Summary
Some enterprise applications must go beyond simple caching tools and rely on data grids for low-latency, highly available data access. In a recent blog post, Oracle's Cameron Purdy explains how data grids differ from databases and other forms of caches.
Advertisement
Rapid and reliable access to data is an important requirement of many enterprise applications. Caching frequently-accessed data in memory can speed up data access by several orders of magnitude. Indeed, in-memory caches are integral parts of some relational database products, including open-source ones, such as MySQL and Postgres.
Some applications, however, require very low-latency, high-throughput data access, and must go beyond simple in-memory caches. Such applications typically rely on data grids that differ from other forms of in-memory caches in important ways. In a recent blog post, Defining a Data Grid, Oracle's Cameron Purdy explains what those differences are, and how they relate to enterprise application architecture:
[A data grid] is a system composed of multiple servers that work together to manage information and related operations – such as computations – in a distributed environment.
[An] in-memory data grid is a data grid that stores the information in memory in order to achieve very high performance, and uses redundancy – by keeping copies of that information synchronized across multiple servers – in order to ensure the resiliency of the system...
Purdy points out a key difference between a data grid and a regular database cache: while a database, and even an in-memory cache associated with a database, often contains row-oriented database data, a data grid contains data in its native object form, such as Java or C# objects:
Application objects are the actual components of the application that contain information that is shared across multiple servers and that must survive server failure in order for the application to be continuously available...
[Application objects] are shared across multiple servers because a middleware application (such as eBay and Amazon.com) is horizontally scaled by adding servers, with each server running an instance of that application. Since the application instance running on one server may read and write some of the same information that an application instance running on another server reads and writes, that information must be shared.
Another data grid characteristic Purdy points is that data grids are optimized for low-latency access. By contrast, relational databases especially are often optimized for easy access via SQL:
[A data grid provides] low response times for data access by keeping the information in-memory and in the application object form, and by sharing that information across multiple servers...
In other words, applications may be able to access the information that they require without any network communication and without any data transformation step such as ORM.
Finally, Purdy notes several performance optimizations that lower the cost of network access to data, as well as some that provide high availability and clustering in a data grid:
Oracle Coherence [data grid] employs a ... sophisticated clustering protocol that can achieve wire speed throughput of information on each server, allowing the aggregate flow of information to increase linearly with the number of servers...
By partitioning the information, as servers are added each one assumes responsibility for its fair share of the total set of information, thus load-balancing the data management responsibilities into smaller and smaller portions...
By combining the wire speed throughput and the partitioning with automatic knowledge of the location of information within the Data Grid, Oracle Coherence routes all read and write requests directly to the servers that manage the targeted information, resulting in true linear scalability of both read and write operations; in other words, high throughput of information access and change.
For queries, transactions and calculations, particularly those that operate against large sets of data, Oracle Coherence can route those operations to the servers that manage the target data and execute them in parallel.
What do you think of Purdy's explanation of data grids?