|
YOUR FEEDBACK
Did you read today's front page stories & breaking news?
SYS-CON.TV |
TOP THREE LINKS YOU MUST CLICK ON Feature Turbo-Charging Applications with Mid-Tier Distributed Caching
Fast and predictable data access
By: Tim Middleton
Feb. 21, 2008 12:00 PM
Keeping data in the mid-tier ensures fast access, but poses a number of challenges in scalability and manageability. These include: Mid-tier caching or "data grid" solutions bring the data closer to applications, thereby removing load from the database. This lets applications scale linearly through the addition of commodity-based hardware, ensuring that the architecture can satisfy ever-increasing transaction throughput and availability demands. This article is intended for application developers and software architects who would like to understand what a data grid is, what types of applications and industries can benefit from the technology, how it works under the covers, and what to consider when deploying such a solution.
Mid-tier Object Caching and Data Grid Solutions Solutions such as Oracle Coherence, Gigaspaces XAP, and IBM ObjectGrid use slightly different methods for solving these issues in a mid-tier caching solution, but they generally rely on their clustering technology. Although caching is often the initial reason for adopting these kinds of technologies, their capabilities go far beyond simple data caching. The enhanced functionality provided by these solutions qualifies them as data grids. Applications that require extremely fast and reliable access to data, massive parallelization of processing, predictable scalability, and extreme event processing capabilities can benefit greatly from data grid solutions. Industries that use these solutions include financial services (for online stock trading), airline and accommodation (for online search aggregation sites), telecommunications, and online gaming. Data grid architecture typically comprises many commodity-based, multi-core machines with multiple Java Virtual Machines (JVMs) running on each machine. High-speed switched networks connect these machines together, and clients connect into the data grid to do data processing and manipulation. Machines can be added to the grid as required, without bringing down or reconfiguring each grid server or client. Data grid solutions provide a reliable, performant, scalable data tier that fits nicely with existing clustering solutions for objects such as HttpSession and EJB stateful session beans. In existing application server implementations, the client state represented in these artifacts is usually serialized and written to a back-end replication channel, which could be the data grid rather than the database — even though the data grid is "backed up" by the database.
Under the Covers Some of these extensions include querying and aggregating data seamlessly across the data grid, grid-style data processing (sending the processing to the data), data locking and transactions, real-time events, and many more. Table 1 outlines some of these capabilities. Typical use of data in these solutions involves access and updating it via the Map interface. For example, to create a new Customer object and put it in the cache, you do the following:
NamedCache customers = CacheFactory.getCache(Customer.CACHENAME); To access and then update the information from the cache
Customer myCust = (Customer)customers.get(key); In this example we used a single Customer object in the data grid. Taking advantage of the distributed and parallel nature of the data grid lets us efficiently load information from the back-end data stores, making the data available to applications in a timely fashion. In the mid-tier we work with data in object form - POJOs, PONOs, and POCOs. Typical usage scenarios require access to this cached data as well as grid-style processing. It's also important to be able to access this data seamlessly across object-oriented languages such as .NET/C#, C++, and Java. The real-time event-processing requirement is particularly useful in the grid because it lets developers design and implement extremely fast event propagation across different applications. It also lets clients monitor changing data from, say, a .NET client on a workstation. Most data grids manage data in one of two ways: federation or clustering. Federation is the more traditional approach that requires that the entire application be partitioned top to bottom (that is, a transaction can only work with data in a single partition). Because few applications can be fully partitioned, an explicit message (for example, via a JMS API) is usually required to stitch the partitions manually into a cohesive whole. This messaging is typically asynchronous, so transactions will appear on different partitions at different points in time depending on the degree of system load. Clustering uses a cluster management protocol to ensure that data integrity is maintained, so it can provide a consistent view of the data across multiple servers. This eliminates the need to stitch together multi-partition operations and provides a "flat" view of the data grid. Clustering also increases data resiliency because there's no dependency on client-side timeouts. This means that data can be failed over as soon as the underlying server fails, allowing new backups to be made immediately in case there's a subsequent failure. BEA WEBLOGIC LATEST STORIES
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
|
SYS-CON FEATURED WHITEPAPERS MOST READ THIS WEEK BREAKING NEWS FROM THE WIRES
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||