This document describes migrating an application that caches data locally to using distributed caching with Oracle Coherence. It provides an overview of a sample dictionary application that runs initially with local caching and then is updated to use Coherence's distributed caching. The document walks through running the application locally, exploring how it works, and then updating it to use Coherence for distributed caching and caching operations. It describes key Coherence concepts used like configuration, cache storage, distributed worker implementation, and testing failover.
This document describes migrating an application that caches data locally to using distributed caching with Oracle Coherence. It provides an overview of a sample dictionary application that runs initially with local caching and then is updated to use Coherence's distributed caching. The document walks through running the application locally, exploring how it works, and then updating it to use Coherence for distributed caching and caching operations. It describes key Coherence concepts used like configuration, cache storage, distributed worker implementation, and testing failover.
Migrating Applications from Local to Distributed Caching with Oracle Coherence
Contents Introduction: Oracle Coherence Hands-On Lab ........................................................................................ 3 Base Environment ................................................................................................................................... 3 Quick overview of the application............................................................................................................ 4 Running the application with local cache ................................................................................................. 5 Explore the application. Understand how it works. ................................................................................ 11 Update application to run with distributed cache .................................................................................. 13 Configuration Subsystem ................................................................................................................... 15 Cache Store ....................................................................................................................................... 16 Distributed Cache Worker implementation ........................................................................................ 16 Run application with distributed cache .................................................................................................. 18 Test failover and redundancy ................................................................................................................ 22 Shutdown everything ............................................................................................................................ 22
Introduction: Oracle Coherence Hands-On Lab
In this hands-on lab, you will take an application that loads data from a set of large files, caches the data in memory, and performs complex queries and updates against the cached data, with changes persisted to the file. You will then modify this application to cache the data in a coherence data grid and perform these computations against it, learning the common Coherence API's, configuration files and usage patterns as you go along. By doing this, you will become familiar with some core coherence concepts like the configuration sub-system, the read-through and write-through mechanisms for loading and persisting data to the backend data source, the JMX subsystem, and the use of EntryProcessors for lock- free concurrent and well performing updates.
For this hands-on lab, we have built a sample application that simulates the access and update of a set of dictionaries of some outdated (and frankly made up) languages. In the next section, you will get an overview of the application. The initial version of the application will work against the dictionaries loaded up in your application's local JVM. We will then walk you through updating the application to work with the data loaded in a Coherence data grid.
This hands-on lab is focused on Coherence. There is no dependency on a database or an application server. The knowledge you gain here is directly applicable in use cases where Coherence is used within an Application Server or fronting a database.
Base Environment The lab machine provides you with the following environment that you will be using in this lab. Oracle Coherence 3.5 for Java Eclipse 3.5 Galileo JRockit 1.6.0 On the desktop, you will find a folder named S309137 Migrating Applications from Local to Distributed Caching with Oracle Coherence. This folder contains a shortcut to the lab which you will be working with. The actual lab folder (accessed from the shortcut) contains scripts to assist you with the lab. There is also a solutions directory in there, which contains the solutions for this lab (for reference when stuck, etc).
Quick overview of the application
This sample application allows you to access the following information about words in a set of languages. description synonyms antonyms daily frequency of use of each word the year which the word became obsolete
The words and their metadata (as described above) are stored in a number of zip files in the working directory. For each language LANG, a zip file called lang-LANG.zip exists and contains an entry for each word.
The application exposes a command line interface with very simple commands to interact with the dictionaries.
The section below shows some typical usage.
Running the application with local cache
For your convenience, batch scripts are provided which allow you setup your environment and run the application. A hint is provided below where double-clicking on the script allows you bypass typing into the cmd shell. Every command must be run with the environment properly setup. You will need to do this for every cmd shell window you open.
set JAVA_HOME=c:\jrmc_3.1 set COHERENCE_HOME=c:\coherence set CLASSPATH=.;%COHERENCE_HOME%\lib\coherence.jar;%COHERENCE_HOME%\lib\tangosol.jar set PATH=.;%JAVA_HOME%\bin;%PATH%;
To start, compile the application. javac -d . app\*.java Hint: you can just run the app-compile.bat script.
Now generate the dictionaries as zip files containing the words for different languages.This will take about 5 minutes. Java -Xms1g Xmx1g app.CreateDictionary Hint: you can just run the app-create-dict.bat script.
Now run the application using the local cache. Here, the cached data is stored in some Collection objects on the local java heap. java -Xms1g -Xmx1500m -Xmanagement app.Main -local Hint: You can just run the app-local.bat script.
This will open up a prompt as below: Main>
At the prompt, type help to see the different commands and how to use them Main> help
Let us get familiar with the application from a user's point of view. At startup, the dictionary has not been loaded up into memory.
Let us look up information about the word word100003 from the language lang1. Since the dictionary is not loaded up into memory, the application will retrieve the information from the zip files directly, and cache just that result into memory.
Main> show lang1 word100003
Now, update the description associated with the word word100003 from the language lang1 Main> update -d lang1 word100003 this new description is cool Run the show command again to ensure that the cache was updated. Also, check the zip file (lang- lang1.zip) and look at the entry for word100410, and ensure that the change was written transparently to the backend zip file. You will notice that updates are slow. Unfortunately, java has poor support for updating zip files, so we had to completely recreate the zip file for each changed word (this is while update takes a while).
Load up the whole dictionary for all languages into memory so that we can do some more complex queries. This may take up to 30 seconds to load all 10 dictionaries.
Main> load
Now, perform a query. Search for words which exist in the language lang1, that are synonymous with synonym674
Main> find lang1 -s synonym674
Let us do a more complex query. Search for words which exist in the language lang1, that are synonymous with synonym674 and synonym1093, but opposite (antonym) of antonym948
Main> find lang1 -s synonym674 -s synonym1430 -a antonym2243
For both of these, you got an UnsupportedOperationException because the query support is hard to implement in our local cache. A database and SQL would have made life much easier but we do not have access to that for our sample application. Hint: This is possible and easy using Coherence.
Let us try some more commands.
Look up the languages currently supported in our dictionary Main> langs
Now look up the stats of the currently loaded cache. This should show you the number of words currently loaded into the cache for each language. Main> stats
Explore the application. Understand how it works.
Now, we will go ahead and explore the application to see how it works under the hood. The application is run from the working directory. For each language LANG, there is a zip file in that directory called lang-LANG.zip which contains meta-data for each word in the language. Open up lang- lang1.zip and look at the contents for familiarity.
This working directory is already setup as an Eclipse project. Open up eclipse and import the project into your workspace. From the file menu, select Import. From the dialog box that comes up, open the General node in the tree menu and select Existing Project into Workspace. Click Next. Click the radio button beside Select Root Directory and click the Browse button to select the working directory. Select the project oow_hol_coherence and click Finish. Hint: You can just run the eclipse.bat script. The working directory is setup as a pre-configured eclipse workspace and project.
You should have the project as below:
The java source files all reside under the app directory. Please read through them to get familiar with what they do. Record.java This encapsulates a word in a language Main.java This is the main command line interface that parses the user inputs and calls the appropriate commands Worker.java This interface allows us decouple the implementation of the cache from the application LocalWorker.java This implementation of Worker uses a local hash map to store the data, and (tries to) manages the computation itself Helper.java This contains some shared helper functions Metrics.java This is a table model containing the languages and memory used for each language within the cache. Monitor.java This is a swing UI which regularly gets updated metrics from the cache and displays as a Swing table.
Once you have a good understanding of the application, feel free to go back to the previous section to run the lab again and get familiar with the application. The rest of the lab will go a lot smoother once you get a hang of what the application is trying to accomplish.
Update application to run with distributed cache
Now, the lab gets more interesting. We will walk through a number of steps to update the application so it runs against the Coherence Data Grid.
In this lab, we aim to achieve the following using Coherence Store the cached data in multiple external JVMs which look to us as a giant local heap with quick access Transparently read the records from the cache, even if they have not been loaded into Coherence Transparently let the cache persist updates to the backend data source Perform queries and computation on the coherence cache Configure the coherence cluster so that other users on the network do not conflict with us Efficiently make updates with the minimum amount of network hops Monitor the application (including the distributed cache) transparently from a single location
To achieve this, we will understand and leverage the following concepts from Coherence Invocable Maps and Entry Processors JMX functionality Read-Through and Write-Through caching strategies Configuration subsystem Coherence Services Partitioned (Distributed) and Near cache topologies
First, define an architecture strategy. In general, Coherence supports using a Partitioned cache where a number of backups are stored on one or more members in the data grid for failover. In addition, a near cache can wrap a partitioned cache, so that a copy of the data is stored on the local JVM for each (so network access may be bypassed). We will leverage both mechanisms for our caching.
The actual words and their metadata are stored in the partitioned (distributed) cache. There will be a different named cache used for each language. For example, language lang1 will be stored in the cache called dict-lang1. The list of available languages and some other state we need will be stored in the near cache (called app-shared). This near cache will actually wrap a different partitioned cache.
We will also use a CacheStore so that read requests to the cache will transparently go to the backend when a cache miss happens, and updates will also transparently go to the backend zip files when we want them to.
Configuration Subsystem Coherence is an extensively configurable system.
At startup, Coherence will find the file tangosol-coherence-override.xml on your classpath. This configuration is for system-wide settings. In this file, you can configure things like your cluster address, cluster multicast port, cluster name, etc.
Copy tangosol-coherence-override.xml from the solutions directory into your working directory. Use the configuration for a local machine restricted cluster, and also define a different distributed cache service for storing the shared state (e.g. languages).
Once Coherence gets a request to lookup a Cache, it will load up the coherence-cache-config.xml file. You should configure your caches here, setting up those that will use the distributed (partitioned) cache separately from those that should use the replicated cache. Copy coherence-cache-config.xml from the solutions directory into your working directory, and ensure the following: Two separate cache schemes are defined for dictionary (using distributed scheme) and shared caches (using near scheme) respectively. A naming convention is used where caches with names matching dict-* are mapped to the distributed scheme, while the cache app-shared is mapped to the near scheme. A cache store is configured which takes the cache name as a parameter. The cache store should apply to every dict-* cache alone, since these are the only ones which need to load up or persist data to the backend zip files.
Now that we have gotten the configuration out of the way, let us go about working on the actual application. Cache Store First, create the CacheStore which does the work of reading data during a cache-miss or persisting data after a cache update. We want an explicit update to the cache to write through to the zip files, but not a bulk load. This means that every cache put should not write back to the zip files. We can control this by using a variable stored in the cache itself. The shared cache (app-shared) will be used to store a variable. During a bulk load, we will set the variable in the cache, and remove it once the bulk load is done. The cache store will not do a write-through to the backend if the variable is stored. Any other cache puts will result in a write-through to the backend. Look at the way app\LocalWorker.java reads and writes individual records to/from the zip files. Implement that same logic in app\DistCacheStore.java. Hint: Look at the solutions directory for how this is done. Distributed Cache Worker implementation Next, create the actual implementation of the Worker interface, similar to the app\LocalWorker.java implementation, that can interact with Coherence. Call this app\DistWorker.java. In this implementation, we need to store all variables in the coherence cache, so that anyone in the coherence cluster can access this. This includes: languages dictionaries (words)
Implement the following methods, with guidelines below: getLanguages store the list of languages in the repl-shared cache and retrieve from there as needed getRecord Simply retrieve the word from the cache. Coherence will ensure that it checks the backend zip file if it does not have it, since the cache store has been configured. update One way to implement this will be to retrieve the Record from Coherence to your local JVM, make updates on the local JVM, and then send the updated Record over to Coherence. However, this can cause unnecessary network traffic which could be a bottleneck if the size of the records is large (especially compared to the change you want to make). A more efficient way is to send your updates directly to the coherence node that hosts the data. You achieve this using EntryProcessors and the InvocableMap. In addition, this method must be smart enough to write-through to the backend zip files only when requested. This can be done by setting a variable in the cache before putting the record in, and the CacheStore will only persist to the zip files if that flag is set. bulkInsert For large uploads, it is more efficient to upload to Coherence in batches. Coherence has a putAll API which can be used for this. find Unlike the app.LocalWorker (where implementing complex queries without a database is difficult and un-implemented), Coherence has an extremely powerful query functionality which can easily simulate complex SQL queries. This is the Coherence Filters API. An added advantage is that Coherence will run your query in parallel across all the nodes in the cluster and return your results faster. The more Coherence nodes you have, the less data each one holds and the less work it does, meaning that your query performance scales linearly with the number of Coherence nodes. updateMetrics Coherence has an extensive JMX feature set, where all the management information can be federated into any number of nodes in the coherence cluster that you deem should hold federated management information for the cluster. We will leverage this to keep track of the number of entries in the cache, how these entries are distributed in the coherence data grid, and how memory is used across the grid. clear This will remove all entries stored in the cache for a given language
Look in the solutions directory for the full solution for the DistWorker.java implementation.
Finally, update the command line interface app\Main.java to have the -dist command line parameter switch to using the app.DistWorker implementation. Do this by un-commenting the call that instantiates DistWorker below.
public static void main(String[] args) throws Exception { Main m = new Main(); m.worker = new LocalWorker(); for(int i = 0; i < args.length; i++) { if(args[i].equals("-dist")) { //m.worker = new DistWorker(); } }
m.run(); }
Run application with distributed cache
Compile your application as was done in one of the prior sections, by opening up a cmd shell, setting up your environment and running javac. Hint: You can run the app-compile.bat script
Now, start up three coherence cache servers, using the command line: java -Xms256m -Xmx512m -Dtangosol.coherence.management.remote=true com.tangosol.net.DefaultCacheServer Hint: You can just run the coherence-cache-server.bat script. Double-click three times to start 3 servers.
This will start up a Coherence cache server configured to expose its management (JMX) information around the cluster.
Now run the application using the distributed cache implementation, using the command line: java -Xms256m -Xmx512m -Xmanagement -Dtangosol.coherence.management=all - Dtangosol.coherence.management.remote=true -Dtangosol.coherence.distributed.localstorage=false app.Main -dist Hint: You can just run the app-dist.bat script
This will start the command line interface to your application, setting that JVM as a coherence cluster member which does not store data, but which collects management information from all the coherence cluster members and stores it in its JMX MBeanServer. You will thus locally be able to see all the management information from the whole coherence cluster within your local JVM. Note that once the data has been loaded into Coherence, you can run multiple clients and have them all share the distributed in-memory data grid.
At the prompt, run stats which will pop up a Swing UI which updates itself as the Coherence cluster membership and contents change. From this Swing UI table, you can see the membership of the coherence cluster and how much of the data each member holds in near-real time (the Swing UI updates itself every 5 seconds).
Main> stats
As mentioned above, this swing table updates itself every five seconds, with the JMX information which has been federated at your local JVM. Each row in it represents how the amount of data stored by each coherence server on behalf of the cluster, and the memory being used in MB. For example, in the screenshot above, the coherence JVM with node-id 1 is the primary store for 10089 words of lang2, 10075 words for lang1, 9924 words for lang3, and uses 114MB of memory. The next rows show how much coherence JVMs with node id 2 and 4 hold. As you add or shutdown coherence cache server, and even as you bulk-load the records into coherence, monitor this Swing UI and see how Coherence automatically distributes the cached data. Look at app\DistMonitor.java for the full implementation. Also, you can open up JConsole or JRMC to look at the rich set of JMX management metrics exposed by Coherence. Hint: You can run the jrockit-mc.bat script
Once that is done, run the other commands as was shown in the previous section. A sampling of those commands is below. Main> help Main> langs Main> show lang1 word100003 Main> update -d lang1 word100003 this new description is cool Main> load Main> find lang1 -s synonym674 Main> find lang1 -s synonym674 -s synonym1430 -a antonym2243 Main> exit To show the beauty of this distributed solution, run another instance of the client. At the prompt, just run the find command. You will see that it works. Any new client does not have to load up the data, since all the data and state is stored externally in the distributed cache. Hint: You can run the app-dist.bat script
Test failover and redundancy
You can test failover and redundancy of your Coherence implementation by shutting down some instances and bringing others back up. You can do this by typing Ctrl-C in some of the Coherence Cache Server windows, and/or bringing some new Coherence servers back up. Hint: You can use the coherence-cache-server.bat script.
Watch the windows of the coherence cache servers that are live. Notice how the coherence cluster automatically rebalances the data. Watch the Swing UI to see the updated stats on how the data is rebalanced and memory used in the cache servers.
Shutdown everything
To shutdown everything, type Ctrl-C in all your open windows.