
Today we will continue to research various new and not very advanced technologies, their unusual application or just original things. Perhaps you remember, I once wrote about the
distributed cache project
EHcache for the Java platform. Today it is time to continue this topic, but from a different perspective - as a separate RESTful server.
First, mention
EHcache . This is a high-performance and scalable caching system for Java, which is a mature and serious project. Caching options are available both in RAM and disk cache, as well as combined strategies (there is also an option to ensure data integrity when the virtual machine or server reboots). Scalability is implemented using asynchronous replication and cache clustering using
JGroup , JMS, RMI, you can also build distributed systems based on third-party products (Terracota). I like the distribution most of all - individual settings for each cache (synchronous / asynchronous replication) paired with the ability to run several different instances within one JVM (or different). Although it should be noted that EHCache stores data in the memory of the JVM process, respectively, some restrictions are imposed on its volume (on 32-bit systems), but nobody canceled the disk cache, and the serious servers are already 64-bit. Known installations with 20 and more GB of data. In addition, the utilization of multithreaded capabilities, competitive access and multi-core is well supported (although there is little to argue with,
JBoss Cache seems to be even better with this, as transactions and some other “goodies” are supported, but its API is rather difficult to understand, I didn’t have a chance to figure it out for me in two approaches, while EHcache was launched right away).
The cache is very fast and overtakes other caching systems in online tests (however, in clusters of 4 machines, JBoss Cache shows a slightly better result, but this is a slightly different system), including the most popular Memcached. I know, the comparison is a bit wrong, since EHcache is an in-process cache, while memcached is a separate daemon and works as an external network service (therefore, there are no such restrictions on the size of the cache). At the same time, if we were to compare that ehcache is still preferable due to much greater flexibility and scalability, memory / disk combination and cache tweaking. This is where the thought came to me ... is it possible to use EHcache instead of memcached (or together with), while remaining on the PHP platform I am used to? Yes you can!
')
The thing is that the developer community, in addition to the cache itself, also implemented a caching REST server, which can be accessed through a REST interface or SOAP. This solution was made based on the embedded version of the Sun GlassFish v3 Prelude server and is self-contained, including all the necessary components and dependencies. The server is accessed using the HTTP protocol, using the GET / POST / PUT / DELETE / OPTIONS / HEAD methods, or via SOAP (also on top of HTTP) via XML. All HTTP / 1.1 features are supported, including keep-alive as well as Last-Modified, ETag, that is, the server gives all the correct headers, so you can often use embedded caching on intermediate nodes during transmission or in the client itself. An interesting point is the ability to work with several data formats, and to be precise, the ability to get an answer in XML or JSON format, for which it is enough to set the correct MIME-type header in the request.
And so, we have the opportunity, literally in one click, to launch a caching server accessible via a simple and understandable protocol, and also to access it from any language or platform that supports HTTP requests. Let's try!
You can download the latest version of the server
to SourceForge , but I recommend downloading the latest version of the cache in parallel, and then updating the server files, since it uses the previous version of the cache. We are interested in ehcache-standalone-server, which currently has version 0.7.
The distribution kit already has scripts to run, read the README, or to launch, go to the lib directory and run it manually:
java -jar ./ehcache-standalone-server-0.7.jar 8080 ../war * This source code was highlighted with Source Code Highlighter .
java -jar ./ehcache-standalone-server-0.7.jar 8080 ../war * This source code was highlighted with Source Code Highlighter .
java -jar ./ehcache-standalone-server-0.7.jar 8080 ../war * This source code was highlighted with Source Code Highlighter .
After specifying the main server file, there is the port number for which it will be available, as well as the path to the directory with the web application (war file). In the console, after launching, you will see the connection progress, as well as information about running services and ports - for example, this way I will find out which port the JMX service is available for management. Since the embedded version of the GlassFish server is used (this is both a web server and an application server), its settings and capabilities are few, but you can always deploy a full-fledged server, not necessarily even GlassFish, and then use only the EHcache server that is available and separately, without a web server.
By default, the caches are available at the address
/ ehcache / rest - by entering the browser or performing a GET request, we will receive an XML document describing all the current cache settings. Initially, there are descriptions of several caches in the configuration file, for example, including a couple of distributed ones. To get started, it's best to delete all the basic settings and create your own caches. A simple cache, without replications, we will now do.
All cache settings are concentrated in one xml file -
/war/WEB-INF/classes/ehcache.xml , which we will edit. Inside there are quite a lot of comments and descriptions of all options, so I will only briefly describe how to make a basic cache in order to continue the experiments.
What these options mean:
- name is the name of the cache that will be used for access (will be in the URL, so the same restrictions are imposed on the content, best of all - a short and clearly denoting the type of stored data). Within one server there can be many caches with different names.
- maxElementsInMemory - the maximum number of elements that are placed in memory
- maxElementsOnDisk - the maximum number of items on the disk (0 - without restrictions)
- eternal - indicates that you can ignore the cache life settings, then the elements will always be in the cache until you manually delete them
- overflowToDisk - indicates whether items can be preempted to disk if the maximum number of objects in memory has been reached
- timeToIdleSeconds - the time from the last access to the object to the moment it is recognized as invalid (if it is not marked as eternal). Optional parameter
- timeToLiveSeconds - item lifetime (from version 1.6, if I'm not mistaken, this option can be set for each individual cache object
- diskPersistent - indicates that the cache state is kept on the disk between restarts
- diskExpiryThreadIntervalSeconds - the frequency of starting the process of checking objects on a disk for expiration of a TTL (lifetime).
- diskSpoolBufferSizeMB - the volume of the pool that is allocated to the cache for buffering write to disk. When the pool is full, the cache state command is called asynchronously.
- memoryStoreEvictionPolicy - indicates the strategy for determining which cache objects should be pushed to disk. It can be LRU (Least Recently Used), by date of last use, FIFO (First In First Out, first added, first preempted) and LFU (Less Frequently Used, by frequency of use)
We are not discussing replication options yet - this is already an in-depth specificity, about which more competent specialists would better tell.
- < cache name = "testRestCache"
- maxElementsInMemory = "10000"
- eternal = "true"
- timeToIdleSeconds = "0"
- timeToLiveSeconds = "0"
- overflowToDisk = "true"
- diskSpoolBufferSizeMB = "4"
- maxElementsOnDisk = "1,000,000,000"
- diskPersistent = "true"
- diskExpiryThreadIntervalSeconds = "3600"
- memoryStoreEvictionPolicy = "LFU"
- />
* This source code was highlighted with Source Code Highlighter .
And so, our cache is configured to keep in memory 10 thousand items, 1 billion on a disk, not to use the lifetime settings, to ensure constancy of data between reboots. I chose the volume of the pool for disk recording to be rather small, and the time of checking for the life of disk elements is very large (but I think we need more, ideally - see if you can disable it altogether). What exactly is this configuration for? It is interesting for me to try to make a simple key-value database based on this cache (today it is a very popular topic), while at the same time providing myself with the possibility of a direct access to the cache from external services, as well as from inside PHP web application. One caveat - even if you do not need a lifetime check and you need a constant cache, do not set the timeToIdle / timeToLive parameters to 0, otherwise the server may not start (or rather, the cache service, the server itself starts to issue a 404 error).
To test, save the edited ehcache.xml file and restart the server. Now open the URL in your browser:
localhost : 8080 / ehcache / rest / testRestCache - you should get an XML document with all cache settings, as well as current cache usage statistics (volume, amount of data, percentage of hits and misses) - this can be further analyzed programmatically to display in the desired form (for example, in the admin).
In the future, I will consider only the REST part, to work through SOAP, you need to change the rest in the URL to soap, get a description of the services in the WSDL format, etc. For performance, I just turned off everything that is unused, including caches and access that are not needed by the SOAP protocol. Servlet settings are available in the web.xml file in the
/ war / WEB-INF directory .
Working with the cache consists in sending requests via the HTTP protocol and parsing the response. In case of an error, the response will be in text / plain format, and the body of the request will contain the error text, the HTTP code will be 404 - for example, you will access a non-existing cache or element, then the response will be the string "Element not found: 333" (if requested an item with a key 333). But this is true for those URLs that are serviced by the EHcache servlet, but if the error is in another part, you will receive a standard 404 error page from GlassFish, which is less adapted to automatic parsing.
You can work both with the server in general (with the cache manager), and individually with each cache and element, for this simply add the URL line and use the desired method with parameters.
For all cache (CacheManager-a):
- GET - returns in XML format a list of available caches on the server and their parameters. This is a normal request through the browser, as we did above, for example: localhost : 8080 / ehcache / rest /
Further along the hierarchy, if you specify a specific cache name in the URL, you can perform the following operations on it:
- OPTIONS - as above, returns a WADL description of available operations.
- HEAD - returns the same meta data describing cache parameters, but in the form of HTTP headers, and not in the response body (as in GET)
- GET - XML ​​document with cache parameters and its statistics.
- PUT - allows you to create a new cache (the name of which is transferred in the URL string) based on the default cache settings (specified in the configuration file).
- DELETE - deletes the cache specified in the URL. It removes and does not clear (for this there is another command, oddly enough, among operations on cache elements), it seems, until the next reboot of the server (but I have not checked this moment yet).
At the cache element level, the following operations are supported:
- OPTIONS - as above, returns a WADL description of available operations.
- HEAD - returns the contents of an element as a string in the HTTP header (there is ambiguity in the help here, since for other cases HEAD duplicates GET, for cache elements it is indicated that it returns metadata, and not a value).
- GET - returns directly the contents of the cache element in the response body.
- PUT - puts the data in the cache. The data itself is transmitted in the request body, the object name is in the URL, and an additional parameter, the lifetime, can be transmitted in the HTTP header with the name “ehcacheTimeToLiveSeconds”, take into account that if there is no parameter, the parameter from the cache description will be used, and - 0 (forever) ... 2147483647 (approximately 69 years).
- DELETE - deletes the specified item. If you need to delete all cache items, use a mask *, it's a pity that other methods do not support this (that is, there is no multi-get at the REST level, although the cache itself supports it completely in JavaAPI).
When you save an item to the cache, you can specify its MIME type (from the list of supported), then when retrieving, we immediately get the necessary data. Supported:
- text / plain - plain text or arbitrary data.
- text / xml - XML ​​document, according to RFC 3023
- application / json - the most interesting, JSON-format (according to RFC 4627 )
- application / x-java-serialized-object - serialized Java object
Actually, that's all the description of the server itself, now the practical part is how to work with the server from a web application in PHP. The original idea was to write a special Cache Backend for the
Zend Framework , similar to the class for Memcached, but at first I decided to just experiment how this all works. Perhaps I will write such a class if it will be interesting and useful for anyone other than me.
We will use the Zend Framework for experiments, in particular, its classes for working with HTTP requests (
Zend_Http_Client ) and the class for working with JSON (
Zend_Json ).
First you need to establish a connection to the server. Zend_Http provides several possibilities for this, different adapters, but the Socket adapter was the fastest in tests, I would use Curl last, if the cache server is remote and cannot be reached by other means (for example, you need to use SSL, but This is a strange requirement for the cache, but in some cases this is necessary, the amendment - the socket can also use ssl).
We describe the connection options, based on maximum performance, given that we will not do a single request within the page:
- $ _config = Array (
- 'timeout' => 5,
- 'maxredirects' => 1,
- 'httpversion' => 1.1,
- 'adapter' => 'Zend_Http_Client_Adapter_Sockets' ,
- 'options' => array (
- 'persistent' => true
- ),
- 'keepalive' => true
- );
* This source code was highlighted with Source Code Highlighter .
Recall that our primary URL is as follows:
$ _url = 'http: // localhost: 8080 / ehcache / rest / testRestCache';For the first example, we will try to put in the cache the contents of a large array, which will be $ _SERVER, while setting the JSON as the data type (we first convert the array to JSON before sending).
- // create connection object
- $ ehcache_connect = new Zend_Http_Client ( 'http: // localhost' , $ _config);
- // name of our object in the cache, its unique id
- $ _chache_item_name = 'testitem1' ;
- // set the full path to the element
- // localhost: 8080 / ehcache / rest / testRestCache / testitem1
- $ ehcache_connect-> setUri ($ _ url. '/' . $ _chache_item_name);
- // indicate that we use JSON
- $ ehcache_connect-> setHeaders ( 'Content-type' , 'application / json' );
- // set the method
- $ ehcache_connect-> setMethod (Zend_Http_Client :: PUT);
- // add data with encoding in JSON
- $ ehcache_connect-> setRawData (Zend_Json :: encode ($ _ SERVER));
- //Everything! Execute the request
- $ response = $ ehcache_connect-> request ();
- // we received the answer as an object of class Zend_Http_Response
- if ($ response-> isSuccessful ())
- {
- // everything is OK, the request is successful, the server returned the correct HTTP response with code 200
- echo 'Request OK!' ;
- }
- else
- {
- echo $ response-> getMessage ();
- }
* This source code was highlighted with Source Code Highlighter .
Now we’ll get our array back, for that we don’t even have to change the URL, just change the request type, the rest is the same as in the previous code:
- // URL of our object
- $ ehcache_connect-> setUri ($ _ url. '/' . $ _chache_item_name);
- // Method
- $ ehcache_connect-> setMethod (Zend_Http_Client :: GET);
- // execute the query
- $ _result = $ ehcache_connect-> request ();
- // if everything is OK
- if ($ _result-> isSuccessful ())
- {
- // get the request body and decode it from JSON back to Array
- $ _json_res = Zend_Json :: decode ($ _ result-> getBody (), Zend_Json :: TYPE_ARRAY);
- // Display
- Zend_Debug :: dump ($ _ json_res);
- }
- else
- {
- echo $ response-> getMessage ();
- }
* This source code was highlighted with Source Code Highlighter .
The remaining commands can be set in the same way. The first thing that slightly limits is that Zend_Http does not support HEAD requests, but they usually duplicate others, so there is no great need for them. The second disadvantage is that metadata about the cache or specific elements is sent in XML format, although it is possible to work with elements in JSON. Statistics are given along with all the data, although it would be good to put it in a separate page. The third disadvantage is that there are no developed possibilities for extracting and adding data. You cannot immediately put in or request several elements (although there is a Java API itself). But you can delete everything at once. Well, security is not secured at all, so do not store confidential data accessible via HTTP to the outside.
In conclusion, I will talk about the main idea of ​​this study. Since we can receive data directly in JSON, and the web server supports all HTTP features, the client application, for example, in AJAX, can easily interact with the cache by requesting data and receiving it in JSON, and the server side will asynchronously store new data when they are. The client himself can first check if there is any data in the cache, and if not, he will directly contact the server side.
It is also quite simple to implement cache sharding and load balancing. By the way, then it is better to deploy a server based on the full version of GlassFish, since there are not some useful features in the embedded one, such as admin, gzip-compression traffic and load balancer. You can also use the front end of nginx, which will balance the load between the servers, and they are replicated between themselves using Java tools in the background. The HTTP protocol is simple and quite flexible, so we can implement any strategy for the behavior of a caching server, combining the capabilities of HTTP and the Java platform.
PS A few words about performance. Of course, my tests are far from real and cannot be reliable at all and in general mean something. The average figure obtained on my machine (development notebook, 1.5 GB RAM / Celeron M 1.7 GHz, WinXP SP3) in the process of preparing the material - 0.020 - 0.025 sec. on read / write operations (if you use cURL, then approximately twice as long). Of course, it is interesting to test the variant with replication and load balancing, but this is a completely different level, but I would gladly take part and look at the results.
PPS Answering the question - why is this all? In some cases, it can replace other caching systems, the same memcached, as it provides more flexible cache settings, data constancy, various replication systems, it is well scaled and distributed, data can be obtained by the client system directly (AJAX). At the same time, if you have a part of the backend working in Java, or even the whole, it will be much easier for it to put data there. EHcache can also work as a highly scalable and reliable key-value database, providing exactly the replication and clustering of a serious level, unlike many new solutions, ehcache has a long history of development and optimization.
It seems to me if you take only the servlet that provides the REST interface and put it on some fast and lightweight web server, for example,
Tjws , adding a lightweight balancer, highlighting a separate JVM for each cache (deploying a two-node cluster to each physical server as a matter of fact) - we will get a much faster and easier system with excellent scalability. And if you add your servlet, literally several lines, we can organize support for other protocols / formats - such a cacher would be very interesting with the ability to receive data through Thrift / Google ProtoBuff, given that clients for these protocols are on client machines (on JS and ActionScript). The field for research is wide and interesting, right?