📜 ⬆️ ⬇️

EHcache RESTful server, PHP and just experiments ...

logo Today we will continue to research various new and not very advanced technologies, their unusual application or just original things. Perhaps you remember, I once wrote about the distributed cache project EHcache for the Java platform. Today it is time to continue this topic, but from a different perspective - as a separate RESTful server.


First, mention EHcache . This is a high-performance and scalable caching system for Java, which is a mature and serious project. Caching options are available both in RAM and disk cache, as well as combined strategies (there is also an option to ensure data integrity when the virtual machine or server reboots). Scalability is implemented using asynchronous replication and cache clustering using JGroup , JMS, RMI, you can also build distributed systems based on third-party products (Terracota). I like the distribution most of all - individual settings for each cache (synchronous / asynchronous replication) paired with the ability to run several different instances within one JVM (or different). Although it should be noted that EHCache stores data in the memory of the JVM process, respectively, some restrictions are imposed on its volume (on 32-bit systems), but nobody canceled the disk cache, and the serious servers are already 64-bit. Known installations with 20 and more GB of data. In addition, the utilization of multithreaded capabilities, competitive access and multi-core is well supported (although there is little to argue with, JBoss Cache seems to be even better with this, as transactions and some other “goodies” are supported, but its API is rather difficult to understand, I didn’t have a chance to figure it out for me in two approaches, while EHcache was launched right away).

The cache is very fast and overtakes other caching systems in online tests (however, in clusters of 4 machines, JBoss Cache shows a slightly better result, but this is a slightly different system), including the most popular Memcached. I know, the comparison is a bit wrong, since EHcache is an in-process cache, while memcached is a separate daemon and works as an external network service (therefore, there are no such restrictions on the size of the cache). At the same time, if we were to compare that ehcache is still preferable due to much greater flexibility and scalability, memory / disk combination and cache tweaking. This is where the thought came to me ... is it possible to use EHcache instead of memcached (or together with), while remaining on the PHP platform I am used to? Yes you can!
')
The thing is that the developer community, in addition to the cache itself, also implemented a caching REST server, which can be accessed through a REST interface or SOAP. This solution was made based on the embedded version of the Sun GlassFish v3 Prelude server and is self-contained, including all the necessary components and dependencies. The server is accessed using the HTTP protocol, using the GET / POST / PUT / DELETE / OPTIONS / HEAD methods, or via SOAP (also on top of HTTP) via XML. All HTTP / 1.1 features are supported, including keep-alive as well as Last-Modified, ETag, that is, the server gives all the correct headers, so you can often use embedded caching on intermediate nodes during transmission or in the client itself. An interesting point is the ability to work with several data formats, and to be precise, the ability to get an answer in XML or JSON format, for which it is enough to set the correct MIME-type header in the request.

And so, we have the opportunity, literally in one click, to launch a caching server accessible via a simple and understandable protocol, and also to access it from any language or platform that supports HTTP requests. Let's try!

You can download the latest version of the server to SourceForge , but I recommend downloading the latest version of the cache in parallel, and then updating the server files, since it uses the previous version of the cache. We are interested in ehcache-standalone-server, which currently has version 0.7.

The distribution kit already has scripts to run, read the README, or to launch, go to the lib directory and run it manually:

java -jar ./ehcache-standalone-server-0.7.jar 8080 ../war * This source code was highlighted with Source Code Highlighter .
  1. java -jar ./ehcache-standalone-server-0.7.jar 8080 ../war * This source code was highlighted with Source Code Highlighter .
java -jar ./ehcache-standalone-server-0.7.jar 8080 ../war * This source code was highlighted with Source Code Highlighter .


After specifying the main server file, there is the port number for which it will be available, as well as the path to the directory with the web application (war file). In the console, after launching, you will see the connection progress, as well as information about running services and ports - for example, this way I will find out which port the JMX service is available for management. Since the embedded version of the GlassFish server is used (this is both a web server and an application server), its settings and capabilities are few, but you can always deploy a full-fledged server, not necessarily even GlassFish, and then use only the EHcache server that is available and separately, without a web server.

By default, the caches are available at the address / ehcache / rest - by entering the browser or performing a GET request, we will receive an XML document describing all the current cache settings. Initially, there are descriptions of several caches in the configuration file, for example, including a couple of distributed ones. To get started, it's best to delete all the basic settings and create your own caches. A simple cache, without replications, we will now do.

All cache settings are concentrated in one xml file - /war/WEB-INF/classes/ehcache.xml , which we will edit. Inside there are quite a lot of comments and descriptions of all options, so I will only briefly describe how to make a basic cache in order to continue the experiments.

What these options mean:

We are not discussing replication options yet - this is already an in-depth specificity, about which more competent specialists would better tell.

  1. < cache name = "testRestCache"
  2. maxElementsInMemory = "10000"
  3. eternal = "true"
  4. timeToIdleSeconds = "0"
  5. timeToLiveSeconds = "0"
  6. overflowToDisk = "true"
  7. diskSpoolBufferSizeMB = "4"
  8. maxElementsOnDisk = "1,000,000,000"
  9. diskPersistent = "true"
  10. diskExpiryThreadIntervalSeconds = "3600"
  11. memoryStoreEvictionPolicy = "LFU"
  12. />
* This source code was highlighted with Source Code Highlighter .


And so, our cache is configured to keep in memory 10 thousand items, 1 billion on a disk, not to use the lifetime settings, to ensure constancy of data between reboots. I chose the volume of the pool for disk recording to be rather small, and the time of checking for the life of disk elements is very large (but I think we need more, ideally - see if you can disable it altogether). What exactly is this configuration for? It is interesting for me to try to make a simple key-value database based on this cache (today it is a very popular topic), while at the same time providing myself with the possibility of a direct access to the cache from external services, as well as from inside PHP web application. One caveat - even if you do not need a lifetime check and you need a constant cache, do not set the timeToIdle / timeToLive parameters to 0, otherwise the server may not start (or rather, the cache service, the server itself starts to issue a 404 error).

To test, save the edited ehcache.xml file and restart the server. Now open the URL in your browser: localhost : 8080 / ehcache / rest / testRestCache - you should get an XML document with all cache settings, as well as current cache usage statistics (volume, amount of data, percentage of hits and misses) - this can be further analyzed programmatically to display in the desired form (for example, in the admin).

In the future, I will consider only the REST part, to work through SOAP, you need to change the rest in the URL to soap, get a description of the services in the WSDL format, etc. For performance, I just turned off everything that is unused, including caches and access that are not needed by the SOAP protocol. Servlet settings are available in the web.xml file in the / war / WEB-INF directory .

Working with the cache consists in sending requests via the HTTP protocol and parsing the response. In case of an error, the response will be in text / plain format, and the body of the request will contain the error text, the HTTP code will be 404 - for example, you will access a non-existing cache or element, then the response will be the string "Element not found: 333" (if requested an item with a key 333). But this is true for those URLs that are serviced by the EHcache servlet, but if the error is in another part, you will receive a standard 404 error page from GlassFish, which is less adapted to automatic parsing.

You can work both with the server in general (with the cache manager), and individually with each cache and element, for this simply add the URL line and use the desired method with parameters.

For all cache (CacheManager-a):



cache_manager_options


Further along the hierarchy, if you specify a specific cache name in the URL, you can perform the following operations on it:

At the cache element level, the following operations are supported:

When you save an item to the cache, you can specify its MIME type (from the list of supported), then when retrieving, we immediately get the necessary data. Supported:

Actually, that's all the description of the server itself, now the practical part is how to work with the server from a web application in PHP. The original idea was to write a special Cache Backend for the Zend Framework , similar to the class for Memcached, but at first I decided to just experiment how this all works. Perhaps I will write such a class if it will be interesting and useful for anyone other than me.

We will use the Zend Framework for experiments, in particular, its classes for working with HTTP requests ( Zend_Http_Client ) and the class for working with JSON ( Zend_Json ).

First you need to establish a connection to the server. Zend_Http provides several possibilities for this, different adapters, but the Socket adapter was the fastest in tests, I would use Curl last, if the cache server is remote and cannot be reached by other means (for example, you need to use SSL, but This is a strange requirement for the cache, but in some cases this is necessary, the amendment - the socket can also use ssl).

We describe the connection options, based on maximum performance, given that we will not do a single request within the page:

  1. $ _config = Array (
  2. 'timeout' => 5,
  3. 'maxredirects' => 1,
  4. 'httpversion' => 1.1,
  5. 'adapter' => 'Zend_Http_Client_Adapter_Sockets' ,
  6. 'options' => array (
  7. 'persistent' => true
  8. ),
  9. 'keepalive' => true
  10. );
* This source code was highlighted with Source Code Highlighter .


Recall that our primary URL is as follows: $ _url = 'http: // localhost: 8080 / ehcache / rest / testRestCache';

For the first example, we will try to put in the cache the contents of a large array, which will be $ _SERVER, while setting the JSON as the data type (we first convert the array to JSON before sending).

  1. // create connection object
  2. $ ehcache_connect = new Zend_Http_Client ( 'http: // localhost' , $ _config);
  3. // name of our object in the cache, its unique id
  4. $ _chache_item_name = 'testitem1' ;
  5. // set the full path to the element
  6. // localhost: 8080 / ehcache / rest / testRestCache / testitem1
  7. $ ehcache_connect-> setUri ($ _ url. '/' . $ _chache_item_name);
  8. // indicate that we use JSON
  9. $ ehcache_connect-> setHeaders ( 'Content-type' , 'application / json' );
  10. // set the method
  11. $ ehcache_connect-> setMethod (Zend_Http_Client :: PUT);
  12. // add data with encoding in JSON
  13. $ ehcache_connect-> setRawData (Zend_Json :: encode ($ _ SERVER));
  14. //Everything! Execute the request
  15. $ response = $ ehcache_connect-> request ();
  16. // we received the answer as an object of class Zend_Http_Response
  17. if ($ response-> isSuccessful ())
  18. {
  19. // everything is OK, the request is successful, the server returned the correct HTTP response with code 200
  20. echo 'Request OK!' ;
  21. }
  22. else
  23. {
  24. echo $ response-> getMessage ();
  25. }
* This source code was highlighted with Source Code Highlighter .


Now we’ll get our array back, for that we don’t even have to change the URL, just change the request type, the rest is the same as in the previous code:

  1. // URL of our object
  2. $ ehcache_connect-> setUri ($ _ url. '/' . $ _chache_item_name);
  3. // Method
  4. $ ehcache_connect-> setMethod (Zend_Http_Client :: GET);
  5. // execute the query
  6. $ _result = $ ehcache_connect-> request ();
  7. // if everything is OK
  8. if ($ _result-> isSuccessful ())
  9. {
  10. // get the request body and decode it from JSON back to Array
  11. $ _json_res = Zend_Json :: decode ($ _ result-> getBody (), Zend_Json :: TYPE_ARRAY);
  12. // Display
  13. Zend_Debug :: dump ($ _ json_res);
  14. }
  15. else
  16. {
  17. echo $ response-> getMessage ();
  18. }
* This source code was highlighted with Source Code Highlighter .


The remaining commands can be set in the same way. The first thing that slightly limits is that Zend_Http does not support HEAD requests, but they usually duplicate others, so there is no great need for them. The second disadvantage is that metadata about the cache or specific elements is sent in XML format, although it is possible to work with elements in JSON. Statistics are given along with all the data, although it would be good to put it in a separate page. The third disadvantage is that there are no developed possibilities for extracting and adding data. You cannot immediately put in or request several elements (although there is a Java API itself). But you can delete everything at once. Well, security is not secured at all, so do not store confidential data accessible via HTTP to the outside.

In conclusion, I will talk about the main idea of ​​this study. Since we can receive data directly in JSON, and the web server supports all HTTP features, the client application, for example, in AJAX, can easily interact with the cache by requesting data and receiving it in JSON, and the server side will asynchronously store new data when they are. The client himself can first check if there is any data in the cache, and if not, he will directly contact the server side.

It is also quite simple to implement cache sharding and load balancing. By the way, then it is better to deploy a server based on the full version of GlassFish, since there are not some useful features in the embedded one, such as admin, gzip-compression traffic and load balancer. You can also use the front end of nginx, which will balance the load between the servers, and they are replicated between themselves using Java tools in the background. The HTTP protocol is simple and quite flexible, so we can implement any strategy for the behavior of a caching server, combining the capabilities of HTTP and the Java platform.

PS A few words about performance. Of course, my tests are far from real and cannot be reliable at all and in general mean something. The average figure obtained on my machine (development notebook, 1.5 GB RAM / Celeron M 1.7 GHz, WinXP SP3) in the process of preparing the material - 0.020 - 0.025 sec. on read / write operations (if you use cURL, then approximately twice as long). Of course, it is interesting to test the variant with replication and load balancing, but this is a completely different level, but I would gladly take part and look at the results.

PPS Answering the question - why is this all? In some cases, it can replace other caching systems, the same memcached, as it provides more flexible cache settings, data constancy, various replication systems, it is well scaled and distributed, data can be obtained by the client system directly (AJAX). At the same time, if you have a part of the backend working in Java, or even the whole, it will be much easier for it to put data there. EHcache can also work as a highly scalable and reliable key-value database, providing exactly the replication and clustering of a serious level, unlike many new solutions, ehcache has a long history of development and optimization.

It seems to me if you take only the servlet that provides the REST interface and put it on some fast and lightweight web server, for example, Tjws , adding a lightweight balancer, highlighting a separate JVM for each cache (deploying a two-node cluster to each physical server as a matter of fact) - we will get a much faster and easier system with excellent scalability. And if you add your servlet, literally several lines, we can organize support for other protocols / formats - such a cacher would be very interesting with the ability to receive data through Thrift / Google ProtoBuff, given that clients for these protocols are on client machines (on JS and ActionScript). The field for research is wide and interesting, right?

Source: https://habr.com/ru/post/66464/


All Articles