With the advent and development of memcached-like systems in web application architectures, another link has emerged, namely, cache servers. Usually this is a machine with a large amount of RAM, in which pre-prepared data is stored. These can be the results of complex database queries or the rendered dynamic parts of the site pages. In fact, the cache, like any other system, can be used as you like to meet the needs of the application.
The essence of caching is simple. It allows to reduce the time of data preparation and, as a result, the time of page generation.
')
In our project we actively use caching for one simple reason. Initially, we laid down the principle of super-flexibility in the information model. What does it mean? And the fact that we have two types of tables: reference books and links. Directories store information about the entities used in the project, and links, as you can guess, store the relationship between these entities. Moreover, the connections themselves may have their own properties.
Consider an example. There are three entities: artist, album, song. And there is a connection between them. All communications must be many-to-many. Since one artist can have several albums, if the album is a compilation, he will have several performers, etc. At the same time, say, the connection between the performer and the performer may have the property “versus”, “feat” and others.
In general, using this example, I tried to show the complexity of the links between entities and the need for a very flexible database structure.
But, with such a structure, requests to the database become multi-storey. For example, in order to display an artist and a list of his albums, you need to join 3 tables: artist, connection_anist_album and album. And now add personalization and get a classic Slow query as a result.
That is why we are forced to use caching. And here is the most interesting thing. As I said, in the database we have two types of data: entities and relationships. And caching also works on a similar principle. Each object in the system is represented as an instance of the DataObject class, which can load itself from the database and save itself there. An object, of course, can be composite, that is, loaded and saved from / to several tables. This is a classic principle, used everywhere. At the same time, the object itself monitors whether it is cached or not and the relevance of its cache. Each object has dynamic properties that represent collections of other objects associated with it. For example, a song has properties artists and performers. These properties are initialized on demand, on first access.
From this point I want to tell about the key principle of caching collections.
Separating flies from cutlets, we store chains of identifiers of objects separately from the objects themselves, and this is why: there is only one data set and many representations. One song can relate to an album, a list of favorite songs of a particular user, get on the air and still be in many places.
For each such content chain, we generate a set of identifiers (of course, on demand), and put it in the cache for a certain period of time, since this data can be updated.
Using the example of getting the user's favorite songs, I will show how it works. We get to the list of favorite songs on the user page. This creates a user who takes the dynamic property
favorite_songs . When trying to get the value of this property, the collection of this user's favorite songs is initialized: whether the list of identifiers in the cache is checked, if not, it is read from the database. Next, there is a check for the presence of all the songs in the cache by these identifiers. A list is made of those that are missing, and using a special procedure, they are read from the database and put into the cache. After that, an array of songs is given above to business logic, which is already preparing their visual presentation in the form of a page.
The use of such an architecture has helped us greatly unload the database server, and for some time pushed the problem of its scaling (separation into several machines).
Related Links:
memcachedXCache - we use it, as it is the most productive in our case.
eAccelerator