Multiple directory caching steps

During the work on the workflow system, the task arose of caching directories used on the client side. The system was designed in the form of a three-link (DB - application server - client part), therefore there was a lot of room for imagination.
Initial conditions: several dozen directories, differing in size from several entries to several tens of thousands of entries in each. For most directories, each entry contains useful data (usually a string) and record identifier (integer).

Step one

The easiest way to access directories on the client is to cache them before the first access. And after finishing work with these directories, delete from memory. For example, each time you enter the window where this directory is required, it is loaded, and when the window is closed, it is deleted. The solution is simple, but the traffic between the client and the server is too large.

Step Two

Optimization suggests itself: do not receive all the data again when updating, if they have not been changed on the server. To do this, when downloading a directory from the server, you need to receive not only the directory itself, but also a label for its modification - a number that allows you to determine the “version” of the directory (timestamp).
At the moment the application server starts, this number is equal to one for each directory (zero is reserved, which is discussed below). With each modification of the directory on the server, one is added to the number. The fact that the timestamp is an integer makes it possible to dispense with a very simple thread-protected function - InterlockedIncrement.
When a client needs to receive a directory, it sends to the server the last timestamp known to the client. If it matches, the server sends 0 to the client and nothing else. Those. client reference book is relevant. If the label on the client and the server is different, then the server sends the label + new version of the directory to the client.
Rake in this case, there are two. The first rake is that the server should transfer the timestamp value to the client, which was before the directory was requested from the database. Otherwise, a situation may arise when the server sends old data and a new timestamp to the client. The second rake with the timestamp update: the server should update the directory timestamp only after the transaction has been successfully completed. This is important because the data will not be transferred to another client until the first transaction is closed (there are several clients — everyone can modify and request data at the same time).

Step Three

The method described above worked for us for a long time until another ambush showed up - on high-latency connections (when connected via the Internet, not on LAN), most of the time was waiting for a response from the server for each directory, and not to transfer the directories themselves. Directories before most of the actions had to be updated about a dozen, and in most cases when requesting an update, the server returns that no update is required. The situation turned out - the directory update takes microseconds, and waiting for a response from the server takes 10-100 milliseconds. As a result, the general update takes about a second (10 references of 100 milliseconds each).
The solution is the simplest. Send requests to the server at once about all the required directories in one package. Those. array of pairs <directory ID> + <last label of the directory known to the client> . The server sends the response in the form of an array <Directory ID> + <current label> + (if required) <Directory itself> .
This allowed users to work remotely without using a remote desktop.
')

Step Four

The number of users grew, the data was updated more and more often. It became a shame that we had to completely update reference books consisting of one hundred thousand entries, even if they had changed only by one entry.
Begs the decision. Store the directory change log on the application server — the ID of the added and deleted entries for each timestamp. At each change of the directory, record changes in the log (only the ID of the directory element and the type of change are stored in the log). And when the client sends its tag, the server looks at the log, starting with the tag sent and sends the added entries to the client (retrieving them from the database) and the ID of the deleted ones. The traffic between the client and the server, as well as the load on the database, is reduced hundreds of times.
UPD: The change log is stored in my case not in the database, but in the application server's RAM. This reduces the load on the database, although it makes operations more complex (as described below).
The main caveat here is that writing to the log occurs immediately when the directory is changed, then in the case of a rollback of the transaction, then it turns out that the changes are written to the log, but not actually made.
The solution was to keep the preliminary logs with each thread until the transaction is confirmed. Only after the transaction processing is completed, you can record a preliminary log in the main, accessible to all threads.

It would be great if the community prompts the following steps for optimization.

Source: https://habr.com/ru/post/144249/

All Articles

Multiple directory caching steps

Step one

Step Two

Step Three

Step Four

More articles: