📜 ⬆️ ⬇️

Cache it: increase the stability of the ONLYOFFICE server with Redis

The main task that we set for ourselves when working on ONLYOFFICE Enterprise Edition was to increase stability. Mono helped us a lot when designing an office for Linux (we have already written a little about this), but at the same time caused us a lot of concern. It was with him was connected with such a problem as the fall of http web servers.

The situation, of course, is not the most pleasant, so we decided to err and run not one server, but two. In normal mode, they work in parallel, and when problems begin, they insure each other: one falls, the other, respectively, takes all the responsibility for what is happening. But there was a problem with the synchronization of server caches, to solve which we needed Redis.

Next, we will talk a little about how we started working with Redis and what came of it.
')


How things are arranged



Let's start from the beginning: ONLYOFFICE runs on the nginx http-server and the fastcgi-mono-server4 module for running .net applications.

For load balancing, we use the nginx ngx_stream_upstream_module module, which is configured using the upstream and fastcgi_pass directives as follows:

upstream fastcgi_backend { server 127.0.0.1:9000; server 127.0.0.1:9001; keepalive 64; } server { listen 80; location / { ... fastcgi_pass fastcgi_backend; ... 

That is: two fastcgi-mono-server4 servers at the addresses 127.0.0.1:9000 and 127.0.0.1:9001 and nginx (thanks to him!) Are simultaneously starting up to balance the load.

In this case, all incoming requests are distributed equally between the two servers: the first request is sent to the first server, the second request to the second server, the third - again to the first server, etc.

To speed up the work used caching. In the cache, we carefully store the most frequently used data. These include, for example, users, groups, user group membership relationships, access rights, portals lists, alert subscriptions, billing information and quotas, settings, and much more.

Now the problem



As we mentioned above, nginx distributes requests in a circle to different servers. This and the fact that in our free version there is no possibility to stick a session to a specific ip-address (user), led to numerous errors due to the mismatch of caches.

For example, the user decided to start everything from a clean slate and change his name on the portal. The change request came to the first server, where a new name of our hero was saved in the database. Information about the event has also been located in the cache of the first server. That is, when the next request comes to the second server, it turns out that he is simply not aware of the changes. The user will again see his old name and realize that the new life has not begun. And all this is due to the fact that the information in the cache of the second server has not been changed. The update, of course, will happen and the second server will know everything, but only after a few minutes after the cache is synchronized with the data from the database.

And here he is, Redis



Actually, to solve this problem, we decided to use the Redis project, which acted as a common cache for our ONLYOFFICE servers. Here, of course, there were some difficulties too. Further we will tell what and how we overcame them.

Main difficulties



Issue with StackExchange.Redis Nuget Package

What was wrong : The Nuget-package StackExchange.Redis we chose to work with Redis refused to cooperate with Mono. After installing it, we successfully launched the .net application. Under Mono, there was a constant error:

It was not possible to connect to the redis server (s); to create a disconnected multiplexer, disable AbortOnConnectFail. SocketFailure on PING

As they decided : everything is simple. Collected StackExchange.Redis.dll from source under Mono.

Increase page response time

What was wrong : Pages with a large number of cache accesses (everything related to user lists, groups and access rights), as well as pages where cached data occupy a significant amount (a list of all portals), became less responsive to our requests. Still, their data is stored in the memory of another process or even on another physical machine, and access to it is through network sockets, which is much slower than accessing the memory of its process.

As decided : On critical for the number of hits and the size of these areas, local caches, as before, remained, however we screwed a small warning system there.

We added code to all methods for changing data cached by this algorithm, which sent information about changed data using the mechanism of publish / subscribe redis. Further, all servers that received the alert synchronized the specified objects with the database.

The problem with long operations

What was wrong : When a user initiates a file operation, a task is created that is queued. Then, any thread freed from the previous task starts its execution. The number of threads is limited from above to avoid overloading the server with file operations (this functionality was implemented based on a specialized version of TaskScheduler).

During the execution of file tasks, the user is shown the progress of the progress, and upon completion of the task, information about success or error is displayed. This is implemented using periodic polling of the status of file operations through the API module of documents, with all the information about the progress of the operation stored in the local cache of the process. But when several ONLYOFFICE servers were started, some of the requests for the status of operations were sent to the server other than the one on which the current file operation was running. This led to errors in displaying the status of operations for the user (tasks suddenly appeared, then disappeared).

As decided : We have created a distributed task manager. It stores the status of running tasks in Redis, limits the number of worker threads, synchronizes the status of tasks between servers and removes completely hopelessly hung ones.

By the way, we took the standard System.Threading.Tasks tasks from Microsoft .NET 4.0 as a basis.

Problem with session providers

What was wrong : We also decided to render sessions in Redis and chose the native session provider for Redis from Microsoft RedisSessionStateProvider for this. Actually, it is used and maintained by Microsoft for the Azure platform, so you can put it on your server, but only at your own peril and risk. As it turned out, it works fine under Windows. But not with Mono: problems with stable work immediately arose, with a small load, the provider fell from a NullReferenceException. It was decided to try a third-party provider. But here the same NullReferenceException occurred.

As decided : We started to understand the internal structure of session providers and how their processing in ASP.NET under .NET differs from Mono. It turned out that if the request comes without cookies, then the session id in the version for Mono will be null, and the providers do not expect such a dirty trick. We wrote a pull-request with the addition of checking the session id to null.

Of the two session providers, third-party seemed to us faster and more lightweight, so we chose it. To connect it, it was necessary to replace it with a standard provider in Web.config.

 <sessionState mode="Custom" customProvider="RedisSessionStateProvider"> <providers> <add name="RedisSessionStateProvider" type="RedisSessionProvider.RedisSessionStateStoreProvider, RedisSessionProvider" /> </providers> </sessionState> 

In addition, it was necessary to add some code at the start of the application:

 var configuration = RedisCachingSectionHandler.GetConfig(); RedisConnectionConfig.GetSERedisServerConfig = (HttpContextBase context) => { if (configuration.RedisHosts != null && configuration.RedisHosts.Count > 0) { var host = configuration.RedisHosts[0]; return new KeyValuePair<string, ConfigurationOptions>("DefaultConnection", ConfigurationOptions.Parse(String.Concat(host.Host, ":", host.CachePort))); } return new KeyValuePair<string, ConfigurationOptions>(); }; 

Results



Here we just briefly say that we are satisfied with the work done and its results. Thanks to Redis, we managed to increase both server resiliency and its scalability, which is simply vital for the corporate server version, which implies a large (moreover, constantly growing) number of users.

Plans to replace Mono with a new cross-platform version of ASP.Net. While we are looking at each other.

Source: https://habr.com/ru/post/276395/


All Articles