📜 ⬆️ ⬇️

Translation: how gitLab uses unicorn and unicorn-worker-killer

I bring to your attention a translation of a small article in which GitLab engineers tell how their application works on Unicorn and what they do with the memory that flows. This article can be viewed as a simplified version of an article already translated by another author.

Unicorn


To handle HTTP requests from git and users, GitLab uses a Unicorn , Ruby server with prefork . Unicorn is a daemon written in Ruby and C that can download and run a Ruby on Rails application, in our case GitLab Community Edition or GitLab Enterprise Edition.

Unicorn has a multi-process architecture to support multi-core systems (processes can run in parallel on different cores) and for fault tolerance (an abnormally terminated process does not end GitLab). On startup, the main Unicorn process loads Ruby and Gitlab into memory, and then starts a number of worker processes that inherit this “initial” memory snapshot. The main Unicorn process does not process incoming requests — workflows do this. The network stack of the operating system receives incoming connections and distributes them to workflows.

In an ideal world, the main process runs the workflow pool once, which then processes incoming network connections until the end of the centuries. In practice, workflows may crash or be killed by a timeout. If the main Unicorn process finds that one of the worker processes has processed the request for too long, then it kills the process with SIGKILL ( kill -9 ). Regardless of how the workflow ended, the main process will replace it with a new one, which inherits the same “initial” state. One of the features of Unicorn is the ability to replace defective workflows without interrupting network connections with user requests.
')
An example of a workflow timeout, which can be found in unicorn_stderr.log . The main process id is 56227:

[2015-06-05T10:58:08.660325 #56227] ERROR -- : worker=10 PID:53009 timeout (61s > 60s), killing [2015-06-05T10:58:08.699360 #56227] ERROR -- : reaped #<Process::Status: pid 53009 SIGKILL (signal 9)> worker=10 [2015-06-05T10:58:08.708141 #62538] INFO -- : worker=10 spawned pid=62538 [2015-06-05T10:58:08.708824 #62538] INFO -- : worker=10 ready 


The basic Unicorn settings for working with processes is the number of processes and the timeout after which the process will be completed. A description of these settings can be found in this section of the GitLab documentation.

unicorn-worker-killer


GitLab has memory leaks. These leaks manifest themselves in long-running processes, in particular, in the workflows created by Unicorn (there are no such leaks in the main Unicorn process, since it does not process requests).

To combat these memory leaks, GitLab uses a unicorn-worker-killer that modifies Unicorn workflows to check memory usage every 16 requests. If the workflow memory used exceeds the set limit, the process ends and the main Unicorn process automatically replaces it with a new one.

In fact, this is a good way to deal with memory leaks, since the design of Unicorn allows you not to lose the user's request at the end of the workflow. Moreover, unicorn-worker-killer ends the process between processing requests, so it does not affect the work with them.

So in the file unicorn_stderr.log looks like a restart of the workflow due to a memory leak. As you can see, the process with the identifier 125918 after introspection decides to complete. The threshold memory value in this case is 254802235 bytes, that is, of the order of 250 megabytes. GitLab uses as a threshold a random number in the range from 200 to 250 megabytes. The main GitLab process with ID 117565 then creates a new workflow with ID 127549:

 [2015-06-05T12:07:41.828374 #125918] WARN -- : #<Unicorn::HttpServer:0x00000002734770>: worker (pid: 125918) exceeds memory limit (256413696 bytes > 254802235 bytes) [2015-06-05T12:07:41.828472 #125918] WARN -- : Unicorn::WorkerKiller send SIGQUIT (pid: 125918) alive: 23 sec (trial 1) [2015-06-05T12:07:42.025916 #117565] INFO -- : reaped #<Process::Status: pid 125918 exit 0> worker=4 [2015-06-05T12:07:42.034527 #127549] INFO -- : worker=4 spawned pid=127549 [2015-06-05T12:07:42.035217 #127549] INFO -- : worker=4 ready 


What else is striking when studying this log: the workflow processed only 23 requests before completing due to memory leaks. This is currently the norm for gitlab.com

Such a frequent restart of workflows on GitLab servers can be a cause of concern for system administrators and devops, but in practice this is often the normal behavior.

Source: https://habr.com/ru/post/270227/


All Articles