Updates on the fly (zero-downtime deployment) in general and in Ruby on Rails

First, let's deal with the definitions. By updating on the fly, we mean a system update that does not disrupt its regular work: customers work, visitors go and no one observes errors, an increased response time, or an “ACCOUNT” sign.

Why do you need it? If you ask this question - you do not need. Hang a sign, sit down to lunch.

How it's done? Complicated. Why? There are two main reasons:
- you can not update the system instantly and atomically (that is, exactly between two HTTP requests). With a naive approach, users will notice at least a long response time, or even an error, if, for example, the database is updated, but the code is not yet;
- The state and configuration of the system exist on the client and on the server. Examples: data in the session, the names of the form fields, addresses in the links, the state in javascript on the page opened by the user.

Common decision

In general, the solution can be formulated as follows: it is necessary to ensure that the code of version N + 1 is compatible with the state of versions N and N + 1, then update the state to N + 1.
')
In practice, such compatibility and results in a huge number of (obvious and not very) difficulties. Let us examine typical cases in an application on Ruby On Rails.

DB schema change

Adding a field to a table is theoretically compatible with the previous version of the code. Practically, too, if there is no particularly evil meta-programming.

Deleting a field has an obvious incompatibility in case the old code uses this field, and is not obvious, in any case: ActiveRecord caches the list of fields and lists all the fields, for example, in INSERT requests. Output: first update the code to an intermediate one, which a) will not use the field to be deleted; b) will itself delete this field from the cache, then update the database, then update the code to the final one.

Renaming a field is a bit more complicated:
- create a field with a new name
- update the code to an intermediate one, which a) reads data from both (old and new) fields b) writes data to both fields
- we migrate data from the old field to the new
- it remains to correctly remove the old field, see the previous paragraph.

Adding and deleting indexes is compatible with the previous version of the code, if a) not using hints with explicit indication of indexes b) removing the index does not slow down the execution of the old code.

When changing the semantics of data, it is difficult to isolate any common cases, since it all depends on the application domain. The only, probably, simple and typical case - changing the field type - is performed in the same way as the renaming.

Change client-server interaction

Changing the names of form fields or making a more significant change will have to be processed with an additional code (most likely in the controller), which can accept as input the field values from the old form and from the new one. Browser windows can remain open for a long time, so you have to leave this code in the application for a while.

Changing the semantics of the data in the session and the cookies will also have to be processed by separate code that understands both formats. Sessions live a long time, cookies even longer. You do not want to lose the data basket of the buyer or force him to enter the login password again? (Habr, shame on you!)

Changing the addresses of various application pages / actions should always be backward-compatible. Leave old routes, assign redirects to them, whatever. URLs in a web application should be the most stable part of the system: this is your public API, which is used by your users and search engines that bring your users. Thus, you will have no problems in the part described by this article.

In the case of using the assets pipeline, it is not necessary to remove the assets of the previous code version. It's simple.

Restart

Code compatibility is not everything. How do you enter the new code in the work? How many web or app servers do you have? Consider the options.

If you have more than one server hidden behind the balancer, you can no longer read - you already know everything :) For the rest, everything is also quite obvious: on a ~~dark night,~~ choose the time of the smallest load on the system and update each server in turn, removing it from under the balancer at the time of the update.

If you have one server on Passenger behind Nginx or Apache httpd, you will have to move to Unicorn. Even Passenger 3, in which zero-downtime restart is declared, makes it rather naive: first it kills old workers, then it makes new ones. As a result, visitors get a great response time, in fact, no less than the start time of your application.

Using Unicorn, we can reproduce the script for several servers, but “in miniature”. In before_fork you need to send a TTOU signal to the old master process; in this case, each new worker will turn off the old one. At the end, you need to send the old QUIT master, that's all. If you have enough memory for double the number of workers, you can make it easier and remove the old processes not gradually, but immediately - at the end of the restart.

Tip: use the preload_app true option preload_app true , even if you are not at the ruby enterprise edition - otherwise you will learn too late that new workers fall at the start due to an error.

Conclusion

Think again: do you really need all this? Right? Maybe it’s still just to insert a fresh issue of ~~+100500~~ TED into the blank page, run cap deploy and go for tea? Oh yeah ... users, sales, profits ...

Source: https://habr.com/ru/post/145793/

All Articles