Thoughts on deploying web applications on a test server

Foreword

The following text is the result of practical experience and self-educational impulses of a person who does not have a systematic education in any of the areas about which he (that is, me) is taken to reason. Therefore, abstruse reasoning here will be interspersed with platitudes. Beat me for the first and ignore the second. For someone, they can be a revelation.

I will try to describe the ideal options for setting up a test web sever, although I understand what kind of mess they are usually going on. I will focus on the situation when I have to deploy often, that is, the project lives on the server in the active development stage or several projects at different stages. Projects are handled by different developers or teams, so projects need to be isolated from each other. But the server is internal, therefore such a degree of isolation and automation of administrative processes, as on servers for rent, is not needed.

I will focus mainly on using different versions of Python as the language of supported web applications. Although many things will certainly be true for other languages, for example, Ruby or Perl.

The Python program interacts with the web server using the WSGI protocol (reads like “wiskey”), described in PEP 3333 and its predecessor, PEP 333 .
')

Single-tier architecture

The advantage of a one-tier approach is the ease of configuration and modest memory consumption at the expense of flexibility and, in some cases, performance.

The most popular web server operating in such conditions, the real “Swiss army knife” of the modern web is Apache HTTPd (this name is often abbreviated to “Apache”). I will keep in mind the current versions of Apache: 2.2 and 2.4.

To connect WSGI applications to Apache directly, without the participation of second-level servers, a plug-in (in Apache terminology - module) mod_wsgi is used . WSGI is an advanced, Python-specific version of the CGI protocol. Technically, the Python interpreter (more precisely, CPython) is built into mod_wsgi as an external library: statically or dynamically — this is determined when the module is built. It is clear that only one version of CPython can be built into mod_python at the same time, just like Apache can address only one mod_wsgi module in one installation. Hence the limitation: in the one-tier architecture with mod_wsgi, we cannot use Python2 and Python3 on the server at the same time .

The choice of a multiprocessing module is essential for us. To efficiently distribute the load between different computational cores in multi-core (multiprocessor) computer configurations, the web server must be able to simultaneously process several requests, initiating the launch of several instances of the web application. There are two types of multiprocessing modules that perform the function described above in different ways.

prefork

Simplified execution of a CGI request by a web server in a UNIX-like environment looks like this: upon receiving a request, the server spawns the CGI application process, passing request parameters to it in the environment variables, and the body in the standard input stream. The very first optimizations were caused by the fact that the process spawned long enough, which increases the response time to the query. In order not to waste time on launching a CGI program (in our case, the Python interpreter embedded in mod_wsgi), it is enough to start it in advance and keep it in suspension without closing the input stream until the server receives a request. If the server can execute several code streams at the same time, it makes sense to start several CGI handlers and ensure that new ones are launched as they are completed. The number (pool size) of running and waiting handlers can be configured. Having processed one request, the handler process can accept the next. If there are too few requests, the server will kill the unnecessary idle processes. The server also kills processes for preventive purposes, after the number of requests processed by them reaches a certain (also customizable) limit.

This is exactly how the mod_prefork and mod_itk modules work. The latter is the most interesting for us, as it is able to serve each virtual host on behalf of the user set in the settings of this virtual host. But more on that later.

worker

The second way to “write off” the costs of spawning a CGI process, and at the same time reducing the overall load on the server, is to use one process to simultaneously serve several requests, in other words, in multi-threaded mode.

The described strategy is implemented by the mpm_worker and mpm_event modules. The latter also relieves the main thread of processing a request from certain accounting actions and so far has experimental status.

Worker has long been considered unstable and unsafe - not by itself, but due to the fact that many other Apache modules were not adapted to a multi-threaded mode of operation or had errors in its implementation. But over time, these problems were solved, and now the worker is safe enough to use in action.

It must be said that due to such a feature of CPython as GIL, mod_wsgi was rather easily adapted to multithreaded work, however this same feature does not allow to get the full benefit from multithreading. GIL guarantees that within the framework of one process all threads of the Python code will be executed sequentially quantum by quantum, which creates only the illusion of simultaneous execution. But this does not apply to the C code that runs (or runs from) Python, but parsing the request parameters, mapping the address to a WSGI handler, etc. — all this is done in C code. Accordingly, a certain gain in performance from the use of mpm_worker sites written in Python still have.

Two-tier architecture

The two-tier architecture assumes sharing of web servers of different levels: external (front end) and internal (back end).

An external server serves as an intermediary between the user and internal servers using “reverse proxy” -like protocols and plug-ins. In addition, the external server does what the HTTP protocol was created for in 1990: it gives the client static files (assets). Although recently this function in the production environment is often transferred to CDN services.

Apache can also be used as an external server, but lightweight nginx or, under limited resources, lighttpd are more suitable for this role. In this role, the server requires only a quick response to requests and modesty regarding the resources consumed.

The internal server usually depends on the architecture and language of the application implementation. Python applications often use uWSGI , Green Unicorn or their own servers based on multi-threaded network frameworks such as twisted or Tornado. Many web frameworks also include web servers that are more or less usable in combat conditions. For example, the web server built into Django is designed exclusively for debugging, and should be replaced in production with something more serious. The built-in web server CherryPy , on the contrary, is optimized for combat load.

Special requirements for the test server (CI server)

File Access Issue

The network architecture in POSIX systems assumes that only privileged users have access to ports 1-1024. Ports 80 (HTTP) and 443 (HTTPS) are subject to this limitation, so the parent process of the web server (external, if we are talking about a two-tier architecture), always has root-rights.

But it is undesirable to give the CGI script unnecessarily wide rights for security reasons. Therefore, when working with scripts and data, the web server always reduces privileges to the level of a regular user. Such a user in Apache is set by the environment variables APACHE_RUN_USER and APACHE_RUN_GROUP. On Debian and its derivative distributions, this user is named 'www-data'. To further enhance the security of the system, the user www-data, like other service users, does not have a shell and cannot log in to the system.

If the web application is installed on the server once and will not be changed in the future (with the exception of the files uploaded by the user), then it would be logical to log in to the server as a user with high privileges, deploy the files to the appropriate directory ('/ var / www' in Debian) and change the ownership of these files:

# chown -R www-data:www-data /var/www/

It is clear that for the development server this method is inconvenient and raises concerns due to the constant use of the privileged account. The question of the ideal permissions to directories and web server data with a detailed answer can be found here . The article got into the list of canonical questions on Serverfault, and that means something.

The essence of the article is as follows:

if there is one responsible for the deployment of the site, you need to make it the user-owner of all the files and directories of the site, and the group-owner to make 'www-data', set the access codes to 750 for directories (770 for directories into which site users can download files) and 640 for files (660 for files recorded by the server). umask in startup scripts must be set to 027,
if the site must be accessed by several people, you must make the root user the owner group will be the group in which all developers will be, and the web server will get access to these files at the expense of the lower three bits,
or set the owner of the data www-data, and the group-owner - a group of developers, but then you will need to change its owner after creating each file. The other responders mention the possibility of setting the suid and guid bits on the directories so that the owner of the created object is not the creator, but the owner of the directory in which the object was created. Such a non-standard way of using these bits was transferred to Linux from * BSD, and I’m not even sure if it will work if the files are created by a web server,
although ideally, the development team should use automatic deployment tools ( Puppet , Chef ) or CI environment ( Jenkins ) to reduce the situation to one responsible user, with the only difference that this user will be a bot,
as a result, it is concluded that complete isolation of virtual hosts with the help of setting access rights is impossible to achieve: the vulnerability of one site will allow an attacker at least to read the code of all web applications on this server.

What is not mentioned in the discussion is the furious complexity of all these manipulations with chmod and chown, and, as a result, a high probability of human errors leading to the occurrence of vulnerabilities.

Everything becomes much simpler and safer if you use the means of privilege separation available in Apache, or switch to a two-tier architecture.

Apache Privilege Splitter

mpm_itk

The mpm_itk module enables us to determine in the simplest and most natural way which user will run our web application:

 <VirtualHost *:80> <IfModule mpm_itk_module> AssignUserID johnd developers </IfModule> … </VirtualHost>

mod_suexec

If for some reason we are forced to use another multiprocessor module, then mod_suexec will come to the rescue . Its disadvantage can be considered perhaps the relative difficulty in setting up, but this is only if we want to install it from source. In distributions everything is automated. And the use of mod_suexec is again reduced to a single directive in the virtual host configuration:

 <VirtualHost *:80> <IfModule mod_suexec.c> SuexecUserGroup johnd developers </IfModule> … </VirtualHost>

Two-tier architecture on the test server

Often the two-tier architecture is used to optimize server performance. Developers are more concerned with the convenience of deploying a web application, rather than the speed of its work. Using a two-tier architecture on a test server also has certain advantages.

The advantages include the fact that the developer is responsible for installing and configuring the internal server, he also determines its optimal parameters and the launch method. The internal server works with developer privileges, and the question of the optimal setting of access rights to program files and project directories is no longer worth it. Host logs are also fully at the disposal of the developer. A web application can use the version of Python that is specified using virtualenv, regardless of other projects. All that a server administrator needs to take care of is directories with statics and a UNIX socket or an unprivileged port through which the external and internal servers will talk.

The developer is also responsible for the automatic launch of the server and its uninterrupted operation. If the first is easily solved by the command “crontab -e”, then the second can be provided with the help of supervisor .

Web server restart

When using Apache as a single server, add the “ MaxRequestsPerChild 1 ” directive to the settings. With it, the developer will not need to restart the server. For each request, a new instance of the interpreter will be launched, therefore, all changes in the site code will immediately be reflected in its work.

Of course, a ban on reusing the request handler will reduce server performance, but otherwise you will have to restart the server every time the code changes. There are two ways to restart Apache: WSGI-specific and generic.

WSGI-specific way is to update the date of editing the WSGI script. This can be done with the touch command. This method works only when mod_wsgi is configured to work in daemon mode. (In general, it is highly desirable to configure mod_wsgi to work in daemon mode, although this mode is turned off by default.)
The common method — the “apachectl restart” command or a similar one — requires root rights. Of course, you can get away with the appropriate sudo setting, but using sudo is in itself harder and more dangerous than it might seem. Therefore, this method is undesirable.

If a two-tier architecture is used, then only the internal server needs to be rebooted. It works on behalf of the same user on whose behalf the deployment is being performed, therefore, there are no problems with its reboot (or setting up to work without the need for a reboot).

findings

The two-tier web server architecture is optimal for a test server used by several developers or development teams within the same enterprise. It will release the server administrator from a large amount of routine, developers will give more freedom in choosing the means of implementing their projects, will allow more efficient use of hardware resources. At the same time, ease of development can sometimes be achieved at the cost of reduced productivity. But even if the organization is so constrained in funds that it is forced to have test and combat projects on one server, security and a fairly good level of performance are quite achievable in this case.

Source: https://habr.com/ru/post/266473/

All Articles