Web application load scaling

With the growing popularity of a web application, its support inevitably begins to demand more and more resources. The first time with the load you can (and, undoubtedly, need) to fight by optimizing the algorithms and / or architecture of the application itself. However, what to do if everything that could be optimized is already optimized, and the application still cannot cope with the load?

Optimization

First of all, you should sit down and think about whether you have already managed to optimize everything:

Are database queries optimal (EXPLAIN analysis, index usage)?
Is the data stored correctly (SQL vs NoSQL)?
Is caching used?
Are there any excessive requests to a file system or database?
Are the data processing algorithms optimal?
Are the environment settings optimal: Apache / Nginx, MySQL / PostgreSQL, PHP / Python?

A separate article can be written about each of these points, so their detailed consideration within the framework of this article is clearly redundant. It is only important to understand that before proceeding with the scaling of the application, it is highly desirable to optimize its operation as much as possible — indeed, then no scaling is possible and not required.

Scaling

And so, let's say that optimization has already been done, but the application still does not cope with the load. In this case, the solution of the problem, obviously, can serve as its separation across several hosts, in order to increase the overall performance of the application by increasing the available resources. This approach has the official name - “scaling” (scale) of the application. More precisely, under the " scalability " (scalability) is called the ability of the system to increase its performance by increasing the amount of resources allocated to it. There are two ways to scale: vertical and horizontal. Vertical scaling means an increase in application performance when adding resources (processor, memory, disk) within a single node (host). Horizontal scaling is typical for distributed applications and implies an increase in application performance when adding another node (host).

It is clear that the simplest way is to simply upgrade the hardware (processor, memory, disk) - that is, vertical scaling. In addition, this approach does not require any modifications of the application. However, the vertical scaling very quickly reaches its limit, after which the developer and the administrator have no choice but to switch to horizontal scaling of the application.
')

Application architecture

Most web applications are a priori distributed, since their architecture can be divided into at least three layers: a web server, business logic (application), data (database, statics).

Each of these layers can be scaled. Therefore, if in your system the application and the database live on the same host - the first step, of course, should be their separation across different hosts.

Bottleneck

Starting to scale the system, the first thing is to determine which of the layers is the “bottleneck” - that is, it works slower than the rest of the system. For starters, you can use commonplace top (htop) utilities to estimate CPU / memory consumption and df, iostat to estimate disk consumption. However, it is desirable to select a separate host, with emulation of the combat load (using AB or JMeter ), on which it will be possible to profile the operation of the application using such utilities as xdebug , oprofile, and so on. To identify narrow queries to the database, you can use utilities like pgFouine (it is clear that it is better to do this on the basis of logs from the combat server).

Usually, everything depends on the application architecture, but the most likely candidates for a “bottleneck” in general are DB and code. If your application works with a large amount of user data, then the bottleneck, respectively, is likely to be static storage.

DB Scaling

As mentioned above, the database is often the bottleneck in modern applications. Problems with it are usually divided into two classes: performance and the need to store large amounts of data.

You can reduce the load on the database by splitting it into several hosts. In this case, there is an acute problem of synchronization between them, which can be solved by implementing the master / slave scheme with synchronous or asynchronous replication. In the case of PostgreSQL, synchronous replication can be implemented using Slony-I , asynchronous - PgPool-II or WAL (9.0). The problem of separating read and write requests, as well as load balancing between existing slaves, can be solved by setting the special database access layer (PgPool-II).

The problem of storing a large amount of data in the case of using relational databases can be solved using the partitioning mechanism in PostgreSQL, or by deploying databases on distributed Hadoop DFS file systems.

You can read about both solutions in the excellent PostgreSQL customization book .

However, for storing large amounts of data, the best solution would be “ sharding ” the data, which is a built-in advantage of most NoSQL databases (for example, MongoDB ).

In addition, NoSQL databases generally work faster than their SQL brothers due to the lack of overhead to parse / optimize the query, check the integrity of the data structure, etc. The topic of comparison of relational and NoSQL databases is also quite extensive and deserves a separate article .

We should also mention the experience of Facebook, which is used by MySQL without JOIN-samples. Such a strategy allows them to scale the database much easier, transferring the load from the database to the code, which, as will be described below, is scaled easier than the database.

Code scaling

The difficulty with scaling code depends on how many shared resources the hosts need to run your application. Will it be just sessions, or will you need a shared cache and files? In any case, the first thing you need to run copies of the application on multiple hosts with the same environment.

Next, you need to configure load balancing / requests between these hosts. This can be done both at the TCP ( haproxy ) level and on HTTP ( nginx ) or DNS .

The next step is to make the static, cache, and web application sessions available on each host. For sessions, you can use a server running over a network (for example, memcached ). As a cache server, it is quite reasonable to use the same memcached, but, of course, on a different host.

Static files can be mounted from some shared file storage via NFS / CIFS or use distributed FS ( HDFS , GlusterFS , Ceph ).

You can also store files in the database (for example, Mongo GridFS ), thereby solving the problems of accessibility and scalability (given that for NoSQL database scalability problem is solved by sharding).

Separately, it is worth noting the problem of deployment to several hosts. How to make it so that the user, clicking "Refresh", did not see different versions of the application? The simplest solution, in my opinion, is to exclude from the load balancer configuration (web server) un-updated hosts, and to enable them sequentially as they are updated. You can also bind users to specific hosts by cookie or IP. If the update requires significant changes in the database, the easiest way is to temporarily close the project altogether.

FS scaling

If it is necessary to store a large amount of static, two problems can be identified: lack of space and speed of access to data. As it was already written above, the problem with lack of space can be solved in at least three ways: distributed FS, storing data in the database with sharding support and organizing sharding "manually" at the code level.

It should be understood that the distribution of statics is also not the easiest task when it comes to high loads. Therefore, it is quite reasonable to have a lot of servers intended for the distribution of statics. At the same time, if we have a common data storage (distributed FS or DB), while saving the file we can save its name without taking into account the host, and substitute the host name randomly when forming the page (randomly balancing the load between web servers that distribute the statics) . In the case where the sharding is implemented manually (that is, the logic in the code is responsible for choosing the host to which the data is uploaded), the information about the fill host must either be calculated based on the file itself, or generated based on the third data (user information, amount of space on storage disks) and saved with the file name in the database.

Monitoring

It is clear that a large and complex system requires constant monitoring. The solution, in my opinion, is the standard one - zabbix, which monitors the load / operation of the system nodes and monit for daemons for backup.

Conclusion

Above we have a brief overview of many solutions to the problems of scaling a web application. Each of them has its own advantages and disadvantages. There is no recipe for how to do everything well and right away - for each task there are a lot of solutions with their own advantages and disadvantages. Which one to choose is up to you.

Source: https://habr.com/ru/post/113992/

All Articles