The main goal of our work is to make IaaS simple and straightforward, even for those who have not experienced IT. Therefore, we carry out continuous optimization of all systems and talk about what we managed to do in our blog on Habré.
A couple of examples:
Today we decided to take a look at Western experience and briefly analyze the topic of scaling applications. We were attracted by the
leadership of Nate Berkopec, an expert in Ruby.
')
/ photo Juhan Sonin CC
Nate begins by talking about the complexity of the topic as a whole. There is a certain difference between those methods that are suitable for scaling from 10 to 1000 requests per minute, and those that use companies like Twitter to go to handle 600 requests per second.
Special thrust to scaling is usually compensated by the fact that at a certain stage you have problems with input / output in the database. Managing the number of processes, CPU and RAM load complicates scaling for those who develop Rails applications.
Here, Nate talks about using Memcached and Redis with RabbitMQ and Kafka. Basically, he applied similar solutions on the Heroku platform, which is generally suitable for solving a similar problem with scaling up to 1000 requests per minute.
He emphasizes that simply adding Dyno-containers on Heroku will not give the desired effect - the application will not be faster. These containers can only increase reliability. If we talk about AWS, the transition from one instance type to another (from T2 to M4) may already have an impact on application performance.
Some insights from Nate's
examples :
- Scaling can affect latency if you have a queue
- To solve the problem you need to understand the principles of the application server
- Heroku documentation does not explain all the nuances of routing HTTP requests
- A Ruby web application needs protection from a slow client and response
- You can not scale only on the basis of response time - here waiting time in a queue can play a role. The same applies to workers' hosts.
- The web application can work not with one, but with dozens of queues, which are formed in the load balancer, Heroku routers, multiprocess server and in the “main process”
- The Little's Law allows you to determine the number of required instances of an application as the product of the average number of requests per second by the average response time (in seconds). If you start scaling at 25% performance, it will be something in the spirit of a false start. Performance here is the ratio of the number of application instances to the average response time.
Query routing
Briefly about how this process works:
1. First of all, the request meets the load balancer - it distributes requests between the Heroku routers.
2. The router sends a processing request to a random Dyno container that is related to your application.
3. While the router is determined by which container to choose, the request is in its waiting queue. This process is similar to running nginx and sending health check requests.
Next, the server of your choice works with the request:
1. For Webrick - opens a connection with the router to download the request, then launches the response to the router and proceeds to the next request. This process completely loads the host and allows it to work with requests from other routers.
2. Thin works a little differently. It is event driven. It uses EventMachine, which is similar to the work of Node.js. Thin accepts the request in parts, but only after the request is fully downloaded, it sends it to the application. It is well suited for handling slow requests, but is not multi-tasking.
3. Unicorn is a single-threaded, but already multi-tasking web server. It is built on workers and listening to a single Unix socket. Here the situation is similar - the request must be fully loaded, so Unicorn is not very suitable for slow clients, like Webrick. While loading other connections can not be established. This problem is solved by query buffering, which Passenger handles.
4. Phusion Passenger 5 - it allows you to install something like nginx right before workers. It incorporates a reverse proxy server that protects workflows from slow clients and downloads. As the request loads, it is passed to HelperAgent, which distributes requests between workers. It is quite suitable for scaling. The alternative is Puma in cluster mode or Unicorn in conjunction with nginx.
PS A little about the work of our IaaS provider: