📜 ⬆️ ⬇️

Linux network application performance. Introduction

Web applications are now used everywhere, and among all transport protocols the lion's share is taken by HTTP. Studying the nuances of developing web applications, the majority devotes very little attention to the operating system, where these applications actually run. Separation of development (Dev) and operation (Ops) only worsened the situation. But with the spread of DevOps culture, developers are beginning to take responsibility for running their applications in the cloud, so it’s very useful for them to thoroughly get acquainted with the operating system backend. This is especially useful if you are trying to deploy a system for thousands or tens of thousands of simultaneous connections.

The limitations in web services are very similar to those in other applications. Whether they are load balancers or database servers, all of these applications have similar problems in a high-performance environment. Understanding these fundamental limitations and ways to overcome them in general will allow you to evaluate the performance and scalability of your web applications.

I am writing this series of articles in response to questions from young developers who want to become well-informed system architects. It is impossible to clearly understand the methods of optimizing Linux applications, not immersed in the basics, how they work at the operating system level. Although there are many types of applications, in this cycle I want to explore network applications, rather than desktop ones, such as a browser or text editor. This material is intended for developers and architects who want to understand how Linux or Unix programs work and how to structure them for high performance.

Linux is a server operating system, and most often your applications run on this OS. Although I say “Linux”, most of the time you can safely assume that all Unix-like operating systems are meant as a whole. However, I have not tested the accompanying code on other systems. So, if you are interested in FreeBSD or OpenBSD, the result may differ. When I try something Linux-specific, I point it out.
')
Although you can use this knowledge to create an application from scratch, and it will be perfectly optimized, but it is better not to do so. If you write a new C or C ++ web server for your organization’s business application, it may be your last day at work. However, knowledge of the structure of these applications will help in the selection of existing programs. You will be able to compare systems based on processes with systems based on threads as well as based on events. You will understand and appreciate why Nginx works better than Apache httpd, why a Tornado-based Python application can serve more users than a Django-based Python application.

ZeroHTTPd: a learning tool


ZeroHTTPd is a web server that I wrote from scratch in C as an educational tool. He has no external dependencies, including access to Redis. We run our own Redis routines. See below for details.

Although we could discuss the theory for a long time, there is nothing better than writing code, running it and comparing all the server architectures with each other. This is the most visual method. Therefore, we will write a simple ZeroHTTPd web server, applying each model: based on processes, threads and events. Let's check each of these servers and see how they work compared to each other. ZeroHTTPd is implemented in a single C file. The event-based server includes uthash , an excellent implementation of a hash table that is supplied in a single header file. In other cases, there are no dependencies, so as not to complicate the project.

The code has a lot of comments to help you figure it out. Being a simple web server in a few lines of code, ZeroHTTPd is also a minimal framework for web development. It has limited functionality, but it is capable of generating static files and very simple “dynamic” pages. I must say that ZeroHTTPd is well suited for learning how to create high-performance Linux applications. By and large, most web services are waiting for requests, check them and process them. This is exactly what ZeroHTTPd will do. This is a tool for learning, not for production. He is not good at handling errors and hardly boasts the best security practices (oh yeah, I used strcpy ) or abstruse stunts of the C language. But I hope he will cope well with his task.


The main page ZeroHTTPd. It can produce different types of files, including images.

Guestbook application


Modern web applications are usually not limited to static files. They have complex interactions with various databases, caches, etc. Therefore, we will create a simple web application called Guestbook, where visitors leave entries under their own names. In the guest book saved entries left earlier. There is also a visitor counter at the bottom of the page.


Guest book web application ZeroHTTPd

Visitor counters and guestbook entries are stored in Redis. Own procedures are implemented for communications with Redis, they do not depend on an external library. I'm not a big fan of rolling out homebrew code when there are generally available and well-tested solutions. But the goal of ZeroHTTPd is to study Linux performance and access to external services, while serving HTTP requests has a serious impact on performance. We must fully control the communication with Redis in each of our server architectures. In one architecture, we use blocking calls, in others, event-based procedures. Using the Redis external client library will not give such control. In addition, our little Redis client performs only a few functions (getting, setting and increasing the key; getting and adding to the array). In addition, the Redis protocol is extremely elegant and simple. He doesn’t even need to teach him. The fact that the protocol does all the work in about a hundred lines of code indicates how well-thought it is.

The following figure shows the actions of the application when the client (browser) requests /guestbookURL .


The mechanism of the guestbook application

When you need to issue a guestbook page, there is one call to the file system to read the template into memory and three network calls to Redis. The template file contains most of the HTML content for the page in the screenshot above. There are also special placeholders for the dynamic part of the content: records and visitor counters. We get them from Redis, paste them into the page and give the client the fully formed content. The third Redis call can be avoided because Redis returns the new key value when incremented. However, for our server with an asynchronous, event-based architecture, numerous network calls are a good test for training purposes. Thus, we discard the return value of Redis about the number of visitors and request it with a separate call.

ZeroHTTPd Server Architectures


We build seven versions of ZeroHTTPd with the same functionality, but different architectures:


We measure the performance of each architecture by loading the server with HTTP requests. But when comparing architectures with a high degree of parallelism, the number of queries increases. We test three times and consider the average.

Testing Methodology



Installation for load testing ZeroHTTPd

It is important that when performing tests all components do not work on the same machine. In this case, the OS incurs additional planning overhead, as the components compete for the CPU. Measuring the operating system overhead for each of the selected server architectures is one of the most important goals of this exercise. Adding more variables will be detrimental to the process. Therefore, the setting in the figure above works best.

What each of these servers does



All servers run on the same processor core. The idea is to evaluate the maximum performance of each of the architectures. Since all server programs are tested on the same hardware, this is the basic level for comparing them. My test setup consists of virtual servers rented from Digital Ocean.

What do we measure?


You can measure different indicators. We estimate the performance of each architecture in this configuration by loading servers with requests at different levels of parallelism: the load grows from 20 to 15,000 simultaneous users.

Test results


The following diagram shows the performance of servers on different architectures with different levels of parallelism. On the y axis - the number of requests per second, on the x axis - parallel connections.







Below is a table with the results.

requests per second
parallelismiterativeforkpre-forkstreamingpre-streamingpollepoll
20711221001800225019002050
50719022001700220020002000
100724522001700220021502100
200733023001750230022002100
300-38022001800240022502150
400-41022001750260020002000
500-44023001850270019002212
600-46024001800250017002519
700-46024001600249015502607
800-46024001600254014002553
900-46023001600247212002567
1000-47523001700248511502439
1500-4902400155026209002479
2000-3502400140023965502200
2500-2802100130024534902262
3000-280190012502502large scatter2138
5000-large scatter160011002519-2235
8,000--1200large scatter2451-2100
10,000--large scatter-2200-2200
11,000----2200-2122
12,000----970-1958
13,000----730-1897
14,000----590-1466
15,000----532-1281

From the graph and the table it is clear that above 8000 simultaneous requests, we have only two players left: pre-fork and epoll. As the load grows, a server based on poll works worse than a streaming one. The pre-threading architecture makes epoll worthy of competition: this is evidence of how well the Linux kernel plans a large number of threads.

ZeroHTTPd source code


ZeroHTTPd source code here . For each architecture, a separate directory.

  ZeroHTTPd
 │
 01── 01_iterative
 │ ├── main.c
 02── 02_forking
 │ ├── main.c
 03── 03_preforking
 │ ├── main.c
 04── 04_threading
 │ ├── main.c
 05── 05_prethreading
 │ ├── main.c
 06── 06_poll
 │ ├── main.c
 07── 07_epoll
 │ └── main.c
 Make── Makefile
 Public──public
 │ ├── index.html
 │ └── tux.png
 Templates── templates
     Guest── guestbook
         Index── index.html 

In addition to the seven directories for all architectures, there are two more in the top-level directory: public and templates. The first one contains the index.html file and the image from the first screenshot. You can put other files and folders there, and ZeroHTTPd should easily issue these static files. If the path in the browser corresponds to the path in the public folder, then ZeroHTTPd searches the index.html file in this directory. Content for the guest book is generated dynamically. It has only the main page, and its content is based on the 'templates / guestbook / index.html' file. Dynamic pages for extensions are easily added to ZeroHTTPd. The idea is that users can add templates to this directory and extend ZeroHTTPd as needed.

To build all seven servers, run make all from the top-level directory - and all builds will appear in this directory. Executable files look for public and templates directories in the directory from which they are launched.

Linux API


To understand the information in this series of articles, it is not necessary to understand the Linux API well. However, I recommend reading more on this topic, there are many reference resources on the Web. Although we will cover several categories of the Linux API, our focus will be mainly on processes, threads, events, and the network stack. In addition to books and articles about the Linux API, I also recommend reading mana for system calls and library functions used.

Performance and Scalability


One note about performance and scalability. Theoretically, there is no connection between them. You may have a web service that works very well, with a response time of a few milliseconds, but it does not scale at all. Similarly, there may be a poorly functioning web application that takes a few seconds to respond, but it scales to tens to handle tens of thousands of simultaneous users. However, the combination of high performance and scalability is a very powerful combination. High-performance applications generally save resources and, therefore, effectively serve more concurrent users on the server, reducing costs.

CPU and I / O tasks


Finally, in calculations there are always two possible types of tasks: for I / O and CPU. Getting requests via the Internet (network I / O), file serving (network and disk I / O), communication with the database (network and disk I / O) are all I / O actions. Some queries to the database may slightly load the CPU (sorting, calculating the average value of a million results, etc.). Most web applications are limited to the maximum possible I / O, and the processor is rarely used at full capacity. When you see that a lot of CPUs are used in some I / O task, this is most likely a sign of a poor application architecture. This may mean that CPU resources are spent on managing processes and context switching - and this is not entirely useful. If you do something like image processing, audio file conversion or machine learning, then the application requires powerful CPU resources. But for most applications this is not the case.

Learn more about server architectures.


  1. Part I. Iterative Architecture
  2. Part II. Fork servers
  3. Part III. Pre-fork servers
  4. Part IV. Run Threads Servers
  5. Part V. Servers with preliminary thread creation
  6. Part VI. Poll based architecture
  7. Part VII. Epoll-based architecture

Source: https://habr.com/ru/post/455212/


All Articles