📜 ⬆️ ⬇️

[Translation] Envoy threading model (Envoy threading model)

Hi, Habr! I present to you the translation of the article "Envoy threading model" by Matt Klein.

This article seemed to me quite interesting, and since Envoy is most often used as part of “istio” or simply as “ingress controller” kubernetes, therefore most people do not have the same direct interaction with it as for example with typical Nginx or Haproxy settings. However, if something breaks, it would be good to understand how it is arranged from the inside. I tried to translate as much text as possible into Russian, including special words; for those who are pained to look at this, I left the originals in brackets. Welcome under cat.

Low-level technical documentation on the Envoy codebase is currently rather poor. To fix this, I plan to do a series of blog posts about various Envoy subsystems. Since this is the first article, please let me know what you think and what you might be interested in in the following articles.

One of the most common technical questions I get about Envoy is a request for a low-level description of the threading model used. In this post, I will describe how Envoy maps connections to threads, as well as a description of the Thread Local Storage system, which is used internally to make the code more parallel and high-performance.
')

Threading description (Threading overview)




Envoy uses three different types of threads:


Connection handling


As discussed briefly above, all worker threads listen to all listeners without any segmentation. Thus, the kernel is used to correctly send received sockets to worker threads. Modern kernels in general are very good at this, they use functions such as I / O priority (IO) boosting to try to fill the thread with work before starting to use other threads that also listen on the same socket, and not use cyclic blocking (Spinlock) to handle each request.

Once a connection is made on a worker thread (worker thread), it never leaves this thread (thread). All further processing of the connection is fully processed in the worker thread, including any forwarding behavior.

This has several important consequences:


What does non-blocking means mean?


The term “non-blocking” has so far been used several times when discussing how the main and worker threads work. All code is written with the proviso that nothing is ever blocked. However, this is not entirely true (which is not quite true?).

Envoy uses several long process locks:


Local thread storage (Thread local storage)


Because of the way Envoy separates the responsibilities of the main thread from those of the workflow, there is a requirement that complex processing can be performed in the main thread and then provided to each workflow with a high degree of parallelism. This section describes the high level Envoy Thread Local Storage (TLS) system. In the next section, I will describe how it is used to manage the cluster.



As already described, the main thread handles almost all management functions and control plane functionality in the Envoy process. The control plane here is a bit overloaded, but if we consider it within the framework of the Envoy process itself and compare it with the shipment that the workflows perform, this seems reasonable. As a general rule, the main thread process does some work, and then it needs to update each workflow according to the result of this work, and the workflow does not need to set a lock on every access .

The TLS (Thread local storage) Envoy system works like this:


Although this is a very simple and incredibly powerful paradigm that is very similar to the concept of RCU blocking (Read-Copy-Update). Essentially, workflows never see any data changes in TLS slots during job execution. The change occurs only during the rest period between work events.

Envoy uses it in two different ways:


Cluster update threading


In this section, I will describe how TLS (Thread local storage) is used to manage a cluster. Cluster management includes the processing of the xDS and / or DNS API, as well as health checking.



Cluster flow control includes the following components and steps:

  1. The cluster manager is a component within Envoy that manages all known upstream clusters, the CDS (Cluster Discovery Service) API, the Secret Discovery Service (SDS) and Endpoint Discovery Service (SDS) APIs, and active external checks health (health checking). He is responsible for creating a “eventually agreed” (eventually consistent) view of each upstream (upstream) cluster, which includes the detected hosts, as well as the health status.
  2. The health checker performs an active health check and reports changes in the health state to the cluster dispatcher.
  3. CDS (Cluster Discovery Service) / SDS (Secret Discovery Service) / EDS (Endpoint Discovery Service) / DNS are performed to determine cluster membership. The state change is returned to the cluster manager.
  4. Each worker thread continuously performs an event loop.
  5. When the cluster manager determines that the state for the cluster has changed, it creates a new read-only cluster snapshot and sends it to each workflow.
  6. During the next rest period, the worker thread will update the snapshot in the dedicated TLS slot.
  7. During an I / O event that the host must determine for load balancing, the load balancer will query the TLS slot (Thread local storage) for host information. No locks are required for this. Note also that TLS can also trigger events during an update, so load balancers and other components can recalculate caches, data structures, etc. This is outside the scope of this post, but is used in various places in the code.

Using the above procedure, Envoy can process each request without any locks (other than those described earlier). In addition to the complexity of the TLS code itself, most of the code does not need to understand how multithreading works, and it can be written in single-threaded mode. This makes it easier to write most of the code in addition to excellent performance.

Other subsystems using TLS (Other subsystems that make use of TLS)


TLS (Thread local storage) and RCU (Read Copy Update) are widely used in Envoy.

Examples of using:


There are other cases, but previous examples should provide a good understanding of what TLS is used for.

Known performance pitfalls


Although Envoy works quite well overall, there are several known areas that need attention when it is used with very high concurrency and throughput:


Conclusion


The Envoy threading model is designed to provide ease of programming and massive parallelism through the potentially wasteful use of memory and connections if they are not configured correctly. This model allows it to work very well with a very high number of flows and throughput.

As I briefly mentioned on Twitter, the design can also run on top of a full-featured network stack in user mode, such as the DPDK (Data Plane Development Kit), which can result in normal servers processing millions of requests per second with full L7 processing. It will be very interesting to see what will be built in the next few years.

One last quick comment: I was asked many times why we chose C ++ for Envoy. The reason still is that this is still the only widely spoken industrial language on which to build the architecture described in this post. C ++ is definitely not suitable for all or even many projects, but for certain use cases it is still the only tool to do the job (to get the job done).

Links to code (Links to code)


Links to files with interfaces and header implementation discussed in this post:

Source: https://habr.com/ru/post/449826/


All Articles