Containers, microservices and service meshes

There are a lot of articles on the service mesh on the Internet, and here's another one. Hooray! But why? Then, I want to express my opinion that it would be better if service meshes appeared 10 years ago, before the advent of container platforms such as Docker and Kubernetes. I do not claim that my point of view is better or worse than others, but since service meshes are quite complex animals, a multiplicity of points of view will help to better understand them.

I will talk about the dotCloud platform, which was built on more than a hundred microservices and supported thousands of applications in containers. I will explain the problems that we encountered in its development and launch, and how the service meshes could help (or could not).

DotCloud history

I already wrote about the dotCloud history and the choice of architecture for this platform, but I didn’t talk much about the network level. If you do not want to dive into reading the previous article about dotCloud, then briefly this is the point: it is a platform-like PaaS service that allows clients to run a wide range of applications (Java, PHP, Python ...), with support for a wide range of data services (MongoDB, MySQL, Redis ...) and a workflow like Heroku: you upload your code to the platform, it builds the container images and deploys them.

I'll tell you how the traffic was directed to the dotCloud platform. Not because it was especially cool (although the system worked well for its time!), But primarily because with the help of modern tools such a design can be easily implemented by a modest team in a short time if they need a way to route traffic between a bunch of microservices or a bunch of applications. Thus, you can compare the options: what happens if you develop everything yourself or use an existing service mesh. Standard choice: make yourself or buy.
')

Traffic routing for hosted applications

Applications on dotCloud can provide HTTP and TCP endpoints.

HTTP endpoints are dynamically added to the Hipache load balancer cluster configuration. This is similar to what Ingress resources at Kubernetes and a load balancer like Traefik are doing today .

Clients connect to HTTP endpoints through their respective domains, provided that the domain name points to dotCloud load balancers. Nothing special.

TCP endpoints are associated with a port number, which is then passed to all containers of this stack through environment variables.

Clients can connect to TCP endpoints using the appropriate host name (something like gateway-X.dotcloud.com) and the port number.

This hostname resolves to the “nats” server cluster (not related to NATS ), which will route incoming TCP connections to the correct container (or, in the case of load-balanced services, to the correct containers).

If you are familiar with Kubernetes, this will probably remind you of the NodePort service.

On the dotCloud platform, there was no equivalent of ClusterIP services: for simplicity, access to services was the same from both the inside and the outside of the platform.

Everything was organized quite simply: the initial implementations of the HTTP and TCP routing networks were probably just a few hundred Python lines. Simple (I would say naive) algorithms that have been developed with the growth of the platform and the emergence of additional requirements.

Extensive refactoring of existing code was not required. In particular, 12-factor applications can directly use the address obtained through environment variables.

How is this different from the modern service mesh?

Limited visibility . We didn’t have any metrics for TCP routing mesh at all. As for HTTP routing, in later versions detailed HTTP metrics with error codes and response time appeared, but modern service meshes go even further, providing integration with metrics collection systems like Prometheus, for example.

Visibility is important not only from an operational point of view (to assist in troubleshooting), but also when new features are released. Speech about safe blue-green and Deploy deploe Canaries .

Routing performance is also limited. In the dotCloud routing grid, all traffic had to go through a cluster of dedicated routing nodes. This meant a potential crossing of several AZ boundaries (availability zones) and a significant increase in latency. I remember how to fix problems with code that made more than a hundred SQL queries per page and for each query opened a new connection to the SQL server. When you start the page locally, it loads instantly, but in dotCloud the download takes several seconds, because each TCP connection (and subsequent SQL query) takes tens of milliseconds. In this particular case, the problem was solved by permanent connections.

Modern service meshes better cope with such problems. First of all, they verify that connections are routed at the source . The logical flow is the same: → → , but now the mesh works locally and not on remote nodes, so the → connection is local and very fast (microseconds instead of milliseconds).

Modern service meshes also implement smarter load balancing algorithms. By controlling the performance of backends, they can send more traffic to faster backends, which leads to an increase in overall performance.

Security is also better. The dotCloud routing grid worked entirely on EC2 Classic and did not encrypt traffic (assuming that if someone managed to put a sniffer on EC2 network traffic, you already have big problems). Modern service meshes transparently protect all our traffic, for example, with mutual TLS authentication and subsequent encryption.

Traffic Routing for Platform Services

Well, we discussed traffic between applications, but what about the dotCloud platform itself?

The platform itself consisted of approximately one hundred microservices responsible for various functions. Some accepted requests from others, and some were background workers who connected to other services but did not accept connections. In any case, each service must know the end points of the addresses to which they need to connect.

Many high-level services can use the routing grid described above. In fact, many of the more than hundreds of dotCloud microservices were deployed as normal applications on the dotCloud platform itself. But a small number of low-level services (in particular, that implement this routing grid) needed something simpler, with less dependencies (since they could not depend on themselves for work — the good old problem of chicken and eggs).

These low-level, critical services were deployed by running containers directly on several key nodes. At the same time, standard platform services were not involved: linker, scheduler and runner. If you want to compare with modern container platforms, it’s like running the control plane with the docker run directly on the nodes, instead of delegating the Kubernetes task. This is quite similar to the concept of static modules (pods) that kubeadm or bootkube uses when booting an autonomous cluster.

These services were displayed in a simple and rude way: their names and addresses were listed in the YAML file; and each client had to take a copy of this YAML file for deployment.

On the one hand, this is extremely reliable, because it does not require the support of an external keystore / value store, such as the Zookeeper (do not forget, at the time, etcd or Consul did not exist yet). On the other hand, this made it difficult to move services. Each time you move, all clients should receive an updated YAML file (and potentially reboot). Not very comfortable!

Subsequently, we began to implement a new scheme, where each client connected to a local proxy server. Instead of the address and port, it is enough for him to know only the port number of the service, and connect through localhost . The local proxy server handles this connection and forwards it to the actual server. Now, when moving the backend to another machine or scaling, instead of updating all clients, you only need to update all these local proxies; and rebooting is no longer required.

(It was also planned to encapsulate traffic in TLS-connections and put another proxy server on the receiving side, as well as check TLS certificates without the participation of the receiving service, which is configured to accept connections only on localhost . More on this later).

This is very similar to Airbnb's SmartStack , but the significant difference is that SmartStack is implemented and deployed to production, while the internal dotCloud routing system was removed to the box when dotCloud turned into a Docker.

I personally consider SmartStack to be one of the predecessors of such systems as Istio, Linkerd and Consul Connect, because they all follow the same pattern:

Run a proxy on each node.
Clients connect to the proxy.
The control plane updates the proxy server configuration when the backends change.
... Profit!

Modern implementation of the service mesh

If we need to implement a similar grid today, we can use similar principles. For example, configure the internal DNS zone by associating service names with addresses in the space 127.0.0.0/8 . Then run HAProxy on each cluster node, accepting connections to each service address (on this subnet 127.0.0.0/8 ) and redirecting / balancing the load on the corresponding backends. The HAProxy configuration can be managed by confd , allowing you to store backend information in etcd or Consul and automatically push the updated configuration to HAProxy when necessary.

That's how Istio works! But with some differences:

Uses Envoy Proxy instead of HAProxy.
Saves backend configuration via Kubernetes API instead of etcd or Consul.
Services are allocated addresses on the internal subnet (Kubernetes ClusterIP addresses) instead of 127.0.0.0/8.
It has an additional component (Citadel) for adding mutual TLS authentication between the client and servers.
Supports new features, such as circuit breaking, distributed tracing, deployment of canaries, etc.

Let's take a quick look at some of the differences.

Envoy proxy

Envoy Proxy was written by Lyft [Uber’s rival in the taxi market - approx. trans.]. It is in many ways similar to other proxies (for example, HAProxy, Nginx, Traefik ...), but Lyft wrote its own, because they needed functions that were missing from other proxies, and it seemed more reasonable to make a new one than to expand the existing one.

Envoy can be used by itself. If I have a specific service that needs to connect to other services, I can configure it to connect to Envoy, and then dynamically configure and reconfigure Envoy with the location of other services, while receiving many excellent additional features, for example, by visibility. Instead of a custom client library or embedding in the call trace code, we send traffic to Envoy, and it collects metrics for us.

But Envoy is also able to work as a data plane for a service mesh. This means that now for this service mesh, Envoy is configured by the control plane.

Control plane

In the control plane, Istio relies on the Kubernetes API. This is not very different from using confd , which relies on etcd or Consul to view a set of keys in the data store. Istio, through the Kubernetes API, scans the Kubernetes resource set.

Between times : I personally found this description of the Kubernetes API useful, which reads as follows:

The Kubernetes API server is a “stupid server” that offers storage, version control, validation, updating, and API resource semantics.

Istio is designed to work with Kubernetes; and if you want to use it outside of Kubernetes, then you need to run an instance of the Kubernetes API server (and the etcd helper service).

Service Addresses

Istio relies on the ClusterIP addresses that Kubernetes allocates, so Istio services get an internal address (not in the 127.0.0.0/8 range).

The traffic to the ClusterIP address for a specific service in the Kubernetes cluster without Istio is intercepted by the kube-proxy and sent to the server part of this proxy. If you are interested in technical details, then kube-proxy sets up iptables rules (or IPVS load balancers, depending on how you configured it) to rewrite the destination IP addresses of the connections going to ClusterIP.

After installing Istio in the Kubernetes cluster, nothing changes until it is explicitly included for a given consumer, or even the entire namespace, by introducing the sidecar container into custom boxes. This container will launch an Envoy instance and set up a set of iptables rules for intercepting traffic going to other services and redirecting this traffic to Envoy.

When integrated with Kubernetes DNS, this means that our code can connect by the service name, and everything “just works”. In other words, our code issues requests like http://api/v1/users/4242 , then api resolves the request to 10.97.105.48 , iptables rules intercept connections from 10.97.105.48 and redirect them to the local Envoy proxy, and this local proxy will send request for actual backend API. Fuh!

Additional ryushechki

Istio also provides end-to-end encryption and authentication through mTLS (mutual TLS). For this is responsible component called Citadel .

There is also a Mixer component that Envoy can request for each request in order to make a special decision about this request depending on various factors such as headers, loading the backend, etc. ... (don't worry: there are many means to make Mixer work, and even if it crashes, Envoy will continue to function normally as a proxy).

And, of course, we mentioned visibility: Envoy collects a huge number of metrics, while providing distributed tracing. In the microservice architecture, if a single API request must pass through microservices A, B, C and D, then upon logging into the system, a distributed trace will add a unique identifier to the request and save this identifier through subqueries to all these microservices, allowing all associated calls to be recorded. delays, etc.

Develop or buy

Istio has a reputation as a complex system. In contrast, building a routing grid, which I described at the beginning of this post, is relatively simple using existing tools. So, does it make sense to create your own service mesh instead?

If we have modest needs (we do not need visibility, a circuit breaker and other subtleties), then thoughts about developing our own tool come. But if we use Kubernetes, it may not even be necessary, because Kubernetes already provides basic tools for service discovery and load balancing.

But if we have advanced requirements, then “buying” a service mesh is a much better option. (This is not always just a “purchase”, since Istio comes with open source, but we still need to invest engineering time to understand its work, close it down and manage it).

What to choose: Istio, Linkerd or Consul Connect?

So far we have been talking only about Istio, but this is not the only service mesh. A popular alternative is Linkerd , and there is Consul Connect .

What to choose?

Honestly, I do not know. At the moment I do not consider myself competent enough to answer this question. There are several interesting articles comparing these tools and even benchmarks .

One promising approach is to use a tool like SuperGloo . It implements an abstraction layer to simplify and unify the API provided by the service meshes. Instead of learning specific (and, in my opinion, relatively complex) APIs of various service meshes, we can use simpler SuperGloo constructs — and easily switch from one to another, as if we have an intermediate configuration format describing HTTP interfaces and backends capable of generating the actual configuration for Nginx, HAProxy, Traefik, Apache ...

I indulged a bit with Istio and SuperGloo, and in the next article I want to show how to add Istio or Linkerd to an existing cluster using SuperGloo, and how much the latter does its job, that is, it allows you to switch from one service mesh to another without overwriting configurations.

Source: https://habr.com/ru/post/453204/

All Articles