📜 ⬆️ ⬇️

Let's try to evaluate Kubernetes

Hi, Habr!

For some time we have been looking at the books on Kubernetes, good, they are already coming out in Manning and in O'Reilly. We can agree that Kubernetes in our area is still interesting from an introductory and engineering rather than from a practical point of view. However, we still put here the cover of the book on Kubernetes and the translation of the article by Daniel Morsing, who made an interesting teaser about this system in his blog.

Enjoy reading!

Introduction
')
Going on a conference tour about the Go language, I listen to various lectures about three times a year about how you can still download Kubernetes. At the same time, I had a very shaky idea of ​​what Kubernetes is about, and there was no need to speak about deep knowledge. But I had free time, and I decided: I’ll try to port a simple application to Kubernetes, follow up with him and describe my first impressions.

I am not going to write a post from the category “Meet Kubernetes”, because it is better to read about this in the official documentation , I will not even begin with it. In fact, the documentation is very good. In particular, I really liked the “Concepts” section when I tried to get a general idea of ​​the system. Documentation authors - hats off!

I'll tell you about the application. I have my own blog server that contains the original of this article. It so happens that he works on a small Linode instance, which I manage manually. It would seem that using a whole cluster management stack to deploy such a small application is gun shooting at sparrows; in fact, it is. But I found that in this way it is convenient to practice working with the system. At the time of publication of the original, my blog was working on a single-node Google Container Engine cluster.

Life cycles podov

Kubernetes is famous for its well-established planner. For each deployed system managed through Kubernetes, a group of containers (so-called “under”) is assigned, in which the machines (so-called “nodes”) can be started. In the process of rolling out and scaling resources, new scams are created and abolished, depending on the requirements of the replica. Such planning provides a more rational use of resources, but it seems to me that they themselves are not as revolutionary as the environment in which Kubernetes operates. Kubernetes provides out-of-the-box image management, an internal domain name system and rolling out automation. Therefore, it seems to me expedient to build such a system where specific levels correspond with specific nodes, therefore, there is virtually no need for a scheduler.

One of the tasks that, apparently, cannot be solved within the framework of this life cycle is that the system does not know how to work with hot caches. In principle, keeping such caches is a bad idea, but in practice they sometimes come across in clusters. If you have a memcache-container, and with it you are using a server, then upgrading such a server will inevitably require you to kill memcache first. There is a mechanism for working with condition-preserving hearths, but at the same time, all state must be dumped onto a disk, and then this state will be read back when it is rescheduled. If such a need really arises, then you are unlikely to be impressed by the prospect of waiting until all this economy is restored from the disk.

Network

The network that provides for the exchange of information between the bottoms is rather beautiful. On the Google Cloud Platform, each node gets a / 24 subnet in a closed range of 10.0.0.0/8, and each one then gets its own IP. Then on each pane, you can use for your services any range of ports, whichever you want. With this isolation, the situation in which several applications would try to connect to the same port is excluded. If you want to reset the http server on port 80, you can do so without worrying about other http servers. Most applications are “trained” to avoid 80, but many things open up debugging servers on the 8080, and I know how entire systems fell because of this.

Another advantage of such work with namespaces is that such a system allows you to deal with various programs that can be configured to interact with non-standard ports. In fact, keeping a DNS on any port other than 53 is really damn hard, since you will not send a request to such a DNS from anywhere.

The basis of this isolation is the ratio “each node has its own subnet”. If a provider allocates you only one IP per node, you will have to install some kind of overlay network, and such networks (in my experience) do not work too well.

So, the communication between the subs is established well, but you still need to accurately determine all the IP addresses to ensure communication between them. For this, there are so-called “services” in Kubernetes. By default, each “service” receives a specific internal cluster IP. This IP can be found on the internal domain name system and connect to it. If you have several podov, suitable for a particular service, then between them will automatically turn on load balancing, the processing of which will engage the node that initiated the communication.

I still can not decide whether I like the “single IP for the entire cluster”. Most applications are pretty bad at being able to handle multiple DNS values ​​obtained en masse, and will choose the result that they will get first - because of which the distribution of resources will be uneven. Single IP per cluster removes this problem. However, here we fall into the classic trap - we confuse the discovery of services and keeping them alive. Turning to work with ever larger clusters, we meet more and more asymmetric segments. It is possible that the node at which it is hosted under will be able to carry out a performance check itself and report it to the Kubernetes master node, but it will not reach it from other nodes. When this happens, the load balancer will still try to reach the failed node, and since there is no one IP backup option for the entire cluster. You can try to complete the connection and try again, in the hope that when you redistribute the load, the TCP connection will go through another node, a worker, but this solution is not optimal.

When deploying Kubernetes, I advise you to organize such a health check: to find out how many requests were sent to a specific sub since it was previously checked. If this amount is below a certain limit, mark that under the junk. Thus, the load balancer must quickly learn to avoid this sub, even if it is reachable from the Kubernetes master node. You can configure the services to get the IP flow directly, but if you do not need a network ID, I do not see the need.
Since the pods have private IPs that are not routable over the public Internet, some kind of local port translation mechanism is needed to be routed if you need to access this or that information from outside the cluster. Usually, I am not thrilled with the creation of NAT in such cases, so the transition to IPv6 would be most logical: we simply allocate to each node a publicly routed subnet, and the problem is solved by itself. Unfortunately, Kubernetes does not support IPv6.

In practice, NAT is not such a big problem. When working with traffic coming from outside the cluster, Kubernetes encourages us to actively work with such services that interact with cloud load balancers that provide a single IP. Since many Kubernetes developers were orchestrating at Google, it is not surprising that the described mechanism is based on the Maglev model.

Unfortunately, I have not yet figured out how to not only establish the interaction of the service with the outside world, but also ensure high availability in an environment where there is no such load balancer. Kubernetes can be ordered to redirect all traffic that reaches the cluster to an external node, but it does not take into account when distributing the traffic to the IP nodes. If the sub is output from the node to which this IP is routed, then the node has to deal with the organization of NAT — which means we are loading it with unnecessary work. Another problem is the so-called port conflicts at the external IP level. If you have a controller that has updated external IPs (as well as information in the DNS service you are using), depending on which nodes these IPs have landed on, it may happen that two pods simultaneously want traffic from port 80, but you will have no chance to distinguish one IP from another.
While I somehow survive with a cloud load balancer. It's easier, but I'm not going to use Kubernetes anywhere outside the cloud for the foreseeable future.

More about cloud magic

Another area in which cloud magic is actively used is the management of persistent storages. The existing options for hitting the disk, which can be accessed from the sub, are clearly tied to work with cloud providers. Yes, you can create a repository, which in essence is simply a removable disk on a node, but this feature is still in the alpha state and is not used in production. It would be interesting to check whether it is possible to boot the NFS service running on Kubernetes and then organize that it is through him that Kubernetes manages permanent storage, but I do not plan on taking on such a task yet.

Some people think that cloud magic is the last drop that makes it all drop, but today cluster computing is so dependent on cloud services that it is very difficult to do without clouds. I notice that people try not to get involved in this magic in order to avoid dependence on the supplier, however, they also underestimate the cost of self-development, as well as the many implicit points implied when working on a cloud platform. Kubernetes provides a consistent interface through which it is convenient to engage in this cloud magic. So, you can trust him, and he (at least) is technically standardized

Conclusion

That ended our little excursion to Kubernetes - I hope you enjoyed it. Naturally, the article deals with toy problems, so I can simply ignore the difficulties that arise when large-scale use of Kubernetes. I also work on a cloud platform, so I don’t have any idea what the machine interface is made of — maybe wax. However, as a user of the innovation system, I guess why many people are looking for alternative ways to download it.

Source: https://habr.com/ru/post/341670/


All Articles