Do databases live in Kubernetes?

Somehow, historically, the IT industry is divided into two conditional camps for any reason: which are “for” and which are “against”. Moreover, the subject of disputes can be absolutely arbitrary. Which OS is better: Win or Linux? On an Android or iOS smartphone? Store everything in the clouds or pour it onto cold RAID storage and put the screws in the safe? Do PHP-Schniki right to be called programmers? These disputes are, at times, purely existential in nature and have no other basis other than sports interest.

It so happened that with the advent of containers and the whole of our beloved kitchen with the docker and the conditional k8s, the pros and cons of using new features in various areas of the backend began. (We will make a reservation in advance that although Kubernetes will be indicated as an orchestrator most often in this argument, the choice of this particular instrument does not matter in principle. You can substitute any other one which you think is the most convenient and familiar.)
')
And, it would seem, it would be a simple dispute between two sides of the same coin. The same senseless and merciless, as the eternal opposition to Win vs Linux, in which adequate people quite to themselves exist somewhere in the middle. That's just in the case of containerization is not so simple. Usually in such disputes there is no right side, but in the case of “applying” or “not using” containers for storing the database, everything becomes upside down. Because in a certain sense both supporters and opponents of this approach are right.

Bright side

The argument of the Light Side can be briefly described in one phrase: “Hello, 2k19 outside the window!” Sounds like populism, of course, but if you go into the situation in detail, there are some advantages there. They are now and analyze.

Suppose you have a large web project. It could initially be built on the basis of a microservice approach, or at some point it came to it in an evolutionary way - this is not very important, in fact. You scattered our project on individual microservices, set up orchestration, load balancing, scaling. And now with a clear conscience you drink mojito in a hammock during habr effects instead of lifting fallen servers. But in all actions you need to be consistent. Very often, only the application itself is containerized - the code. And what else do we have besides the code?

That's right, the data. The heart of any project is its data: it can be either a typical DBMS - MySQL, Postgre, MongoDB, or storages used for searching (ElasticSearch), key-value storages for caching - for example, redis, etc. Now we don’t let's talk about curves of backend implementations when the database drops due to poorly written queries, and instead talk about ensuring the resiliency of the database itself under client load. After all, when we containerize our application and allow it to scale freely to handle any number of incoming requests, this naturally increases the load on the database.

In fact, the database access channel and the server on which it rotates becomes the eye of a needle in our beautiful containerized backend. At the same time, the main motive of container virtualization is the mobility and plasticity of the structure, allowing to organize the distribution of the peak load across the entire infrastructure available to us as efficiently as possible. That is, if we do not containerize and do not roll out all the existing elements of the system in a cluster, we make a very serious mistake.

It is much more logical to cluster not only the application itself, but also the services responsible for data storage. When clustering and deploying independently working and sharing the load of web servers in k8s, we already solve the problem of data synchronization - the same comments on posts, if we take as an example some kind of media or blog platform. In any case, we have an intracluster, even if virtual, representation of the database as ExternalService. The question is that the database itself is not yet clustered - the web servers deployed in the cube collect information about changes from our static combat base, which is spinning separately.

Feel trick? We use k8s or Swarm to distribute the load and avoid the main web server crash, but we do not do this for the database. But after all, if the database falls, then there is no sense in our entire clustered infrastructure - is there any sense to us from empty web pages returning an error of access to the database?

That is why clustering requires not only web servers, as is usually done, but also database infrastructure. Only in this way can we ensure that the structure is fully working in one harness, but at the same time independent from each other. At the same time, even if our half of our backend collapses under load, the rest will survive, and the database synchronization system among themselves within the cluster and the possibility of infinite scaling and deployment of new clusters will help to quickly reach the necessary capacities — there would be racks in the data center .

In addition, the database model distributed in clusters allows you to carry this same database where it is needed; if we are talking about a global service, then it is rather illogical to turn the web cluster somewhere in the San Francisco area and at the same time drive packages when accessing the database in the Moscow region and back.

Also, the database containerization allows you to build all the elements of the system on the same level of abstraction. That, in turn, makes it possible to manage this system directly from the code, by developers, without actively attracting administrators. The developers thought that a separate DBMS is needed for a new subproject - it is easy! They wrote a yaml file, downloaded it to the cluster and it was done.

Well, of course, internal operation is simplified at times. Tell me, how many times did you squint at the moments when a new member of the team put his hands into the combat database for work? Which you, in fact, one and now spinning right now? Of course, we are all adults here, and we have a fresh backup somewhere, and even further - behind a shelf with grandmother's cucumbers and old skis - another backup, perhaps even in cold storage, because once your office was on fire. But all the same, every introduction of a new member of the team who has access to the combat infrastructure and, of course, to the combat database is a bucket of validol for everyone around. Well, who is he, a novice, knows, maybe he is clubhand? Scary, agree.

Containerization and, in fact, the distributed physical topology of the database of your project helps to avoid such valid moments. Do not trust the newcomer? Okay! Let us raise our own cluster for work and disconnect from the other DB clusters - synchronization only by manual push and synchronous rotation of two keys (one timlide, second administrator). And everyone is happy.

And now it is time to change shoes into opponents of database clustering.

Dark side

Arguing why you should not containerize the database and continue to twist it on one central server, we will not stoop to the rhetoric of orthodoxies and statements like "grandfathers spun the database on hardware, and we will!" Instead, let's try to think of a situation in which containerization would bring tangible dividends.

Agree, projects that really need a base in a container can be counted on the fingers of one hand not the best milling machine operator. For the most part, even the use of k8s or Docker Swarm itself is redundant - quite often these tools are resorted to because of the general high technology of the “supreme” in the person of genders to drive everything into clouds and containers. Well, because now it is fashionable and everyone is doing it.

At least in half the cases, the use of cubernetis or just a docker on the project is redundant. The question is that not all teams or outsourcing companies hired to maintain the client's infrastructure are aware of this. Worse - when containers are imposed, because it rises in a certain amount of coins to the client.

In general, there is a perception that the docker / cube-mafia stupidly presses under itself customers who give these infrastructure issues to outsourcing. After all, in order to work with clusters, we need engineers who are capable of this and generally understand the architecture of the implemented solution. We have somehow described our case with the publication of the Republic - there we trained the client’s team to work in the realities of Cubernetis, and everyone was satisfied. And it was decent. Often, the “implementers” k8s take the client's infrastructure hostage - because now they only understand how everything works there, there are no specialists on the client side.

Now imagine that this way we give not only the web server part to the outsourcing, but also the maintenance of the database. We said that the DB is the heart, and the loss of the heart is fatal for any living organism. In short, the prospects are not the best. So, instead of a Hayp Coubernetis, many projects would simply not have been happy with the normal AWS tariff, which will solve all the problems with the load on their website / project. But AWS is no longer fashionable, and Ponte is more expensive than money - unfortunately, in the IT environment too.

Okay. Perhaps, the project is really necessary for clustering, but if everything is clear with stateless applications, then how can we organize a decent provision for the network connectivity of a clustered database?

If we are talking about a seamless engineering solution, which is the transition to k8s, then our main headache is the replication of data in a clustered database. Some DBMS are initially quite loyal to the distribution of data between their individual instances. Many others are not so welcoming. And quite often the main argument in choosing a DBMS for our project is not the ability to replicate with minimal resource and engineering costs. Especially if the project was not originally planned as a microservice, but simply evolved in this direction.

We think it’s not necessary to tell about the speed of the network drives - they are slow. Those. we still do not have a real opportunity, in which case, to re-raise a copy of a DBMS somewhere, where there is more, for example, processor capacity or free RAM. We will very quickly focus on the performance of the virtualized disk subsystem. Accordingly, the DBMS must be nailed to its own personal set of machines in close proximity. Alternatively, it is necessary to somehow separately separately write up a fairly fast synchronization of data to the estimated reserves.

In continuation of the theme of virtual FS: Docker Volumes, unfortunately, not trouble-free. In general, in such a case as a long-term reliable data storage, I would like to do as technically simple as possible. And the addition of a new layer of abstraction from the FS container to the FS of the parent host is in itself a risk. But when even in the work of the containerization support system there are difficulties in transmitting data between these layers, then it is quite a disaster. At the moment, most of the problems known to progressive humanity seem to be eradicated. But you know, the more complex the mechanism, the easier it is to break.

In the light of all these “adventures”, it is much more profitable and easier to keep the database in one place, and even if you need containerization of the application - let it spin itself and through the distribution gateway get a simultaneous connection with the database that will be read and written only once In one place. This approach reduces the likelihood of errors and out of sync to the lowest possible values.

What do we lead to? Moreover, database containerization is appropriate where there is a real need for it. You can not push the base of a full app and twist it as if you have two dozen microservices - this does not work. And it must be clearly understood.

Instead of output

If you are waiting for a distinct conclusion “to virtualize the database or not”, then we will upset it: it will not be here. Because when creating any infrastructure solution, one should be guided not by fashion and progress, but, first of all, by common sense.

There are projects on which the principles and tools that go with the cubernetis fit perfectly, and in such projects there is peace at least in the backend area. And there are projects that do not need containerization, but a normal server infrastructure, because they fundamentally can not be scaled up to a microservice cluster model, because they will fall.

Source: https://habr.com/ru/post/453602/

All Articles

Do databases live in Kubernetes?

Bright side

Dark side

Instead of output

More articles: