How to manage a herd of OpenVZ containers, if their number is> 300-500

How to get a squad of well-trained services instead of circus-tent in the form of a pile of demons on bare iron with the help of obvirtualizing everything and everything, and what is the point.

Artistic introduction

Long ago, when ~~people did not know how to make fire,~~ all services (applications, databases, http-backends, etc) lived directly on the servers, ~~Magi~~ admins were awakened by nightly alert calls (reaction time ~ 30 minutes, hehe) monitor users. Because either one or the other demon got out of obedience and otzhiral something free - disk / memory / processor cycles ...
There were also craftsmen from among the developers who were able to make a generic xml-ek, which kept up to 8 gigabytes of these in memory.
')

Scheme

One service - one container.
Service - http-frontend, http-backend, applications, databases, whatever.

Since the limits are issued to containers only as a result of load testing (no results are not commercially exploited), the service can go crazy only within these limits. Sometimes it happens that one container gives LA for the whole car, but ioprio is not tough at all.

“Use Bacula if you do not plan to restore files from backup”

Bases are backed up through mysqldump and pg_dump on one host bold in terms of disks.
All sorts of small files - settings, content, etc - are added up in the bakula. Projects are already in svn and git.
In addition, we use a “warm” backup — about every 6th machine is a nightly rsync replica running on the other 5 nodes of the containers. Containers with mysql at the end of the sync are raised with the left address and the check table is made.
So, we have “saved everything” in one form or another and we have the opportunity to quickly and quickly return the containers from the fallen piece of iron with the hands of the duty staff.
Machines with heartbeat and drbd do not count.

Control

If there are from three or four hundred containers available, then in addition to the “usual” monitoring, with such a scheme it is especially important to know when and where there were strikes to the limits and what is the current load on the hosts.
At a glance, nothing suitable had gonebbled up at least 2 years ago, so Yabeda and Harvester were written by our forces.
Yabeda is able to show in the console which container in which parameter has knocked since the last launch, as well as to write this knowledge into the database and sneak around the jabber.
Harvester can:
- For host nodes, collect available (commited) memory (privvmpages and shmpages), cpu units and disk space into the database;
- For vpsov - collect basic parameters and add changes to the base;
- For people - show the current picture in general and a detailed history in particular (see django-frontend to the base of harvester-web ).

There is also a self-signed binding to ps and vzpid for snapshots of a list of processes on the host machine, since htop does not know how to output to text files.

If I have not explicitly described something, ask!

Source: https://habr.com/ru/post/73093/

All Articles

How to manage a herd of OpenVZ containers, if their number is> 300-500

Artistic introduction

Scheme

“Use Bacula if you do not plan to restore files from backup”

Control

More articles: