⬆️ ⬇️

Kubernetes tips & tricks: speeding up the bootstrap of large databases

With this article we open a series of publications with practical instructions on how to make life easier for yourself (exploitation) and developers in various situations that happen literally every day. All of them are collected from real experience in solving problems from clients and have improved over time, but still do not pretend to be ideal - consider them rather as ideas and preparations.



I will start with the “trick” of preparing large database dumps like MySQL and PostgreSQL for their quick deployment for various needs - first of all, at the developer sites. The context of the operations described below is our typical environment, which includes the working Kubernetes cluster and the use of GitLab (and dapp ) for CI / CD. Go!





')

The main pain in Kubernetes when using the feature branch is large databases when developers want to test / demonstrate their changes on a full (or almost complete) database from production. For example:





Note : Usually, we back up the database using MySQL using Percona innobackupex, which allows you to save all databases and users ... in short, everything that may be required. It is this example that is discussed further in the article, although in general it makes absolutely no difference how exactly you make backups.



So, let's say we have a database backup. What to do next?



Step 1: Preparing a new database from dump



First of all, we will create in Kubernetes Deployment , which will consist of two init-containers (i.e., such special containers that run up to the sweatsheets with the application and allow pre-configuration) and one sweep.



But where to place it? We have a large database (1 TB) and we want to raise ten of its copies - we need a server with a large disk (10+ TB). Order it separately for this task and mark the node with this server with a dedicated: non-prod-db label dedicated: non-prod-db . At the same time, we use the same name taint , who will say Kubernetes, that only applications that are resistant (have tolerations ) to it can roll to this node, that is, translating into the language Kubernetes, dedicated Equal non-prod-db .



Using nodeSelector and tolerations select the required node (located on the server with a large disk):



  nodeSelector: dedicated: non-prod-db tolerations: - key: "dedicated" operator: "Equal" value: "non-prod-db" effect: "NoExecute" 


... and take a look at the contents of this site.



Init containers: get-bindump



We will call the first init-container get-bindump . emptyDir is mounted to emptyDir (in /var/lib/mysql ), where the database dump received from the backup server will be added. For this, the container has everything you need: SSH keys, backup server addresses. This stage in our case takes about 2 hours.



The description of this container in Deployment is as follows:



  - name: get-bindump image: db-dumps imagePullPolicy: Always command: [ "/bin/sh", "-c", "/get_bindump.sh" ] resources: limits: memory: "5000Mi" cpu: "1" requests: memory: "5000Mi" cpu: "1" volumeMounts: - name: dump mountPath: /dump - name: mysqlbindir mountPath: /var/lib/mysql - name: id-rsa mountPath: /root/.ssh 


The get_bindump.sh script used in the container:



 #!/bin/bash date if [ -f /dump/version.txt ]; then echo "Dump file already exists." exit 0 fi rm -rf /var/lib/mysql/* borg extract --stdout user@your.server.net:somedb-mysql::${lastdump} stdin | xbstream -x -C /var/lib/mysql/ echo $lastdump > /dump/version.txt 


Init containers: prepare-bindump



After downloading the backup, the second init-container, prepare-bindump . It runs innobackupex --apply-log (since the files are already available in /var/lib/mysql - thanks to emptyDir from get-bindump ) and the MySQL server starts.



It is in this init-container that we do all the necessary conversions to the database, preparing it for the selected application: we clear the tables for which this is permissible, we change access within the database, etc. Then turn off the MySQL server and simply archive the whole /var/lib/mysql to a tar.gz file. As a result, the dump fits into a file with a size of 100 GB, which is already an order of magnitude smaller than the original 1 TB. This stage takes about 5 hours.



Description of the second init container in Deployment :



  - name: prepare-bindump image: db-dumps imagePullPolicy: Always command: [ "/bin/sh", "-c", "/prepare_bindump.sh" ] resources: limits: memory: "5000Mi" cpu: "1" requests: memory: "5000Mi" cpu: "1" volumeMounts: - name: dump mountPath: /dump - name: mysqlbindir mountPath: /var/lib/mysql - name: debian-cnf mountPath: /etc/mysql/debian.cnf subPath: debian.cnf 


The prepare_bindump.sh script used in it looks like this:



 #!/bin/bash date if [ -f /dump/healthz ]; then echo "Dump file already exists." exit 0 fi innobackupex --apply-log /var/lib/mysql/ chown -R mysql:mysql /var/lib/mysql chown -R mysql:mysql /var/log/mysql echo "`date`: Starting mysql" /usr/sbin/mysqld --character-set-server=utf8 --collation-server=utf8_general_ci --innodb-data-file-path=ibdata1:200M:autoextend --user=root --skip-grant-tables & sleep 200 echo "`date`: Creating mysql root user" echo "update mysql.user set Password=PASSWORD('password') WHERE user='root';" | mysql -uroot -h 127.0.0.1 echo "delete from mysql.user where USER like '';" | mysql -uroot -h 127.0.0.1 echo "delete from mysql.user where user = 'root' and host NOT IN ('127.0.0.1', 'localhost');" | mysql -uroot -h 127.0.0.1 echo "FLUSH PRIVILEGES;" | mysql -uroot -h 127.0.0.1 echo "truncate somedb.somedb_table_one;" | mysql -uroot -h 127.0.0.1 -ppassword somedb /usr/bin/mysqladmin shutdown -uroot -ppassword cd /var/lib/mysql/ tar -czf /dump/mysql_bindump.tar.gz ./* touch /dump/healthz rm -rf /var/lib/mysql/* 


Under



The final chord is the launch of the main flow, which occurs after the execution of the init containers. In the pod we have a simple nginx, and through emtpyDir compressed and truncated dump of 100 GB is emtpyDir . The function of this nginx is to give this dump.



Configuration:



  - name: nginx image: nginx:alpine resources: requests: memory: "1500Mi" cpu: "400m" lifecycle: preStop: exec: command: ["/usr/sbin/nginx", "-s", "quit"] livenessProbe: httpGet: path: /healthz port: 80 scheme: HTTP timeoutSeconds: 7 failureThreshold: 5 volumeMounts: - name: dump mountPath: /usr/share/nginx/html - name: nginx-config mountPath: /etc/nginx/nginx.conf subPath: nginx.conf readOnly: false volumes: - name: dump emptyDir: {} - name: mysqlbindir emptyDir: {} 


This is what the entire Deployment looks like with its initContainers ...
 --- apiVersion: apps/v1beta1 kind: Deployment metadata: name: db-dumps spec: strategy: rollingUpdate: maxUnavailable: 0 revisionHistoryLimit: 2 template: metadata: labels: app: db-dumps spec: imagePullSecrets: - name: regsecret nodeSelector: dedicated: non-prod-db tolerations: - key: "dedicated" operator: "Equal" value: "non-prod-db" effect: "NoExecute" initContainers: - name: get-bindump image: db-dumps imagePullPolicy: Always command: [ "/bin/sh", "-c", "/get_bindump.sh" ] resources: limits: memory: "5000Mi" cpu: "1" requests: memory: "5000Mi" cpu: "1" volumeMounts: - name: dump mountPath: /dump - name: mysqlbindir mountPath: /var/lib/mysql - name: id-rsa mountPath: /root/.ssh - name: prepare-bindump image: db-dumps imagePullPolicy: Always command: [ "/bin/sh", "-c", "/prepare_bindump.sh" ] resources: limits: memory: "5000Mi" cpu: "1" requests: memory: "5000Mi" cpu: "1" volumeMounts: - name: dump mountPath: /dump - name: mysqlbindir mountPath: /var/lib/mysql - name: log mountPath: /var/log/mysql - name: debian-cnf mountPath: /etc/mysql/debian.cnf subPath: debian.cnf containers: - name: nginx image: nginx:alpine resources: requests: memory: "1500Mi" cpu: "400m" lifecycle: preStop: exec: command: ["/usr/sbin/nginx", "-s", "quit"] livenessProbe: httpGet: path: /healthz port: 80 scheme: HTTP timeoutSeconds: 7 failureThreshold: 5 volumeMounts: - name: dump mountPath: /usr/share/nginx/html - name: nginx-config mountPath: /etc/nginx/nginx.conf subPath: nginx.conf readOnly: false volumes: - name: dump emptyDir: {} - name: mysqlbindir emptyDir: {} - name: log emptyDir: {} - name: id-rsa secret: defaultMode: 0600 secretName: somedb-id-rsa - name: nginx-config configMap: name: somedb-nginx-config - name: debian-cnf configMap: name: somedb-debian-cnf --- apiVersion: v1 kind: Service metadata: name: somedb-db-dump spec: clusterIP: None selector: app: db-dumps ports: - name: http port: 80 


Additional notes:



  1. In our case, every night we prepare a new dump using the scheduled job in GitLab. Those. every night we automatically roll out this Deployment , which pulls up a fresh dump and prepares it for distribution to all developers' test environments.
  2. Why do we also add volume /dump to init containers (and there is a check for existence of /dump/version.txt in the script)? This is done in case the server is restarted, on which it runs under. The containers will be restarted and without this check, the dump will begin to download again. If we have already prepared a dump once, then the next start (in case of server reboot) the flag file /dump/version.txt will report this.
  3. What kind of image db-dumps ? We collect it with a dapp and its Dappfile looks like this:



     dimg: "db-dumps" from: "ubuntu:16.04" docker: ENV: TERM: xterm ansible: beforeInstall: - name: "Install percona repositories" apt: deb: https://repo.percona.com/apt/percona-release_0.1-4.xenial_all.deb - name: "Add repository for borgbackup" apt_repository: repo="ppa:costamagnagianfranco/borgbackup" codename="xenial" update_cache=yes - name: "Add repository for mysql 5.6" apt_repository: repo: deb http://archive.ubuntu.com/ubuntu trusty universe state: present update_cache: yes - name: "Install packages" apt: name: "{{`{{ item }}`}}" state: present with_items: - openssh-client - mysql-server-5.6 - mysql-client-5.6 - borgbackup - percona-xtrabackup-24 setup: - name: "Add get_bindump.sh" copy: content: | {{ .Files.Get ".dappfiles/get_bindump.sh" | indent 8 }} dest: /get_bindump.sh mode: 0755 - name: "Add prepare_bindump.sh" copy: content: | {{ .Files.Get ".dappfiles/prepare_bindump.sh" | indent 8 }} dest: /prepare_bindump.sh mode: 0755 


Step 2: Run the database in the developer's environment



When the MySQL database rolls out in the developer’s test environment, it has a button in GitLab that launches the Redundant Deployment 's with MySQL with the RollingUpdate.maxUnavailable: 0 strategy:







How is this implemented?
In GitLab, when you click on reload db , Deployment deploys with this specification:



 spec: strategy: rollingUpdate: maxUnavailable: 0 


Those. we say Kubernetes to update Deployment (create a new one under) and at the same time ensure that at least one of them is live. Since, when creating a new one, it has init-containers, while they are working, the new one does not become the Running status, which means the old one continues to work. And only at the moment that it started up itself under MySQL (and completed the readiness probe), the traffic switches to it, and the old one (with the old database) is deleted.



Details of this scheme can be found in the following materials:





The selected approach allows us to wait until the new dump is downloaded, unzipped and launched, and only after that the old one will be removed from MySQL. Thus, while we are preparing a new dump, we are quietly working with the old base.



The init container of this Deployment uses the following command:



 curl "$DUMP_URL" | tar -C /var/lib/mysql/ -xvz 


Those. we download the compressed database dump, which was prepared in step 1, unzip it to /var/lib/mysql , and then start under Deployment , in which MySQL is started with the data already prepared. All this takes about 2 hours.



And Deployment looks like this ...
 apiVersion: apps/v1beta1 kind: Deployment metadata: name: mysql spec: strategy: rollingUpdate: maxUnavailable: 0 template: metadata: labels: service: mysql spec: imagePullSecrets: - name: regsecret nodeSelector: dedicated: non-prod-db tolerations: - key: "dedicated" operator: "Equal" value: "non-prod-db" effect: "NoExecute" initContainers: - name: getdump image: mysql-with-getdump command: ["/usr/local/bin/getdump.sh"] resources: limits: memory: "6000Mi" cpu: "1.5" requests: memory: "6000Mi" cpu: "1.5" volumeMounts: - mountPath: /var/lib/mysql name: datadir - mountPath: /etc/mysql/debian.cnf name: debian-cnf subPath: debian.cnf env: - name: DUMP_URL value: "http://somedb-db-dump.infra-db.svc.cluster.local/mysql_bindump.tar.gz" containers: - name: mysql image: mysql:5.6 resources: limits: memory: "1024Mi" cpu: "1" requests: memory: "1024Mi" cpu: "1" lifecycle: preStop: exec: command: ["/etc/init.d/mysql", "stop"] ports: - containerPort: 3306 name: mysql protocol: TCP volumeMounts: - mountPath: /var/lib/mysql name: datadir - mountPath: /etc/mysql/debian.cnf name: debian-cnf subPath: debian.cnf env: - name: MYSQL_ROOT_PASSWORD value: "password" volumes: - name: datadir emptyDir: {} - name: debian-cnf configMap: name: somedb-debian-cnf --- apiVersion: v1 kind: Service metadata: name: mysql spec: clusterIP: None selector: service: mysql ports: - name: mysql port: 3306 protocol: TCP --- apiVersion: v1 kind: ConfigMap metadata: name: somedb-debian-cnf data: debian.cnf: | [client] host = localhost user = debian-sys-maint password = password socket = /var/run/mysqld/mysqld.sock [mysql_upgrade] host = localhost user = debian-sys-maint password = password socket = /var/run/mysqld/mysqld.sock 


Results



It turns out that we always have Deployment , which rolls out every night and does the following:





For the example we have considered, creating a dump from a real replica takes about 6 hours, preparing a “base image” takes 7 hours, and updating the database in the developer’s environment takes 2 hours. Since the first two actions are performed “in the background” and invisible to developers, in fact they can deploy a prod version of the database (with a size of 1 TB) in the same 2 hours .



Questions, criticism and corrections to the proposed scheme and its components are welcome in the comments!



PS Of course, we understand that in the case of VMware and some other tools, it would be possible to manage creating a virtual snapshot and launching a new virtual virus from snapshot (which is even faster), but this option does not include preparing the base, taking into account which time ... Not to mention the fact that not everyone has the opportunity or desire to use commercial products.



Pps



Other K8s series tips & tricks:





Read also in our blog:



Source: https://habr.com/ru/post/417509/



All Articles