About the use of Docker
and Docker-compose
lately a lot has been written, for example, I recommend a recent article on Habré , if you haven’t yet penetrated. It is really very convenient, and in a bundle in the ansible especially. And I use it everywhere. From development to automatic integration testing on CI
. About use in testing, they also wrote . It's great and comfortable. However, for local development, for trashuting of data “as in production” or performance testing, on “volumes of people close to production”, I would like to have an image with the base “like in production” at hand!
Accordingly, it would be desirable that each developer, starting to work on a project, could launch it with one team, for example:
./gradlew dockerRun
and would the application go up at once with all the necessary associated containers? And the main thing is that there would already be data for most development and bugfixing cases, standard users and most working services that could be immediately started working without wasting time on exporting and importing any images or demo data out there!
What a nice bonus, isn't it great to have a database of several gigabytes and the ability to roll back to its original (or any other commit) state within a couple of seconds?
Of course, we'll talk about writing a Dockerfile
for such an image with data, and some of the pitfalls of this process.
Here on this and focus. In our application, we actively use Postgres
, so the story and examples will be about the container with it, but this applies only to examples, the essence of the presentation applies to any other relational or fashionable NoSQL
database.
First, let's define the problem in more detail. We are preparing an image with data that anyone who works with our application can work with:
CI
I will not start with what Dockerfile
, I hope you are already familiar with this. Those who want to get an idea, refer to the article , well, or official documentation .
It is worth noting that the official docker image of postgres already has several extension points:
POSTGRES_*
variables/docker-entrypoint-initdb.d
where you can put sh scripts or sql
files to be executed at the start. This is very convenient if you say you want to create additional users or databases, set permissions, and initialize extensions.However, for our purposes this is not enough:
entrypoint
, and see private data that should not be seen.--max_prepared_transactions=110
, but we cannot easily put them into the image, and make them standardfsync
)I’ll probably show the prototype of the file right away (only some insignificant parts are cut so that it becomes smaller, for example, the inclusion of the pg_hint_plan
extension, which is pg_hint_plan
on Debian
from RPM
, is pg_hint_plan
because it is not in Deb
and the official repositories):
FROM postgres:9.6 MAINTAINER Pavel Alexeev # Do NOT use /var/lib/postgresql/data/ because its declared as volume in base image and can't be undeclared but we want persist data in image ENV PGDATA /var/lib/pgsql/data/ ENV pgsql 'psql -U postgres -nxq -v ON_ERROR_STOP=on --dbname somedb' ENV DB_DUMP_URL 'ftp://user:password@ftp.somehost.com/desired_db_backup/somedb_dump-2017-02-21-16_55_01.sql.gz' COPY docker-entrypoint-initdb.d/* /docker-entrypoint-initdb.d/ COPY init.sql/* /init.sql/ # Later in RUN we hack config to include conf.d parts. COPY postgres.conf.d/* /etc/postgres/conf.d/ # Unfortunately Debian /bin/sh is dash shell instead of bash (https://wiki.ubuntu.com/DashAsBinSh) and some handy options like pipefaile is unavailable # Separate RUN to next will be in bash instead of dash. Change /bin/sh symlink as it is hardcoded https://github.com/docker/docker/issues/8100 RUN ln -sb /bin/bash /bin/sh RUN set -euo pipefail \ && echo '1) Install required packages' `# https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/#apt-get` \ && apt-get update \ && apt-get install -y \ curl \ postgresql-plperl-9.6 \ && echo '3) Run postgres DB internally for init cluster:' `# Example how to run instance of service: http://stackoverflow.com/questions/25920029/setting-up-mysql-and-importing-dump-within-dockerfile`\ && bash -c '/docker-entrypoint.sh postgres --autovacuum=off &' \ && sleep 10 \ && echo '4.1) Configure postgres: use conf.d directory:' \ && sed -i "s@#include_dir = 'conf.d'@include_dir = '/etc/postgres/conf.d/'@" "$PGDATA/postgresql.conf" \ && echo '4.2) Configure postgres: Do NOT chown and chmod each time on start PGDATA directory (speedup on start especially on Windows):' \ && sed -i 's@chmod 700 "$PGDATA"@#chmod 700 "$PGDATA"@g;s@chown -R postgres "$PGDATA"@#chown -R postgres "$PGDATA"@g' /docker-entrypoint.sh \ && echo '4.3) RERun postgres DB for work in new configuration:'\ && gosu postgres pg_ctl -D "$PGDATA" -m fast -w stop \ && sleep 10 \ && bash -c '/docker-entrypoint.sh postgres --autovacuum=off --max_wal_size=3GB &' \ && sleep 10 \ && echo '5) Populate DB data: Restore DB backup:' \ && time curl "$DB_DUMP_URL" \ | gzip --decompress \ | grep -Pv '^((DROP|CREATE|ALTER) DATABASE|\\connect)' \ | $pgsql \ && echo '6) Execute build-time sql scripts:' \ && for f in /init.sql/*; do echo "Process [$f]"; $pgsql -f "$f"; rm -f "$f"; done \ && echo '7) Update DB to current migrations state:' \ && time java -jar target/db-updater-*.jar -f flyway.url=jdbc:postgresql://localhost:5432/somedb -f flyway.user=postgres -f flyway.password=postgres \ && echo '8) Vacuum full and analyze (no reindex need then):' \ && time vacuumdb -U postgres --full --all --analyze --freeze \ && echo '9) Stop postgres:' \ && gosu postgres pg_ctl -D "$PGDATA" -m fast -w stop \ && sleep 10 \ && echo '10) Cleanup pg_xlog required to do not include it in image!:' `# Command inspired by http://www.hivelogik.com/blog/?p=513` \ && gosu postgres pg_resetxlog -o $( LANG=C pg_controldata $PGDATA | grep -oP '(?<=NextOID:\s{10})\d+' ) -x $( LANG=C pg_controldata $PGDATA | grep -oP '(?<=NextXID:\s{10}0[/:])\d+' ) -f $PGDATA \ && echo '11(pair to 1)) Apt clean:' \ && apt-get autoremove -y \ curl \ && rm -rf /var/lib/apt/lists/*
As you can see, I tried to add comments directly to the file, maybe they are even more than exhaustive, but nevertheless we’ll dwell on a couple of moments in more detail.
ENV PGDATA /var/lib/pgsql/data/
. This is the key point. because we want the completed data during the build to be included in the image ; we should not put it in a standard place defined as volume .DB_DUMP_URL
defined simply for ease of later editing. If desired, it can be transferred from the outside, during the build.Postgres
right during the build process: bash -c '/docker-entrypoint.sh postgres --autovacuum=off &'
in order to run some simple configurations:sed
, we mainly include include_dir
in the main postgres.conf
. We need this in order to minimize such manipulations with the config, otherwise they will be very difficult to maintain, but we have provided unlimited extensibility of the configuration! Please note, a little bit higher we use the COPY postgres.conf.d/* /etc/postgres/conf.d/
to put pieces of configs specific to our build.chown
and chmod
instructions, since the database is initialized, then the files will already have the correct users and rights in the image, but it was empirically found out that on the docker version for Windows this operation for some reason may take a very long time, up to tens of minutes.Postgres
, and only then try to configure it! Otherwise we will get an error at the start that the directory for cluster initialization is not empty!Postgres
to re-read the configs we put in and set up to read. Strictly speaking, this step is not at all mandatory. However, by default it has very conservative memory settings like shared_buffers = 128MB
, and work with any significant ones is delayed by the clock./init.sql/*
construct /init.sql/*
apply all SQL
scripts from this directory during the image creation (as opposed to standard extension scripts). This is where we do the necessary obfuscation of data, sampling, cleaning, adding test users, etc.Postgres
with auto-vacuum turned off ( --autovacuum=off
) to speed up imports.pg_resetxlog
to reset and not include accumulated WAL . And when I start, I use --max_wal_size=3GB
to increase the file size and not rotate it once again.The finished image can only assign a tag and push into the repository. Most often, of course, it will be a private repository, if you are not working on some kind of public data sample.
I will be very happy if it helps someone to make his process of preparing the creation of test images with the data even a little easier.
Source: https://habr.com/ru/post/328226/
All Articles