Three laws of configydynamics

In the article under the cat we will talk about how to deal with entropy in the configuration files.

The birth of configuration files

A long time ago, one developer wrote a simple web application for storing data on the salaries of company employees. He used two bases: a worker with real employees and salaries, and a test one with invented data.

One day, late at night, he transferred a new function to the workbase and forgot to change the hard link to the test base:

mysql_connect("db-staging.example.com", "admin", "admin");

The next morning, the boss logged in and found that instead of employees, there were mysteriously appeared Disney cartoon characters.

Boss is not pleased.

Firmly intending never to repeat this error again, the developer decided to replace the hard-coded host name with a variable.

This is how the first configuration file was born.

 ;    SVN [db] host = db1.example.com dbname = payrolls user = admin pass = s3cur3

Several years have passed, and the database connection settings are still the first thing that people try to put in the configuration files. But now we also have API keys, third-party system tokens, and other all sorts of necessary information, which has an unpleasant peculiarity that changes periodically.

Previously, in our buildo, web applications used 2 configuration files.

The Scala server APIs read configuration data from the application.conf file. It looked like this:

 app { db = { url = "jdbc:postgresql://localhost:5432/app" user = "postgres" password = "" driver = org.h2.Driver keepAliveConnection = true queriesTimeoutSeconds = 1 connectionTimeout = 1000 numThreads = 1 } interface = "0.0.0.0" port = 8082 allowedHostnames = ["localhost"] allowedHeaders = ["Content-Type", "Authorization", "Cache-Control", "Pragma"] #Timeout for reading routine time from file routineTimeDataTimeout = 1 localBackupPath = "/Users/fra/buildo/app/backup" serviceEndpoint = "http://localhost:8083" maximumItemsNumber = 100000000 #Wait 5 seconds before killing all connections waitBeforeKillingConnectionsMillis = 5000 #Size of the buffer used to read and write files streamBufferSize = 4096 }

The JS clients used the JSON file config.json :

 { "NODE_ENV": "development", "hostname": "localhost", "port": 9090, "apiEndpoint": "https://api-buildo-dev.example.com/v2", "gMapsAPIKey": "abcdefghijklmnopqrstuvwxyz", "title": "Yolo", "username": "test-temp@buildo.io", "password": "test", "debug": "state*,react-avenger*" }

Both files were added to .gitignore and never got into Git.

Wait, you need to use environment variables!

You may be familiar with the Twelve-Factor App manifest, which recommends storing configuration information in environment variables rather than configuration files. However, despite the fact that it is a convenient, language-independent way of storing configuration information, it does not solve the main problem. If an application needs several configuration parameters, you end up with a file that looks something like this:

 export MY_APP_VAR1="foo" export MY_APP_VAR2="bar"

Congratulations, you have just created a universal configuration file! You also discovered the first law of configydynamics:

The values of configuration parameters cannot be created or destroyed, they only go from one type to another.

https://commons.wikimedia.org/wiki/File:Carnot_heat_engine_2.svg

To share is to take care.

When I first start working on a project and clone the repository for the first time, the project often refuses to run without a configuration file.

 $ npm install && npm start [...] Error: Cannot find module './config.json'

Usually I ask a question on Slack, and another developer throws me a working version of the config in a personal chat (this is of course silly, but it’s known for certain that we are not the only ones who do this).

Having once again looked at the two previous examples, it can be noted that they are rather long. This is due to the second law of configydynamics:

The total length of the configuration file can only increase with time.

https://commons.wikimedia.org/wiki/File:PSM_V76_D246_Demonstration_of_the_second_law_of_thermodynamics.png

In fact, I missed several values from real configuration files that I sent over Slack several months ago. But is it always possible to be sure that these are the correct settings, and not some kind of test config?

The only way to deal with the second law is to take some energy (the developer) and look for configuration parameters that are rarely changing. You may not want to hard-drive them into the code, but no one will forbid you to pour these values into the repository in any other form.

For Scala applications, we use Lightbend Config. It allows you to define reference.conf with the default configuration settings that you can safely put in the repository.

Not so long ago, we began to pay more attention to what is happening in the reference.conf file. We want to be sure that this is not just a framework, but a full configuration file, which has all the necessary values for running the application.

If you want to overwrite these values, you can either set local environment variables or create an application.conf file that will not fall into commits, as it is added to .gitignore .

Here is the beginning of our reference.conf , designed in a new style:

 # This is the reference config file that contains all the default settings. # Make your edits/overrides in your application.conf. app { interface = "0.0.0.0" interface = ${?SERVICE_INTERFACE} port = 8080 port = ${?SERVICE_PORT} ... }

A similar situation develops with the client part, where we now always create a development.json file (falls into a commit), which contains default values that can be overridden in the optional local.json file (does not fall into a commit). We also create a production.json file that contains the settings for production. In this case, we do not use any open-source library, but wrote our own simple implementation .

It allows us to convert, for example, this old build CI script:

 echo '{ "NODE_ENV": "production", "port": 9090, "apiEndpoint": "/api", "uglify": true, "gzip": false, "title": "Awesome App" }' > config.json npm run build

to a new view:

 NODE_ENV=production npm run build

The legend of many environments

You should strive to ensure that you have one default configuration loaded into the repository, which is enough to run the application in the local developer environment.

Hence the third law of configydynamics is formulated:

The length of the ideal configuration file in the developer’s environment is zero.

https://commons.wikimedia.org/wiki/File:Can_T%3D0_be_reached.jpg

But what about the other environments? You may want to deploy the application in production. It is also quite likely that you have a staging server with minor differences (for example, more detailed logging).

First you need to reduce the number of optional differences. If identical settings are suitable for environments, they most likely need to be combined.

Next, it is necessary for each environment to create files with their specific settings, overwriting the default settings. Store them in Git in the application's repository or in a separate infrastructure repository. Your developers should be able to quickly find configurations for various environments and, if necessary, apply them in their development environments.

Finally, ensure that your artifacts, which are versioned in Git, are automatically deployed to servers. In every way resist the temptation to log into the server via SSH and fix the config manually. Use Ansible , Chef, or another configuration management tool; or take a Packer , collect new AMIs and deploy them with Terraform . Use the most convenient tool for yourself, but always keep your files in a synchronized state.

Docker power

We use Docker for packaging applications, and this simplifies working with configuration files by reducing the differences between environments.

The following Docker Compose file will work fine on both the MacBook and the production server. The name of the API host will always be api , the host with the base is db , and Docker will take care of associating these names with the correct containers. Now you do not need to register different host names for different environments!

 services: web: image: quay.io/buildo/app-frontend ports: - "80:5000" links: - api api: image: quay.io/buildo/app-backend links: - db db: image: postgres

To collect environment-specific settings into a single file, we use the functionality of multiple Docker compose files . Looking at testing.yml , you can immediately notice that the test environment uses non-standard HTTP ports, turns on the development token and loads a different database configuration.

 services: web: ports: - 8008:5000 api: environment: - "USE_DEVELOPMENT_TOKEN=true" ports: - 8005:8080 db: volumes: - ./config/postgres/staging.conf:/usr/share/postgresql/postgresql.conf.sample

Thus, our Docker images are exactly the same for all environments, and we use Compose files to set environment-specific settings using environment variables or configuration files.

In most cases, it is better to use environment variables, since the values of configuration parameters for different (micro) services can easily be included in a single Compose file, which is then stored in Git. As shown in the example above, if you think that the configuration file is better in some cases, it can be connected using a Docker volume. But do not forget that all configuration files that are mentioned in the Compose file must also be saved in Git.

How to keep secrets

Sometimes configuration file options are too sensitive to be stored in Git even if it is a private repository. These can be, for example, keys for AWS, production API tokens, etc. Diogo Monica recently stated that this data should also not be stored in environment variables.

In buildo, we most often use git-crypt to encrypt confidential information, so this information can be stored in Git, but it cannot be accessed without a PGP key from the white list.

More sophisticated solutions like Vault or Docker secrets have certain advantages. However, these tools are still in our development, and perhaps we will write about them in future articles ...

Conclusion

Please remember the consequences of the three laws of configydynamics:

moving configuration parameters to environment variables does not solve the problem;
it is necessary to regularly search for optional settings;
It is important to ensure that the application runs without configuration files.

Knowledge of these laws reduces the configurational entropy and ultimately saves a significant amount of time.

The following statement is sometimes called the zero law of configydynamics: