📜 ⬆️ ⬇️

Best Approaches to Database Versioning

Translation of the article “Database versioning best practices” published on the site enterprisecraftsmanship.com.

Tracking your application database changes is not an easy task. As a rule, database schemas do not match in different environments, data in one database may not have some important data pieces. Such circumstances can be unpleasant, especially if they arise in production.

The situation becomes even worse if you are a software distribution developer. In this case, each of your clients has their own copy of the database, the structure of which may differ from others. In such projects, tracking client database changes can be a nightmare.
')
Let's look at the best approaches to database versioning.

Database Versioning: Problem


When you are working on a project alone, which has not yet been released in production, there are no such problems as a version control problem for the database. You change the data scheme the way you want it and it always works.

Problems arise when your program starts working in production or a new team member joins the work on a database related to a part of your project. As soon as you have more than one instance of the database, desynchronization begins. Even with one instance, it takes a significant amount of time to synchronize changes when more than one developer works with it.

Looks familiar? I bet you have had such situations more than once. I, of course, have been.

Best Approaches to Database Versioning


Fortunately, we are not alone. There are many materials written on this topic. As well as software that allows you to solve this problem. I recommend this book if you want to study the subject further. This is a comprehensive guide on how to develop a database in conjunction with the code that uses it.

Ok, so what are these best practices for versioning databases?

Best practice # 1: we must treat the application database and reference data as normal code. This means that we must store both the schema and reference data in the version control system.

Please note that this rule includes not only the database schema, but also reference data. Reference data is data that is required to run the application. For example, if you have a dictionary of all possible types of clients on which your application is built, you should store it in a version control system.

Best practice # 2: we must store all changes to the database schema and reference data explicitly. This means that for each modification we need to create a separate SQL script with changes. If the modification affects both the schema and reference data, they should be reflected in one scenario.

image

Following this rule is an important part of building a successful database version control system. Many projects have database schemas in their version control system, but often this is just a snapshot of the latest database version, that's all. All changes in it are tracked by the version control system itself, they are not stored explicitly. In addition, all changes in reference data are often not tracked.

Tools such as Visual Studio emphasize and encourage programmers to use automatically generated scripts to update the database project data schema. This may work well in small projects, but in large projects, tracking changes in the database using automatically generated scripts becomes a burden. We will talk about database project in Visual Studio and other available tools in the next post .

Best practice # 3: Each SQL script file must be unchanged after deploying to production or staging environments.

The point of saving changes in separate files is that we can monitor (track) each of them. When we modify existing SQL scripts, we lose all the advantages of working with the best versioning practices of the databases provided to us. Keep script files unchanged after they are deployed. If you need to roll back the changes that have already been made - create a separate script for this.

Best practice # 4: All changes to the schema and reference data in databases should be applied through scripts. None of these can be applied manually.

If we change the database to bypass our scripts, the whole idea of ​​versioning the database becomes useless, so we need to make sure that the changes are made only with the help of the SQL scripts we create.

Best practice # 5: Each developer in the team should have their own copy of the database.

Often, teams start with a single database in the development environment. This works well at the beginning, but when the database becomes large enough, modifying it at the same time becomes harder and harder until it stops working at all.

Often, programmers make incompatible changes, so it is a good idea for each programmer to have a separate copy of the database in order to avoid such collisions. If developers make changes to some part of the database schema at the same time, then such conflicts can be resolved using version control systems, such as conflicts in C # / Java, etc.

In addition, if you have several branches of your code base, you can also create a separate database instance for each of them, depending on what database differences exist in these branches.

Best practice # 6: Database versions should be stored in the database itself. I usually create a separate table called “Settings” and store the version there. Do not use complex notation like “xyz” for version numbers, just use an integer.

What are the benefits of this approach?


So what advantages give us the best approaches to versioning databases?

The first and most important advantage is that, using this approach, we no longer have problems with the mismatch of the database schema. Automatic updating to the latest version solves them completely, of course, if we fully adhere to the rules described above.

This is especially useful when you do not have a single production database, but each client has its own copy of the database. Managing a version of the database in such conditions can become hell if you do not use the right version control techniques.

Another advantage of using best practices is the strong coherence of changes in the database. This means that each known modification in the scheme and reference data is reflected in one place, and not scattered throughout the application.

SQL upgrade scripts are endowed with strong connectivity, in the sense that they contain all the necessary changes for the database, so it is easy to understand that the changes were made in the database in order to unlock specific functionality. Keeping the schema and data changes related to each other in one file also helps a lot.

The approach described in this post is applicable, even if you have not followed it from the very beginning. To use this in practice, you just need to create the initial database schema script that you currently have in production and can gradually begin to change it from now on. The current version should be version # 1, with which you can move on using the approaches we discussed above.

Source: https://habr.com/ru/post/265769/


All Articles