Theory and practice of migration of web systems to PostgreSQL

In recent months, the problems of migration of working systems to open-source data storage solutions have captured the minds of domestic developers. PostgreSQL is particularly popular as a target platform. There are several reasons for this:

The import substitution policy that is widely known by the government;
The promotion of PostgreSQL by enthusiasts and the development of the Russian community through activities such as PG Day and PGConf;
Extending the functionality of PostgreSQL, which allows developers to build flexible and “schema-less” applications, while not losing all the advantages of DBMS, such as fair transactions, fault tolerance, scalability, etc.

We were able to verify the effectiveness of PostgreSQL several years ago. The introduction of the DBMS has allowed to eliminate a serious technological crisis in one of the large projects of the company. A detailed story about this success story was held at PG Day'14 Russia, held last year in St. Petersburg. Since then, we have had the opportunity to try a database to solve a wide range of problems.

For four years we have spent about 15 migrations from various repositories on projects of varying degrees of criticality and load, from small services to large databases that are dialing up to a terabyte of transaction logs daily. Today, about 85-90% of the tasks of storing and processing data of different profiles in our company's projects are solved by means of PostgreSQL.
')
The goal of this article is to introduce you to our migration experience and to talk about some practices that will greatly simplify the migration process and reduce potential labor costs. Consider the typical misconceptions and the inevitable realities of commercial development, which will have to face. We will discuss some organizational nuances that are rarely thought about when planning a migration.

What is “migration”?

The general meaning of the term is clear to everyone. Migration is the process of changing one data warehouse to another. Quite often, migration is perceived as a purely technological process (preparing an application for working with a new DBMS), and this is the most common mistake. Migration is an integrated process that includes organizational and training tasks, planning, proper allocation of resources available to a manager, risk assessment.

Before rushing to rewrite the code, you need to weigh the pros and cons and make sure that you still need to migrate the database. Often, decision makers on migration inadequately assess the need to change the database, guided by biased criteria. Often it turns out that it is possible to achieve an acceptable result on an existing database, having thoroughly studied it and tried various options. If you have not done so yet, forget about the migration. To “move out” from an unexplored base is an unreasonably high risk. Unknown reefs more than once and not two will emerge in the process of work, giving rise to unrecorded dependencies, delays in terms and general discontent.

If you decide to change the database, which works more or less decently, simply because you have read an article about a new and interesting NoSQL-development, please have fun and do not create problems for your colleagues. There is nothing worse than an incompetent "technical leader" or a manager who makes serious decisions based on his whims and personal interests.

Do not hurry. The task of migration is complex, uninteresting, labor-intensive and generates unnecessarily high expectations for the resulting product. You will come across a mass of nuances, spend a lot of time on a blatant routine, which is already so much in our work, will inevitably "fill up" the deadlines for one reason or another. The product received after migration will have to clear up some hidden problems for some time. Poorly technical-savvy management will think that a bright future has come, and with redoubled efforts will begin to invent commercial tasks.

In general, if you have the opportunity not to do the migration - do not do it. Try to solve the problem with existing tools. If you have no other choice, you have exhausted all the options with minimal effort, (perspiring) external forces force you to start the migration, and you have firmly decided to move forward - read on.

How to start the migration?

Migration is the transfer of data to a new repository. It is logical to assume that we should start with an assessment of the state of the current repository. Examine the scheme, evaluate its confusion. How many tables are stored in it, which indexes, triggers, and foreign keys are defined on them. What is the amount of data stored in these tables? Is there any code that resides inside the repository (stored procedures and the like)? What components and services of your system work with the database? What application code interacts with the database?

Answers to these questions will allow you to understand several important things.

First of all, how much code is to be rewritten, inside the database and on the application side. It would not be superfluous to estimate for yourself which code is more or less complex. Suppose that you, as a competent "Technical", are well aware of the abilities and capabilities of your team. Consequently, you can immediately understand to whom what layer of work can be entrusted and, based on this, what part of the team must be involved in preparing for migration.

Migration assumes that at some point the current storage will have to be stopped in order to upload data from there and then send it to a new storage. This, in one way or another, implies downtime for all components of the system working with the storage. Already at this stage it is useful to think about how this will turn around and what workarounds there are if shutting down services is absolutely unacceptable.

If the unpleasant prospect of migration without “downtime” loomed on the horizon, it would not be superfluous to immediately consider whether it is possible to divide the migration into several successive stages. Here everything is very dependent on the specific architecture of your individual project, it is not possible to give a definite answer. I can only warn against the invention of something that you and your colleagues will later regret all my life.

Suppose you want to migrate a database from MySQL to PostgreSQL, and to break a task into parts, you intend to make PostgreSQL a slave-node "muskul". Reception is quite feasible and seemingly uncomplicated in practice: enable replication, “parse” the binary log with a self-made script, convert SQL instructions into a PostgreSQL-compatible format and shove into the target database. In practice, this will turn into a nightmare and hellish torments. Every SMS message coming to the phone will make you nervously shudder and expect “hello” from a broken script that could not make out the next incoming request.

It makes no sense to explain why such things are extremely dangerous. Any person who has worked in the development of a more or less large, commercially successful service understands that, with great probability, the migration divided into two stages will be hastily slowed down after the first half of the plan has been implemented and the team will have to live with “temporary” solutions for a couple more years.

And, of course, the fact of evaluating the scheme will allow you to get an idea of how insanely the current data warehouse is arranged, and what problems the team will objectively face during preparation. At the current stage, this will not help much, except as an additional hint in assessing the timing and an approximate understanding of what nuances should be considered in advance.

Armed with the above knowledge, you can go to the office of the authorities and substantively discuss the complexity of the proposed migration, the number of people required, the approach to developing new features during migration, the possibility of service downtime at the time of the actual move and similar aspects. Always increase the required dates. You not only process the code, but also the team to prepare, train, all sorts of different pitfalls to find. Just reworking the code will take a third more time than you expect.

Schema and data processing

Before you even think about the adaptation of the data scheme, throw out excess garbage from the database. I have never once had a chance to meet a database, arbitrarily large or small, in which there would be no temporary tables, outdated information and other obscure archaisms. Absolutely in any base there is a bloat. The more complicated and older your project is, the more likely it is that you can safely throw out a third of the tables without in any way disturbing the logic of the application.

Do not be lazy, go through the scheme, look for entries in the code and eliminate all unnecessary. With a similar execution, it is necessary to transfer all data that is not needed for the operation of the service. In any system, there are millions and gigabytes of statistics "for all time", which are vital for the commercial department, but in practice God’s power is taken away a couple of times a year. Terabytes of archive logs, which are so fascinating colleagues tell at conferences. All this must be mercilessly removed from the database. If someone resists, argue that reducing the time spent on migration in general and data transfer in particular. Less data - quickly transfer them to a new database and start the service. The math is simple.

Now it's time to think about adapting the data schema. Immediately upset, it is impossible to automate this process and, in general, it is not necessary. Do not waste time searching for a tool that will automatically convert your circuit. What you get at the exit will not start the first time, and you will spend a lot of time “finishing” the results of the work of such a program. The data schema of a functioning product is a phenomenon that is too complex and cannot be formalized clearly. Any third-party utility will give out some approximation of the “how it should be”, and will completely ignore most of the subtleties of your particular system.

Perhaps worse than the idea of finding such a utility can only be the idea to write such a thing yourself. Do not waste time! Well, why do you need a versatile tool for a one-time and unique task? The saddest case from my practice was trying to write such a utility by my predecessors to manage one large project. A program that converts extremely complex SQL-code with a bunch of DDL constructs on the fly has been developed for about a year and a half. No one saw her working adequately.

The way out of this situation is very simple, but extremely disgusting and disgusting for any self-respecting programmer. You need to sit down and, having killed one day of your working time, manually rewrite the scheme for a new storage, wisely and carefully using the Copy & Replace function. This is not a pleasant activity, but I have already mentioned that the migration process, in principle, is not a very encouraging event. Nevertheless, the benefits of this approach mass. You will thoroughly study the entire data scheme, find something else useless that can be thrown out, and work out all the nuances. You will probably see some typical problems and their possible solutions. Do not be lazy, take notes. Coming soon.

With the scheme figured out, it's time to migrate data. Here the whole story looks exactly the same way. Forget about automation and universal solutions. Write a simple script on your lap that will download the data from the source storage and import it into the target one. In our practice, the most effective way to do this turned out to be a banal “dump” of plates in CSV format. All more or less adequate bases allow you to make such a “dump” by configuring a number of parameters (the escape character, the separator, the line break, the NULL character, etc.). The files are relatively small, neat, well compressed with gzip-ohm for subsequent transfer over the network (again, banal rsync-ohm) and are eaten very quickly by PostgreSQL (hint: use COPY).

Making a script is a matter of one evening. The special charm of such a tool is that you can easily program in it the processing of any specific situations. Feel free to “hardcore”. If you managed to save “copy-paste” from Microsoft Word with special characters in some nameplate, go through the file with perl / sed or something else, correct it and add it for import. In general, this approach is the only one that will allow you to avoid surprises and guarantee the complete transfer of the entire database. Similarly, prepare scripts that import some additional objects, for example, the values of sequences for the corresponding columns.

As soon as the tools are made and successful export-import of data is achieved, it is necessary to organize an automated check of this whole economy. Imagine this is a backup & recovery solution. Any self-respecting engineer knows that it is not enough just to “regularly shoot” a backup copy. It should also be regularly checked, because the “backup” is in the habit of breaking, which is better not to allow. Do the same. Once a week, perform an automated (or manual, as you prefer) export-import run with fresh data from the “combat” database. Work on preparing for migration is still very long and, most likely, some changes will occur in the database during this time or data will appear that you did not take into account. In order not to deal with these surprises in combat conditions during real migration, identify problems and solve them in a timely manner.

Periodic review of the export-import procedure will help you estimate the time spent on data transfer. At this point, you have the opportunity to think where and what can be optimized. For example, to make a tool multithreaded and simultaneously upload / download data from several tables.

Code recycling

Adaptation of the application code is the most inevitable and sad part of the whole event. Before you begin, you need to decide on some important points.

First, what and in what order should be rewritten. The general rule is that the most complex and intricate things are redone first. Things with poorly formalizable business logic that cannot be verified with a clearly defined set of input parameters (usually these are some kind of billing or analytics calculation) should be prepared immediately and rotated constantly on the test bench. Over time, various kosyachki will pop up, which you can fix in a timely manner before being sent to “production”. It is obvious that the more such precedents happen before rolling out the solution “into battle”, the better.

Second, prepare the knowledge base. It is highly probable that your colleagues and subordinates do not have well-developed skills in bringing up a new database. In this regard, identify the best ways to solve individual problems. Find typical, repeating problems in the application code and describe how to fix them. This will save time for all process partners.

Thirdly, forget about refactoring, optimization, improving the code and other amenities of the programmer. Your task is to transfer the application to the new platform as close as possible to the original. Do not rewrite business logic. Do not apply design patterns. Do not replace one library with another. Disable it to all your subordinate programmers. Any attempts to do something like this will lead to the emergence of new bugs, delayed timing and general anxiety. Something can be rewritten and optimized only in one situation: when a rewritten section of code, for one reason or another, does not cope effectively with its task.

Fourth, formalize and structure the rules for writing new code. It is highly likely that migration was started due to the inability to adequately support the current system. If you do not want to repeat the mistakes of the past (and it’s better not to repeat them, the authorities will not approve the re-migration), make general rules and make everyone follow them. Remember that a well-written project is one that can be easily and systematically supported. Since we started a lot of recycling, make efforts to ensure that life after migration is not the same torment as life before it.

Now you can open a text editor and start processing the code. No one can tell you how difficult and how long this will take. Your code, you know better. Perhaps you are lucky and you will not have three-story SQL queries in each module, all using database-specific “features” of the language. By the way, I believe that this is one of the few scenarios where using ORM to execute CRUD requests greatly objectively speeds up development and simplifies the programmer’s life, and not vice versa.

About testing

If you expect that at the time of the migration, business representatives will completely stop the commercial development of the product and the associated technical department tasks, then you are very naive and deeply mistaken. Tasks will appear, and they will have to do.

How to solve this dilemma? Very simple. Highlight part of the team to support the existing system. Train them in the same rules of design as people working on migration. Obviously, after the migration, the whole team will develop a new platform, so all engineers, without exception, must understand and accept the new realities of development. Implement a new rule of work on the task: the code does not go into “production” until your fighters have prepared and tested the version for the new platform. Yes, it actually means that for each task you will make two versions of the code, the old and the new.

All this is disgusting, tedious and unpleasant. But there are no other alternatives. No one will ever give you time after migration to adapt the new functionality. Not to mention the fact that a month later, the accumulated stack of changes will go out of control and you will not find it later. The output will be unstable and have lost some of the functionality of the product. Bosses tend to be offended by this.

If you are suddenly lucky and business representatives have agreed to suspend commercial development during the migration, rejoice. But the probability of this is extremely small.

The testing process, like everything else, does not have any secrets and silver bullets. We need a competent tester who will process everything processed to thoroughly check whether the behavior of the new logic matches the old one. Automated testing is a secondary aspect. Personally, I highly advise you not to waste time on writing “autotests”, it is better to spend this time on thorough manual testing. If you already have a developed infrastructure and tests are written, be sure to use them. Otherwise, do not bother.

Test a lot and often, make several approaches to the previously tested. New and new flaws will emerge. Do not regret time and effort. The fewer problems then emerge in “production”, the more successful will be the result of the work, which will lead to categorical approval from the management and will contribute to the education of labor discipline in your team.

About training

As noted above, it is extremely important to set the right motion vector to a brighter future for your programmers. Well-formed rules of development will provide a systematic approach to work on the task and will make a very important contribution to the professional growth of the entire technical department.

Of course, the development rules and other “guidelines” are not enough. To colleagues appreciated your efforts, the team must be trained. My many years of practice have shown that programmer training is a purely individual process, regardless of what task a person is engaged in, whether it be migration or a standard daily development routine. Do you want to get a competent specialist? Allow me to spend time.

What do you need to do for this? Immerse yourself in the development of the head and follow who does what and how. Organize a team review code. Feel free to engage in “micromanagement” and clearly explain what you do not like and for what reason. Always be sure to argue your position. Get to the details, achieve the result you want to see. If one of your programmers does not understand this or that moment, do not be lazy to write or explain why everything works this way and not otherwise.

This is an extremely ungrateful job, and people will be offended by you. Someone who is chronically unable to adapt to a common ideology, will be completely offended and leave the project. The result will appear only after some time, when all programmers will get used to working systematically according to the same rules and will appreciate in practice how easy and pleasant it is to develop code and maintain the system when everything is written in the same style, qualitatively and clearly. And this is despite the fact that the team can work up to 10 people or more.

Launching a project after migration is a kind of new life for a product. As a colleague of mine recently said, it was never possible to catch the moment when a new project was flipped from the state “Immediately Make Everything Right” to the state “Initially Everything Was Done Wrong”. However, all partners will be determined to do better this time. Take a moment and direct the energy of your colleagues to the right direction.

Hour H

The migration procedure itself is a task no less complex and responsible than all that preceded it.To its implementation, too, need to prepare.

Sit down with colleagues and thoroughly write down the sequence of migration. Someone who stops and turns on, who where what code rolls out, etc. Write everything in great detail. At the exit, each of the employees should have a step-by-step plan that will allow him to perform actions without thinking very much about what follows what and if he has not missed something.

Having prepared such a scenario, be sure to conduct several test runs. This will finally make sure that everything works as it should, smooth out irregularities and eliminate shortcomings in the plan.

Be sure to plan for B. There is a possibility that something will go wrong somewhere. Serious and unforeseen circumstances during the migration are a guaranteed reason to “roll back” and prepare for re-entry. If such a situation happens, you should be ready for it, the entire sequence of rollback to the starting point should also be written and rehearsed. Practice doing a rollback procedure with a command. Discuss a possible rollback procedure with representatives of commerce, they should be morally prepared for this.

No less mandatory is the simulation of the behavior of the system after a regular switch. This was one of the key mistakes of our first migration. We checked the billing performance in the regular conditions on the new base, but did not take into account the fact that the inclusion of billing after idle maintenance will lead to the accumulation of a large number of processing tasks. In practice, during migration, it turned out that billing did not cope with the task. I had to cancel the migration, redo the billing algorithms with PostgreSQL specific and try again. Do not repeat our mistakes, simulate the inclusion of the service after downtime.

Talk time estimates based on the results of test runs with non-technical staff (managers, support staff, etc.). They have to keep the defense and fend off nervous partners. Explain what will happen and how, be prepared to promptly notify them of unforeseen circumstances and changes in plans.

Plan your migration for Monday morning. All must be rested and slept. Forbid on the eve of abusing alcohol and other substances, "chopped" all night in the MMORPG. Do not let us install software and OS updates on workstations, rebuild the Linux kernel. Make sure that there will be no problems with Internet access, arrange a backup option in case the main connection breaks down. Think about other possible circumstances that might prevent you.

Bright future?

How to understand that the migration was successful? Definitely it will become known only after some time. But a few more or less objective criteria to select does not hurt.

First of all, the migration procedure took place as scheduled, plus or minus 30 minutes for various trifles and the human factor. Any deviation from the original plan means that something has gone wrong, and it can be felt in the future.

You did not need some kind of terrible crutches to prop up key components of business logic. New life does not start like that. Think hard if you are ready to take the risk of releasing such a thing into final and irrevocable “production”.

In the process, wild bugs did not come out, to eliminate which you will need another month of idle time for business tasks. No one will like it, including you. These problems must be identified before migration. Roll back, resolve, test and try again.

Any surprise with which it is not clear how to deal is a reason to roll back. Remember, the most important thing is not to lose what was already working before your intervention. No delays in terms are comparable with such risks. Not sure - don't make hasty decisions.

If the migration took place, then I hasten to congratulate you. The problems in your life will now become even more. The product will need to be “combed” for a long time and optimized for the capabilities of the new base. In spite of this, there will be even more tasks. The set high level of quality of work in the technical department will now have to constantly keep in order not to bring the matter to re-migration.

So do not expect that the migration task will be justified at the moment when it will be completed. The effect will appear later, when a stable working service stops disturbing you every week with night “falls”, a trained and trained department will solve problems amicably, smoothly and efficiently.

For us, migration to PostgreSQL is not only a means of solving major technological problems. It is an excellent tool for building effective development, debugging relevant processes in a team and system development of a common base of competences within companies.

One of the reasons why the community regularly selects this DBMS for its tasks with increasing frequency is the realization that PostgreSQL has been used for a long time to solve the needs of large businesses. Even representatives of the industrial and banking sector (areas in which commercial DBMS traditionally run the show) are not afraid to apply free and open technology. A striking example is the largest Brazilian bank Caixa. All processing of bank cards in ATMs of the country is built using PostgreSQL. An extremely interesting business case was described by one of the architects of the system at the Canadian PGCon in 2010. We decided to invite a colleague from distant Brazil to share the secrets of success for the upcoming PG Day in St. Petersburg.

PostgreSQL has established itself as a product capable of solving serious problems. This is what motivates us to introduce technology in our own projects and contribute to its popularization in the domestic development market.

Source: https://habr.com/ru/post/254667/

All Articles