📜 ⬆️ ⬇️

Versioning and Deploying PostgreSQL Code

Hundreds of databases and thousands of stored procedures. How to write all this, test and deploy to many servers with the ability to quickly roll back in the conditions of hload 24x7 and not die? Interesting? Welcome under the cut!

image

As you already know, all your Avito ads live in PostgreSQL. The capabilities of this database provide us with a very large functionality, based not only on the data level, but also on creating our own API to provide access to this data through stored procedures, triggers, functions. When working with this whole structure, it may often be necessary to make any changes. And in the simplest case, when a developer is dealing with one client and one database, the update process looks quite simple: changes, migration script and everything. But such a situation is rare, most often customers and databases for any product number in the hundreds. Thus, for the normal database life cycle, a code versioning mechanism is essential.

In this article we will talk about two ways of versioning code inside databases that we implemented in Avito. You may need this information. First, a little context.
')
Avito is:


Our initial tasks:


The first version of versioning


The first option that we came up with is versioning via a dictionary .

Details



Column name


Description


Example


branch


Title
branches,
wherein
was conducted
stored development
procedures


location-id-fix


fn_name


Title
stored
procedures including
scheme


core.location_save


fn_md5


Hash sum
(md5) code
stored
procedures


0539f31fee4efd845a24c9878cd721b2


ver_id


Version number
increases
by 1 with
change
hash
default: 0


2


create_txtime


Time
create


2016-12-11 10:16:10


update_txtime


Time
the last
updates
versions
(increases
ver_id)


2016-12-11 11:23:14




1 => array ( 'verId' => 2, 'hash' => '0539f31fee4efd845a24c9878cd721b2', 'fnFullName' => 'core.location_save@master' ) 



 $this->db->exec( "select core.location_save%ver%(...)" ); 


Depla stored procedures is carried out in the first steps of the project assembly, before the assembly of dictionaries.

For each file stored procedure in the project:


  1. The hash sum from the file contents is calculated, then the minimum version of the stored procedure with the new hash sum is stored in the stored_procedures table.
  2. If nothing was found (previously such a procedure was not deployed in any branch), then the version for the new procedure is incremented and allowed by the procedure of this procedure to the database.
  3. If this stored procedure with a new hash sum has already been used previously in other branches, then the current branch will also use this procedure with the minimum version without a new deployment to the database.
  4. If this stored procedure was used earlier in this thread and the new hash sum differs from the hash sum in the stored_procedures table for the current record and this stored procedure with the new hash sum ...
    - it was not used in other branches and the minimum version is not known, then the version is incremented for the new procedure and allowed to be applied to the database;
    - used in other branches and the minimum version is known, then the current branch will use the existing stored procedure with the minimum version without a new code deployment to the database.
  5. In case of initial registration in the stored_procedures table or the ver_id update, the stored procedure creation code is executed on the target base with a previously prepared version in the SQL header of the stored procedure.

     CREATE OR REPLACE FUNCTION core.location_save(...) 

    The php will turn into

     CREATE OR REPLACE FUNCTION core.location_save_ver2(...) 

    and run on the base.

    The core.location_save.sql file will remain intact.
  6. Next, the dictionary is assembled, which at this stage contains the current versions of the stored procedures for this branch.

Advantages of this code versioning method:



Minuses:



The second version of versioning


The next option, you guessed it, came from the cons of the previous one. Let's designate it as versioning by creating a unique scheme and a user for each new build of the project .


Details


Information on all builds is stored in the build_history table in the database on the main server.

Column name


Description


Example


build_branch


Name of the branch to be collected


deploy_search_path


build_tag


The name of the future archive with the project


Deploy_1501247988


build_time


Project build time


07.28.17 13:19:48


schema_name


Designated scheme for the project


z_build_1


schema_user


Assigned database user for the project


user_1


deploy_time


Time to switch to a new project code


07/28/17 2:05:22 PM




The process of combat deployment during project assembly:


  1. When the build starts, the build_history table registers information about the new build, assigns a unique schema and user to connect to the database servers.
  2. A user is written to the config, which is inserted together with the project code.
  3. There is a connection to the database servers under a special user for deployment .
  4. Databases create (recreate, if any) assigned schema with stored procedures.
  5. After the project code is decomposed into all application servers before replacing the symlink with a new project code, one of these servers will contact the main server, on which:
    - set the time to switch to a new project code in the build_history table;
    - for the selected user, the production group is assigned to know who is in the battle, and do not accidentally overwrite the scheme with stored procedures if the project will be repeatedly rebuilt without further switching of the symlink;
    - on all servers where the scheme was created, a new search_path of the form is set:
    search_path = public, <assigned schema> for:
    - selected user user_N;
    - developers and DBA teams;
    - user for various crowns, etc.

An important addition to setting up pools in pgbouncer


If you are using a pgbouncer, you should use the max_db_connections option to pool_size to limit the size of the pool. Without its use, each user of the pool will have its own pool_size. This behavior is not documented, but max_db_connections works like this: it limits the number of active transactions for all users of the pool.


Pgbouncer pool example:


 my_database = host=localhost pool_size=5 max_db_connections=5 

In conclusion, I would like to note that the presented versions of code versioning showed themselves perfectly in the highload mode 24x7 and are used in our hybrid mode. But more recently we are giving preference to the second method on search_path.


Thanks for attention!

Source: https://habr.com/ru/post/342946/


All Articles