Business logic in a database using SchemaKeeper

The purpose of this article is to show, using the example of the schema-keeper library, tools that make it possible to significantly simplify the process of developing databases within PHP projects using the PostgreSQL DBMS.

The information from this article, first of all, will be useful for developers who want to use the capabilities of PostgreSQL to the maximum, but are faced with the problems of maintaining the business logic in the database.

The article will not describe the advantages or disadvantages of storing business logic in a database. It is assumed that the choice has already been made by the reader.

The following questions will be considered:

In which form to store the database structure in the version control system (hereinafter referred to as VCS)
How to track changes in the database structure after saving a dump
How to transfer changes in the database structure to other environments without conflicts and giant migration files
How to adjust the process of parallel work on a project of several developers
How to safely deploy a large number of changes in the database structure in a production environment
SchemaKeeper is designed to work with stored procedures written in the PL / pgSQL language. Testing with other languages was not conducted, respectively, the use may not be as effective or impossible.

How to store the dump structure of the database in VCS

The schema-keeper library provides the saveDump function, which saves the structure of all objects from the database as separate text files. The output creates a directory containing the database structure, divided into grouped files that are easy to add to the VCS.

Consider the conversion of objects from the database to files with a few examples:

Object type	Scheme	Title	Relative file path
Table	public	accounts	`./public/tables/accounts.txt`
Stored procedure	public	auth (hash bigint)	`./public/functions/auth(int8).sql`
Representation	booking	tariffs	`./booking/views/tariffs.txt`

The contents of the files is a textual representation of the structure of a specific database object. For example, for stored procedures, the contents of the file will be the complete definition of the stored procedure, starting with the CREATE OR REPLACE FUNCTION block.

As can be seen from the table above, the path to the file stores information about the type, scheme and name of the object. This approach makes it easier to navigate through the dump and code review of changes to the database.

The .sql for the files with the source code of stored procedures is selected in order for the IDE to automatically provide tools for interacting with the database when opening a file.

How to track changes in the database structure after saving a dump

By saving the dump of the current database structure in VCS, we are able to check whether changes have been made to the database structure after creating the dump. The schema-keeper library for detecting changes in the database structure provides a function verifyDump , which returns information about differences without side effects.

An alternative way to check is to call the saveDump function saveDump , specifying the same directory, and check for changes in the VCS. Since all objects from the database are saved in separate files, VCS will show only the changed objects. The main disadvantage of this method is the need to overwrite files to see the changes.

How to transfer changes in the database structure to other environments without conflicts and giant migration files

Thanks to the deployDump function , the source code of stored procedures can be edited in exactly the same way as regular application source code. You can add / delete new lines in the code of stored procedures and immediately send changes to the version control system, or create / delete stored procedures by creating / deleting the corresponding files in the dump directory.

For example, to create a new stored procedure in the public scheme, it is enough to create a new file with the .sql extension in the public/functions directory, place the source code of the stored procedure in it, including the CREATE OR REPLACE FUNCTION block, then call the deployDump function. Similarly, the change and removal of the stored procedure. Thus, the code simultaneously enters both the VCS and the database.

If an error appears in the source code of any stored procedure, then deployDump will not be executed, throwing an exception. The mismatch of stored procedures between the dump and the current database is not possible when using deployDump .

When creating a new stored procedure, there is no need to manually enter the correct file name. It is enough that the file has the extension .sql . The correct name can be obtained from the return value of the deployDump function, and use it to rename the file.

deployDump allows deployDump to change the parameters of a function or return type without additional actions, while the classical approach would have to
first perform DROP FUNCTION , and only then CREATE OR REPLACE FUNCTION .

Unfortunately, there are some situations where deployDump not able to automatically apply changes. For example, if a trigger function is deleted that is used by at least one trigger. Such situations are solved manually with the help of migration files.

If the schema-keeper is responsible for transferring changes in stored procedures, then the migration files must be used to transfer the remaining changes in the structure. For example, doctrine / migrations is a good migration library.

Migrations must be applied before deployDump . This allows you to make all changes to the structure and resolve problem situations so that changes in stored procedures are subsequently transferred without problems.

Work with migrations will be described in more detail in the following sections.

How to adjust the process of parallel work on a project of several developers

You need to create a full database initialization script that will be launched by the developer on your working machine, adjusting the structure of the local database to the dump stored in the VCS. The easiest way to divide the local database initialization into 3 steps:

Import a file with a basic structure, which will be called, for example, base.sql
Migration application
Call deployDump

base.sql is the starting point, over which migrations are applied and deployDump is deployDump , that is, base.sql + + deployDump = . You can generate such a file using the pg_dump utility. base.sql used exclusively when initializing a database from scratch.

Let's call the full database initialization script refresh.sh . Workflow might look like this:

The developer runs refresh.sh in his environment and gets the current database structure.
The developer begins work on the task, modifying the local database to the needs of the new functionality ( ALTER TABLE ... ADD COLUMN , etc.)
After completing the task, the developer calls the saveDump function to commit the changes made to the database to the VCS.
The developer refresh.sh , then verifyDump , which now shows a list of changes to include in the migration.
The developer transfers the entire structure change to the migration file, launches refresh.sh and verifyDump , and, if the migration is completed correctly, verifyDump will show that there is no difference between the local database and the saved dump

The process described above is compatible with gitflow principles. Each branch in VCS will contain its own version of the dump, and when merging the branches, the dumps will be merged. In most cases, no additional actions need to be taken after the merger, but if changes were made in different branches, for example, in the same table, a conflict may arise.

Consider a conflict situation on an example: there is a develop branch, from which two branches are branched : feature1 and feature2 , which do not have conflicts with develop , but have conflicts between themselves. The goal is to merge both branches in develop . For such a case, it is recommended to first merge one of the branches in develop , and then merge develop to the remaining branch, while resolving conflicts in the remaining branch, and then merging the last branch to develop . At the stage of conflict resolution, you may have to correct the migration file in the last branch so that it corresponds to the final dump, which includes the merge results.

How to safely deploy a large number of changes in the database structure in a production environment

Due to the presence in the VCS of the actual database structure, it becomes possible to check the production base for exact correspondence to the required structure. This ensures that all the changes that the developers intended were successfully transferred to the production-base.

Since the PostgreSQL DDL is transactional , it is recommended that you follow the following deployment order so that, in case of an unforeseen error, it is “painless” to perform a ROLLBACK :

Start transaction
Complete all migrations in a transaction
In the same transaction, perform deployDump
Without completing the transaction, execute verifyDump . If there are no errors, execute COMMIT . If there is an error, perform a ROLLBACK

These steps are fairly easy to integrate into existing approaches to the deployment of applications, including zero-downtime.

Conclusion

Thanks to the methods described above, you can squeeze out maximum performance from “PHP + PostgreSQL” projects, while sacrificing a relatively small amount of design convenience compared to implementing all the business logic in the main application code. Moreover, data processing in PL / pgSQL often looks more transparent and requires less code than the same functionality written in PHP.

Source: https://habr.com/ru/post/447746/

All Articles