Versioned database structure migration: another approach

I read an interesting and useful article ( 1 ) - and wanted to share my own experience. In our company for 12 years of working with one (its own, Oracle-oriented) program, hundreds of clients have accumulated a wealth of material on upgrading the database structure.

Initially, we assumed that it was enough to store the version number of the latest upgrade in each database and roll the scripts incrementally, raising the version to the required one. This technique was successfully used in the previous version of our program, which worked with Paradox DBMS. But everything went wrong with the Oracle DBMS, each client had its own vision of what its database should be, and version out of sync became an inevitable evil. The habitual upgrade technique for versions began to constantly lead to errors that no one paid any attention to, and the mismatch of the structures lasted for several years. As a result, each client had its own, not identical to any other, structure of the database, and the client part of the program somehow had to deal with it.

At some point, the management instructed the leading programmer of the company (me) to tackle the problem of upgrades to the database - a creative person who prefers to spend more time but avoid routine work. And then a tool was developed that allowed virtually eliminating the differences in database structures for all clients. About this technology I want to tell the world community.
')

The basis of the new technology is three algorithms:

- Getting the XML image of the real database
- Comparing two XML images and identifying differences
- Forming a script to eliminate differences

User interface

Visually, the tool looks like a window divided horizontally into three areas:
- in the left panel there is a tree structure of loaded objects (for brevity - “tree”);
- in the middle and right panels - a detailed description of the object selected in the tree in the structures being compared (I call these panels “comparison panels”).

One or two structures are loaded into the program - from a real database or from an XML file. The object tree displays a unified hierarchy, that is, objects that are present in at least one of the two loaded structures. Identical objects that are different or absent on one side are represented by different icons — white, blue or red-white, respectively (red from the side where the object is missing).

Above the tree at the touch of a button, the filter panel appears or disappears.

At the top of each of the comparison panels, the data source is displayed — an Oracle schema or file.

Above the comparison panels at the touch of a button, the options panel appears or disappears.

The user interface could be different, it doesn’t matter - and maybe we will redo it in the future - but for now it looks something like what I described.

Obtaining an XML image of a real database

Connecting to a real database, the program executes a series of SQL queries, obtaining information about Oracle metadata, and the obtained information is collected in the program memory into a hierarchical object structure, where each metadata object is associated with its own hierarchy node.

In this case, the user can impose a filter on the types and names of the metadata objects being loaded.

In general, structures of several schemes are subject to loading. The user selects one of three options:
- Abstract scheme
- Specific scheme
- All server schemes

If you select the first or second option, you can select several additional schemes on the next panel, and we immediately call the selected scheme the main one.

If the third option is selected, the structures of all server schemes are loaded, except for standard ones (created during the Oracle installation and the creation of the database instance)

"Abstract schema" means that the name of the main Oracle schema does not matter, so you can compare with each other similar in structure schema with different names. Practically, this means that the host name of the main schema will not receive the name of the loaded schema, but the abstract name user - you cannot create a schema with that name (and even if someone succeeds, then there is little point in it).

"Specific scheme" means that the name of the main scheme has a value, and its structure can only be compared with the structure of the same name scheme.

Additional schemas are always compared by name, that is, the nodes of these schemas are named after the original Oracle schemas.

The resulting object structure can be saved in an XML file that can be immediately packaged in ZIP format.

Accordingly, such a file can be read by the program, and the object model is restored from the saved image.

Each new release of our program comes with a standard database in a packed file so that the client’s server can analyze the differences by comparing the actual structure with the image from the attached file.

Comparing two XML images and identifying differences

The identification of the compared nodes is based on two criteria - a common parent node and the same name of the node itself. The node name is also used to sort the child nodes of a single parent node. Therefore, the naming of nodes of different types of objects is emphasized.

At the top level of the hierarchy are the schematic nodes — first there are additional schemas sorted by name, then the main schema (remember, this node is called either the schema name or the word user).

Child schematic nodes - groups of objects of the same type (tables, views, sequencers, stored procedures, etc.) The names of these nodes correspond to their purpose: tables, views, etc.

At the third level are the objects themselves, whose names must be unique within their group.

The fourth and further levels are determined by the type of object, and are not displayed in the tree. These levels correspond to objects or their groups stored and processed in the program memory and displayed on the comparison panels when selecting the corresponding third-level object.

For example, a table has a child node columns with its child nodes describing each field of the table, there are nodes triggers, constraints, etc.

Each object in the program's memory stores signs of the presence of an object in each of the compared structures and the differences between them. Composite objects are compared by differences in child nodes, end nodes are compared using individual methods for each type of object.

For PL / SQL objects (triggers, procedures, etc.) it is possible to launch an external utility to visually compare text data such as Beyond Compare, WinMerge or any other (it is assumed that this utility accepts the names of the compared files as the first two command line parameters) .

But the decision on the identity of these objects is made on the basis of its own comparison algorithm, which ignores differences in comments, delimiters (spaces, tabs, line breaks) or the register of characters (except for string constants).

Forming a script to eliminate differences

The script can be formed both from left to right and from right to left - both comparison panels are equal for all algorithms.

The script can be generated both for the entire loaded structure, and for the object selected in the tree (scheme, group of objects of one type or one object).

When forming the script, the same filters are applied as when loading the structure (by type and by name). For example, you can compare only triggers and procedures, etc. - even if the download occurred without a filter, since it can be changed after the structures are loaded.

The script is built in turn for each scheme in the order of their enumeration - thus, additional schemes are compared before the main one, and all operations on one scheme fall into the final script in one continuous fragment.

Within one schema, the script is built by object types — first synonyms are processed, then type headers, tables, foreign keys, sequencers, functions, procedures, package headers, views, type bodies, package bodies, triggers, and contexts.

Objects of the same type are processed in the order of their names (alphabetically). Sometimes (quite rarely), this still leads to conflicts that are eliminated by re-applying the synchronization algorithm — but in most cases, the specified order of type processing eliminates errors associated with dependencies between objects.

The following rules apply when handling differences:
- If there is no object in the target structure, a command is added to the script to create it.
- If there is a difference in the structure of the object, commands for making changes are added to the script.
- If there is no object in the original structure, nothing is done by default, but there is an optional possibility to insert commands to the script to delete the object.
- In the absence of changes in the default structure, nothing is done, but there is an optional possibility to insert commands to re-create the object in the script.

Connect to the database server

For the analysis of the scheme and for the application of the script, two connection options are possible - on behalf of the main scheme and on behalf of the superuser (sys). And the second is preferable, but if you do not have access (the sys password is unknown), you can work on behalf of the main scheme, but some this case is not available. The program itself tries both connection options, first trying to connect on behalf of sys.

The connection is established every time before downloading the metadata from the server and before executing the finished script, after performing the required operation, an automatic disconnection from the server is performed.

Conclusion

The described technology may not be perfect - and I will be happy to read the opinion of my colleagues, even if I disagree with them.

Nevertheless, the described tool has been working in our company for about three years, and during this time it has helped to analyze and eliminate the differences in most of our clients' schemes.

The tool is constantly improving and acquiring new features - now it can export and import data using a filter for data and metadata, make a full backup of the program (client, database and settings files), download and download files from the Oracle Directory, to which there is no direct access through the file system, it also implements the old method of stored scripts by version - we use this method in special cases, these scripts are executed before comparing with the new technology .
_{(I highlighted this place in color after I repeated it 3 or 4 times in the comments in response to the questions)}

All features of the tool are available through running in command line mode, that is, you can create batch command files that run on a schedule without human intervention.

Of course, such things as upgrading the database structure, especially with a large number of differences, it is better to do manually, analyzing each different object in turn. In addition, before such an operation it is strongly recommended to do a full backup of the database. But as the described algorithms improve, the value of the user's qualification decreases right before our eyes.

I am new to Habré, and here it seems that emoticons don't seem to be so I suggest that readers themselves guess in which phrases I allowed myself to joke.

Thank you for attention.

Source: https://habr.com/ru/post/121909/

All Articles

Versioned database structure migration: another approach

More articles: