How to manage sections in Oracle DB and not go crazy

We have already talked about why database partitioning is very important for the performance of a DLP system and how we implemented it in PostgreSQL. This article is about Oracle.

The specificity of using DBMS in DLP solutions is that the amount of data grows very quickly. They cannot be kept in the operational archive, and long-term storage is a need for a company of at least 50 employees. At the same time, the operational archive is filled so quickly that it is necessary to send information to the long-term archive once every 2 weeks or more. Using only built-in tools of the DBMS requires knowledge and experience. This is the main difficulty, and it, in general, is obvious "on the shore."

In addition, problems arise that are not immediately obvious. How to return from the long-term archive a partition with the data of an older version of the application and attach to a more recent one? What if they have different data storage formats? What to do if the connection of the section was interrupted, and it “hung” between the long-term and operational archive?
')

In general, resolving these issues boils down to two main technical tasks: automating the management of sections in the Oracle DBMS (disconnecting and connecting) and the system “rolling back” sections if something went wrong during connection.

What is wrong with the built-in mechanisms of Oracle DB

The Partitioning option helps to remove data to the tape and return it from the long-term archive; with its help, you can divide the table into parts according to some principle, for example, by a range of dates. In addition to manageability and availability, this separation also improves performance. Each period is stored in a separate tablespace, which allows using the transportable tablespace technology to quickly enough move tablespaces between different reporting and archive databases with different versions and platforms. But the problem is that standard mechanisms are not always enough: they only allow you to create basic structures without taking into account the specifics of the application. And then the administrator is forced to create around them a bunch of management tools. Yes, and the process of disconnect-connect-transfer requires the skills of database administration. Therefore, the minimum task is to automate this process, to make it available to application administrators.

We have developed a set of scripts with which you can manage partitioned tables, receive any information about them, etc. You do not need knowledge of teams, experience with the database. The administrator of the application simply runs the script or selects the desired action in the interface, indicates the desired partition, and everything happens by itself.

(Not) Version Compatibility

So, we have automated disabling sections and sending them to the long-term archive. But there is a problem with the long-term archive: sometimes it needs to be returned.

Suppose the administrator transferred several sections to it in the old version. A year later, a new version was released, in which new fields were added to the tables, new indices, and a number of sections went into the long-term archive. And then the security man investigates a certain incident, and he needs to raise the data two years ago, i.e. raise a section of several versions back and somehow connect it to the database.

The structure of the tables of the new version is sometimes different from the historical one. A number of checks and changes are needed for the archive section. Verification always begins with a comparison of the current version of Solar Dozor and the version of the DBMS, and the connected partition. If there are differences, procedures are launched, correcting metadata, the necessary fields are added, indexes, keys, the consistency of the connected data is checked, etc., the redundant is removed.

Additional complexity brings the use of text indexes for searching in Solar Dozor. There are some bugs related to EXCHANGE PARTITION for text indexes created in different versions of the DBMS or when using the transportable tablespace (up to version 12 of the metadata corruption version). Patches are not always for the right version or platform. Recreating indexes upon connection is not a quick and rather resource-intensive procedure. I had to “work” the workarounds in the procedures for connecting the partition. The DR $ structure of the text index tables of the connected partition is “aligned” with the current one, the table field ctxsys.dr $ index will be updated.

There is also protection against various errors of administrators. For example, at the application level, any actions with the partition that is currently being poured and having the “current” status are prohibited.

"Houston, we have a problem"

In the course of implementing these mechanisms, we are faced with another problem that is unexpectedly often encountered by customers. In the process of shutting down something can go wrong, even to a banal power outage, so that the connection section can be interrupted at any time. As a result, we obtain a base that is in the "intermediate" state.
Oracle has DDL and DML. DML implements a mechanism for ensuring transactional integrity, which rolls back the results if the transaction fails. There is no such mechanism in DDL, and any actions with a section are one way ends.

We have developed a mechanism that checks the execution of all steps to disconnect-connect partitions and corrects problems that arise. In case of problems, the mechanism restarts operations with the partition from the moment when something went wrong. Disconnected-connection errors are logged, and this allows you to know at any time what problems and when they arose.

How does it work to turn off partitions? A sequence of commands is created - foreign keys are disabled, a table is created that is identical in the structure of the partition to be disconnected, necessary indexes and primary keys are created for it, the exchange partition command, the inclusion of foreign keys. Each command is logged into the service table as it is executed, a cancel operation is recorded for it (disable constraint - enable constraint, create table –drop table, etc.), the time when the operation was performed, the status of the operation. If something goes wrong, after the procedure is restarted, a check is performed at which step the shutdown stops, and either the next command is executed or the rollback occurs — the recorded undo operations are performed in the reverse order.

ID	STATEMENT	Undo	STATUS
4411	CREATE TABLE ADDRESS2_P10001 TABLESPACE DOZOR1_INDEXES AS SELECT * FROM ADDRESS2 WHERE 1 = 2	DROP TABLE ADDRESS2_P10001	Ok
4412	CREATE INDEX IX_ADDRESS2_MESSAGE_P10001 ON ADDRESS2_P10001 (MESSAGE) TABLESPACE DOZOR1_INDEXES	NULL;	Ok
4413	CREATE UNIQUE INDEX IX_ADDRESS2_UNIQ2_P10001 ON ADDRESS2_P10001 (ADDR_TYPE, VALUE, MESSAGE) TABLESPACE DOZOR1_INDEXES	NULL;	Ok
4414	ALTER TABLE ADDRESS2 EXCHANGE PARTITION p10001 WITH TABLE ADDRESS2_P10001 INCLUDING INDEXES WITHOUT VALIDATION	ALTER TABLE ADDRESS2 EXCHANGE PARTITION p10001 WITH TABLE ADDRESS2_P10001 INCLUDING INDEXES WITHOUT VALIDATION	Ok

As a result, we obtain either the initial state of the database or the successful completion of the shutdown procedure.

Source: https://habr.com/ru/post/345014/

All Articles

How to manage sections in Oracle DB and not go crazy

What is wrong with the built-in mechanisms of Oracle DB

(Not) Version Compatibility

"Houston, we have a problem"

More articles: