
In continuation of today's topic
Versioning and data history - I will share a simple tool that we use.
Dklab_rowlog is a library of several PostgreSQL stored procedures that allows you to add versioning of records to any table in the database. In other words, whatever happens to the table, no matter how the data changes (added / deleted), it will be reflected in a special log-plate.
Benefits:
- Versioning is added to any table in 1 minute using 1 SQL command.
- You can specify which columns should be saved and which should not (which saves space). In this case, an entry to the log will be added only if at least one of the specified columns has changed.
- You can specify a column that will be treated as the "change author" ID.
- You can set the columns to be logged in any case, regardless of whether they changed or not.
Examples of using
Suppose we need to log changes in this table:
')
CREATE TABLE test_src1
(id bigint
NOT NULL ,a character varying
( 20 ) ,b character varying
( 20 ) ,c character varying
( 20 ) ,modified_by bigint
NOT NULL) ;
Example 1: we track changes only in columns "a" and "c". As soon as one of these fields has changed, a record about this will be added to public.rowlog.
CREATE TRIGGER t_rowlog
AFTER
INSERT OR DELETE OR UPDATE ON test_src1
FOR EACH ROW
EXECUTE PROCEDURE rowlog
. t_rowlog_aiud
( 'diff => a' , 'diff => c' , 'rowlog => public.rowlog' ) ;
Example 2: always add an entry to rowlog when changing a row in a table, but save only columns "a" and "b". By the way, you can not set the 'rowlog => xxx' parameter, since by default, it is CURRENT_SCHEMA.rowlog.
CREATE TRIGGER t_rowlog
AFTER
INSERT OR DELETE OR UPDATE ON test_src1
FOR EACH ROW
EXECUTE PROCEDURE rowlog
. t_rowlog_aiud
( 'always => a' , 'always => b' ) ;
Example 3: in each record in the log we save the ID of the "author of the change". You can also explicitly specify the name of the primary key of the table (by default, "id").
CREATE TRIGGER t_rowlog
AFTER
INSERT OR DELETE OR UPDATE ON test_src1
FOR EACH ROW
EXECUTE PROCEDURE rowlog
. t_rowlog_aiud
( 'always => a' , 'author => modified_by' , 'pk => id' ) ;
The structure of the table-log
The structure is approximately as follows:
CREATE TABLE rowlog
(- Row version primary key.id BIGSERIAL
NOT NULL ,- Timestamp of this version creation.stamp timestamp
WITH time zone
DEFAULT now
( ) NOT NULL ,- Who modified a source row? You can specify any type, not only BIGINT.author bigint
,- Table OID of the changed row.rel regclass
NOT NULL ,- Previous row columns.data_old hstore
. hstore
NOT NULL ,- Resulting row columns.data_new hstore
. hstore
NOT NULL ,- Change operation (INSERT / UPDATE / DELETE).operation enum_tg_op
NOT NULL ,- Primary key of the source table row row.pk bigint
CONSTRAINT
"rowlog_pkey" PRIMARY KEY ( "id" )) ;
You can add other fields, hang indexes, etc. It is possible to store records from different tables in the same log table (most often it is convenient, because in this case adding versioning is reduced to the only CREATE TRIGGER command).
Restrictions
There are 2 things to consider when using:
- The library is not designed for ultrahigh loads, because there are several EXECUTE SQLs inside it. But several thousand inserts per second it easily maintains.
- Storing the changed data in hstore is convenient because new fields can be quickly added to the source table, but there is a drawback: if the source table changes structurally in time (for example, the fields are deleted or renamed there), the old versions will remain in hstore structure.
Therefore, in practice, I would recommend using the library mainly for the purposes of calculating various statistics on a living database or its replica (KPI).