📜 ⬆️ ⬇️

Versioning and data history

When developing databases, it is often necessary to provide support for versioning and storing object history. For example, an employee may change position, the position in turn may change salary - in multidimensional modeling this is called Slowly changing dimensions (hereinafter SCD) - rarely changing dimensions, that is, dimensions whose non-key attributes tend to change over time. In total there are 6 main types (methods) of SCD , which determine how the history of changes can be reflected in the model.



Type 0


It lies in the fact that the data after the first hit in the table further never change. This method is practically not used by anyone, because It does not support versioning. It is needed only as a zero point of reference for the SCD methodology.
')

Type 1


Type 1 is the usual rewriting of old data with new ones. In its pure form, this method also does not contain versioning and is used only where history is not actually needed. However, in some DBMS for this type, it is possible to add limited support for versioning by the means of the DBMS itself (for example, Flashback query in Oracle) or by tracking changes through triggers.

Advantages:

Disadvantages:


Type 2


This method consists in creating for each version a separate entry in the table with the addition of a key attribute field of this version, for example: version number, date of change, or date of beginning and end of the period of version existence.

Example:
IDNAMEPOSITION_IDDEPTDATE_STARTDATE_END
oneKolya21208/11/2010 10:42:2501/01/9999
2Denis23308/11/2010 10:42:2501/01/9999
3Boris26208/11/2010 10:42:2501/01/9999
fourSheldon22308/11/2010 10:42:2501/01/9999
fivea penny25208/11/2010 10:42:2501/01/9999


In this example, the default end date of the version is '01 .01.9999 ', instead of which one could specify, say, null, but then there would be a problem with creating the primary key from ID, DATE_START and DATE_END, and, moreover, The sampling condition for a specific date is simplified (" where snapshot_date between DATE_START and DATE_END " instead of " where snapshot_date>DATE_START and (snapshot_date < DATE_END or DATE_END is null) ".
With such an implementation, when an employee leaves the company, it will be possible to simply change the end date of the current version to the date of dismissal instead of deleting employee records.

Advantages:

Disadvantages:


Type 3


The record itself contains additional fields for the previous attribute values. When new data is received, old data is overwritten by current values.

IDUPDATE_TIMELAST_STATECURRENT_STATE
oneone08/11/2010 12:58:480one
2208/11/2010 12:29:16oneone
Advantages:

Disadvantages:


Type 4


The change history is contained in a separate table: the main table is always overwritten with current data with old data transferred to another table. Usually this type is used to audit changes or create archive tables (as I said, in Oracle, this same 4th type can be obtained from the 1st using the flashback archive). A subtype or a hybrid of this option (with the second type), it seems to me, should be considered sectioned on the basis of the current version with the allowed movement of lines, but this is beyond the limits of modeling and rather refers to administration.

Example:
 select * from emp 

IDNAMEPOSITION_IDDEPT
oneKolya212
2Denis233
3Boris262
fourSheldon223
fivea penny252


 select * from emp_history 

IDNAMEPOSITION_IDDEPTDATE
oneKolya21one08/11/2010 2:12:13 PM
2Denis23208/11/2010 2:12:13 PM
3Boris26one08/11/2010 2:12:13 PM
fourSheldon22208/11/2010 2:12:13 PM

Advantages:

Disadvantages:


Hybrid Type / Type 6 (1 + 2 + 3)


Type 6 was coined by Ralph Kimball as a combination of the above methods and is designed for situations that they do not take into account or for greater convenience of working with data. It consists in introducing additional redundancy: a type 2 is taken as a basis, a surrogate attribute is added for an alternative version review (type 3), and one or all previous versions (type 1) are overwritten.
Example:
VERSIONIDNAMEPOSITION_IDDEPTDATE_STARTDATE_ENDCURRENT
oneoneKolya21208/11/2010 10:42:2501/01/9999one
one2Denis23308/11/2010 10:42:2501/01/9999one
one3Boris26208/11/2010 10:42:2508/11/2010 11:42:250
23Boris26208/11/2010 11:42:2601/01/9999one


In this example, for example, adding a surrogate key adds the ability to refer from the fact tables to a specific version of the dimension, which may not belong to the lifetime of the fact itself, and the indicator of the current version may help to partition by the current versions (although it would be better to call the latest version sectioning since the version may become outdated without changing the record itself). However, the indicator of the current version can be created as a virtual computable field, without worsening normalization, if it is necessary in the table (if the DBMS supports such fields, they appeared in Oracle in the 11th version), and as a field in the view from this table.
In general, any combination of basic types of SCD refers to a hybrid type, so both their disadvantages and advantages depend on your specific implementation, but one thing is sure - the choice of a hybrid type can only be due to the complexity of your model and almost always (in any case, I don’t know cases when it can be otherwise) you can get by with the main 4 types.

Let me add a few tips on how to implement SCD:


Ps. Habralyudi, share what interesting hybrid implementations you met?

Source: https://habr.com/ru/post/101544/


All Articles