All developers of information systems are faced with periodic information, i.e. data varying in time. For example:
- Commodity prices
- Currency rates
- Posts
etc. Also, one periodic information changes frequently, the other rarely. Rarely changing information can be attributed, for example:
- Last name and first name
- Address of residence
- Family status
So, in most cases, there is no need to keep a history of changes in this rarely changing information, because no one will ever build a backdating report so that it displays this information that is valid on the date of the report.
In such cases, it is enough to keep just the fact of the information change, the previous value and the date of the change - this is needed in the rarest cases.
Then we will talk about the method of storing periodic information in a relational database for cases when such reports are obviously needed in the system and they require obtaining actual data on a certain date.
For example, take information about a person.
"". :
-
-
-
-
-
-
-
')
The time-varying requisites here theoretically can be: Last Name, First Name, Family Status, Residence Address and, condemned, Paul.
Create an object to store the history of changes to this data. Immediately agree that we will keep the entire record with the entire set of changing details - it is more convenient for work. Suppose we lose in the amount of data, but we will benefit in processing.
We take out the changeable details from the “Person” object and create an object for storing periodic information.
"". :
-
-
" ". :
-
-
-
-
-
-
-
(in fact, you can transfer all the details from the Person to Information, here the goal is to show that this is not necessary)
The details object is distinguished by the presence of a “Date of changes in details” and a link to the person’s master.
In all objects of the system, the person and the link to it should be used, and the information will be used only as a subordinate object without a direct link to them.
It would seem that everything ?! But it only seems.
Difficulties begin immediately when you try to display a list of Persons. Let's try to write a request to implement it. We turn on the fantasy and present the object SQL in Russian:
select ., ., .
from ,
where .ID = .
But here is a problem right away - the whole history of information on the person will seem to us. And we need a list of persons with current information for the current date. Adapt to the requirements:
select ., ., .
from ,
where .ID = .
and .ID = (select top 1 .ID from where .ID = . and . <= order by . desc)
The result is now correct, but the query runs awfully long and on large volumes can completely knock out the DBMS, since a bunch of subqueries are executed.
The alternative, as they say, is - ducks! As previously stated, it is better to have data redundancy, but not to lose speed when processing them.
Let's add the register with one date.
" ". :
-
-
-
-
...
This date will respectively indicate the date of the “neighboring” person’s information record. It is important that this field always has a value. Then, if the next date of change is unknown, then we fill it with a “maximum” date, for example, 12/31/9999. As a result, our request can be rewritten as:
select ., ., .
from ,
where .ID = .
and . >
and . <=
Everything is gorgeous fast here!
Fill in the following date for example in the trigger. Here it will be necessary to take into account the possibility of changing the "Date of information change" and "Person", and indeed the deletion of a record. These trigger algorithms are fairly simple. In total, while saving a record, a maximum of 2 “neighboring” records may change (one current neighbor, another new neighbor). This is a minor loss with rare changes.
Let's see how you can store another version of periodic information, for example, the place of work of the same Person in the context of Organizations.
. "". :
-
-
-
-
-
Selection of all company employees for the current date:
select ., ., ., ., .
from , ,
where .ID = .
and . >
and . <=
and .ID = .
and . <=
and . >
and . =
Everything is good and wonderful. It seems there is no problem ... until we are asked to count / withdraw those people who have worked in the company since the beginning of the year?
Let us think about which of the workers it could be:
a) the one who worked at the beginning of the year
b) the one who works on the current date
c) the one who came after the beginning of the year and retired until now
(for simplicity, we remove from the request DetailsPersons, because with them everything is already clear, we will assume that they are available from the Name field Persons)
select ., ., .
from ,
and .ID = .
and . =
and (
(. <= and . > ) /* */
or (. <= and . > ) /* */
or (. >= and . <= ) /* */
)
As a result, without knowing it at all, we get several records for a person who has changed his position within this period or interrupted work. How to get rid of the repetition of man? Apply distinct? We need his last post! You can try a subquery, but there is a better option.
Here you can also apply the approach to storage, similar to information about the person.
Another thing worth mentioning is the concept of “Key requisites” - this is a set of requisites, within the values ​​of which the intervals (in this case from the Dates of Admission to the Date of Dismissal) of records should not intersect. For the Employee, this set includes the “Person” and “Organization” requisites, i.e. A person cannot have several entries in one organization at the same time, but can work in different organizations.
So, the “Next Date” requisite also works within the framework of Key Details, i.e. when the position changes, these dates are set to the next record, and when the organization changes, they remain maximum.
"". :
-
-
-
-
-
-
We rewrite the same query:
select ., ., .
from ,
and .ID = .
and . =
and (
(. <= and . > and . > ) /* */
or (. <= and . > ) /* */
or (. >= and . <= and . > ) /* */
)
Now the data is correct and everything works fast enough.
Also, as with the Next date, you can add the Previous date, which will indicate the previous position or the "minimum" date, for example 01.01.0001. The previous date will be used for requests such as "closest appearing employees." You can also change the dates in the trigger. The maximum additional change will be 4 more entries.
Thanks to this design, you can simply get answers to the questions:
- a list of those who have ever (or in the period) worked in the company
- list of recently retired on date
- a list of first established from the date
- etc.
PS I hope not tired
UPD 03/28/2011
As noted by MaximKat, I updated the texts of requests for the Employee.
Also added a note about key details.