Unexpected need for data versioning

When the project is closer to the end of the development, it becomes clear that you cannot do without the versioning of the data, because any user can go in and delete what has been created for a long time by dozens of other people, you have to find a solution that requires minimal effort. Without going into the details of a specific project on which such a need arose, imagine a google docs spreadsheets document editable to any site visitor.

Suppose Google engineers are using a MySQL database with this structure:
')
documents
--id
--name
--creator_id

sheets
--id
--document_id
--name

rows
--number
--sheet_id
--height

columns
--number
--sheet_id
--width

cells
--id
--sheet_id
--color
--content
--row_number
--col_number
--creator_id

As it was already written above, anyone has access to the document, and, for example, changing the value of a cell to a swear word will permanently destroy the previous value.

Structure requirements

Required

Ability to save previous cell values
Ability to view previous revisions of the online document
The ability to roll back to the previous version
Some parameters may not have different versions (id, sheet_id)
Implementing the used ORM

Desirable

Minimum possible labor for data transfer to a new structure, selection, modification and addition of data
Saving disk space
Highlighting changed cells when viewing old revisions

Implementation

After much thought, the "formula" was derived. What if you make an additional table and zadzhoyny the main table and the new table containing different versions of values.

First of all, we need a table with revisions:

revisions
--id
--parent_id
--document_id
--created_date

Spreading the cells table by 2, so that the cells remain information that does not require versioning, and the cells_data - user-modified information. In addition, we add the created_in_revision_id, deleted_in_revision_id fields to the cells_data table, and the user ID that made the changes.

cells
--id
--sheet_id
--row_number
--col_number
--creator_id

cells_data
--cell_id
--color
--content
--data_creator_id
--created_in_revision_id
--deleted_in_revision_id
(primary key on cell_id + created_in_revision_id)

In the code for the Document object (if we program objectally) we add the getRevisionCondition method ($ revisionId = false), which should return a prefix to the SQL Quarae like "created_in_revision_id in (0,100,300,301) and deleted_in_revision_id not in (0,100,300,301)". Those. contain the current revision and all its parentes in the “in (...)” and “not in (...)” construction

Sample

Next kvar, which previously looked like this:

select * from cells where row_number=3 and col_number=2

turns into:

 select c.*,cd.* from cells c,cells_data cd where row_number=3 and col_number=2 and id=cell_id and $revisionCondition

Of course, the field names of these tables should not be duplicated.

Insert a new entry

Here everything is as simple as when sampling. Check if the revision has expired and take the last one. For example:

 $revision=$document->updateRevisionIfExpired()

We first insert into the main table (cells), then into the table with versioned data (cells_data). In the created_in_revision_id field, we write the ID of the last revision.

Delete record

Here we will try to save disk space. If, for example, we set 30 minutes as the revision lifetime, then we compare the revision of the deleted record with the current revision and:

if they are the same, just delete from the database
if the revision has already been changed, then we just write a fresh revision id in the table field cells_data - deleted_in_revision_id

For node.js, it would look like this:

 if(cellRevisionId==currentRevision.getId()){ db.online.query("delete from cells_data where cell_id="+cellId+" and created_in_revision_id="+cellRevisionId) }else{ db.online.update('cells_data',{'deleted_in_revision_id':currentRevision.getId()},{'cell_id': cellId, 'created_in_revision_id' : cellRevisionId}) }

Record Update

Changing data is something like deletion. If the revision has not changed, the data is simply updated; if it has changed, the deleted_in_revision_id field is updated and a new record is inserted into the cells_data table with the ID of the new revision created_in_revision_id.

Instead of conclusion

Because Most of the tables that need versioning have now been transferred to a new structure - I can highlight advantages and disadvantages:

Virtues

All selections and insertions (in other words, selects and inserts) were redone as quickly as possible and without any complications.
Does everything that was required and almost everything that was desired.

disadvantages

Deleting, updating the record still happens a little too tightly wound and a more automated script would not hurt
Perhaps you need to do something with requests like in (...) and not in (...), since they list all revisions for the document

Source: https://habr.com/ru/post/136202/

All Articles