⬆️ ⬇️

Unexpected need for data versioning

When the project is closer to the end of the development, it becomes clear that you cannot do without the versioning of the data, because any user can go in and delete what has been created for a long time by dozens of other people, you have to find a solution that requires minimal effort. Without going into the details of a specific project on which such a need arose, imagine a google docs spreadsheets document editable to any site visitor.







Suppose Google engineers are using a MySQL database with this structure:

')

documents

--id

--name

--creator_id



sheets

--id

--document_id

--name



rows

--number

--sheet_id

--height



columns

--number

--sheet_id

--width



cells

--id

--sheet_id

--color

--content

--row_number

--col_number

--creator_id



As it was already written above, anyone has access to the document, and, for example, changing the value of a cell to a swear word will permanently destroy the previous value.



Structure requirements



Required




Desirable






Implementation





After much thought, the "formula" was derived. What if you make an additional table and zadzhoyny the main table and the new table containing different versions of values.



First of all, we need a table with revisions:



revisions

--id

--parent_id

--document_id

--created_date



Spreading the cells table by 2, so that the cells remain information that does not require versioning, and the cells_data - user-modified information. In addition, we add the created_in_revision_id, deleted_in_revision_id fields to the cells_data table, and the user ID that made the changes.



cells

--id

--sheet_id

--row_number

--col_number

--creator_id



cells_data

--cell_id

--color

--content

--data_creator_id

--created_in_revision_id

--deleted_in_revision_id

(primary key on cell_id + created_in_revision_id)



In the code for the Document object (if we program objectally) we add the getRevisionCondition method ($ revisionId = false), which should return a prefix to the SQL Quarae like "created_in_revision_id in (0,100,300,301) and deleted_in_revision_id not in (0,100,300,301)". Those. contain the current revision and all its parentes in the “in (...)” and “not in (...)” construction



Sample


Next kvar, which previously looked like this:

select * from cells where row_number=3 and col_number=2 


turns into:

 select c.*,cd.* from cells c,cells_data cd where row_number=3 and col_number=2 and id=cell_id and $revisionCondition 


Of course, the field names of these tables should not be duplicated.



Insert a new entry


Here everything is as simple as when sampling. Check if the revision has expired and take the last one. For example:

 $revision=$document->updateRevisionIfExpired() 


We first insert into the main table (cells), then into the table with versioned data (cells_data). In the created_in_revision_id field, we write the ID of the last revision.



Delete record


Here we will try to save disk space. If, for example, we set 30 minutes as the revision lifetime, then we compare the revision of the deleted record with the current revision and:



For node.js, it would look like this:

 if(cellRevisionId==currentRevision.getId()){ db.online.query("delete from cells_data where cell_id="+cellId+" and created_in_revision_id="+cellRevisionId) }else{ db.online.update('cells_data',{'deleted_in_revision_id':currentRevision.getId()},{'cell_id': cellId, 'created_in_revision_id' : cellRevisionId}) } 


Record Update


Changing data is something like deletion. If the revision has not changed, the data is simply updated; if it has changed, the deleted_in_revision_id field is updated and a new record is inserted into the cells_data table with the ID of the new revision created_in_revision_id.



Instead of conclusion



Because Most of the tables that need versioning have now been transferred to a new structure - I can highlight advantages and disadvantages:



Virtues




disadvantages


Source: https://habr.com/ru/post/136202/



All Articles