Guide to designing relational databases. Cascade delete data

Addition to the cycle of translated articles.
Articles 1-3 , 4-6 , 7-9 , 10-13 , 14-15

The information in the article refers to the 5th part of the manual.

In the comments one of the users reasonably reproached the lack of information about the cascade deletion of data. Fill the gap. The author of the articles has no information on this topic, so I wrote a short article about it. It is quite logical to fit into the specified cycle.
To begin with, in order to avoid confusion, it is worth saying that it is not so much and not only about cascading data deletion, but about the topic of referential integrity and foreign keys, of which cascading data deletion is a part.
')

Introduction

If you start from the philistine position of a person who develops databases, then foreign keys are convenient and simplify life (in most cases, there are always exceptions.). Even being an ignoramus in the relational database theory, to the conscious need to use foreign keys, at a certain stage of its development, almost any practice comes (the statement is more relevant to beginners), which does not stand still in its development and continues to think. Even if he does not yet know that what he needs is called a link by a foreign key, he begins to organize the data himself in a certain way, split it into separate tables and link them together. So it becomes obvious.
But when using foreign keys, even if you do not know such a definition, it becomes necessary to monitor the data to be bound. The subject of this article is, if I may say so, a kind of satellite that follows such an organization of data. And in this case it is already much more useful to know the theory, since This can greatly simplify life in the process of working with the database.

Closer to the point.

Foreign keys were told in translations, I will not dwell on this. I'll tell you about the "satellite".

In case you don’t know the theory, you’ll have to keep track of the data connections yourself. An alternative option is to assign this task to the database. What is the so-called data link tracking? To understand, you need an example.

We have some things. They are scattered, a lot of them. We want to clean up the mess. Order is often classification (categorization) and inventory. We want order, and we can work with databases and do not want to write anything on paper. We write all things “in a column”. Next, we review the list and determine the categories to which things belong.

Let this part of our things, the rest do not consider:

book 1
book 2
book 3
PC mouse
keyboard
a pen
stapler

We define for ourselves that:

Book 1, book 2, book 3 - this is a book, oddly enough.
Computer mouse, keyboard - this is computer peripherals.
Pen, stapler - this is stationery.

We create two tables in the database: categories (categories) and stuff (things) .

Categories

category_id | name

1 | books
2 | computer peripherals
3 | stationery

Stuff (things)

stuff_id | category_id | name

1 | 1 | book 1
2 | 1 | book 2
3 | 1 | book 3
4 | 2 | PC mouse
5 | 2 | keyboard
6 | 3 | a pen
7 | 3 | stapler

PS Images from habrastorage.org are not displayed.

Total: we have books, computer peripherals, stationery.

We wanted to throw out or donate all our books, we don’t want to see these things as a category, at home, we like e-books. We remove the category “book” from the category table. At the same time, we have things from this category in another table, we refer to these categories in the table of things. This is called a violation of referential integrity . It would seem that we do not have a category, and therefore there are no books, but the entries in the table of things remain and there are a lot of things, and in the future the situation can repeat and repeat, and then we will have a mess, a lot of unnecessary information and all the ensuing consequences both in the convenience of working with our information, and in the technical part when working with the database (for example, searching for information). And here comes the understanding that we need to work with two tables, to watch out in which cases the connections can be broken, broken and make some gestures, and there are two options: do it yourself or, here knowledge is power, we can shift this headache on the database.

In the framework of the relational data model, the table of categories is an ancestor, and the table of things is a descendant. Everything is clear, as a parent and child. Moreover, cases in which connections can be broken are also identified (take and use). Our case is not the only one.

Relationships can break (if we speak in the "correct" language - referential integrity may be broken) in the following cases:

the foreign key (reference to the identifier in the category table) is updated in the descendant row. We update the category (number, identifier of this category) for some thing, and we are mistaken, there is no such category. And ... we have a thing suspended in the air.
a new child string is added. Add a new thing, but it does not belong to any category. By the way, we can add a category without things. We have a database so arranged that a thing cannot be without a category, but a category can, because it does not refer to a thing.
delete ancestor line This is exactly what happened in our case. They removed the category, but things remained.
primary key update in ancestor string. We changed the category identifier, and the former identifier is referenced by certain things. Bottom line: some of the things again in limbo.

Means to maintain the referential integrity of SQL ( I’ll say right away, beforehand when you need it, you will understand; if we talk about MySQL RDBMS, then using these tools together with foreign keys is possible only for InnoDB tables; you can use foreign keys in MyISAM, creating a certain data structure, but then the whole headache of keeping track of the connection falls on the user ) allows you to handle these cases.

And this is how these problems are solved (in the order of listing):

When updating the child table, the new value of the foreign key is checked. If the specified value is not among the primary keys of the ancestor table, an error is returned.
In our case, if we change a category number for a thing, but it does not exist.
When adding a new child string. If the specified foreign key value does not exist among the primary keys of the parent table, an error is returned.
In our case, if we add a thing and specify the number of a non-existent category for it.

Now the last two. Here the situation is more interesting.

Delete ancestor line. In our case, if we delete a category, and things in the table of things will refer to it. There may be several solutions to the problem. Which of them to do is decided by the database developer (you indicate).

1) when deleting a category, you can delete all related items.
2) prohibit the deletion of categories while some things refer to them.
3) for things that refer to the category being deleted, specify the value NULL as a reference to the category.
4) for things that refer to the category being deleted, specify a default value other than NULL.
In the case of updating the primary key in the ancestor row, the options similar to the previous ones.

Now about the cascade deletion of data, which was mentioned at the beginning.
When specifying the deletion and / or update rules in SQL, certain syntactic structures are used:

FOREIGN KEY [key_name] (col1, …) REFERENCES table (table_col, …)
[ON DELETE {CASCADE|SET NULL|NO ACTION|RESTRICT|SET DEFAULT}]
[ON UPDATE {CASCADE|SET NULL|NO ACTION|RESTRICT|SET DEFAULT}]

Where optional constructions ON DELETE and ON UPDATE allow you to specify the same solutions to the problem, which are discussed above. And these keywords call them:

CASCADE - when you delete or update an entry in the ancestor table that contains the primary key, the entries with references to this value in the descendant table are automatically deleted or updated. In our case, if we delete a category, then all things belonging to this category in the table of things will be deleted. If we update the identifier of the category, then the things that refer to this category will also change the identifier to a new one.
That, cascade, but, as you can see, not only removal.

SET NULL - when deleting or updating an entry in the ancestor table that contains the primary key, the foreign key values in the descendant table are set to NULL.
In our case, if we delete or update the category identifier in the category table, then all things that were referenced belong to this category in the field with the category identifier will be set to NULL.

NO ACTION - when deleting or updating an entry in the ancestor table that contains the primary key, no action will be taken in the child table.
In our case, if we delete or update the category identifier in the category table, this will not affect the table of things in any way.

RESTRICT - if there are records in the child table that refer to the existing primary key in the child table, then if you delete or update the record with the primary key in the ancestor table, an error will be returned.
In our case, if we try to update or change the category identifier, while there are things related to this category, we will get an error.

SET DEFAULT - here it is clear from the name that when you delete or update an entry in the ancestor table that contains the primary key, the default value will be set in the descendant table of the corresponding entries. There is one “BUT”. MySQL RDBMS does not use this keyword.

And now again - to cascade delete data. Why is it at the hearing? Why was asked about him in the first place, despite the fact that it is only one of. Probably because cascading data deletion is the most common solution to a problem.

Source: https://habr.com/ru/post/194738/

All Articles

Guide to designing relational databases. Cascade delete data

Introduction

Closer to the point.

More articles: