Storage of trees in the database. Part one, theoretical

Half a year ago I wrote the ClosureTable bundle for the Laravel 3 framework. The reason for writing was this wonderful presentation by Bill Karvin on how to store and process hierarchical data in MySQL using PHP.

So. There are several database design patterns for storing and processing hierarchical structures:

Adjacency List ("adjacency list")
Materialized Path ("materialized path")
Nested Sets
Closure Table ("table of relations")

What is Closure Table

The essence of this design pattern is that the relationships between entities are stored in a separate table, while the main table contains only the data of the entities themselves.

The link table must contain at least two fields:

reference to ancestor (ancestor)
child reference (descendant)

Let us work on the creation of the next SuperPuper CMS and have started the development of the text page editing module. We need two tables:

pages will contain data about the page
pages_treepath will contain page hierarchy data

DB tables schema
')
As an example, use the following page hierarchy:

Sample of descendants

This is the SQL query we can do if we want to select all the pages in the "About us" section.

 SELECT * FROM pages p JOIN pages_treepath t ON (p.id = t.descendant) WHERE t.ancestor = 1

A branch of the result. Arrows indicate links between pages.

"Descendant" means "descendant", and "ancestor" - ancestor. Accordingly, in order to get all the child pages, we pages_treepath link table provided that the page id page has the same value as the reference to the descendant descendant . At the same time, the ancestor link to the parent page is 1 , the identifier of the page “About the company”.

Sampling of ancestors

And now from the bottom up: let's see all the "parents" at the "Corporate" page.

 SELECT * FROM pages p JOIN pages_treepath t ON (p.id = t.ancestor) WHERE t.descendant = 11

In this case, the opposite. We search for pages higher in the hierarchy, therefore we join the link table with the condition that the id of the page id should be equal to the reference to the ancestor ancestor , and the selection is carried out by reference to the descendant descendant , in our case equal to 11 .

Insert a new item

You can add a new job. These values in our case do not represent, so let's look at the request itself.

 INSERT INTO pages VALUES (12, '  ', '', '    ', '0000-00-00 00:00:00', '0000-00-00 00:00:00') INSERT INTO pages_treepath (ancestor, descendant) SELECT ancestor, 12 FROM pages_treepath WHERE descendant = 4 UNION ALL SELECT 12, 12

With the first query, everything is clear - this is the simple insertion of new data. But the second query is worth sorting out in order, so let's see what happens here.

Links between elements after inserting a new job

 SELECT ancestor, 12 FROM pages_treepath WHERE descendant = 4

After completing this request, we get the following list of links:

 -------------------------
 |  ancestor |  descendant |
 -------------------------
 |  4 |  12 |
 |  1 |  12 |
 -------------------------

Add to the previous request one more by combining:

 SELECT ancestor, 12 FROM pages_treepath WHERE descendant = 4 UNION ALL SELECT 12, 12

Link link of the page to itself will be added to the list of links:

 -------------------------
 |  ancestor |  descendant |
 -------------------------
 |  4 |  12 |
 |  1 |  12 |
 |  12 |  12 |
 -------------------------

As you can see, this SELECT query allows you to establish links between the new page and all its ancestors. ancestor is always a link to ancestor, descendant is a link to a descendant. The INSERT query written at the beginning inserts the result into the pages_treepath table.

Remove item

And now close the job web designer.

 DELETE FROM pages_treepath WHERE descendant = 6 DELETE FROM pages WHERE id = 6

Everything is simple here. First, we delete all links where the link to the child is 6 (the Web Designer page), and then we delete the page itself.

Deleting a nested tree

Suddenly it happened that for some time ABC company stopped developing websites. We will need to execute such a query in order to delete the corresponding subsection:

 DELETE FROM pages WHERE id IN ( SELECT descendant FROM ( SELECT descendant FROM pages p JOIN pages_treepath t ON p.id = t.descendant WHERE t.ancestor = 7 ) AS tmptable ) DELETE FROM pages_treepath WHERE descendant IN ( SELECT descendant FROM ( SELECT descendant FROM pages_treepath WHERE ancestor = 7 ) AS tmptable )

Unlike the previous query, this one is somewhat more complicated and the pages themselves are deleted first and after that the links between them (since the latter are actively used when deleting the first ones).

The complexity of queries is partly due to the fact that MySQL does not allow you to perform a query for deleting records with a WHERE , which contains a SELECT sample from the same table. In the case of MySQL, we are forced to place SELECT queries in a temporary table. But in general, our requests would look like this:

 DELETE FROM pages WHERE id IN ( SELECT descendant FROM pages p JOIN pages_treepath t ON p.id = t.descendant WHERE t.ancestor = 7 ) DELETE FROM pages_treepath WHERE descendant IN ( SELECT descendant FROM pages_treepath WHERE ancestor = 7 )

If you look closely at the nested SELECT query in the DELETE query from the pages table, you will find that we have already considered a similar query. This one differs from the previous one only by the page identifier. As a result of the sample, we get all the child pages of the “Sites” section (including the section itself), and then delete all pages with the received identifiers.

After the pages are deleted, it remains to remove the links between them. To do this, we find all the links to descendants' descendant , where the link to the ancestor is equal to the ID of the “Sites” page.

Nesting level

You can also add a field in the relationship table that controls the nesting level of elements. This field will allow you to make more simple queries for sampling immediate ancestors or immediate descendants. For example:

 SELECT * FROM pages p JOIN pages_treepath t ON (p.id = t.descendant) WHERE t.ancestor = 4 AND t.level = 2

DB tables schema

To be continued.

Source: https://habr.com/ru/post/193166/

All Articles