📜 ⬆️ ⬇️

Recursive (hierarchical) queries in PostgreSQL

Following Oraklom with its 'connet by prior', all other DBMS introduce their own implementation of hierarchical queries (IZ). I would like to tell a wide audience how it is done in PostgreSQL.

Story


It all started like this
From: Evgen Potemkin <evgent@ns.terminal.ru>
Subject: Proposal of hierachical queries (a la Oracle)
Date: 2002-11-14 11:52:28 GMT
Hi there!
I want to order the hierarchical queries
posibility. It allows you to construct queries for Oracle for ex:
SELECT a, b FROM t CONNECT BY a PRIOR b START WITH cond; B
I've seen this type of query often made by adding a new type, which
stores position of row in the tree. But sorting such tree are very
tricky (i think).
Patch allows you to sorted, ie, subnodes of each node
be sorted by ORDER BY clause.
with regards, evgen


then
From: Tatsuo Ishii <ishii@postgresql.org>
Subject: RFP: Recursive query in 8.4
Date: 2008-02-19 08:36:00 GMT (1 year, 12 weeks, 6 days ago)
Hi,
I’ve promised
recursive query as defined in PostgreSQL 8.4.
Sumitomo Electric Information Systems Co., Inc.
Ltd. (http://www.sei-info.co.jp/) and SRA OSS, Inc. Japan
(http://www.sraoss.co.jp).


Well And since version 8.4 Postgresql began to support the recursive request.

Theory


')
FROM in PostgreSQL is implemented on the basis of standard SQL clause WITH. Let's get a little deeper: non-recursive WITH allows you to save on repeated subqueries, split a complex query into several smaller ones, is a convenient shortcut for referring to a subquery and is convenient in terms of saving time when writing code. as in the example below, it was possible to avoid using the subquery in WHERE by applying a temporary table top_regions formed specifically for this query.

WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
  1. WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
  2. WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
  3. WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
  4. WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
  5. WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
  6. WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
  7. WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
  8. WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
  9. WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
  10. WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
  11. WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
  12. WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
  13. WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
  14. WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
  15. WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
  16. WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .


Adding an optional RECURSEVE statement allows a Postgre query to access its output. The query algorithm must consist of other parts. The first part is the base, usually returning one line from the starting point of the hierarchy or part of the hierarchy. Ie a place in the hierarchy where the countdown will start (for example, the root). and the second recursive part that will be associated with the temporary table we declared after WITH. The first and second parts are combined with the UNION or UNION ALL operator. so that we would set an example for this chaotic set of words.
Create a table that describes the structru of one company
CREATE TABLE KPO
(
"ID" character varying(55),
"DESCRIPTION" character varying(255),
"PARENT" character varying(55
) ;


after entering the data there:
Select * from kpo



IDDESCRIPTIONPARENT
== =============================================
KPOKARACHAGANAK PETROLEUM OPERATING{null}
AKSAYAKSAYKPO
URALSKKPOKPO
LONDONLONDONKPO
KPCKPCAKSAY
U2UNIT-2AKSAY
U3UNIT-3AKSAY
PRODPRODACTIONKPC
MAINTMAINTENANCEAKSAY
CMMSCMMS TEAMMAINT


Now the recursive query itself:

  1. WITH RECURSIVE temp1 ("ID", "PARENT", "DESCRIPTION", PATH , LEVEL ) AS (
  2. SELECT T1. "ID", T1. "PARENT", T1. "DESCRIPTION", CAST (T1. "ID" AS VARCHAR (50)) as PATH , 1
  3. FROM KPO T1 WHERE T1. "PARENT" IS NULL
  4. union
  5. select T2. "ID", T2. "PARENT", T2. "DESCRIPTION", CAST (temp1. PATH || '->' || T2. "ID" AS VARCHAR (50)), LEVEL + 1
  6. FROM KPO T2 INNER JOIN temp1 ON (temp1. "ID" = T2. "PARENT"))
  7. select * from temp1 ORDER BY PATH LIMIT 100
* This source code was highlighted with Source Code Highlighter .


The first part (rows 2-3) returns to the temporary table the first row in this case, the root entry of our structure, from which the countdown will begin in our hierarchy. the second part (lines 4-5) adds to the same temporary table the records associated with the string already contained in temp1 via JOIN (ID = PARENT) and so on until the end until all the leaves of our ROOTa are in temp1.
Also in this example, the Orakolavskaya function sys_connect_by_path was simulated.



"ID""PARENT""DESCRIPTION""Path""Level"
KPOKARACHAGANAK PETROLEUM OPERATINGKPOone
AKSAYKPOAKSAYKPO-> AKSAY2
KPCAKSAYKPCKPO-> AKSAY-> KPC3
PRODKPCPRODAUCTIONKPO-> AKSAY-> KPC-> PRODfour
MAINTAKSAYMAINTENANCEKPO-> AKSAY-> MAINT3
CMMSMAINTCMMS TEAMKPO-> AKSAY-> MAINT-> CMMSfour
U2AKSAYUNIT-2KPO-> AKSAY-> U23
U3AKSAYUNIT-3KPO-> AKSAY-> U33
LONDONKPOLONDONKPO-> LONDON2
URALSKKPOURALSKKPO-> URALSK2


Postgre does not have a built-in looping test, so if we received the data from the guys who were directly involved in creating the structure in Excel, we should check this structure for integrity. sometimes it is enough to use UNION instead of UNION ALL, but this is only sometimes. If you set the starting point in the hierarchy in the first part, and if there is even a break in the hierarchy, in principle, having started the above-mentioned error there will not be an error, just the lines “tweaks” will be ignored. But we need to know where the error is, and we can implement this by introducing an additional check before performing UNION.
  1. WITH RECURSIVE temp1 ("ID", "PARENT", "DESCRIPTION", PATH , LEVEL , cycle ) AS (
  2. SELECT T1. "ID", T1. "PARENT", T1. "DESCRIPTION", cast ( array [T1. "ID"] as varchar (100) []), 1, FALSE
  3. FROM KPO T1
  4. union all
  5. select T2. "ID", T2. "PARENT", T2. "DESCRIPTION", cast (temp1. PATH || T2. "ID" as varchar (100) []), LEVEL + 1,
  6. T2. "ID" = any (temp1. PATH )
  7. FROM KPO T2 INNER JOIN temp1 ON (temp1. "ID" = T2. "PARENT") AND NOT CYCLE )
  8. select * from temp1 WHERE CYCLE IS TRUE LIMIT 100;
* This source code was highlighted with Source Code Highlighter .


Here, as you see, the same Path field is created but all previous parents are organized in an array, which makes it possible for us to compare each new “ID” to a duplicate. we don’t use this line to search for descendants anymore, so looping is avoided (union all ... WHERE ... AND NOT CYCLE).

Advice from the official site: use the LIMIT operator as I did in the examples. This will help you keep your nerves.

Practical examples



Horsch's theory of course, but where all this can be applied in practice, tebolle in light of the cost of such requests.
In addition, it would be nice to draw a hierarchy in one query, and to find for example all the leaves of the current “ID” there are also other tasks.
Often you need to make changes, such as for example changing the phone code to all employees of a particular department in the telephone directory. Of course, you can use the column in the table of workers or even prefix the “ID”. but all this makes the base non-flexible and non-scalable. It is much better to make a separate Workers table that will display a hierarchy with three columns “ID”, “PARENT“, “HIERARCHID”. the “HIERARCHID” field will allow you to make more than one hierarchical structure. The table is called for an example ANCESTOR which will also consist of three columns “ID”, “ANCESTOR”, “HIERARCHID”. in the “ANCESTOR” field all ancestors will be contained, remember the “Path” array from Example 3. So this table can be filled with a recursive query.
  1. insert into ancestor (“ID” "ANCESTOR", "HIERARCHID")
  2. WITH RECURSIVE temp1 ("ID", "PARENT", "DESCRIPTION", PATH , LEVEL , cycle ) AS (
  3. SELECT T1. "ID", T1. "PARENT", T1. "DESCRIPTION", cast ( array [T1. "ID"] as varchar (100) []), 1, FALSE
  4. FROM KPO T1
  5. union all
  6. select T2. "ID", T2. "PARENT", T2. "DESCRIPTION", cast (temp1. PATH || T2. "ID" as varchar (100) []), LEVEL + 1,
  7. T2. "ID" = any (temp1. PATH )
  8. FROM KPO T2 INNER JOIN temp1 ON (temp1. "ID" = T2. "PARENT") AND NOT CYCLE
  9. )
  10. select "ID" AS "LOCATION", PATH [1] AS "ANCESTOR", 'DEPT' AS "DID" from temp1
* This source code was highlighted with Source Code Highlighter .


such a sign will turn out



LOCATIONANCESTORHIERARCHID
============================
AKSAYAKSAYDEPT
AKSAYKPODEPT
CMMSKPODEPT
CMMSMAINTDEPT
CMMSCMMSDEPT
CMMSAKSAYDEPT
KPCAKSAYDEPT
KPCKPCDEPT
KPCKPODEPT
KPOKPODEPT
LONDONLONDONDEPT
LONDONKPODEPT
MAINTAKSAYDEPT
MAINTMAINTDEPT
MAINTKPODEPT
PRODKPODEPT
PRODAKSAYDEPT
PRODKPCDEPT
PRODPRODDEPT
U2AKSAYDEPT
U2KPODEPT
U2U2DEPT
U3U3DEPT
U3AKSAYDEPT
U3KPODEPT
URALSKKPODEPT
URALSKURALSKDEPT


Now all our selects and updates will be done using this table. for example, with the same phone numbers, it would look something like this:

Update EMPLOYEE SET “TELNUM” = '545-454' FROM ANCESTOR WHERE “ANCESTOR”.”ANCESTOR” = 'AKSAY' AND EMPLOYEE.”LOCATION” = ANCESTOR.”LOCATION” AND EMPLOYEE.ISDPRT = 'N' ;

Moreover, it will be necessary to foresee a trigger for updating the table when adding a new one or deleting an entry in the KPO table.

Continue

Suppose that there is a table in which logs are recorded, imagine that you need to count how many records for the previous month were made and grouped by day. for this you will need a reference calendar for the past month, for example, so as not to miss the day when there were no records. such a calendar (list of days in one column) can be obtained by such a request.
  1. WITH RECURSIVE t (n) AS (
  2. VALUES (1)
  3. UNION ALL
  4. SELECT n + 1 FROM t WHERE n <( SELECT cast ( extract ( 'day' from date_trunc ( 'month' , current_date ) - interval '24 hour ' ) as integer ))
  5. )
  6. SELECT to_date (n || '-' || extract ( 'month' from date_trunc ( 'month' , current_date ) - interval '24 hour ' )
  7. || '-' || extract ( 'year' from date_trunc ( 'month' , current_date ) - interval '24 hour ' ), ' dd-mm-yyyy ' )
  8. FROM t;
* This source code was highlighted with Source Code Highlighter .


Another not very useful example for a snack. The original idea of ​​Graeme Joba (Graeme Job). the result is better to look at the text file
  1. WITH RECURSIVE z (ix, iy, cx, cy, x, y, i) AS (
  2. SELECT ix, iy, x :: float , y :: float , x :: float , y :: float , 0
  3. FROM ( select -1.88 + 0.016 * i, i from generate_series (0,150) as i) as xgen (x, ix),
  4. ( select -1.11 + 0.060 * i, i from generate_series (0.36) as i) as ygen (y, iy)
  5. UNION ALL
  6. SELECT ix, iy, cx, cy, x * x - y * y + cx, y * x * 2 + cy, i + 1
  7. FROM z
  8. WHERE x * x + y * y <16 :: float
  9. AND i <27
  10. )
  11. SELECT array_to_string (array_agg ( substring ( '. ,,, ----- ++++ %%%% @@@@ ####' ,
  12. greatest (i, 1), 1)), '' )
  13. FROM (
  14. SELECT ix, iy, max (i) AS i
  15. FROM z
  16. GROUP BY iy, ix
  17. ORDER BY iy, ix
  18. ) AS zt
  19. GROUP BY iy
  20. ORDER BY iy
* This source code was highlighted with Source Code Highlighter .


Well, such a post to start ;-)

Source: https://habr.com/ru/post/73700/


All Articles