Following Oraklom with its 'connet by prior', all other DBMS introduce their own implementation of hierarchical queries (IZ). I would like to tell a wide audience how it is done in PostgreSQL.
Story
It all started like this
From: Evgen Potemkin <evgent@ns.terminal.ru>
Subject: Proposal of hierachical queries (a la Oracle)
Date: 2002-11-14 11:52:28 GMT
Hi there!
I want to order the hierarchical queries
posibility. It allows you to construct queries for Oracle for ex:
SELECT a, b FROM t CONNECT BY a PRIOR b START WITH cond; B
I've seen this type of query often made by adding a new type, which
stores position of row in the tree. But sorting such tree are very
tricky (i think).
Patch allows you to sorted, ie, subnodes of each node
be sorted by ORDER BY clause.
with regards, evgen
then
From: Tatsuo Ishii <ishii@postgresql.org>
Subject: RFP: Recursive query in 8.4
Date: 2008-02-19 08:36:00 GMT (1 year, 12 weeks, 6 days ago)
Hi,
I’ve promised
recursive query as defined in PostgreSQL 8.4.
Sumitomo Electric Information Systems Co., Inc.
Ltd. (http://www.sei-info.co.jp/) and SRA OSS, Inc. Japan
(http://www.sraoss.co.jp).
Well And since version 8.4 Postgresql began to support the recursive request.
Theory
')
FROM in PostgreSQL is implemented on the basis of standard SQL clause WITH. Let's get a little deeper: non-recursive WITH allows you to save on repeated subqueries, split a complex query into several smaller ones, is a convenient shortcut for referring to a subquery and is convenient in terms of saving time when writing code. as in the example below, it was possible to avoid using the subquery in WHERE by applying a temporary table top_regions formed specifically for this query.
WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
WITH regional_sales AS ( SELECT region, SUM (amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > ( SELECT SUM (total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM (quantity) AS product_units, SUM (amount) AS product_sales FROM orders WHERE region IN ( SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter .
Adding an optional RECURSEVE statement allows a Postgre query to access its output. The query algorithm must consist of other parts. The first part is the base, usually returning one line from the starting point of the hierarchy or part of the hierarchy. Ie a place in the hierarchy where the countdown will start (for example, the root). and the second recursive part that will be associated with the temporary table we declared after WITH. The first and second parts are combined with the UNION or UNION ALL operator. so that we would set an example for this chaotic set of words.
Create a table that describes the structru of one company
CREATE TABLE KPO
(
"ID" character varying(55),
"DESCRIPTION" character varying(255),
"PARENT" character varying(55
) ;
after entering the data there:
Select * from kpo
ID | DESCRIPTION | PARENT |
---|
== ====== | ================================ | ======= |
KPO | KARACHAGANAK PETROLEUM OPERATING | {null} |
AKSAY | AKSAY | KPO |
URALSK | KPO | KPO |
LONDON | LONDON | KPO |
KPC | KPC | AKSAY |
U2 | UNIT-2 | AKSAY |
U3 | UNIT-3 | AKSAY |
PROD | PRODACTION | KPC |
MAINT | MAINTENANCE | AKSAY |
CMMS | CMMS TEAM | MAINT |
Now the recursive query itself:
- WITH RECURSIVE temp1 ("ID", "PARENT", "DESCRIPTION", PATH , LEVEL ) AS (
- SELECT T1. "ID", T1. "PARENT", T1. "DESCRIPTION", CAST (T1. "ID" AS VARCHAR (50)) as PATH , 1
- FROM KPO T1 WHERE T1. "PARENT" IS NULL
- union
- select T2. "ID", T2. "PARENT", T2. "DESCRIPTION", CAST (temp1. PATH || '->' || T2. "ID" AS VARCHAR (50)), LEVEL + 1
- FROM KPO T2 INNER JOIN temp1 ON (temp1. "ID" = T2. "PARENT"))
- select * from temp1 ORDER BY PATH LIMIT 100
* This source code was highlighted with Source Code Highlighter .
The first part (rows 2-3) returns to the temporary table the first row in this case, the root entry of our structure, from which the countdown will begin in our hierarchy. the second part (lines 4-5) adds to the same temporary table the records associated with the string already contained in temp1 via JOIN (ID = PARENT) and so on until the end until all the leaves of our ROOTa are in temp1.
Also in this example, the Orakolavskaya function sys_connect_by_path was simulated.
"ID" | "PARENT" | "DESCRIPTION" | "Path" | "Level" |
---|
KPO | | KARACHAGANAK PETROLEUM OPERATING | KPO | one |
AKSAY | KPO | AKSAY | KPO-> AKSAY | 2 |
KPC | AKSAY | KPC | KPO-> AKSAY-> KPC | 3 |
PROD | KPC | PRODAUCTION | KPO-> AKSAY-> KPC-> PROD | four |
MAINT | AKSAY | MAINTENANCE | KPO-> AKSAY-> MAINT | 3 |
CMMS | MAINT | CMMS TEAM | KPO-> AKSAY-> MAINT-> CMMS | four |
U2 | AKSAY | UNIT-2 | KPO-> AKSAY-> U2 | 3 |
U3 | AKSAY | UNIT-3 | KPO-> AKSAY-> U3 | 3 |
LONDON | KPO | LONDON | KPO-> LONDON | 2 |
URALSK | KPO | URALSK | KPO-> URALSK | 2 |
Postgre does not have a built-in looping test, so if we received the data from the guys who were directly involved in creating the structure in Excel, we should check this structure for integrity. sometimes it is enough to use UNION instead of UNION ALL, but this is only sometimes. If you set the starting point in the hierarchy in the first part, and if there is even a break in the hierarchy, in principle, having started the above-mentioned error there will not be an error, just the lines “tweaks” will be ignored. But we need to know where the error is, and we can implement this by introducing an additional check before performing UNION.
- WITH RECURSIVE temp1 ("ID", "PARENT", "DESCRIPTION", PATH , LEVEL , cycle ) AS (
- SELECT T1. "ID", T1. "PARENT", T1. "DESCRIPTION", cast ( array [T1. "ID"] as varchar (100) []), 1, FALSE
- FROM KPO T1
- union all
- select T2. "ID", T2. "PARENT", T2. "DESCRIPTION", cast (temp1. PATH || T2. "ID" as varchar (100) []), LEVEL + 1,
- T2. "ID" = any (temp1. PATH )
- FROM KPO T2 INNER JOIN temp1 ON (temp1. "ID" = T2. "PARENT") AND NOT CYCLE )
- select * from temp1 WHERE CYCLE IS TRUE LIMIT 100;
* This source code was highlighted with Source Code Highlighter .
Here, as you see, the same Path field is created but all previous parents are organized in an array, which makes it possible for us to compare each new “ID” to a duplicate. we don’t use this line to search for descendants anymore, so looping is avoided (union all ... WHERE ... AND NOT CYCLE).
Advice from the official site: use the LIMIT operator as I did in the examples. This will help you keep your nerves.
Practical examples
Horsch's theory of course, but where all this can be applied in practice, tebolle in light of the cost of such requests.
In addition, it would be nice to draw a hierarchy in one query, and to find for example all the leaves of the current “ID” there are also other tasks.
Often you need to make changes, such as for example changing the phone code to all employees of a particular department in the telephone directory. Of course, you can use the column in the table of workers or even prefix the “ID”. but all this makes the base non-flexible and non-scalable. It is much better to make a separate Workers table that will display a hierarchy with three columns “ID”, “PARENT“, “HIERARCHID”. the “HIERARCHID” field will allow you to make more than one hierarchical structure. The table is called for an example ANCESTOR which will also consist of three columns “ID”, “ANCESTOR”, “HIERARCHID”. in the “ANCESTOR” field all ancestors will be contained, remember the “Path” array from Example 3. So this table can be filled with a recursive query.
- insert into ancestor (“ID” "ANCESTOR", "HIERARCHID")
- WITH RECURSIVE temp1 ("ID", "PARENT", "DESCRIPTION", PATH , LEVEL , cycle ) AS (
- SELECT T1. "ID", T1. "PARENT", T1. "DESCRIPTION", cast ( array [T1. "ID"] as varchar (100) []), 1, FALSE
- FROM KPO T1
- union all
- select T2. "ID", T2. "PARENT", T2. "DESCRIPTION", cast (temp1. PATH || T2. "ID" as varchar (100) []), LEVEL + 1,
- T2. "ID" = any (temp1. PATH )
- FROM KPO T2 INNER JOIN temp1 ON (temp1. "ID" = T2. "PARENT") AND NOT CYCLE
- )
- select "ID" AS "LOCATION", PATH [1] AS "ANCESTOR", 'DEPT' AS "DID" from temp1
* This source code was highlighted with Source Code Highlighter .
such a sign will turn out
LOCATION | ANCESTOR | HIERARCHID |
---|
========= | ========= | ========== |
AKSAY | AKSAY | DEPT |
AKSAY | KPO | DEPT |
CMMS | KPO | DEPT |
CMMS | MAINT | DEPT |
CMMS | CMMS | DEPT |
CMMS | AKSAY | DEPT |
KPC | AKSAY | DEPT |
KPC | KPC | DEPT |
KPC | KPO | DEPT |
KPO | KPO | DEPT |
LONDON | LONDON | DEPT |
LONDON | KPO | DEPT |
MAINT | AKSAY | DEPT |
MAINT | MAINT | DEPT |
MAINT | KPO | DEPT |
PROD | KPO | DEPT |
PROD | AKSAY | DEPT |
PROD | KPC | DEPT |
PROD | PROD | DEPT |
U2 | AKSAY | DEPT |
U2 | KPO | DEPT |
U2 | U2 | DEPT |
U3 | U3 | DEPT |
U3 | AKSAY | DEPT |
U3 | KPO | DEPT |
URALSK | KPO | DEPT |
URALSK | URALSK | DEPT |
Now all our selects and updates will be done using this table. for example, with the same phone numbers, it would look something like this:
Update EMPLOYEE SET “TELNUM” = '545-454' FROM ANCESTOR WHERE “ANCESTOR”.”ANCESTOR” = 'AKSAY' AND EMPLOYEE.”LOCATION” = ANCESTOR.”LOCATION” AND EMPLOYEE.ISDPRT = 'N' ;
Moreover, it will be necessary to foresee a trigger for updating the table when adding a new one or deleting an entry in the KPO table.
Continue
Suppose that there is a table in which logs are recorded, imagine that you need to count how many records for the previous month were made and grouped by day. for this you will need a reference calendar for the past month, for example, so as not to miss the day when there were no records. such a calendar (list of days in one column) can be obtained by such a request.
- WITH RECURSIVE t (n) AS (
- VALUES (1)
- UNION ALL
- SELECT n + 1 FROM t WHERE n <( SELECT cast ( extract ( 'day' from date_trunc ( 'month' , current_date ) - interval '24 hour ' ) as integer ))
- )
- SELECT to_date (n || '-' || extract ( 'month' from date_trunc ( 'month' , current_date ) - interval '24 hour ' )
- || '-' || extract ( 'year' from date_trunc ( 'month' , current_date ) - interval '24 hour ' ), ' dd-mm-yyyy ' )
- FROM t;
* This source code was highlighted with Source Code Highlighter .
Another not very useful example for a snack. The original idea of ​​Graeme Joba (Graeme Job). the result is better to look at the text file
- WITH RECURSIVE z (ix, iy, cx, cy, x, y, i) AS (
- SELECT ix, iy, x :: float , y :: float , x :: float , y :: float , 0
- FROM ( select -1.88 + 0.016 * i, i from generate_series (0,150) as i) as xgen (x, ix),
- ( select -1.11 + 0.060 * i, i from generate_series (0.36) as i) as ygen (y, iy)
- UNION ALL
- SELECT ix, iy, cx, cy, x * x - y * y + cx, y * x * 2 + cy, i + 1
- FROM z
- WHERE x * x + y * y <16 :: float
- AND i <27
- )
- SELECT array_to_string (array_agg ( substring ( '. ,,, ----- ++++ %%%% @@@@ ####' ,
- greatest (i, 1), 1)), '' )
- FROM (
- SELECT ix, iy, max (i) AS i
- FROM z
- GROUP BY iy, ix
- ORDER BY iy, ix
- ) AS zt
- GROUP BY iy
- ORDER BY iy
* This source code was highlighted with Source Code Highlighter .
Well, such a post to start ;-)