📜 ⬆️ ⬇️

As an sql query, retrieve information from the database that is not there.

Under such a tricky title is quite an easy task, but first a small introduction:


Users come and ask: “Here we have entered the data into the database, but tell us what is missing? What data we have not yet entered into the database and they are not enough for complete happiness? "
The first (and frankly, very stupid) reaction: “How can I find you something that is not in the database?”.


But let us drop emotions and apply logic. After all, as a rule, data are required, the formation of which obeys a certain rule - the numbers of receipts, references and so on ... And I assume that all these numbers and identifiers can be converted into natural sequence.
That is, the task will be formulated as follows: a sequence of natural numbers in which there are gaps is stored in the database, and it is necessary to output the missing numbers for the user.
In this formulation, the task already looks pretty simple. Moreover, there is a desire to implement this task with a single sql query.


Let's create a table and fill it with some data.


CREATE TABLE IF NOT EXISTS `Test` (`id` int(6) NOT NULL); INSERT INTO `Test` (`id`) VALUES (3), (5), (7), (8) , (9) , (11) , (12), (16) , (17) ; 

The basic idea is as follows: compare the table with itself and for each value of the X and find the minimum IGREC (which is still greater than the X), where (X + 1) and (IGREC - 1) are our boundaries for the missing ranges of numbers. Adding a logical condition that, (X + 1) should be no less (IGREC - 1) we get the following ranges: from 4 to 4, from 6 to 6, from 10 to 10 and from 13 to 15.
What are the nuances:
1) The first element of the sequence can be skipped (in our case it is 1)
2) The last element of the sequence is unknown (what if it is 22). You can, of course, request this information from the user, but experience suggests that it is better to avoid it.
3) The range "from 4 to 4" looks buggy, you just have to replace it with one number
4) The result is still desirable to obtain the value of a single line, rather than a set of lines


We take into account the comments and get a version of the script under MySQL:


 SELECT GROUP_CONCAT( ranges ) FROM ( SELECT CASE WHEN id2 IS NULL THEN CONCAT( id1, '...' ) WHEN id1 = id2 THEN id1 ELSE CONCAT( id1, '-', id2 ) END ranges FROM ( SELECT id +1 id1, ( SELECT MIN( id ) -1 FROM `Test` t2 WHERE t2.id > t1.id )id2 FROM `Test` t1 UNION SELECT 1 , MIN( id ) -1 FROM `Test` t3 )t WHERE id1 <= id2 OR id2 IS NULL ORDER BY id1 )tt 

and option under Oracle:


 SELECT LISTAGG (ranges, ', ') FROM ( SELECT CASE WHEN id2 IS NULL THEN TO_CHAR (id1) || '...' WHEN id1 = id2 THEN TO_CHAR (id1) ELSE TO_CHAR (id1) || '-' || TO_CHAR (id2) END ranges FROM ( SELECT id + 1 id1, (SELECT MIN (id) - 1 FROM TEST t2 WHERE t2.id < t1.id) id2 FROM TEST t1 UNION SELECT 1, MIN (id) - 1 FROM TEST t3) t WHERE id1 <= id2 OR id2 IS NULL ORDER BY id1 ) tt 

The result of their execution is the string '1-2, 4, 6, 10, 13-15, 18 ...'
First, this line contains what the users wanted.
Secondly, the result looks clear to any user.
And most importantly, the query displays data that is not actually stored in the database!


UPD1:


Many thanks to everyone who suggested options for improving the scripts, and just expressed interest in this issue.
As is often the case, comments turned out to be more interesting and useful than the original article.
Indeed, it was worth noting that nuance number 4 is not implemented by the standard SQL syntax and for this task it is necessary to attract additional functionality implemented on each separate DBMS in its own way.
Below are the scripts that I pulled from comments on the article.


MySQL option from asmm


 SELECT CONCAT(IFNULL(CONCAT(GROUP_CONCAT(miss_num), ','), '') , IFNULL(MAX(id) + 1, @start_num) , '...' ) miss_num FROM ( SELECT @prev_id prev_id , CASE WHEN @prev_id + 1 = id THEN NULL WHEN @prev_id + 2 = id THEN @prev_id + 1 ELSE CONCAT(@prev_id + 1, '-', id - 1) END miss_num , @prev_id := id id FROM (SELECT @start_num := 1 start_num, @prev_id := @start_num - 1 prev_id) p , `Test` t WHERE t.id >= p.start_num ORDER BY t.id ) t 

Option for Oracle from xtender


 select listagg(id1||decode(id2 ,id1 ,null ,null,'...' ,'-'||id2) ,',') within group(order by id1)s from (select max(id)+1 id1 ,lead(min(id)) over(order by min(id)) id2 from (select 0 id, 0 rn from dual union all select id,row_number()over(order by id) rn from test) group by id - rn) 

Option for MSSQL from yizraor


 select rlist = ( select "text()" = iif(id1 < id2, convert(varchar(15), id1) + iif(id1 < (id2 - 1), '-' + convert(varchar(15), id2 - 1), ''), '') + iif(id3 is null, iif(id1 < id2, ', ', '') + convert(varchar(15), id2 + 1) + '...', ', ') from ( select id1 = isnull(lag(id) over (order by id), 0) + 1, id2 = id, id3 = lead(id) over (order by id) from test ) t where ( id1 < id2 ) or ( id3 is null ) order by id2 for xml path('') ) 

')

Source: https://habr.com/ru/post/303364/


All Articles