📜 ⬆️ ⬇️

SQL Olympiad: analyzing a calendar task

Hello, on the radio SQL!


We continue the theme of popularizing the SQL language among the broad masses of the IT population of our planet, this time in its Russian-speaking part. However, residents of other planets, too, pull up.


Tune in to our gravitational wave, brush the slime, straighten your shells and get comfortable - we start! ..


In this article, I am going to analyze a calendar task that I gave at the SQL Olympiad , about which I wrote earlier. Fascinating recursion and cryptic aggregate functions, subqueries, and armed groupings of data - all this awaits us today!


I draw your attention that this is exactly the analysis, and not a ready-made solution. To avoid a blunt copy-paste , I intend to take a couple of actions that will get the finished result only to those who work a little head.


First, I will not give the complete and final request code. To get the final solution, you have to intelligently assemble all the parts of the request into one. For humans, it is easy, just use the head ganglia. I will also omit some of the bulky, but completely uninteresting parts (such as aligning the names of the months in the center) for refining by a file in their own place. Accordingly, the collected result without some refinement will not formally be the correct solution to the original problem. But I don’t care about a drop, as my goal is to show how such problems are solved in principle, and not to get the finished result in this particular case.


Secondly, I will take another, not SQL SQL dialect. Of course, any somewhat non-trivial task requires all sorts of buns, which in different SQL variants are supported slightly differently, causing thoughts that our matrix still fails slightly. We will essentially need CTEs that allow us to assemble parts of a query with the WITH ... subexpression, and the input parameters in the statement of the problem are given as such. We will also need recursive queries or their analogue to generate sequences of a previously unknown length, and, finally, the aggregate function of stitching together strings to put everything together. With such soft limitations, the role of SQL server can claim any coffee grinder almost anything where in the description is an abbreviation of SQL. This PostgreSQL, and SQLite, and even MySQL has finally begun to support CTE. Commercial databases by themselves are all able for a long time.


After some time, I chose PostgreSQL to take a closer look at how it will look like in comparison with the Oracle database. Expressing all the necessary solution steps in a different SQL dialect should not be a problem, I personally coped with this quickly. Let me remind you that at the Olympiad, from where the task was taken, it was Oracle SQL that was used, on which the reference solution was originally written. Well, it will be more interesting for me, not only entertainment for the public.


Well, joking aside, proceed to the analysis. I will remind the condition.


Task number 1. The calendar

Write one SQL query that generates a pocket calendar. The parameters for the task indicate the year of the calendar and the number of rows and columns to form a matrix of months. Parameters are set by the following query:


 with param(year, c, r) (…) 

where respectively


  • year - the year of the calendar
  • c - the number of columns of the calendar matrix
  • r is the number of rows of the matrix.

Months are located in the cells of the calendar matrix in order from left to right and then from top to bottom. The numbers in each month are arranged according to the days of the week, the first day of the week in the first column, and so on. The beginning of the week should correspond to the database localization settings at the time of launching the request. The name of the month is also taken from the localization settings and is displayed in the center above the numbers. Between the months you need to leave a gap so that the numbers of the neighboring months “do not stick together”. The very first line should be centered year. Empty lines should not be.


For example, with the following parameters:


 with param(year, c, r) as (select 2016, 3, 4 from dual) 

The following query output should result:


  2016    1 2 3 1 2 3 4 5 6 7 1 2 3 4 5 6 4 5 6 7 8 9 10 8 9 10 11 12 13 14 7 8 9 10 11 12 13 11 12 13 14 15 16 17 15 16 17 18 19 20 21 14 15 16 17 18 19 20 18 19 20 21 22 23 24 22 23 24 25 26 27 28 21 22 23 24 25 26 27 25 26 27 28 29 30 31 29 28 29 30 31    1 2 3 1 1 2 3 4 5 4 5 6 7 8 9 10 2 3 4 5 6 7 8 6 7 8 9 10 11 12 11 12 13 14 15 16 17 9 10 11 12 13 14 15 13 14 15 16 17 18 19 18 19 20 21 22 23 24 16 17 18 19 20 21 22 20 21 22 23 24 25 26 25 26 27 28 29 30 23 24 25 26 27 28 29 27 28 29 30 30 31    1 2 3 1 2 3 4 5 6 7 1 2 3 4 4 5 6 7 8 9 10 8 9 10 11 12 13 14 5 6 7 8 9 10 11 11 12 13 14 15 16 17 15 16 17 18 19 20 21 12 13 14 15 16 17 18 18 19 20 21 22 23 24 22 23 24 25 26 27 28 19 20 21 22 23 24 25 25 26 27 28 29 30 31 29 30 31 26 27 28 29 30    1 2 1 2 3 4 5 6 1 2 3 4 3 4 5 6 7 8 9 7 8 9 10 11 12 13 5 6 7 8 9 10 11 10 11 12 13 14 15 16 14 15 16 17 18 19 20 12 13 14 15 16 17 18 17 18 19 20 21 22 23 21 22 23 24 25 26 27 19 20 21 22 23 24 25 24 25 26 27 28 29 30 28 29 30 26 27 28 29 30 31 31 

Solution approach


What I really like about this task is that it does not need any special knowledge of algorithms or anything else specific to computer science to solve it. Well, that is, you need to be able to program at all, it is useful to know what recursion is, etc. In my opinion all this is still taking place at school. But the task falls into several steps, each of which has its own nuances. As a result, the task turned out not too complicated, but also not to say so trivial.


The only really nontrivial thing that is there is to imagine how you can generate a calendar matrix when the width and height are given by the parameters. Everything is quite simple with height - all the SQL dialects somehow allow you to generate the number of records specified by the parameter by queries. These are usually recursive queries, although sometimes special constructions come across. For example, in the same PostgreSQL there was a special construct generate_series(MIN, MAX) , which generates a series of values ​​from MIN to MAX. You can also use the "classic" recursive query of the form:


 with recursive seq(n) as ( select MIN union all select n+1 from seq where n<MAX) 

but the special design will be shorter. So you can get the desired number of lines.


Now we decide on how to generate the number of columns specified by the parameter. In principle, everything is the same as with the lines above, you can generate the required number of records. And then, when they need to be output, we group and glue these records together with an aggregate function for working with strings. In PostgreQSL for this, a suitable string_agg() function was found:


 select string_agg(t::text,'-') from generate_series(MIN,MAX) as s(t); 

With the help of this technique, we will generate a matrix blank for the calendar, in which horizontally and vertically will be the required number of months, specified in the parameters. Each month we will be presented in the form of familiarity in 6 rows with 7 columns for each of the days of the month, like this:


 xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx 

Call each such construction a month, so it will be convenient to refer on. We will arrange the numbers according to the days of the week by month columns. And in rows in accordance with the number of the week in the month - first fill in the first line, then the second and so on. You need seven columns by the number of days of the week, let it be six rows. More than six weeks in a month can not be. Of course, I mean the earthly Gregorian calendar. The request to residents of other planets to treat with understanding, this task was invented primarily for earthlings of the XXI century (013 in the tenture, to the left of the BM).


Now that everything has become clear with the most non-trivial, we will deal with the remaining technical details. We will need to generate all the days of the year, then to place them in the matrix obtained above. There may be a hitch in order to correctly determine this number of days. For example, given the fact that the year may be a leap year. Or here in the terrestrial Gregorian calendar in 1582 there are no days from October 5 to October 14 (and Oracle honestly shows this!). So the days of the desired year will be received as follows: all days from the first day of the year specified in the parameters, and up to (but not including) the first day of the next year.


And the last step we will need to carefully put everything together, add the names of the months and the year number on top, remove the empty lines, etc.


Total in order:


  1. Generate all days of the year.
  2. Generate a blank matrix with the required number of rows and columns.
  3. Write the days of the year in the matrix of the calendar in the right places.
  4. Put it all together to display the results.

Implementation


Go.


Here are our initial parameters for the task:


 with params(year, r, c) as (select 2016, 3, 4) 

Generation of all days of the year. The days themselves can be generated via generate_series(START_DATE, END_DATE) , specifying the beginning of the first day of the year and the end of the previous day from the first day of the next year. Next, besides the date itself, we will need to get some useful data from it that will be useful to us: the number of the day of the week, the number of the month, the number of the day in the month and the day of the week for the first day of the month. We can get this data as needed, but it will be cumbersome, it is better to count right away. Having looked at the documentation on the functions of working with dates in PostgreSQL, I see that for this you can use the extract() function.


 ... days(day, moy, dom, dow, fdow) as (select d -- day of year (date) , extract(month from d)::int-1 -- month, 0-11 , extract(day from d)::int -- day of month, 1-31 , extract(isodow from d)::int-1 -- day of week, 0-6 , extract(isodow from date_trunc('month', d))::int-1 -- day of week of first day in month, 0-6 from params p , generate_series( (p.year ||'-01-01')::date , ((p.year+1)||'-01-01')::date - 1, '24:00') as s(d)) 

Now we will generate a calendar matrix with the required number of months in columns and rows. For the convenience of further placement of dates in places, in each month of the place (which is 7x6 cells), we immediately write down the number of the month that we will substitute for this place, and the position number of the day in the month in order. It will take a certain amount of obscure integer arithmetic with division completely and residues, but then the arrangement of dates will be simple and convenient:


 ... matrix(c, r, moy, pos) as (select cc, rr, cc/7 + rr/6*pc, cc%7 + rr%6*7 + 1 from params p , generate_series(0, pc*7-1) as c(c) -- columns , generate_series(0, pr*6-1) as r(r)) -- rows 

Now we gather together, setting the days in places. As I said, taking into account the preliminary preparation, this is done on time:


 ... cal (r,c,dom) as (select r,c,dom from matrix m left outer join days d on d.moy = m.moy -- same month and d.fdow+d.dom = m.pos -- position is day no plus weekday of first day ) 

Which is typical, if in the parameters of the task the calendar matrix in size is more or less than 12 months, then everything will work correctly. Either all the months will be filled and the part will remain empty, or the extra months will not fit, but in both cases the matrix will not go away.


Everything, the main part is done. It remains to carefully put everything together in cal_all . Let's start with the month with the numbers:


 ... cal_all (no, line) as (-- days in cal matrix select r, string_agg(lpad(coalesce(dom::text,' '), 3+case when c%7=0 then 2 else 0 end) ,'' order by c) from cal group by r ... 

Here the function string_agg() sticks together in one line everything that fell into one row of the calendar matrix. In this case, the empty day values ​​are replaced with spaces, and all numbers are padded to the left with spaces to align them. And within one month, 3 places are assigned to each number, and between months (condition c%7=0 ) - 5 each. This allows you to visually separate months from each other. Also note that we save the line number. It will determine the correct order in the final output.


Then we add the names of the months here. To do this, we select only the first days of the month from the days representation already made, group them by params.c pieces in a string, and glue them using string_agg() . If the calendar matrix does not fit all the months, then take the names of only those that fit. We do not forget to add each name with spaces so that the months are above their months, and also give each resulting line a number so that in the final sorting the months fall into place. That is, the first line with months must stand before the first line with numbers, the second before the seventh (remember that we took six lines a month away?) And so on. Everything comes together like this:


 ... union all -- month names select (moy/pc)*6-0.5, string_agg(lpad(to_char(day, 'Month'), 7*3+2) , '' order by moy) from days d, params p where dom = 1 -- first days of months only and moy < pr*pc group by (moy/pc) 

It remains to top in the middle to finish the year:


 ... union all select -1, lpad(year::text, (7*3+2)*c/2+length(year::text)/2) from params 

It now remains to select all the lines in the correct order from cal_all and discard the empty ones, if any:


 ... select line from cal_all where trim(line) <> '' order by no 

This is the final part of our request.


What is behind the scenes. It is params2 to make a representation of params2 and put in it constants like "number of familiarity for each day of the month" and "number of spaces between the months". Because if you suddenly need to change them, you will have to look for these numbers in code that is not always obvious, which is fraught with errors. Well, I have simplified all the alignment functions in order not to overload the code. And according to the condition of the problem, everything needs to be centered.


findings


The main and the main one, which I tried to convey to the audience: there is nothing too difficult in solving such problems.


What can you say about PostgreSQL in comparison with Oracle on the basis of this task. Functionally, everything corresponds, you can express approximately the same and approximately the same. Some nuances are more convenient in one dialect, some in another. The functions are different, but the documentation is given to us. Is there a significant difference in something? Yes there is. On the example of this task, I see at least in two places.


First, Oracle supports locales, and in different locales a week can start on different days of the week. For example, in most parts of Europe, the week starts on Monday, and in the US from Sunday. In PostgreSQL, there are no locale settings for the first day of the week and it is impossible to generate a calendar so that it starts from the day familiar to the user.


Secondly, there is support for date conversion and work with the calendar. In Oracle, October 4, 1582 is October 15 (as determined when entering the Gregorian calendar), there are 05 in PostgreSQL and all the other numbers of October 1582. The question is not as simple as it may seem at once, there is even a special section in the PostgreSQL documentation explaining the problem and why it has been solved in this way in PostgreSQL. But the fact remains: the calendars in Oracle and PostgreSQL are different, although both are Gregorian, and, accordingly, the logic of working with dates is significantly different. This can be important when porting.


If you liked the presentation, and you have ideas about what else you need to write about, I’m waiting for you in the comments. You can also answer the question below, whether you support the publication of the analysis of Olympiad problems.


At this today I say goodbye to our audience, stay tuned! ..


')

Source: https://habr.com/ru/post/359064/


All Articles