📜 ⬆️ ⬇️

Oracle SQL: Model dialect in examples. Part 1

image Any developer dealing with the generation of reports from databases regularly encounters the construction of cumbersome queries. Often this is due to database design errors, and, more often, with the difficulties of converting the extracted data. The latter include the use of iterative methods of calculation, calculation of subtotals by subgroups, calculations that use the values ​​of the adjacent sample lines, complex formatting of lines and similar tasks. Such transformations are often moved from the database level to the application server or client level, which affects the performance and convenience of maintaining the code. To solve these problems, SQL DBMS Oracle provides analytical functions and the MODEL operator - this is the topic of this article.

This SELECT extension is now available with DBMS version 10g. MODEL allows you to access sample data as multidimensional arrays, modify and add elements, perform complex aggregation, and solve a number of tasks that previously required the use of PL / SQL. At the same time, language constructs remain readable and declarative. In a word - Excel-like, plus the entire burden falls on the shoulders of the database server.

Syntax


MODEL [ IGNORE NAV ] [ RETURN UPDATED ROWS ]
[ PARTITION BY (partition_column_1, ...)]
DIMENSION BY (dimension_column_1, ...)
MEASURES (measured_column_1, ...)
RULES [ AUTOMATIC ORDER | ITERATE (value) [ UNTIL (expression)]] (
rule_1, ...
);

The MODEL operator is processed among the latter, after it only DISTINCT and ORDER BY. As a result of the application, sample columns are mapped into measured_column_ * arrays with dimension_column_ * dimensions . The optional parameter PARTITION BY defines paritizations similar to those used in analytic functions (each of them is processed by rule_ * rules as an independent array). The rules apply in the order listed.
')

The simplest examples


First, let's model a sample of numbers 1, 2, and 3:

SELECT *
FROM dual
MODEL DIMENSION BY (0 dimension)
MEASURES (dummy)
RULES (
dummy[5] = 1,
dummy[6] = 2,
dummy[7] = 3
);

DIMENSION R
---------- -
0 X
7 3
6 2
5 1

In this case, according to the three rules, the dummy array is filled with the dimension dimension . The dimension 0 alias is defined to add a new column. Let's sort the transformation in more detail. First, the selection and mapping of sampling columns ( 0 dimension in DIMENSION, dummy in MEASURES) occurs, then sampling is performed on these columns (the string dummy = X is returned , dimension = 0 ) and only after that the rules are fulfilled. First we search for a string with dimension = 5 , but since it is not located, a new one is created and dummy = 1 is filled in, similarly for the remaining two rules. If necessary, using the RETURN UPDATED ROWS directive, you can output only updated lines:

SELECT result, dummy
FROM dual
MODEL RETURN UPDATED ROWS
DIMENSION BY (dummy)
MEASURES (0 result)
RULES (
result[5] = 1,
result[6] = 2,
result[7] = 3
);

RESULT DUMMY
---------- ------
3 7
2 6
1 5

You can also apply rule sets in loops. The following query calculates several elements of the Fibonacci sequence:

SELECT sequence
FROM dual
MODEL DIMENSION BY (0 dimension)
MEASURES (0 sequence)
RULES ITERATE (100500) UNTIL (sequence[iteration_number] > 10) (
sequence[iteration_number] =
CASE iteration_number
WHEN 0 THEN 0
WHEN 1 THEN 1
ELSE sequence[iteration_number - 2] + sequence[iteration_number - 1]
END
);

SEQUENCE
----------
0
1
1
2
3
5
8
13

ITERATE specifies the number of iterations of the loop (starting from 0), and the optional UNTIL directive is a condition for exiting it (which worked, judging by the fact that you have not pressed Ctrl + End yet). Access to the counter is carried out through the variable iteration_number .

Ranges and Aggregations


image Above, we have seen examples with the calculation of the values ​​of individual elements. In addition, rules can be set for their groups, filtered by specified conditions. To do this, use the second type of links - symbolic (previously only positional). Indexes in symbolic links can contain validation conditions, for example:
In positional references, the index values ​​are uniquely determined:
With the help of symbolic links, you cannot create new elements - just update existing ones (if you really want, then, of course, you can, but more on that in the next part). Positional allow both to update and create. With the ranges specified by symbolic links, it is permissible to use any aggregate functions. Analytical functions inside the rules are prohibited.

Consider a table storing information about coffee consumed during the week:

SELECT * FROM coffee;

TYPE CNT DAY
-------------------- ---------- ----------
turkish 1 1
espresso 1 1
turkish 2 2
black 1 2
espresso 1 2
latte 3 3
black 2 4
ice 1 4

Suppose we want to report on how many cups of coffee are drunk on each of the days and in general, at the same time, we take into account that on Thursday the portions of black coffee were double. So the query:

SELECT *
FROM coffee
MODEL DIMENSION BY (day, type)
MEASURES (cnt)
RULES (
cnt[4, 'black' ] = cnt[cv(day), 'black' ] * 2,
cnt[ FOR day FROM 1 TO 4 INCREMENT 1, ' total for day' ] = sum<(cnt)[cv(day), ANY ],
cnt[ NULL , 'GRAND TOTAL' ] = sum(cnt)[ ANY , ' total for day' ]
)
ORDER BY day, type DESC ;

DAY TYPE CNT
--------- -------------------- ---------
1 turkish 1
1 espresso 1
1 total for day 2
2 turkish 2
2 espresso 1
2 black 1
2 total for day 4
3 latte 3
3 total for day 3
4 ice 1
4 black 4
4 total for day 5
GRAND TOTAL 14

Let's sort the rules in more detail. The first doubles the amount of coffee drunk on Thursday. The function cv (dimension_name) returns the current index value by dimension dimension_name for the element being processed (i.e., in this case, instead of cv (day), you could specify day = 4 or, if you wish, refer to the previous day as day = cv ( day) - 1) . The second rule calculates subtotals from Monday to Thursday. In addition to the cycle (which is already obvious), here we use references to the ranges of elements in the right-hand sides of the equations. You can specify ranges in the same ways as the checks in the WHERE clause; the additional keyword ANY is used to select any index values. Links to ranges in the right-hand side of the equation must be aggregated, in this case the sum function is used. And finally, the third rule considers the sum of subtotals.

Moving on. Consider a query that displays the subtotals of a drink by type of coffee, the amount of latte drunk on Tuesday, and also a message about whether the goal of the week was achieved - to drink an espresso on Wednesday:

SELECT *
FROM coffee
MODEL DIMENSION BY (day, type)
MEASURES (cnt)
RULES (
cnt[ NULL , FOR type IN ( SELECT DISTINCT type FROM coffee)] = sum(cnt)[ ANY , cv(type)],
cnt[ NULL , 'GRAND TOTAL' ] = sum(cnt)[ NULL , ANY ],
cnt[ NULL , ' drank ' || cnt[2, 'latte' ] || ' cups of latte on 2 day' ] = NULL ,
cnt[ NULL , CASE
WHEN cnt[3, 'espresso' ] IS PRESENT THEN ' ACHIEVED'
ELSE ' FAILED'
END || ': drank espresso on 3 day' ] = NULL
)
ORDER BY day, type DESC ;

DAY TYPE CNT
---------- ---------------------------------------- ----------
1 turkish 1
1 espresso 1
2 turkish 2
2 espresso 1
2 black 1
3 latte 3
4 ice 1
4 black 2
turkish 3
latte 3
ice 1
espresso 2
black 3
GRAND TOTAL 12
drank cups of latte on 2 day
FAILED: drank espresso on 3 day

The first rule uses a loop with iterations on the sample values. Nested queries used inside a MODEL must be uncorrelated. The following rule calculates the total amount. Pay attention to the line "drank cups of latte on 2 day". Because the cnt element [2, 'latte'] was not found, we got NULL by reference. This behavior can be changed with the IGNORE NAV directive (added after the word MODEL), then instead of the missing elements and NULL, the calculations will substitute: 0 for numbers, January 1, 2001 for dates, empty string for string types and NULL for everything else. And finally, the fourth rule demonstrates the use of the expression IS PRESENT, it returns true if the given element exists, but, alas, no espresso was drunk on Wednesday.

This introductory is complete, I hope you got a general idea and got used to the syntax. In the second part, we will talk about managing the update and creating elements, controlling the output of MODEL, questions of the applicability of the method and its performance. And, of course, more complex examples will be analyzed.

Read continuation

Literature

  1. Oracle® Database Data Warehousing Guide 11g Release 2 - Chapter 22 "SQL for Modeling". HTML
  2. The SQL Model Clause of Oracle Database 10g. PDF

Source: https://habr.com/ru/post/101003/


All Articles