Geekly Articles each Day

Any developer dealing with the generation of reports from databases regularly encounters the construction of cumbersome queries. Often this is due to database design errors, and, more often, with the difficulties of converting the extracted data. The latter include the use of iterative methods of calculation, calculation of subtotals by subgroups, calculations that use the values of the adjacent sample lines, complex formatting of lines and similar tasks. Such transformations are often moved from the database level to the application server or client level, which affects the performance and convenience of maintaining the code. To solve these problems, SQL DBMS Oracle provides analytical functions and the MODEL operator - this is the topic of this article.

This SELECT extension is now available with DBMS version 10g. MODEL allows you to access sample data as multidimensional arrays, modify and add elements, perform complex aggregation, and solve a number of tasks that previously required the use of PL / SQL. At the same time, language constructs remain readable and declarative. In a word - Excel-like, plus the entire burden falls on the shoulders of the database server.

#### Syntax

The MODEL operator is processed among the latter, after it only DISTINCT and ORDER BY. As a result of the application, sample columns are mapped into*measured_column_ ** arrays with *dimension_column_ * dimensions* . The optional parameter PARTITION BY defines paritizations similar to those used in analytic functions (each of them is processed by *rule_ ** rules as an independent array). The rules apply in the order listed.

')

#### The simplest examples

First, let's model a sample of numbers 1, 2, and 3:

In this case, according to the three rules, the*dummy* array is filled with the dimension *dimension* . The *dimension 0* alias is defined to add a new column. Let's sort the transformation in more detail. First, the selection and mapping of sampling columns ( *0 dimension* in DIMENSION, *dummy* in MEASURES) occurs, then sampling is performed on these columns (the string *dummy = X is* returned *, dimension = 0* ) and only after that the rules are fulfilled. First we search for a string with *dimension = 5* , but since it is not located, a new one is created and *dummy = 1 is* filled in, similarly for the remaining two rules. If necessary, using the RETURN UPDATED ROWS directive, you can output only updated lines:

You can also apply rule sets in loops. The following query calculates several elements of the Fibonacci sequence:

ITERATE specifies the number of iterations of the loop (starting from 0), and the optional UNTIL directive is a condition for exiting it (which worked, judging by the fact that you have not pressed Ctrl + End yet). Access to the counter is carried out through the variable*iteration_number* .

#### Ranges and Aggregations

Above, we have seen examples with the calculation of the values of individual elements. In addition, rules can be set for their groups, filtered by specified conditions. To do this, use the second type of links - symbolic (previously only positional). Indexes in symbolic links can contain validation conditions, for example:

Consider a table storing information about coffee consumed during the week:

Suppose we want to report on how many cups of coffee are drunk on each of the days and in general, at the same time, we take into account that on Thursday the portions of black coffee were double. So the query:

Let's sort the rules in more detail. The first doubles the amount of coffee drunk on Thursday. The function*cv (dimension_name)* returns the current index value by dimension *dimension_name* for the element being processed (i.e., in this case, instead of *cv (day),* you could specify *day = 4* or, if you wish, refer to the previous day as *day = cv ( day) - 1)* . The second rule calculates subtotals from Monday to Thursday. In addition to the cycle (which is already obvious), here we use references to the ranges of elements in the right-hand sides of the equations. You can specify ranges in the same ways as the checks in the WHERE clause; the additional keyword ANY is used to select any index values. Links to ranges in the right-hand side of the equation must be aggregated, in this case the *sum* function is used. And finally, the third rule considers the sum of subtotals.

Moving on. Consider a query that displays the subtotals of a drink by type of coffee, the amount of latte drunk on Tuesday, and also a message about whether the goal of the week was achieved - to drink an espresso on Wednesday:

The first rule uses a loop with iterations on the sample values. Nested queries used inside a MODEL must be uncorrelated. The following rule calculates the total amount. Pay attention to the line "drank cups of latte on 2 day". Because the*cnt* element *[2, 'latte'] was* not found, we got NULL by reference. This behavior can be changed with the IGNORE NAV directive (added after the word MODEL), then instead of the missing elements and NULL, the calculations will substitute: 0 for numbers, January 1, 2001 for dates, empty string for string types and NULL for everything else. And finally, the fourth rule demonstrates the use of the expression IS PRESENT, it returns true if the given element exists, but, alas, no espresso was drunk on Wednesday.

This introductory is complete, I hope you got a general idea and got used to the syntax. In the second part, we will talk about managing the update and creating elements, controlling the output of MODEL, questions of the applicability of the method and its performance. And, of course, more complex examples will be analyzed.

Read continuation

#### Literature

This SELECT extension is now available with DBMS version 10g. MODEL allows you to access sample data as multidimensional arrays, modify and add elements, perform complex aggregation, and solve a number of tasks that previously required the use of PL / SQL. At the same time, language constructs remain readable and declarative. In a word - Excel-like, plus the entire burden falls on the shoulders of the database server.

`MODEL [ IGNORE NAV ] [ RETURN UPDATED ROWS ]`

[ PARTITION BY (partition_column_1, ...)]

DIMENSION BY (dimension_column_1, ...)

MEASURES (measured_column_1, ...)

RULES [ AUTOMATIC ORDER | ITERATE (value) [ UNTIL (expression)]] (

rule_1, ...

);

The MODEL operator is processed among the latter, after it only DISTINCT and ORDER BY. As a result of the application, sample columns are mapped into

')

First, let's model a sample of numbers 1, 2, and 3:

`SELECT *`

FROM dual

MODEL DIMENSION BY (0 dimension)

MEASURES (dummy)

RULES (

dummy[5] = 1,

dummy[6] = 2,

dummy[7] = 3

);

DIMENSION R

---------- -

0 X

7 3

6 2

5 1

In this case, according to the three rules, the

`SELECT result, dummy`

FROM dual

MODEL RETURN UPDATED ROWS

DIMENSION BY (dummy)

MEASURES (0 result)

RULES (

result[5] = 1,

result[6] = 2,

result[7] = 3

);

RESULT DUMMY

---------- ------

3 7

2 6

1 5

You can also apply rule sets in loops. The following query calculates several elements of the Fibonacci sequence:

`SELECT sequence`

FROM dual

MODEL DIMENSION BY (0 dimension)

MEASURES (0 sequence)

RULES ITERATE (100500) UNTIL (sequence[iteration_number] > 10) (

sequence[iteration_number] =

CASE iteration_number

WHEN 0 THEN 0

WHEN 1 THEN 1

ELSE sequence[iteration_number - 2] + sequence[iteration_number - 1]

END

);

SEQUENCE

----------

0

1

1

2

3

5

8

13

ITERATE specifies the number of iterations of the loop (starting from 0), and the optional UNTIL directive is a condition for exiting it (which worked, judging by the fact that you have not pressed Ctrl + End yet). Access to the counter is carried out through the variable

Above, we have seen examples with the calculation of the values of individual elements. In addition, rules can be set for their groups, filtered by specified conditions. To do this, use the second type of links - symbolic (previously only positional). Indexes in symbolic links can contain validation conditions, for example:

- cnt [day <6, type LIKE 'latt%']
- cnt [day IN (3, 6), cv (type)]
- cnt [day BETWEEN 1 AND 16, regexp_like (type, '^. + (sso | tte) $')]

- cnt [2, 'black']
- cnt [7, 'latte']

Consider a table storing information about coffee consumed during the week:

`SELECT * FROM coffee;`

TYPE CNT DAY

-------------------- ---------- ----------

turkish 1 1

espresso 1 1

turkish 2 2

black 1 2

espresso 1 2

latte 3 3

black 2 4

ice 1 4

Suppose we want to report on how many cups of coffee are drunk on each of the days and in general, at the same time, we take into account that on Thursday the portions of black coffee were double. So the query:

`SELECT *`

FROM coffee

MODEL DIMENSION BY (day, type)

MEASURES (cnt)

RULES (

cnt[4, 'black' ] = cnt[cv(day), 'black' ] * 2,

cnt[ FOR day FROM 1 TO 4 INCREMENT 1, ' total for day' ] = sum<(cnt)[cv(day), ANY ],

cnt[ NULL , 'GRAND TOTAL' ] = sum(cnt)[ ANY , ' total for day' ]

)

ORDER BY day, type DESC ;

DAY TYPE CNT

--------- -------------------- ---------

1 turkish 1

1 espresso 1

1 total for day 2

2 turkish 2

2 espresso 1

2 black 1

2 total for day 4

3 latte 3

3 total for day 3

4 ice 1

4 black 4

4 total for day 5

GRAND TOTAL 14

Let's sort the rules in more detail. The first doubles the amount of coffee drunk on Thursday. The function

Moving on. Consider a query that displays the subtotals of a drink by type of coffee, the amount of latte drunk on Tuesday, and also a message about whether the goal of the week was achieved - to drink an espresso on Wednesday:

`SELECT *`

FROM coffee

MODEL DIMENSION BY (day, type)

MEASURES (cnt)

RULES (

cnt[ NULL , FOR type IN ( SELECT DISTINCT type FROM coffee)] = sum(cnt)[ ANY , cv(type)],

cnt[ NULL , 'GRAND TOTAL' ] = sum(cnt)[ NULL , ANY ],

cnt[ NULL , ' drank ' || cnt[2, 'latte' ] || ' cups of latte on 2 day' ] = NULL ,

cnt[ NULL , CASE

WHEN cnt[3, 'espresso' ] IS PRESENT THEN ' ACHIEVED'

ELSE ' FAILED'

END || ': drank espresso on 3 day' ] = NULL

)

ORDER BY day, type DESC ;

DAY TYPE CNT

---------- ---------------------------------------- ----------

1 turkish 1

1 espresso 1

2 turkish 2

2 espresso 1

2 black 1

3 latte 3

4 ice 1

4 black 2

turkish 3

latte 3

ice 1

espresso 2

black 3

GRAND TOTAL 12

drank cups of latte on 2 day

FAILED: drank espresso on 3 day

The first rule uses a loop with iterations on the sample values. Nested queries used inside a MODEL must be uncorrelated. The following rule calculates the total amount. Pay attention to the line "drank cups of latte on 2 day". Because the

This introductory is complete, I hope you got a general idea and got used to the syntax. In the second part, we will talk about managing the update and creating elements, controlling the output of MODEL, questions of the applicability of the method and its performance. And, of course, more complex examples will be analyzed.

Read continuation

Source: https://habr.com/ru/post/101003/