Geekly Articles each Day

This is a continuation of an article on using the SELECT statement extension — the Model construct. From the first part, you already have an idea of the purpose and some features of its use, as well as are familiar with half the syntax. Further, several complex examples will be analyzed, as well as an analysis of the scope and performance.

The use of the MODEL construction prohibits the use of aggregate functions inside other SELECT blocks (in general, all selectable columns should be derived from those mentioned in MODEL) - instead, these functions should be declared inside PARTITION BY, DIMENSION BY or MEASURES. For example, the following simple query

will be equivalent

')

To control the changes made by the rules, there are three semantics: UPDATE, UPSERT, UPSERT ALL. UPDATE only allows elements to be updated, UPSERT (works by default) to change and add using positional links, and UPSERT ALL allows elements to be created using symbolic links. You can change the semantics both for an individual rule (in this case, the directive is written before it), and at the level of the whole block (then the directive must be specified after RULES).

It is important to know that UPSERT ALL does not work in the same way as the FOR loop. The creation of new elimination in this case occurs in four steps.

#### Challenging examples

At the end of a couple of examples that show how different tasks can be solved with the help of MODEL. To begin with we will display a list of drunk coffee for each day in one line:

From the new function here,*presentv (ref, arg1, arg2)* , it returns *arg1* if the *ref* reference points to an existing element with a NOT NULL value and *arg2* otherwise. For row numbering and sorting by name, an analytical function is used (row_number () OVER (PARTITION BY day ORDER BY type)). Processing separately for each day is provided by partitioning (PARTITION BY) both in the model and in the analytical function.

And, at the end, a geometric example. On the straight line is given a set of intersecting segments [p1, p2], it is required to display a sorted list of ranges completely covered with segments:

It is believed that p2> = p1, the solution for arbitrarily stored segments is left for the warm-up by those interested. So, the example code:

The segments in the model are first sorted by left points and numbered (row_number () OVER (ORDER BY p1, p2)), and then viewed from the top down by the number (ORDER BY rn directive). Notice that the symbolic link is used for the first time on the left side of the equality; this is why you must specify the order in which the elements will be processed. If the point*p1 of the* viewed segment belongs to the previous segment, then it is replaced by *p1 of the* previous segment. *p2* is replaced by the rightmost *p2* among the current and previous segments). Thus, going down, we expand the range covered by segments. The AUTOMATIC ORDER directive is used so that *p1* and *p2 are* counted in turn for the __current__ segment. If you remove the directive, the first rule for __all__ lines will be executed first, and only then the second one. In general, this directive takes into account the dependencies between elements when processing rules and may violate the order of their traversal (as described above).

#### Practical application and performance

In general, MODEL is used~~extremely~~ rarely. This happens because 90% of really complex things are solved by analytic functions, and those 10% that remain for MODEL are traditionally removed from the database level. Despite this, here are the main application scenarios:

#### Literature

The use of the MODEL construction prohibits the use of aggregate functions inside other SELECT blocks (in general, all selectable columns should be derived from those mentioned in MODEL) - instead, these functions should be declared inside PARTITION BY, DIMENSION BY or MEASURES. For example, the following simple query

`SELECT employee_id, sum(amount)`

FROM sales

GROUP BY employee_id;

will be equivalent

`SELECT employee_id, amt`

FROM sales

GROUP BY employee_id

MODEL PARTITION BY (employee_id)

DIMENSION BY (0 dummy)

MEASURES (sum(amount) amt)

RULES ();

')

To control the changes made by the rules, there are three semantics: UPDATE, UPSERT, UPSERT ALL. UPDATE only allows elements to be updated, UPSERT (works by default) to change and add using positional links, and UPSERT ALL allows elements to be created using symbolic links. You can change the semantics both for an individual rule (in this case, the directive is written before it), and at the level of the whole block (then the directive must be specified after RULES).

It is important to know that UPSERT ALL does not work in the same way as the FOR loop. The creation of new elimination in this case occurs in four steps.

- All items that match the symbolic link are found.
- For each dimension is a set of unique index values.
- The Cartesian product of these sets is calculated.
- All non-existing elements present in the work are created.

At the end of a couple of examples that show how different tasks can be solved with the help of MODEL. To begin with we will display a list of drunk coffee for each day in one line:

`SELECT day, substr(type, 2) listing`

FROM coffee

MODEL RETURN UPDATED ROWS

PARTITION BY (day)

DIMENSION BY (row_number() OVER ( PARTITION BY day ORDER BY type) position)

MEASURES (type, cnt)

RULES ITERATE (100500) UNTIL (presentv(type[iteration_number + 2], 1, 0) = 0) (

type[0] = type[0] || ', ' || cnt[iteration_number + 1] || ' x ' || type[iteration_number + 1]

)

ORDER BY day;

DAY LISTING

---------- ----------------------------------------

1 1 x espresso, 1 x turkish

2 1 x black, 1 x espresso, 2 x turkish

3 3 x latte

4 2 x black, 1 x ice

From the new function here,

And, at the end, a geometric example. On the straight line is given a set of intersecting segments [p1, p2], it is required to display a sorted list of ranges completely covered with segments:

`SELECT * FROM lines;`

P1 P2

---------- ----------

1 6

5 7

2 4

8 20

11 28

30 32

It is believed that p2> = p1, the solution for arbitrarily stored segments is left for the warm-up by those interested. So, the example code:

`SELECT p1, max(p2) p2`

FROM (

SELECT p1, p2 FROM lines

MODEL DIMENSION BY (row_number() OVER ( ORDER BY p1, p2) rn)

MEASURES (p1, p2)

RULES AUTOMATIC ORDER (

p1[rn > 1] ORDER BY rn = CASE sign(p2[cv(rn) - 1] - p1[cv(rn)])

WHEN 1 THEN p1[cv(rn) - 1]

ELSE p1[cv(rn)]

END ,

p2[rn > 1] ORDER BY rn = greatest(p2[cv(rn) - 1], p2[cv(rn)])

)

)

GROUP BY p1

ORDER BY p1;

P1 P2

---------- ----------

1 7

8 28

30 32

The segments in the model are first sorted by left points and numbered (row_number () OVER (ORDER BY p1, p2)), and then viewed from the top down by the number (ORDER BY rn directive). Notice that the symbolic link is used for the first time on the left side of the equality; this is why you must specify the order in which the elements will be processed. If the point

In general, MODEL is used

- it is necessary to generate a really complex report (readability and maintainability are priority);
- sampling is not feasible even with the use of analytical functions;
- the query is dynamically generated in a high-level language (it is easier to generate rules in MODEL than analytics);
- resource-intensive iterative calculations are used, when MODEL is still enough, and the PL / SQL transition leads to a drop in performance;
- academic interest (declarative approach instead of procedural).

- SQL MODEL ORDERED [FAST] - means that the rules are processed in the sequence specified in the query (by default);
- SQL MODEL ACYCLIC [FAST] - cell dependencies are automatically calculated;
- SQL MODEL CYCLIC is the slowest option, with complex inter-cell dependencies.

Source: https://habr.com/ru/post/101301/