📜 ⬆️ ⬇️

Oracle SQL: Model dialect in examples. Part 2

image This is a continuation of an article on using the SELECT statement extension — the Model construct. From the first part, you already have an idea of ​​the purpose and some features of its use, as well as are familiar with half the syntax. Further, several complex examples will be analyzed, as well as an analysis of the scope and performance.

The use of the MODEL construction prohibits the use of aggregate functions inside other SELECT blocks (in general, all selectable columns should be derived from those mentioned in MODEL) - instead, these functions should be declared inside PARTITION BY, DIMENSION BY or MEASURES. For example, the following simple query

SELECT employee_id, sum(amount)
FROM sales
GROUP BY employee_id;

will be equivalent

SELECT employee_id, amt
FROM sales
GROUP BY employee_id
MODEL PARTITION BY (employee_id)
DIMENSION BY (0 dummy)
MEASURES (sum(amount) amt)
RULES ();

')
To control the changes made by the rules, there are three semantics: UPDATE, UPSERT, UPSERT ALL. UPDATE only allows elements to be updated, UPSERT (works by default) to change and add using positional links, and UPSERT ALL allows elements to be created using symbolic links. You can change the semantics both for an individual rule (in this case, the directive is written before it), and at the level of the whole block (then the directive must be specified after RULES).

It is important to know that UPSERT ALL does not work in the same way as the FOR loop. The creation of new elimination in this case occurs in four steps.
  1. All items that match the symbolic link are found.
  2. For each dimension is a set of unique index values.
  3. The Cartesian product of these sets is calculated.
  4. All non-existing elements present in the work are created.

Challenging examples


image At the end of a couple of examples that show how different tasks can be solved with the help of MODEL. To begin with we will display a list of drunk coffee for each day in one line:

SELECT day, substr(type, 2) listing
FROM coffee
MODEL RETURN UPDATED ROWS
PARTITION BY (day)
DIMENSION BY (row_number() OVER ( PARTITION BY day ORDER BY type) position)
MEASURES (type, cnt)
RULES ITERATE (100500) UNTIL (presentv(type[iteration_number + 2], 1, 0) = 0) (
type[0] = type[0] || ', ' || cnt[iteration_number + 1] || ' x ' || type[iteration_number + 1]
)
ORDER BY day;

DAY LISTING
---------- ----------------------------------------
1 1 x espresso, 1 x turkish
2 1 x black, 1 x espresso, 2 x turkish
3 3 x latte
4 2 x black, 1 x ice

From the new function here, presentv (ref, arg1, arg2) , it returns arg1 if the ref reference points to an existing element with a NOT NULL value and arg2 otherwise. For row numbering and sorting by name, an analytical function is used (row_number () OVER (PARTITION BY day ORDER BY type)). Processing separately for each day is provided by partitioning (PARTITION BY) both in the model and in the analytical function.

And, at the end, a geometric example. On the straight line is given a set of intersecting segments [p1, p2], it is required to display a sorted list of ranges completely covered with segments:

SELECT * FROM lines;

P1 P2
---------- ----------
1 6
5 7
2 4
8 20
11 28
30 32


It is believed that p2> = p1, the solution for arbitrarily stored segments is left for the warm-up by those interested. So, the example code:

SELECT p1, max(p2) p2
FROM (
SELECT p1, p2 FROM lines
MODEL DIMENSION BY (row_number() OVER ( ORDER BY p1, p2) rn)
MEASURES (p1, p2)
RULES AUTOMATIC ORDER (
p1[rn > 1] ORDER BY rn = CASE sign(p2[cv(rn) - 1] - p1[cv(rn)])
WHEN 1 THEN p1[cv(rn) - 1]
ELSE p1[cv(rn)]
END ,
p2[rn > 1] ORDER BY rn = greatest(p2[cv(rn) - 1], p2[cv(rn)])
)
)
GROUP BY p1
ORDER BY p1;

P1 P2
---------- ----------
1 7
8 28
30 32


The segments in the model are first sorted by left points and numbered (row_number () OVER (ORDER BY p1, p2)), and then viewed from the top down by the number (ORDER BY rn directive). Notice that the symbolic link is used for the first time on the left side of the equality; this is why you must specify the order in which the elements will be processed. If the point p1 of the viewed segment belongs to the previous segment, then it is replaced by p1 of the previous segment. p2 is replaced by the rightmost p2 among the current and previous segments). Thus, going down, we expand the range covered by segments. The AUTOMATIC ORDER directive is used so that p1 and p2 are counted in turn for the current segment. If you remove the directive, the first rule for all lines will be executed first, and only then the second one. In general, this directive takes into account the dependencies between elements when processing rules and may violate the order of their traversal (as described above).

Practical application and performance


In general, MODEL is used extremely rarely. This happens because 90% of really complex things are solved by analytic functions, and those 10% that remain for MODEL are traditionally removed from the database level. Despite this, here are the main application scenarios:
Performance is simple. In most cases, MODEL is faster than a similar PL / SQL code, but slower than using analytic functions. With EXPLAIN PLAN you can learn some details about processing a request.
The presence of FAST annotation means that everything is just fine. This happens under the condition that all the left-hand sides of equalities are positional links, or on the left-hand side there are symbolic links, but on the right side there are simple arithmetic aggregate functions (sum, avg, etc.). Accordingly, MODEL FAST is close in speed to the use of analytical functions, and MODEL CYCLIC can lose implementations in PL / SQL.

Literature


  1. Oracle® Database Data Warehousing Guide 11g Release 2 - Chapter 22 "SQL for Modeling". image
  2. The SQL Model Clause of Oracle Database 10g. image

Source: https://habr.com/ru/post/101301/


All Articles