Oracle SQL: Model dialect in examples. Part 2

This is a continuation of an article on using the SELECT statement extension — the Model construct. From the first part, you already have an idea of the purpose and some features of its use, as well as are familiar with half the syntax. Further, several complex examples will be analyzed, as well as an analysis of the scope and performance.

The use of the MODEL construction prohibits the use of aggregate functions inside other SELECT blocks (in general, all selectable columns should be derived from those mentioned in MODEL) - instead, these functions should be declared inside PARTITION BY, DIMENSION BY or MEASURES. For example, the following simple query

SELECT employee_id, sum(amount) FROM sales GROUP BY employee_id;

will be equivalent

SELECT employee_id, amt FROM sales GROUP BY employee_id MODEL PARTITION BY (employee_id) DIMENSION BY (0 dummy) MEASURES (sum(amount) amt) RULES ();

')
To control the changes made by the rules, there are three semantics: UPDATE, UPSERT, UPSERT ALL. UPDATE only allows elements to be updated, UPSERT (works by default) to change and add using positional links, and UPSERT ALL allows elements to be created using symbolic links. You can change the semantics both for an individual rule (in this case, the directive is written before it), and at the level of the whole block (then the directive must be specified after RULES).

It is important to know that UPSERT ALL does not work in the same way as the FOR loop. The creation of new elimination in this case occurs in four steps.

All items that match the symbolic link are found.
For each dimension is a set of unique index values.
The Cartesian product of these sets is calculated.
All non-existing elements present in the work are created.

Challenging examples

At the end of a couple of examples that show how different tasks can be solved with the help of MODEL. To begin with we will display a list of drunk coffee for each day in one line:

SELECT day, substr(type, 2) listing FROM coffee MODEL RETURN UPDATED ROWS PARTITION BY (day) DIMENSION BY (row_number() OVER ( PARTITION BY day ORDER BY type) position) MEASURES (type, cnt) RULES ITERATE (100500) UNTIL (presentv(type[iteration_number + 2], 1, 0) = 0) ( type[0] = type[0] || ', ' || cnt[iteration_number + 1] || ' x ' || type[iteration_number + 1] ) ORDER BY day; DAY LISTING ---------- ---------------------------------------- 1 1 x espresso, 1 x turkish 2 1 x black, 1 x espresso, 2 x turkish 3 3 x latte 4 2 x black, 1 x ice

From the new function here, presentv (ref, arg1, arg2) , it returns arg1 if the ref reference points to an existing element with a NOT NULL value and arg2 otherwise. For row numbering and sorting by name, an analytical function is used (row_number () OVER (PARTITION BY day ORDER BY type)). Processing separately for each day is provided by partitioning (PARTITION BY) both in the model and in the analytical function.

And, at the end, a geometric example. On the straight line is given a set of intersecting segments [p1, p2], it is required to display a sorted list of ranges completely covered with segments:

SELECT * FROM lines; P1 P2 ---------- ---------- 1 6 5 7 2 4 8 20 11 28 30 32

It is believed that p2> = p1, the solution for arbitrarily stored segments is left for the warm-up by those interested. So, the example code:

SELECT p1, max(p2) p2 FROM ( SELECT p1, p2 FROM lines MODEL DIMENSION BY (row_number() OVER ( ORDER BY p1, p2) rn) MEASURES (p1, p2) RULES AUTOMATIC ORDER ( p1[rn > 1] ORDER BY rn = CASE sign(p2[cv(rn) - 1] - p1[cv(rn)]) WHEN 1 THEN p1[cv(rn) - 1] ELSE p1[cv(rn)] END , p2[rn > 1] ORDER BY rn = greatest(p2[cv(rn) - 1], p2[cv(rn)]) ) ) GROUP BY p1 ORDER BY p1; P1 P2 ---------- ---------- 1 7 8 28 30 32

The segments in the model are first sorted by left points and numbered (row_number () OVER (ORDER BY p1, p2)), and then viewed from the top down by the number (ORDER BY rn directive). Notice that the symbolic link is used for the first time on the left side of the equality; this is why you must specify the order in which the elements will be processed. If the point p1 of the viewed segment belongs to the previous segment, then it is replaced by p1 of the previous segment. p2 is replaced by the rightmost p2 among the current and previous segments). Thus, going down, we expand the range covered by segments. The AUTOMATIC ORDER directive is used so that p1 and p2 are counted in turn for the current segment. If you remove the directive, the first rule for all lines will be executed first, and only then the second one. In general, this directive takes into account the dependencies between elements when processing rules and may violate the order of their traversal (as described above).

Practical application and performance

In general, MODEL is used ~~extremely~~ rarely. This happens because 90% of really complex things are solved by analytic functions, and those 10% that remain for MODEL are traditionally removed from the database level. Despite this, here are the main application scenarios:

it is necessary to generate a really complex report (readability and maintainability are priority);
sampling is not feasible even with the use of analytical functions;
the query is dynamically generated in a high-level language (it is easier to generate rules in MODEL than analytics);
resource-intensive iterative calculations are used, when MODEL is still enough, and the PL / SQL transition leads to a drop in performance;
academic interest (declarative approach instead of procedural).

Performance is simple. In most cases, MODEL is faster than a similar PL / SQL code, but slower than using analytic functions. With EXPLAIN PLAN you can learn some details about processing a request.

SQL MODEL ORDERED [FAST] - means that the rules are processed in the sequence specified in the query (by default);
SQL MODEL ACYCLIC [FAST] - cell dependencies are automatically calculated;
SQL MODEL CYCLIC is the slowest option, with complex inter-cell dependencies.

The presence of FAST annotation means that everything is just fine. This happens under the condition that all the left-hand sides of equalities are positional links, or on the left-hand side there are symbolic links, but on the right side there are simple arithmetic aggregate functions (sum, avg, etc.). Accordingly, MODEL FAST is close in speed to the use of analytical functions, and MODEL CYCLIC can lose implementations in PL / SQL.

Literature

Oracle® Database Data Warehousing Guide 11g Release 2 - Chapter 22 "SQL for Modeling".
The SQL Model Clause of Oracle Database 10g.

Source: https://habr.com/ru/post/101301/

All Articles

Oracle SQL: Model dialect in examples. Part 2

Challenging examples

Practical application and performance

Literature

More articles: