📜 ⬆️ ⬇️

“Dense_rank ()” vs “Max ()” or an investigation with an unexpected end

Hello colleagues.
In this article I will talk about my research in the question: "And what is better: dense_rank () or max ()" and, of course, why this research ended with an unexpected, at least for me, result.

Background:

So there were stars that I need to look for work now. Before each interview I study the company to which I was invited, in order to understand what the company does, what I will learn if I get an offer, etc. And so, at one fine moment, I received an invitation for an interview, for the position of a PL / SQL developer, from one excellent company. After reading about it, it seemed to me that I was in love and I want to work there. When I came to the interview itself and at that moment when everything was ready for the interview, but it had not yet begun simply because people were getting to know each other, hr offers coffee, etc., I already knew that I wanted, I really want to, here work.

For all the interviews with the team leader, I was once offered to write a simple request, otherwise everything took place in the mode: question-answer.

The challenge sounded like this:
“We have a table of operations, it has 4 columns: the operation id, the client id, the date of the operation, the amount of the operation. It is necessary to withdraw the last transactions for each client with the maximum amount for a certain period. "
')
And I, of course, began frantically to think how to write so that it was beautiful and effective. And besides oracle, I also worked with a teradata and at that moment my brain issued the following request:

Hidden text
/*id  - oper_id, id  - client_id,   - input_date,   - amount*/ select t.* , max(t.amount) over (partition by t.client_id) as m_a from some_table t qualify oper_id = max(t.oper_id) over (partition by t.client_id) where m_a = amount 
And in the case of the cheater, this would work, but not with the Oracle, alas. And clearly remembering that there is no “qualify” in the oracle, I wrote on a piece of paper something like:
Hidden text
 select t.* from some_table t where amount = max(t.oper_id) over (partition by t.client_id) 


To which I was asked a question: “Why was max () used instead of dense_rank ()?”, I don’t remember what I said exactly, but it sounded like this: “I used max () more often and I can, more or less Exactly, imagine what he will return to me, unlike dense_rank (). ” I will not describe the interview further; I will only say that I, of course, were refused. Later, at home, in trying to analyze everything and understand the mistakes, I came to the conclusion that I wanted to work too hard and I was worried, otherwise I cannot explain the mess in my head during the interview. It was something akin to the feeling when a schoolboy tries to talk to a girl whom he secretly loves even from kindergarten, but these attempts more and more embarrass him. Similarly, I, trying to look calm and adequate, showed myself as a worthless specialist. In general, I decided for myself to find out that it is better to use dense_rank () or max () when solving this problem.

Study

If you want to see with my own eyes everything that I write and touch it all with my own hands - I have prepared a set of data creation scripts for the test:

Hidden text
 /* */ create table habr_test_table_220414 ( oper_id number, client_id number, input_date date, amount number, constraint habr_test_table_220414_pk primary key (oper_id) ); grant all on habr_test_table_220414 to public; /* ,   oper_id   -  */ create sequence habr_test_sequence_220414 increment by 1; grant all on habr_test_sequence_220414 to public; /* ,     ,    oper_id    */ create trigger habr_test_trigger_220414 before insert on habr_test_table_220414 for each row begin :new.oper_id := habr_test_sequence_220414.nextval; end; /*           */ /*     ,     10- */ /* ,     cost = 3,   20000 ,    */ /*   -          counter */ declare counter number := 10000; i number := 0; begin loop insert into habr_test_table_220414 ( client_id , input_date , amount ) values ( trunc (dbms_random.value (1, 11)) , to_date(trunc(dbms_random.value(to_char(date '2013-01-01','j'),to_char(date '2013-12-31','j'))),'j') , trunc (dbms_random.value (1, 100000)) ); exit when (i = counter); i := i + 1; end loop; commit; /*   ,         */ /*  :*/ insert into habr_test_table_220414 select * from habr_test_table_220414; commit; /*   id   */ end; 


So, the test data is created, it’s time to proceed to the actual queries. In order not to cut off our 20,000 lines, we will not limit our sample to any specific period, it’s important for us to understand which method is better and more efficient, and
 where input_date between to_date('01.01.2013','dd.mm.yyyy') and to_date('01.05.2013','dd.mm.yyyy') 
we can add and then.

Request using max ()
 select * from ( select c.* , max(c.oper_id) over (partition by c.client_id) as m_o/*max_operation*/ from ( select t.* , max(t.amount) over (partition by t.client_id) as m_a/*max_amount*/ from habr_test_table_220414 t ) c where c.m_a = c.amount ) where m_o = oper_id; 

Request using dense_rank ()
 select * from ( select c.* , dense_rank() over (partition by c.client_id order by c.oper_id desc) as m_o/*max_operation*/ from ( select t.* , dense_rank() over (partition by t.client_id order by t.amount desc) as m_a/*max_amount*/ from habr_test_table_220414 t ) c where c.m_a = 1 ) where m_o = 1; 


Preliminary plans for these queries (obtained in pl / sql developer):

Max:
Hidden text
image

Dense_rank:
Hidden text
image


But these are preliminary plans; we’ll get real plans with the help of the SQLTUNE utility:

Training:
 /*   max()*/ DECLARE my_task_name varchar2(30);my_sqltext clob;rep_tuning clob; BEGIN Begin DBMS_SQLTUNE.DROP_TUNING_TASK('my_sql_tuning_task_max'); exception when others then NULL; end; MY_SQLTEXT:= 'select * from ( select c.* , max(c.oper_id) over (partition by c.client_id) as m_o/*max_operation*/ from ( select t.* , max(t.amount) over (partition by t.client_id) as m_a/*max_amount*/ from habr_test_table_220414 t ) c where c.m_a = c.amount ) where m_o = oper_id'; MY_TASK_NAME:=DBMS_SQLTUNE.CREATE_TUNING_TASK(SQL_TEXT => my_sqltext, TIME_LIMIT => 60, --     TASK_NAME =>'my_sql_tuning_task_max', DESCRIPTION=> my_task_name , SCOPE => DBMS_SQLTUNE.scope_comprehensive); begin DBMS_SQLTUNE.EXECUTE_TUNING_TASK('my_sql_tuning_task_max'); exception when others then null; end; END; /*   dense_rank()*/ DECLARE my_task_name varchar2(30);my_sqltext clob;rep_tuning clob; BEGIN Begin DBMS_SQLTUNE.DROP_TUNING_TASK('my_sql_tuning_task_dense'); exception when others then NULL; end; MY_SQLTEXT:= 'select * from ( select c.* , dense_rank() over (partition by c.client_id order by c.oper_id desc) as m_o/*max_operation*/ from ( select t.* , dense_rank() over (partition by t.client_id order by t.amount desc) as m_a/*max_amount*/ from habr_test_table_220414 t ) c where c.m_a = 1 ) where m_o = 1'; MY_TASK_NAME:=DBMS_SQLTUNE.CREATE_TUNING_TASK(SQL_TEXT => my_sqltext, TIME_LIMIT => 60, --     TASK_NAME =>'my_sql_tuning_task_dense', DESCRIPTION=> my_task_name , SCOPE => DBMS_SQLTUNE.scope_comprehensive); begin DBMS_SQLTUNE.EXECUTE_TUNING_TASK('my_sql_tuning_task_dense'); exception when others then null; end; END; /* ,              */ /* ,     */ 
and these real plans look like this:

Hidden text
 SELECT DBMS_SQLTUNE.REPORT_TUNING_TASK('my_sql_tuning_task_max') FROM DUAL; 


image

 SELECT DBMS_SQLTUNE.REPORT_TUNING_TASK('my_sql_tuning_task_dense') FROM DUAL; 


image


In addition to the real plan, SQLTUNE also gives recommendations on how to optimize the script, in our case it recommends collecting statistics, but since we have one tablet, the requests are in the same conditions.

Preliminary result
After all these manipulations, it is clear to me, as I hope you will, that when solving this task, max () works faster than dense_rank () 2 times and eats half the CPU time. Well, it is understandable so, without plans and other things, because max () is just the search for the greatest, while dense_rank () is, first of all, sorting, and only then numbering.

But it was not this that prompted me to write an article.

Suddenly
In the process of initial filling in the table for the test, I realized the scripts for the article, and for the first time everything happened almost in manual mode, and to check the condition of the experimental table, I used a query with order by .

Hidden text
 /*  10   */ insert into habr_test_table_220414...; .... .... insert into habr_test_table_220414...; commit; select * from habr_test_table_220414 t order by t.client_id; /*    :*/ insert into habr_test_table_220414 select * from habr_test_table_220414; commit; select * from habr_test_table_220414 t order by t.client_id; /*       */ 
After that, I modified this request to the final state “request with max ()”, without removing the order by .
Here's what happened:
 select * from ( select c.* , max(c.oper_id) over (partition by c.client_id) as m_o/*max_operation*/ from ( select t.* , max(t.amount) over (partition by t.client_id) as m_a/*max_amount*/ from habr_test_table_220414 t order by t.client_id ) c where c.m_a = c.amount ) where m_o = oper_id 


Later I wrote a “request with dense_rank ()” and started comparing plans, but noticing this ill-fated order by in the request with max (), deleted order by , but I saw the cost and remembered it. And when I saw the cost in the request with max () without an order by very surprised, because:

Preliminary plan
image
SQLTUNE real plan
image


Anyway, to say that I was very surprised - to say nothing ... How did it happen? Why order by speed the request 10 times faster? Decided to find the answer in the trace. I will not write exactly how to shoot the route in the Oracle, because this is a topic for a separate article, and articles describing this process are easy to find on the world wide web. I will provide only a set of scripts with which I conducted a trace and a link to such an article, I found it for quite some time, since it has been helping me out:

Hidden text
Link to article on enabling tracing
 alter system set timed_statistics=true; alter session set tracefile_identifier='test_for_habr_220414'; alter session set events '10046 trace name context forever, level 12'; select * from ( select c.* , max(c.oper_id) over (partition by c.client_id) as m_o/*max_operation*/ from ( select t.* , max(t.amount) over (partition by t.client_id) as m_a/*max_amount*/ from habr_test_table_220414 t order by client_id ) c where c.m_a = c.amount ) where m_o = oper_id; alter session set events '10046 trace name context off'; select value from v$parameter p where name='user_dump_dest'; /*      tkprof*/ /*     'test_for_habr_220414'*/ 
In the track we are interested in a piece that describes the actions of the oracle when executing the request, namely:
Hidden text
 select * from ( select c.* , max(c.oper_id) over (partition by c.client_id) as m_o/*max_operation*/ from ( select t.* , max(t.amount) over (partition by t.client_id) as m_a/*max_amount*/ from habr_test_table_220414 t order by client_id ) c where c.m_a = c.amount ) where m_o = oper_id call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.01 0.00 0 1 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 1 0.03 0.02 0 84 0 10 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 3 0.04 0.03 0 85 0 10 Misses in library cache during parse: 1 Optimizer mode: ALL_ROWS Parsing user id: SYS Rows Row Source Operation ----- --------------------------------------------------- 10 VIEW (cr=84 pr=0 pw=0 time=28155 us cost=23 size=1592850 card=21525) 20 WINDOW BUFFER (cr=84 pr=0 pw=0 time=28145 us cost=23 size=1313025 card=21525) 20 VIEW (cr=84 pr=0 pw=0 time=21628 us cost=23 size=1313025 card=21525) 22010 WINDOW SORT(cr=84 pr=0 pw=0 time=24393 us cost=23 size=1033200 card=21525) 22010 TABLE ACCESS FULL HABR_TEST_TABLE_220414(cr=84 pr=0 pw=0 time=5172 us cost=23 size=1033200 card=21525) 


Results

From here we see that both the preliminary and the real plans are not erroneous, there seems to be no dirty trick and one can rejoice at the tenfold acceleration. Is it true?

PS I could not answer this question and still do not believe that the request is really accelerated many times with the help of order by. I will continue to try to find out this moment, to which I urge you. And may the hidden secrets of oracle open before us!

PPS Thank you all for your attention! If you carried out a test with me, we don’t forget to clean the base behind you, especially if it is a bank of some kind.

Hidden text
 drop trigger habr_test_trigger_220414; drop sequence habr_test_sequence_220414; drop table habr_test_table_220414; 

Source: https://habr.com/ru/post/220455/


All Articles