📜 ⬆️ ⬇️

Advanced Oracle Troubleshoot Session - PGA / UGA Memory Fragmentation

Luxoft Training offers to get acquainted with the translation of Randolph Gaist 's article “Advanced Oracle Troubleshooting Session - PGA / UGA memory fragmentation” .

Randolph Geist specializes in correcting errors related to the performance of Oracle databases. One of the best specialists in the world in the area of ​​analyzing the performance of SQL-code and Oracle optimization technologies. Is an Oracle Certified Database Administrator (OCP DBA) versions 8i, 9i and 10g.


Advanced Oracle Troubleshoot Session - PGA / UGA Memory Fragmentation


The diagnostics and troubleshooting database session for one of my clients began with the following description of the problem: in some specific environments, sometimes a separate batch process takes much longer than expected (instead of a few minutes a few hours). In addition, when monitoring, you can detect SQL queries that behave non-standard.
')
So, first of all it was necessary to learn how to reproduce the problem.
Pretty quickly, we were able to detect a specific call to a complex PL / SQL stored procedure from a package that accessed many other stored procedures / packages and that served to process LOBs (large objects) containing XML data.

It turned out that the problem can be reproduced by calling this stored procedure several times in a loop, and in various database environments, as was reported at the beginning.

A unique feature of this stored procedure — it was executed in “exclusive” mode. This means that it was run on a fairly large IBMpSeries server (AIX 5.3, 11.1.0.7, 240 GB of RAM, numerous multi-core P6 processors, expensive storage) without any other active business processes at the same time. Basically it was a consistent implementation. The monitoring showed that this procedure spent most of the time on two simple and clear SQL queries. These were static SQL queries from PL / SQL that accessed the same small table using a unique index and ROWID. Thus, we had two SQL queries that, under normal conditions, could in the worst case require two logical reads, since the index and the table were completely placed in the buffer cache.
During the monitoring of the execution of these SQL queries at the start of the aforementioned cycle, it became clear that they were called thousands of times (depending on the content of LOBXML) at each start.

Since these were static SQL, the PL / SQL cursor cursors (as well as the data block cache) worked very well — that is, the query was parsed only once, and then with each subsequent execution only logical, but not physical reads occurred.

In addition, in most cases, these SQL queries did not find the desired value in a small table. This means that, on average, a single logical read — a scan of a unique index — was required for one query execution, since the value it was looking for was missing.
When checking the statistics for executing SQL queries in different environments (Linux instead of AIX), it was found that the same SQL queries were executed the same number of times, but in these environments it took less time to execute a batch job. In this regard, a huge number of SQL executions looked strange, but it was not the main cause of the problem.

So:
• We defined a stored procedure that works with LOBs that represent XML as the cause of the problem.
• Most of the procedure time was spent on two static SQL queries that were executed thousands of times each time the stored procedure was run.
• However, these SQLs used the “optimal” execution plan and actually generated only one logical I / O per run.
• Due to the “exclusive” behavior of the batch job, rivalry with other processes was impossible.
• In other words, we are not talking about the usual suspicions like “ineffective query plan”, “competition for resources”, “serialization”, etc.
• Therefore, conventional performance monitoring methods: wait events, extended SQL tracing, active session history, session statistics / system statistics, ASH / AWR / ADDM reports, did not reveal an obvious cause of the problem.
• Running SQL queries defined in a separate process showed that they take an average of 10 microseconds (!) In one run, which is pretty fast. In this connection, the question arises: how is it possible that these queries are displayed as problematic - when launched in a simple cycle, ten thousand launches are possible in one second.

We are faced with the classic situation “Waiting events explain nothing (sometimes)” . All that we see - most of the time was spent on running these SQL. Everything that we found out - all this happened on a single processor that fulfilled these requests.

But wait, during the monitoring of the statistical session delta (for example, using the Tenel Poder snapper utility) an interesting point came up: the process required more and more PGA / UGA memory when executed.

Thus, it was time to apply advanced methods of problem solving - for example, Oak Table participant Tanel Poder performed a lot of research in this area and provided a large number of useful tools.

In the course of further inspections, several anomalies were discovered, which oriented in which direction to look further.

The consumption of the PGA / UGA process increased with each iteration of the loop that triggers the PL / SQL stored procedure, i.e. Somewhere there was a memory problem. In this regard, it was a good idea to check the memory dump of the PGA / UGA process.

Beware: getting a memory dump can damage your system or process, so be very careful when performing this operation in a production environment. Getting a PGA memory dump usually affects only one process, so it’s not too critical (if it’s not a background process, of course).

You can get a memory dump using EVENTS:

ALTER SESSION SET EVENTS 'immediate trace name heapdumplevel <lvl>'; 


You can also request a memory dump using ORADEBUG ASSYSDBA after defining a process using SETOSPID, SETORAPID, or SETMYPID:

 ORADEBUG DUMP HEAPDUMP <LVL>'; 


See also the site of Julian Dyke for an overview of the available levels of memory dumps.

In this case, the fifth level dump (5 = 4 + 1, i.e. PGA and UGA) seems to be sufficient (if you are interested in current / user calls, you need level 29 = 8 + 16 + 4 + 1).

In Chapter 8 of our book , Oracle's Recommendations for Oracle Experts from the Oak Table , Charles and I describe the most important options for memory dumps and other performance monitoring methods used in this case.

The resulting trace file can be primarily analyzed, for example, using the Tanela heapdump_analyzer script ... Since in our case the trace file was quite large, we used a slightly modified version of the awk program from heapdump_analyzer to create the file, which was then connected as an external Oracle table so that it was possible to perform more complex queries to this data.

Here is a command that converts a trace file to a file suitable for use as an external Oracle table:

 cat $TRACE_FILE | awk ' /^HEAP DUMP heap name=/ { split($0,ht,"\""); HTYPE=ht[2]; doPrintOut = 1; } /Chunk/{ if ( doPrintOut == 1 ) { split($0,sf,"\""); printf "%10d , %16s, %16s, %16s\n", $4, HTYPE, $5, sf[2]; } } /Total heap size/ { doPrintOut=0; } ' > $EXT_TAB_DIR/heapdump.txt 

The following is an example DDL for such an external table:
 create table heapdump ( chunk_size integer , heap_type varchar2(16) , chunk_type varchar2(16) , alloc_reason varchar2(16) ) organization external ( type oracle_loader default directory ext_tab_dir access parameters ( records delimited by newline fields terminated by ',' lrtrim ) location ('heapdump.txt') ); 


This process causes a “leakage” of temporary LOBs - in V $ TEMPORARY_LOBS, there is a constant increase in the number of LOBs, however, it goes slowly (one object each time the stored procedure is started), while memory consumption in the temporary tablespace remains low.

When calculating the average execution time of two SQL queries that were repeated many times, it became clear that they were actually executed at first as quickly as in separate execution, but their execution slowed down gradually.
After a few minutes, it took several milliseconds (!) To fulfill these requests, which initially took several microseconds. Not surprisingly, the batch job took hours, considering that these requests were required many times. But now each execution took a thousand times longer than at the very beginning of the process.

Upon closer inspection, it became clear that this slowdown affected every SQL query that was executed. But since the execution of many of these requests took a few milliseconds anyway and they were executed a small number of times, the increase in the execution time by a few milliseconds was not really so significant. These were the requests that were executed so often that they received the greatest “gain” in the execution time.

As a result, after the completion of the cycle that started the stored procedure PL / SQL, it became clear that all processes were affected during this session. For example, it took about 30 hundredths of a second (0.3 second!) To perform a simple SELECT * FROMDUAL operation or a zero PL / SQL block (BEGINNULL; END). The same operation in the new session (while the other session was still open) was performed instantly (in less than 0.01 seconds, data was obtained using the SQL * Plus time calculator) - this was typical only for this session , and not for the case as a whole.

Thus, something appeared that obviously took more and more CPU time. This led to an increase in the execution time of "each" operation, regardless of what it was.

The time has come to turn to a deeper level and examine the trace at the operating system level in order to understand what takes up most of the time during the process. Unfortunately, we dealt with AIX 5.3, so the set of tools that could be used to make the necessary measurements was very limited. AIX does not support DTrace, and the new ProbeVue ​​tool is only available for versions of AIX 6 and above. Therefore, we only had Oracle’s built-in oradebugshort_stack and procstack for AIX, which we used to get samples from the foreground process call stack.

But again, Tanel comes to the rescue. Using his OStackProf toolkit, we were able to retrieve items from the call stack, but only to find out that the process spent most of the time on memory management. This was only an assumption, since no one could say what the names of the functions in the call stack actually meant. Based on the MOS 175982.1 documentation, this all had to do with ADT memory management (Oracle objects).

This is what the stack trace looked like:
 Belowisthestackprefixcommontoallsamples: ------------------------------------------------------------------------ Frame->function() ------------------------------------------------------------------------ # 36 ->main() # 35 ->opimai_real()b0 # 34 ->sou2o() # 33 ->opidrv() # 32 ->opiodr()b98 # 31 ->opiino()f0 # 30 ->opitsk() # 29 ->ttcpip()c # 28 ->opiodr()b98 # 27 ->kpoal8()c # 26 ->opiexe() # 25 ->kkxexe()c # 24 ->peicnt() # 23 ->plsql_run() # 22 ->pfrrun() # 21 ->pfrrun_no_tool()c # 20 ->pfrinstr_EXECC()c # 19 ->pevm_EXECC()e4 # 18 ->psdnal()c # 17 ->psddr0() # 16 ->rpidrv() # ...(seecallprofilebelow) # # -#-------------------------------------------------------------------- # - Num.Samples ->incallstack() # ---------------------------------------------------------------------- 76 ->rpiswu2()c0->rpidru()->skgmstack()c4->rpidrus()c8->opiodr()b98->opipls()->opiexe()e4->auddft()b8->audbeg()->ktcxbr0()a0->kocbeg()->koctxbg()->kohalc()->kohalmc()->kghfru()c->44c0->-> 20 ->rpiswu2()c0->rpidru()->skgmstack()c4->rpidrus()c8->opiodr()b98->opipls()->opiexe()->44c0->-> 1 ->rpiswu2()c0->rpidru()->skgmstack()c4->rpidrus()c8->opiodr()b98->opipls()->opiexe()b8c->kksfbc()ec->ktcsna()c->ktucloGetGlobalMinScn()->44c0->-> 1 ->rpiswu2()c0->rpidru()->skgmstack()c4->rpidrus()c8->opiodr()b98->opipls()->opiexe()->ksuvrl()c->44c0->-> 1 ->rpiswu2()c0->rpidru()->skgmstack()c4->rpidrus()c8->opiodr()b98->opipls()->opiexe()->44c0->-> 1 ->rpiswu2()c0->rpidru()->skgmstack()c4->rpidrus()c8->opiodr()->ksuprc()c->44c0->-> 


The names of the “auddft” / “audbeg” functions may indicate something relevant to the audit, but the audit option in the database is not involved in any way, except for the “audit_trail = DB” parameter. Alex Fatkulin recently published a description of the problem in which the included audit option led to a slowdown of 11.2.0.1, and, of course, when setting the parameter “audit_trail = NONE” to 11.1.07 or 11.2.0.1, the slowdown was not so clearly observed. This means that there are actually parts of the code that are relevant to the audit. However, this did not solve the problem - a significant slowdown was still observed.

An important observation was obtained when analyzing the memory dump of the PGA / UGA:

The PGA / UGA dump included hundreds of thousands of very small pieces of data that were related to working with XML ("qmxdpls_subhea"). Thus, it became obvious that the error was related to something related to memory management for XML.

Therefore, we focused on those parts of the code that were responsible for processing XML. Very quickly, a bug was found: the code did not refer to DBMS_XMLDOM.FREEDOCUMENT.

Apparently, this bug was present in the code from the very beginning, but the application was designed for 10.2, where there was no such reduction in speed. However, there were similar signs of a problem, for example, increased PGA / UGA memory consumption and leakage for temporary LOBs, but identical slowdowns were not It was. After some time ago the database was transferred to version 11.1.0.7, the slowdown problem manifested itself.

So, the problem can be solved or found after a simple addition or removal of the freedocument from the code.

This led us to the development of a sample test script that allows the problem to be reproduced on demand in most cases using databases 11.1.0.7 or higher.

Note: the problem seems to be related to the Oracle version for a specific operating system. Oracle 11.1.0.7 for Linux x64 behaved a little differently - only some SQL queries were slowed down, which happened when using the AIX version.
 drop table t_testloop; purge table t_testloop; create table t_testloop ( id integer not null , vc varchar2(255) , constraint t_testloop_pk primary key (id) ) ; insert into t_testloop ( id , vc ) select level as id , rpad('x', 100, 'x') as vc from dual connect by level <= 100; commit; exec dbms_stats.gather_table_stats(null, 'T_TESTLOOP') -- This is supposed to be a INDEX UNIQUE SCAN + TABLE ACCESS BY ROWID explain plan for select id , vc from t_testloop where id = to_number(:x); set linesize 160 set pagesize 999 select * from table(dbms_xplan.display); set timing on echo on serveroutput on -- This is the normal (reference) execution time for running -- the simple statement a thousand times declare procedure check_key as x integer; n_id integer; s_vc varchar2(255); begin x := 42 * 3; select id , vc into n_id , s_vc from t_testloop where id = x; exception when NO_DATA_FOUND then null; end; begin for i in 1..1000 loop check_key; end loop; end; / -- "Deterministic" randomness :-)) exec dbms_random.seed(0) declare start_time number; end_time number; -- Generate some CLOB containing XML -- Note that it looks like the CLOB needs -- to be different for every iteration -- otherwise the issue couldn't be reproduced function return_clob return clob as the_lobclob; s_name varchar2(20); n_sal integer; s_job varchar2(20); begin the_lob := '<root> '; for i in 1..20 loop s_name := dbms_random.string('U', trunc(dbms_random.value(1, 21))); n_sal := trunc(dbms_random.value(1, 1001)); s_job := dbms_random.string('U', trunc(dbms_random.value(1, 21))); the_lob := the_lob || '<emp attr1="val1" attr2="val2" attr3="val3"> <empno attr1="val1" attr2="val2" attr3="val3">' || to_char(i, 'TM') || '</empno> <name attr1="val1" attr2="val2" attr3="val3">' || s_name || '</name> <sal attr1="val1" attr2="val2" attr3="val3">' || to_char(n_sal, 'TM') || '</sal> <job attr1="val1" attr2="val2" attr3="val3">' || s_job || '</job> </emp> '; end loop; the_lob := the_lob || '</root>'; return the_lob; end return_clob; -- Some usage of the PL/SQL XML DOM API -- Some dummy processing of the attributes of the given node procedure process_attributes ( in_nodedbms_xmldom.DOMNode ) is len number; n dbms_xmldom.DOMNode; nnmdbms_xmldom.DOMNamedNodeMap; key varchar2(1000); val varchar2(32767); BEGIN nnm := dbms_xmldom.getAttributes(in_node); if (dbms_xmldom.isNull(nnm) = FALSE) then len := dbms_xmldom.getLength(nnm); -- loop through attributes for i in 0..len-1 loop n := dbms_xmldom.item(nnm, i); key := dbms_xmldom.getNodeName(n); val := dbms_xmldom.getNodeValue(n); end loop; end if; end process_attributes; -- Some usage of the PL/SQL XML DOM API -- Recursively walk the nodes of the DOM -- and call the attribute processing per node procedure walk_node ( in_nodedbms_xmldom.DOMNode ) is nldbms_xmldom.DOMNodeList; len number; n dbms_xmldom.DOMNode; node_name varchar2(100); begin -- loop through elements node_name:=dbms_xmldom.getNodeName(in_node); process_attributes(in_node); nl := dbms_xmldom.getChildNodes(in_node); len := dbms_xmldom.getLength(nl); for i in 0..len-1 loop n := dbms_xmldom.item(nl, i); node_name := dbms_xmldom.getNodeName(n); walk_node(n); end loop; end walk_node; -- The main procedure procedureprocess_xml_clob as the_lobclob; varXMLType; doc dbms_xmldom.DOMDocument; root dbms_xmldom.DOMNode; root_tag varchar2(100); begin -- Get the CLOB with the XML the_lob := return_clob; -- Instantiate an XMLTYPE var := xmltype(the_lob); -- Generate a new DOM document from the XMLType -- This seems to allocate a temporary LOB under the covers doc := dbms_xmldom.newDOMDocument(var); -- Some rudimentary XML DOM processing root := dbms_xmldom.makeNode(dbms_xmldom.getDocumentElement(doc)); root_tag := dbms_xmldom.getNodeName(root); -- If you want to burn more CPU to exaggerate the effect -- uncomment this -- walk_node(root); -- This omission here causes a significant PGA/UGA memory leak -- and causes version 11.1 and 11.2 to slow down everything -- in this session -- Version 10.2 suffers from the same symptoms but doesn't slow down --DBMS_XMLDOM.freeDocument(doc); end; begin -- Run this a thousand times and measure / output the runtime per execution for i in 1..1000 loop start_time := dbms_utility.get_time; process_xml_clob; end_time := dbms_utility.get_time; dbms_output.put_line('Run ' || to_char(i, 'TM') || ': Time (in seconds)= ' || ((end_time - start_time)/100)); end loop; end; / -- Do the simple statement again a thousand times -- Notice the difference in runtime when using 11.1.0.7 or 11.2.0.1 declare procedure check_key as x integer; n_id integer; s_vc varchar2(255); begin x := 42 * 3; select id , vc into n_id , s_vc from t_testloop where id = x; exception when NO_DATA_FOUND then null; end; begin for i in 1..1000 loop check_key; end loop; end; / 


The important line above is "DBMS_XMLDOM.FREEDOCUMENT." If you run the script without calling FREEDOCUMENT under the version of the database that is subject to slowdown (11.1.0.7 or 11.2.0.1), then the result usually looks like this:

 SQL> declare 2 procedure check_key 3 as 4 x integer; 5 n_id integer; 6 s_vc varchar2(255); 7 begin 8 x := 42 * 3; 9 select 10 id 11 , vc 12 into 13 n_id 14 , s_vc 15 from 16 t_testloop 17 where 18 id = x; 19 exception 20 when NO_DATA_FOUND then 21 null; 22 end; 23 begin 24 for i in 1..1000 loop 25 check_key; 26 end loop; 27 end; 28 / PL/SQL procedure successfully completed. Elapsed: 00:00:00.94 . . . Run 1: Time (in seconds)= .49 Run 2: Time (in seconds)= .08 Run 3: Time (in seconds)= .08 Run 4: Time (in seconds)= .08 Run 5: Time (in seconds)= .05 Run 6: Time (in seconds)= .03 Run 7: Time (in seconds)= .03 Run 8: Time (in seconds)= .03 Run 9: Time (in seconds)= .03 Run 10: Time (in seconds)= .02 Run 11: Time (in seconds)= .03 Run 12: Time (in seconds)= .03 Run 13: Time (in seconds)= .03 Run 14: Time (in seconds)= .03 Run 15: Time (in seconds)= .03 Run 16: Time (in seconds)= .03 Run 17: Time (in seconds)= .02 Run 18: Time (in seconds)= .03 Run 19: Time (in seconds)= .03 Run 20: Time (in seconds)= .03 Run 21: Time (in seconds)= .03 Run 22: Time (in seconds)= .03 Run 23: Time (in seconds)= .03 Run 24: Time (in seconds)= .03 Run 25: Time (in seconds)= .03 . . . Run 287: Time (in seconds)= .03 Run 288: Time (in seconds)= .03 Run 289: Time (in seconds)= .03 Run 290: Time (in seconds)= .03 Run 291: Time (in seconds)= .02 Run 292: Time (in seconds)= .03 Run 293: Time (in seconds)= .03 Run 294: Time (in seconds)= .03 Run 295: Time (in seconds)= .03 Run 296: Time (in seconds)= .03 Run 297: Time (in seconds)= .03 Run 298: Time (in seconds)= .02 Run 299: Time (in seconds)= .03 Run 300: Time (in seconds)= .03 Run 301: Time (in seconds)= .03 Run 302: Time (in seconds)= .03 Run 303: Time (in seconds)= .17 Run 304: Time (in seconds)= .17 Run 305: Time (in seconds)= .17 Run 306: Time (in seconds)= .17 Run 307: Time (in seconds)= .17 Run 308: Time (in seconds)= .17 Run 309: Time (in seconds)= .17 Run 310: Time (in seconds)= .17 Run 311: Time (in seconds)= .18 Run 312: Time (in seconds)= .17 Run 313: Time (in seconds)= .18 Run 314: Time (in seconds)= .18 Run 315: Time (in seconds)= .18 Run 316: Time (in seconds)= .17 Run 317: Time (in seconds)= .19 Run 318: Time (in seconds)= .18 Run 319: Time (in seconds)= .18 Run 320: Time (in seconds)= .19 Run 321: Time (in seconds)= .18 Run 322: Time (in seconds)= .19 . . . Run 973: Time (in seconds)= .82 Run 974: Time (in seconds)= .83 Run 975: Time (in seconds)= .83 Run 976: Time (in seconds)= .82 Run 977: Time (in seconds)= .83 Run 978: Time (in seconds)= .83 Run 979: Time (in seconds)= .82 Run 980: Time (in seconds)= .82 Run 981: Time (in seconds)= .83 Run 982: Time (in seconds)= .82 Run 983: Time (in seconds)= .83 Run 984: Time (in seconds)= .83 Run 985: Time (in seconds)= .82 Run 986: Time (in seconds)= .84 Run 987: Time (in seconds)= .83 Run 988: Time (in seconds)= .86 Run 989: Time (in seconds)= .84 Run 990: Time (in seconds)= .83 Run 991: Time (in seconds)= .85 Run 992: Time (in seconds)= .84 Run 993: Time (in seconds)= .84 Run 994: Time (in seconds)= .85 Run 995: Time (in seconds)= .84 Run 996: Time (in seconds)= .85 Run 997: Time (in seconds)= .84 Run 998: Time (in seconds)= .87 Run 999: Time (in seconds)= .84 Run 1000: Time (in seconds)= .85 PL/SQL procedure successfully completed. Elapsed: 00:06:00.49 SQL> SQL> declare 2 procedure check_key 3 as 4 x integer; 5 n_id integer; 6 s_vc varchar2(255); 7 begin 8 x := 42 * 3; 9 select 10 id 11 , vc 12 into 13 n_id 14 , s_vc 15 from 16 t_testloop 17 where 18 id = x; 19 exception 20 when NO_DATA_FOUND then 21 null; 22 end; 23 begin 24 for i in 1..1000 loop 25 check_key; 26 end loop; 27 end; 28 / PL/SQL procedure successfully completed. Elapsed: 00:00:03.02 SQL> 


Note; after a certain number of iterations, the execution of commands slows down more and more. What is even more interesting is that a simple loop of 1,000 SQL queries then takes significantly more time to execute than before the execution of this block of code began.

You may want to watch the PGA / UGA consumption while collecting statistics or when running V $ PROCESS and V $ TEMPORARY_LOBS while executing this block of code.

If you run the same operations in version 10.2.0.4, you will see the same signs of a problem, but the execution time will remain stable for at least 1,000 iterations. It is clear that the moment will come when the system will experience difficulties with an ever-increasing amount of memory consumed, but this has nothing to do with our problem.

If the call to DBMS_XMLDOM.FREEDOCUMENT is performed correctly, the same part of the code in 11.1.0.7 and 11.2.0.1 will ensure a stable process execution time (there will not be an increased consumption of PGA / UGA memory or temporary LOB).

Nigel Noble also recently posted a blog post about PL / SQL memory problems that only appear in version 11.1.0.7, but not found in version 10.2.0.4.

Although the problems pointed to by Alex Fatkulin and Nigel Noble do not necessarily relate to the problem described here, this still indicates that the release of Oracle version 11 introduces some changes in memory management (perhaps because of implementations of the AMM automatic memory management function or the memory_target parameter), which behave differently than in previous versions under certain conditions.

On March 24, a master class of Randolph Gaist will be held in Moscow, dedicated to solving the performance problems of the Oracle database.

Source: https://habr.com/ru/post/275901/


All Articles