Introduction to PERFORMANCE_SCHEMA

A lot of stones were thrown at MySQL, due to the lack of traceability of the sessions and the removal of stats pack reports showing exactly which events load the database. Starting with version 5.5, MySQL was finally puzzled by the need to solve this problem and set up a prototype, which in the future may lead to the creation of similar tools in MySQL. My today's story will be about such a powerful (unfortunately so far only for MySQL developers) tool like PERFORMANCE_SCHEMA . So, set the performance_schema = ON in the my.cnf configuration file, and proceed to study its limited, but already extremely interesting features.

First, a little theory

In order to avoid an attack on the rake that awaits you on the way of working with this scheme, I will do a little digression into the theory. If you already know how this scheme is arranged, but do not yet understand how this can be applied, you can go straight to the practical part.
First of all. PERFORMANCE_SCHEMA is static. What does this mean for us? and why is it so made? MySQL gradually learns from its mistakes and after working on INFORMATION_SCHEMA , and in particular on the presentation INFORMATION_SCHEMA.PROCESSLIST , they changed the pattern of working with system views. Using the SHOW PROCESSLIST command (and others like it) can cause server # 56299 , # 61186 to hang. The reason for this misunderstanding lies in bug # 42930 , which clearly states that the launch of this command causes the lock to be set LOCK_thread_count . This leads to the fact that MySQL cannot: connect a new session, disconnect a completed session, create a new binary log, etc. since locking a given mutex is critical for the entire engine. This was done due to the possibility of dynamic memory allocation by threads. So that at one point, when requesting the parameters of the sessions, the view does not apply to the freed memory, having received an error and having failed the entire server. The source code PERFORMANCE_SCHEMA does not use commands for dynamic allocation of memory type malloc. Memory allocation is performed only once, during the server start. Therefore, all parameters of this scheme cannot be changed at runtime. On the one hand, this allows the operation of this scheme without locks, on the other hand, some problems with its configuration, some of which are bypassed with tools included in PERFORMANCE_SCHEMA ENGINE itself . After starting, you can not completely disable this engine, without restarting the server.
Secondly. Since this functionality monitors the entire database, it naturally consumes some resources. The amount of resources consumed depends on the configuration of the scheme, which can be viewed by running a request.

  show variables like 'performance%';
 + ------------------------------------------------- - + --------- +
 |  Variable_name |  Value |
 + ------------------------------------------------- - + --------- +
 |  performance_schema |  ON | 
 |  performance_schema_max_cond_instances |  1000 | 
 |  performance_schema_max_file_instances |  10,000 | 
 |  performance_schema_max_mutex_instances |  1,000,000 | 
 |  performance_schema_max_rwlock_instances |  1,000,000 | 
 |  performance_schema_max_table_instances |  50,000 | 
 |  performance_schema_max_thread_instances |  1000 | 
 |  performance_schema_max_file_handles |  32768 | 
 |  performance_schema_max_table_handles |  100,000 | 
 |  performance_schema_events_waits_history_long_size |  10,000 | 
 |  performance_schema_events_waits_history_size |  10 | 
 |  performance_schema_max_cond_classes |  80 | 
 |  performance_schema_max_file_classes |  50 | 
 |  performance_schema_max_mutex_classes |  200 | 
 |  performance_schema_max_rwlock_classes |  30 | 
 |  performance_schema_max_thread_classes |  50 | 
 + ------------------------------------------------- - + --------- +

The first parameters - instances are the most important, just enough of them will allow you to fully monitor everything that happens with the database. However, the main thing here is not to overdo it, because a pair of mutex + rwlock is created for every 16 kilobyte block in buffer_pool . That is, if a sufficiently large amount of RAM is allocated on the database, then you will have to save the resources, and if you set these parameters too small, then you will not be able to monitor all system processes. How to understand that you put them correctly say a little lower.
According to the parameters of the handle, I think everything is clear from the title. This is the maximum number of open tables and files that you can monitor.
The size parameter depends on the load on your database. You will have to pick it up yourself. For example, if you launch a query and do not have time to see its statistics on events_waits_history_long , then you should think about increasing this parameter.
Parameters classes in theory are not very important, as they show the maximum number of types of monitored objects. Here it is necessary to be written on developers of plug-ins. If they find it necessary to include some mutexes in the monitoring, they usually indicate their number. For example, for the InnoBD engine there are only 35 such classes.
It is important to note that you don’t have to be greedy when setting these parameters, since if you set them up right when searching for empty elements of the array, PERFORMANCE_SCHEMA will be terribly slow. Which is logical in principle, because finding an empty element in an array that is 90% empty is easier than finding it in an array that is 90% full.
So after turning on PERFORMANCE_SCHEMA, it starts to eat resources ... sometimes very much. For example, here is the result from one of the industrial databases.

  show engine performance_schema status;
 + -------------------- + ---------------------------- ---------------------------- + ----------- +
 |  Type |  Name |  Status |
 + -------------------- + ---------------------------- ---------------------------- + ----------- +
 ...
 |  performance_schema |  events_waits_history.memory |  1200000 | 
 ...
 |  performance_schema |  events_waits_history_long.memory |  1200000 | 
 ...
 |  performance_schema |  mutex_instances.memory |  136000000 | 
 ...
 |  performance_schema |  rwlock_instances.memory |  200000000 | 
 ...
 |  performance_schema |  file_instances.memory |  6240000 | 
 ...
 |  performance_schema |  events_waits_summary_by_thread_by_event_name.memory |  17280000 | 
 ...
 |  performance_schema |  (pfs_table_share) .memory |  24400000 | 
 ...
 |  performance_schema |  performance_schema.memory |  394468704 | 
 + -------------------- + ---------------------------- ---------------------------- + ----------- +

If you have free 400Mb RAM, turn it on!
In addition to the consumption of RAM, this scheme also lowers the speed of query execution. According to some of the estimates, with all subscribers (which I will tell below), the performance degradation reaches about 25% for reading and 19% for writing, but simply setting the parameter performance_schema = ON without monitoring any activity for 8% for reading. And in view of the fact that you can’t just turn it off like that - be careful.
The third thing to pay attention to is the quality of monitoring. As I wrote above, if you have a large database, then a number of parameters set by default may not work for you. You can learn about this annoying misunderstanding by executing the command

  show status like "performance%";
 + ------------------------------------------ + ------ - +
 |  Variable_name |  Value |
 + ------------------------------------------ + ------ - +
 |  Performance_schema_cond_classes_lost |  0 | 
 ...
 |  Performance_schema_thread_instances_lost |  0 | 
 + ------------------------------------------ + ------ - +

If one of the counters takes a value above zero, then the corresponding parameter must be increased.
Fourth. If you have already understood what exactly you should monitor and which parameters are responsible for this, then you can save some of the system resources. Feel free to run update this table.

  select * from setup_consumers;
 + ---------------------------------------------- + - ------- +
 |  NAME |  ENABLED |
 + ---------------------------------------------- + - ------- +
 |  events_waits_current |  YES | 
 |  events_waits_history |  YES | 
 |  events_waits_history_long |  YES | 
 ...
 |  file_summary_by_event_name |  YES | 
 |  file_summary_by_instance |  YES | 
 + ---------------------------------------------- + - ------- +

Or this

  select * from setup_instruments;
 + ------------------------------------------------- ----------- + --------- + ------- +
 |  NAME |  ENABLED |  TIMED |
 + ------------------------------------------------- ----------- + --------- + ------- +
 |  wait / synch / mutex / sql / PAGE :: lock |  YES |  YES | 
 |  wait / synch / mutex / sql / TC_LOG_MMAP :: LOCK_sync |  YES |  YES | 
 ...
 |  wait / io / file / innodb / innodb_temp_file |  YES |  YES | 
 + ------------------------------------------------- ----------- + --------- + ------- +

The fifth. If you have changed configuration parameters, queries or other optimizations, and you want to evaluate the result, it is not necessary to restart the server. You can just countrycount all summary tables in PERFORMANCE_SCHEMA , to get fresh results and wait an hour to accumulate statistics and clear all current and historical tables.
The last thing I would like to draw attention to. When analyzing performance, there is often a desire to see the expectation for all events not just in some abstract units, but in live and tangible seconds. It is easy to do.

  select * from setup_timers;
 + ------ + ------------ +
 |  NAME |  TIMER_NAME |
 + ------ + ------------ +
 |  wait |  CYCLE | 
 + ------ + ------------ +

With only one reservation. If you work in processor cycles, then you have the maximum possible exact values; if you switch to seconds, you will get a bad error, the value of which is unique for each machine and can be detected from the table.

 select * from performance_timers;  + ------------- + ----------------- + ----------------- - + ---------------- + |  TIMER_NAME |  TIMER_FREQUENCY |  TIMER_RESOLUTION |  TIMER_OVERHEAD |  + ------------- + ----------------- + ----------------- - + ---------------- + |  CYCLE |  2395388059 |  1 |  72 |  |  NANOSECOND |  1 billion |  1000 |  513 |  |  MICROSECOND |  1,000,000 |  1 |  207 |  |  MILLISECOND |  1038 |  1 |  225 |  |  TICK |  101 |  1 |  531 |  + ------------- + ----------------- + ----------------- - + ---------------- +

From practice, I can say that such (~ 500) overheads for converting cycles into nanoseconds, will consume about 5% of the total test run time. However, I saw cars on which these overheads make up more than 12%. So when interpreting results in real time, be careful.

And so with the setup we sort of figured out. Now let's get to where we can see the data on the load and most importantly how?

  show tables;
 + ---------------------------------------------- +
 |  Tables_in_performance_schema |
 + ---------------------------------------------- +
 ...
 |  events_waits_current | 
 |  events_waits_history | 
 |  events_waits_history_long | 
 |  events_waits_summary_by_instance | 
 |  events_waits_summary_by_thread_by_event_name | 
 |  events_waits_summary_global_by_event_name | 
 ...
 |  file_summary_by_event_name | 
 |  file_summary_by_instance | 
 ...
 |  threads | 
 + ---------------------------------------------- +

threads - contains references to processes that still live in INFORMATION_SCHEMA.PROCESLIST , if the process is dead, then the thread will still live for some time, so if you join, use an external connection.
events_waits_current ( history , hisotry_long ) - contains the most recent wait events, the size of the tables is set by parameters. This is the most needed table, for those who are faced with obscure locks or trying to understand which part of the code is slowing down, because it contains a link to the source line.
events_waits_summary - contain information that allows you to get the average temperature in the hospital. The most interesting tables for the database administrator.
file_summary - input output statistics, allows to get the hottest tables. Here it is necessary to understand that if you did not set up innodb_file_per_table = 1 when creating the database, then nothing good for the InnoDB engine can be seen here.
In all event tables, there is also all the information about the input output. Since it will most likely be duplicated in file_summary in a more understandable form, then when querying from them, it is better to ignore event_name like 'wait / io / file /%' .
With theory, it seems to be finished, let's move on to practice. Why all this is necessary and what utility can be obtained from this scheme.

Practice

Unfortunately. at the moment, it is possible to extract not so much information from this scheme as we would like, I will write out the main approaches.

Hottest tables

  select substring_index (file_name, '/', -1) file_name, event_name, count_read, count_write from file_summary_by_instance where COUNT_READ + COUNT_WRITE> 0 order by COUNT_READ + COUNT_WRITE desc limit 30;
 + ------------------------------------- + ----------- --------------------------- + ------------ + --------- ---- +
 |  file_name |  event_name |  count_read |  count_write |
 + ------------------------------------- + ----------- --------------------------- + ------------ + --------- ---- +
 |  proc.MYD |  wait / io / file / myisam / dfile |  102716781 |  0 | 
 |  ib_logfile1 |  wait / io / file / innodb / innodb_log_file |  0 |  1008958 | 
 |  innodb1 |  wait / io / file / innodb / innodb_data_file |  43106 |  958070 | 
 |  event.MYD |  wait / io / file / myisam / dfile |  633053 |  126470 | 
 |  # sql149f_301db7_b.frm |  wait / io / file / sql / FRM |  180012 |  0 | 
 ...
 |  event.MYI |  wait / io / file / myisam / kfile |  0 |  126466 | 
 |  # sql149f_337a12_3.frm |  wait / io / file / sql / FRM |  71372 |  0 | 
 ...
 |  job_events # P # job_events_201108.ibd |  wait / io / file / innodb / innodb_data_file |  8 |  34662 | 
 ...
 |  olap_transactions_hourly_amount.frm |  wait / io / file / sql / FRM |  3637 |  8264 | 
 + ------------------------------------- + ----------- --------------------------- + ------------ + --------- ---- +

What you should pay attention to:
proc.MYD is a directory that contains procedure codes, do not write anything, read a lot. It is necessary to configure the driver so that the procedures are cached and enable the connection pool.
ib_logfile1 - well here we are powerless, it is all the same InnoDB .
innodb1 - either we created InnoDB tables before setting up innodb_file_per_table = 1, or we are waiting for MySQL 5.6 for more detailed explanations of what it writes there.
event.MYD , event.MYI - well, I have already spoken about the quality of this product.
tmp / # sql149f_301db7_b.frm and others like them - well, someone actively uses temporary tables - you need to look for more detailed information in the events. For events, you can download the processlist and if you are lucky you will catch them red-handed.
job_events # P # job_events_201108.ibd is one of the hottest innodb tables. We do not read much writing, it is worth thinking about architecture, have we done everything right?
olap_transactions_hourly_amount.par - well, everything seems to be fair here, we change the partition every hour, the level of readings roughly corresponds to the level of the record.
This is just an example of analysis, you may not have a series of tables, but you and the administrator need to know what is being done with each table.

Hottest locks

  select event_name, source, sum (timer_wait) timer_wait from events_waits_history_long where event_name is not like 'wait / io / file%' group by event_name, source order by 3 desc limit 30;
 + ------------------------------------------------- ----- + -------------------- + --------------- +
 |  event_name |  source |  timer_wait |
 + ------------------------------------------------- ----- + -------------------- + --------------- +
 |  wait / synch / cond / sql / COND_queue_state |  event_queue.cc:765 |  1998358450083 | 
 |  wait / synch / rwlock / myisam / MYISAM_SHARE :: key_root_lock |  mi_rnext.c: 43 |  82397115 | 
 ...
 |  wait / synch / mutex / sql / LOCK_plugin |  sql_plugin.cc:744 |  607986 | 
 + ------------------------------------------------- ----- + -------------------- + --------------- +

(The information in this table on a live database is very quickly updated. Therefore , it is better to use events_waits_summary_by_thread_by_event_name to calculate the average temperature in the hospital, there is no specific line of code, but there the information is more objective, an example is provided just to show it - download the sources)
The gap between the first and second place scares. Go to the source.

  if (! thd-> killed)
   {
     if (! abstime)
       mysql_cond_wait (& COND_queue_state, & LOCK_event_queue);
     else
       mysql_cond_timedwait (& COND_queue_state, & LOCK_event_queue, abstime);  // here it is our string
   }

Well, there seems to be nothing terrible, it's just a slip. But agree, to dig into the source code of MySQL is not at all Camilfo. In the event of a deadlock without this in any way, I will describe the top of the most frequent problems arising on mutexes.
In the latest versions of MySQL, the promoted innobd_adaptive_hash_index feature appeared . As you know, a search in B-Tree is not an optimal operation. Therefore, if your table is small and the index can fit into the memory, MySQL creates a copy of this index as a hash map in this buffer. The search thus becomes as fast as possible. However, this solution has one pitfall. This cache has only one synchronization mutex. Entry to this critical section is performed both when reading and writing for any table or index, so if you have a large amount of competing transactions and this mutex becomes hot you will see in the top of the query.
| wait/synch/rwlock/innodb/btr_search_latch | btr0sea.c:XXX | 183897 |
If this is the case, it is worth thinking about disabling this parameter.
InnoDB uses atomic operations instead of threads to implement mutexes and read-write locks. Such an implementation is currently considered the most optimal for multiprocessor systems. Instead of locks, as far as I understand, some counters are incrementally increasing there. If you have noticed wait / synch / mutex / innodb / rw_lock_mutex or wait / synch / mutex / innodb / thr_local_mutex wait types in tops, do not be too lazy to go to the sources and check if the line of these expectations is under the #ifdef section INNODB_RW_LOCKS_USE_AATHAOOHAOOLAOOmAATOM for the expectation under the section #ifdef INNODB_RW_LOCKS_USE_ATOMAAHTOMAAATOM in the standby / thr_local / mnx / innodb / rn_lock_mutex / innodb / thr_local_mutex early compiler and you go to #else / * INNODB_RW_LOCKS_USE_ATOMICS * / and thus do not use all the charm of this technology.
In one of the latest InnoDB plugins, a special wait / synch / mutex / innodb / flush_list_mutex mutex was introduced. As you know, all dirty blocks are hung up in RAM for some time, and only after some time are discarded. Previously, a buffer pool mutex was used to synchronize the reset, however this was not a good solution and a separate critical section was created for this process. This allows for better scalability and tracking of the process of writing to disk.
If you see a large number of wait / synch / rwlock / innodb / checkpoint_lock mutexes, you either need to increase the size of the log files or you have the innodb_max_dirty_pages_pct parameter too low. Try to make the checkpoint process less aggressive.
Perhaps the most popular will be wait / synch / mutex / innodb / buf_pool_mutex . This mutex is responsible for loading data into the buffer pool. If this mutex is hanging in the tops you have two choices: changing the database architecture in such a way that as little data is loaded into the RAM, or the transition to XtraDB . I heard such a version that this plugin works much more efficiently with large databases than InnoBD , but did not check it myself.
There are locks that simply say that everything is bad for you, but no one can help you with this. For example, kernel_mutex locks are very painful for those who have a lot of transactions, because when creating each transaction, it needs to copy the entire list of current transactions into its memory. If the list is rather big, the lock (vital in other processes) will be held for a long time.
From my practice I can say that mutexes for the innodb plugin are best described.
For the rest in most cases will have to read the source.
Also, using this scheme, you can track locks when the instance is totally hung. But that's another story.
')

Conclusion

This mechanism, of course, will not replace us with the much-loved traces in Oracle and the power of stats pack, but already now there is something to see. If your database freezes and you don’t know where to start, start with PERFORMANCE_SCHEMA . There is always a lot of useful information there, especially since Oracle is already standing still and in the release of 5.6.X we were promised two amazing views that show statistics on the use of all tables and indexes (I already checked out what is being issued there and I personally the information seemed extremely useful, to look for missing or otherwise unused indices with its help is just wonderful). Well, hope and wait.

ZY if you came across solving a problem on a particular mutex, do not forget to share it in the comments, because in no place at the moment there is complete information on the mutexes, I think this will help many who switched to using PERFORMANCE_SCHEMA

Source: https://habr.com/ru/post/126358/

All Articles