📜 ⬆️ ⬇️

Counting the number of records found in PostgreSQL

At work in the new project is used PostgreSQL DBMS. Since I have worked with MySQL so far, now Postgre has to be studied and discovered. The first problem that interested me is the replacement of muscular SQL_CALC_FOUND_ROWS . When using this constant in MySQL, you can get the number of all the records found on the query, even if a query with a limit is indispensable for paging search results when using "heavy" queries.
Immediately ready solution was not found. The forums simply stated that there was no SQL_CALC_FOUND_ROWS in Postgry. Some wrote that it was necessary to count (*). And no more information. But I also knew from MySQL that searching with count () - the query works almost 2 times slower than with SQL_CALC_FOUND_ROWS. I consulted those who use PostgreSQL, tormented google for a day, and as a result received 4 options for replacing SQL_CALC_FOUND_ROWS in PostgreSQL, one of which is quite acceptable in speed.

So, immediately present those four options, which will be discussed. Our target query looks for entries in the table that contain the adf text in the `text` field. We select id 20 records starting from 180.000 in order and the number of total found.
Option 1 . Taken from phpPgAdmin. I just looked at the code for this client for PostgreSQL and looked at how the count was made when they looked at the table data. 2 queries with subqueries are used. Convenience is that you do not need to parse and change the original query to count the number of records found by it.
  select count (id) from (select id from testing where text like '% adf%') as sub;
 select * from (select id from testing where text like '% adf%') as sub limit 20 offset 180000 

Option 2 . The easiest option, which is usually used by beginners in MySQL, as well as in Postgres and other DBMS. 2 requests.
  select count (id) from testing where text like '% adf%';
 select id from testing where text like '% adf%' limit 20 offset 180000 

Option 3 . max_posedon . This is an attempt to emulate muscular SQL_CALC_FOUND_ROWS in Postgres by logic. The truth only works when sorting by id (in this case). This is the id of the last entry in the selection, i.e. records at number 180.000 + 20.
  select id from testing where text like '% adf%' limit 20 offset 180000;
 select count (id) from testing where text like '% adf%' and id> 132629; 

Option 4 . Following the advice of the users of irc.freenode.org, again max_posedon 'a, and this response on the PostgreSQL forum, which was hidden deep in Google. The cursor is used.
  DECLARE curs CURSOR FOR select id from testing where text like '% adf%';
 MOVE FORWARD 180000 IN curs;
 FETCH 20 FROM curs;
 MOVE FORWARD ALL IN curs; 

+ PQcmdTuples () API function Postgres (or $ count = pg_cmdtuples ($ result); in PHP).
Please note that all 4 query options should be executed in one transaction, then they are faster. The fourth option will not work at all if you do not use a single transaction: the cursor is lost.
Now about the speeds . I tested the speed of these four options. Vobschem tests confirmed expectations. But I will note an important fact. All queries were run on a PostgreSQL default configuration that is not optimized for performance. I didn’t have an optimized server at hand. So the numbers can be slightly adjusted when launched with a “good” config. However, the essence will not change.
Test runs were carried out in PHP, 20 repetitions, 2 times for each option. Available php-script that runs the tests. Who cares, there is a complete statistics of samples in Excel. Here I will publish only the pivot table:
Var 1Var 2Var 3Var 4
Wed time (ms)647.41648.25450.64370.67
Attitude to Var 41.751.751.22-

For comparison, the query time without using a transaction:

Results The fastest option 4 using the cursor. Its speed is due to the fact that the “heavy” search query is executed only once. Further operations are performed with the cursor. SQL_CALC_FOUND_ROWS in MySQL works in the same way. Option 3 is lagging 20% ​​of it - an attempt to emulate SQL_CALC_FOUND_ROWS operation in PostgreSQL. Options 1 and 2 operate at approximately the same speed and by 75% (more than 2/3!) Are inferior in speed to the query with the cursor.
PS for pg-guru. First, if this information seems obvious to you, then trust me - for novice PostgreSQL users, this is not at all obvious, and finding this information is not so easy. Secondly, I am waiting for comments from you about tests on a postgre tuning configuration or about your experience, and about other options for counting.

')

Source: https://habr.com/ru/post/30046/


All Articles