📜 ⬆️ ⬇️

Speed ​​up a SELECT COUNT (*) query for large tables in PostgreSQL

As we all know, SELECT COUNT (*) queries from large tables in PostgreSQL are very slow. I propose a complete solution to speed up this query using functions and triggers.

Consider the example of a table with ~ 200 000 entries:
SELECT COUNT (*) FROM users;
count
- 205043

Now we do the query analysis with enable_seqscan enabled:

SET enable_seqscan TO on; EXPLAIN ANALYZE SELECT COUNT(*) FROM users; 

QUERY PLAN
- Aggregate (cost = 15813.70..15813.71 rows = 1 width = 0) (actual time = 82.907..82.908 rows = 1 loops = 1)
-> Seq Scan on users (cost = 0.00..15296.16 rows = 207016 width = 0) (actual time = 0.014..55.505 rows = 205043 loops = 1)
Planning time: 0.211 ms
Execution time: 82.967 ms

Runtime was: 82.967 ms .

Now with enable_seqscan turned off:
')
 SET enable_seqscan TO off; EXPLAIN ANALYZE SELECT COUNT(*) FROM users; 

QUERY PLAN
- Aggregate (cost = 20156.95..20156.96 rows = 1 width = 0) (actual time = 117.553..117.554 rows = 1 loops = 1)
-> Bitmap Heap Scan on users (cost = 4343.25..19639.41 rows = 207016 width = 0) (actual time = 28.354..92.228 rows = 205043 loops = 1)
Heap Blocks: exact = 13226
-> Bitmap Index Scan on users_pkey (cost = 0.00 ..4291.50 rows = 207016 width = 0) (actual time = 25.247..25.247 rows = 229621 loops = 1)
Planning time: 0.141 ms
Execution time: 117.724 ms

The runtime was 117.724 ms , although in this case postgresql used the users_pkey index, it only got worse.

As you can see, the execution time of these queries is too long.

The solution I developed will reduce the request execution time to less than 1ms . The solution is as follows:

1. for each table, get the count of the number of records with the name TABLE_count_seq .
2. write the rows_count () function that will display the value of the counter or reset the counter.
3. write the rows_count_update_trigger () function that will be launched by table triggers to automatically change the counter for INSERT, DELETE, TRUNCATE requests.
4. we connect triggers which will be:
- increase the counter with INSERT
- decrease with DELETE
- reset when TRUNCATE
5. instead of SELECT COUNT (*), we will use SELECT rows_count ('TABLE')

So, let's begin.

1. Create a counter that will store the current number of records in the table.

 CREATE SEQUENCE users_count_seq MINVALUE 0 START 0; 

2. The rows_count () function - output the counter value or reset.

 CREATE OR REPLACE FUNCTION rows_count( tablename text, reset bool default false ) RETURNS bigint LANGUAGE plpgsql AS $$ DECLARE rows_count bigint; tablename_seq text; BEGIN tablename_seq := tablename || '_count_seq'; --   IF reset IS TRUE THEN EXECUTE 'SELECT setval($1,count(*)) FROM '||tablename USING tablename_seq INTO rows_count; --     ELSE EXECUTE 'SELECT last_value FROM '||tablename_seq INTO rows_count; END IF; RETURN rows_count; END; $$; 

3. The rows_count_update_trigger () function is a trigger function for automatically changing the counter.

 CREATE OR REPLACE FUNCTION rows_count_update_trigger() RETURNS TRIGGER LANGUAGE plpgsql AS $$ DECLARE tablename_seq text; BEGIN tablename_seq := TG_TABLE_NAME || '_count_seq'; --    INSERT IF TG_OP = 'INSERT' THEN EXECUTE 'SELECT nextval($1) FOR UPDATE' USING tablename_seq; RETURN NEW; --    DELETE ELSEIF TG_OP = 'DELETE' THEN EXECUTE 'SELECT setval($1,nextval($1)-2) FOR UPDATE' USING tablename_seq; RETURN OLD; --    TRUNCATE ELSEIF TG_OP = 'TRUNCATE' THEN EXECUTE 'SELECT setval($1,0) FOR UPDATE' USING tablename_seq; RETURN OLD; END IF; END; $$; 

4. Connecting the trigger function to the table.

CREATE TRIGGER rows_count_update_trigger
AFTER INSERT OR DELETE ON users
FOR EACH ROW EXECUTE PROCEDURE rows_count_update_trigger ();
 CREATE CONSTRAINT TRIGGER rows_count_update_trigger AFTER INSERT OR DELETE ON users INITIALLY DEFERRED FOR EACH ROW EXECUTE PROCEDURE rows_count_update_trigger(); CREATE TRIGGER rows_count_reset_trigger AFTER TRUNCATE ON users FOR EACH STATEMENT EXECUTE PROCEDURE rows_count_update_trigger(); 

5. We look at the results using SELECT rows_count ('TABLE')

First you need to reset the counter to keep the current number of records in the table.

We reset the counter:

SELECT rows_count ('users', true);
rows_count
- 205043

We see that the counter is reset and shows the current number of records 205043. A SELECT query rows_count ('users') will return the same result 205043.

Analyzing the query SELECT rows_count ('users'):

 EXPLAIN ANALYZE SELECT rows_count('users'); 

Conclusion:
QUERY PLAN
- Result (cost = 0.00..5.25 rows = 1000 width = 0) (actual time = 0.242..0.244 rows = 1 loops = 1)
Planning time: 0.033 ms
Execution time: 0.260 ms

Runtime was: 0.260 ms.

Another advantage is that the execution time SELECT rows_count ('TABLE') will always be the same for any number of records in the table.

Thanks for attention.

Source: https://habr.com/ru/post/276055/


All Articles