📜 ⬆️ ⬇️

Postgres 9.3 highlights of material features: materialized views

PostgreSQL 9.3 will come with a pretty cool feature called materialized views . The feature was developed by Kevin Grittner and recently committed to:

commit 3bf3ab8c563699138be02f9dc305b7b77a724307
Date: Sunday, March 4, 18:23:31 2013 -0600
Posted by: Kevin Grittner

Added materialized views
')
A materialized view has a rule, just like a regular view, and a heap, as well as other physical properties, like a table. The rule is used only for filling the table, the links in the queries indicate the materialized data.

Implemented minimal functionality, but it can be useful in many cases. Currently, data is loaded only “on demand” with the CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW instructions. It is expected that in future releases incremental data updates with different update time settings will be added, and the notion of “fresh” data will be given a clearer definition. At some point, even queries will be able to use the materialized data instead of the data of the tables themselves, but this requires the implementation of the above described functionality in the first place.

Most of the documentation work was done by Robert Haas. Review: Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja. A security review, including a decision on how to implement sepgsql, is expected from KaiGai Kohei.
What is a materialized view? In short, this is a mutant of the table and the usual presentation. A view is a projection of data using a given relationship that does not have a repository. The table is ... table!

The materialized view is somewhere in the middle - it is a projection of tabular data that has its own repository. It uses the query to get its data as a view, but the data is stored as in a regular table. The materialized view can be updated with fresh data by re-executing the query used during the creation phase. In addition, it can be truncated . In the latter case, it remains in a non-scanning state. Also, since the materialized view has its own rich storage, it can use table spaces ( tablespace ) and its own indexes. Pay attention to the fact that it may be unlogged ( approx. Transl .: That is, the data is not written to write-ahead log ).

Together with this feature, 4 new SQL commands are introduced:

• CREATE MATERIALIZED VIEW
• ALTER MATERIALIZED VIEW
• DROP MATERIALIZED VIEW
• REFRESH MATERIALIZED VIEW

CREATE, ALTER and DROP — in this case, these are the usual DDL commands for manipulating the definition of a view. The most interesting team is the REFRESH (there were long disputes within the community about its name). This command can be used to update the materialized view with fresh data by restarting the scanning query. Note that REFRESH can also be used to clear data ( truncate ), although not real, by running with the WITH NO DATA option.

Materialized views have many advantages in various situations: quick access to data that must be obtained from a remote server (reading a file on a postgres server via file_fdw , etc.), using periodically updated data (a caching system), projection of data from ORDER BY from large tables, periodic execution of expensive “JOIN” -s in the background, etc.

I can already imagine some wonderful combinations of data updating procedures and background workers. Who ever said that automatically updating data in materialized views is impossible?

And now, let's see how it works:

postgres = # CREATE TABLE aa AS SELECT generate_series (1,1000000) AS a;
SELECT 1,000,000
postgres = # CREATE VIEW aav AS SELECT * FROM aa WHERE a <= 500000;
CREATE VIEW
postgres = # CREATE MATERIALIZED VIEW aam AS SELECT * FROM aa WHERE a <= 500000;
SELECT 500000

Dimensions for each relationship:

postgres = # SELECT pg_relation_size ('aa') AS tab_size, pg_relation_size ('aav') AS view_size, pg_relation_size ('aam') AS matview_size;
tab_size | view_size | matview_size
---------- + ----------- + --------------
36249600 | 0 | 18137088
( 1 row )

The materialized view uses the storage (in this case, 18 MB) to the extent necessary to store the data selected from the parent table (36 MB in size) during the execution of the request to create the view.
Updating a received view is very easy.

postgres = # DELETE FROM aa WHERE a <= 500000;
DELETE 500,000
postgres = # SELECT count (*) FROM aam;
count
- 500,000
( 1 row )
postgres = # REFRESH MATERIALIZED VIEW aam;
REFRESH MATERIALIZED VIEW
postgres = # SELECT count (*) FROM aam;
count
- 0
( 1 row )

Changes in the parent table were reflected in the materialized view only after the execution of the REFRESH command. Please note that at the time of this writing, REFRESH used an exclusive lock (eh ...).
The materialized view can be transferred to the unscannable state using the REFRESH option WITH the NO NO DATA option.

postgres = # REFRESH MATERIALIZED VIEW aam WITH NO DATA;
REFRESH MATERIALIZED VIEW
postgres = # SELECT count (*) FROM aam;
ERROR: materialized view "aam" has not been populated
HINT: Use the REFRESH MATERIALIZED VIEW command.

There was a new system table matviews, which contain information about the current state of the materialized views.

postgres = # SELECT matviewname, isscannable FROM pg_matviews;
matviewname | isscannable
------------- + -------------
aam | f
( 1 row )

DML queries cannot be performed on the materialized view, since these views may not correspond to the current value of the parent table. Regular views, on the other hand, perform the corresponding query every time it is needed, so it is possible to modify parent tables ( updatable views ) through them.

postgres = # INSERT INTO aam VALUES (1);
ERROR: cannot change materialized view "aam"
postgres = # UPDATE aam SET a = 5;
ERROR: cannot change materialized view "aam"
postgres = # DELETE FROM aam;
ERROR: cannot change materialized view "aam"

Now a few words about the improvement and degradation of performance that you can get using materialized views (given the fact that you can manipulate their indexes). For example, you can very easily improve the performance of sampling queries in materialized views without worrying about the data schema in the parent table at all:

postgres = # EXPLAIN ANALYZE SELECT * FROM aam WHERE a = 1;
QUERY PLAN
-------------------------------------------------- ------------------------------------------------
Seq Scan on aam ( cost = 0.00..8464.00 rows = 1 width = 4 ) ( actual time = 0.060..155.934 rows = 1 loops = 1 )
Filter: ( a = 1 )
Rows Removed by Filter: 499999
Total runtime: 156.047 ms
( 4 rows )
postgres = # CREATE INDEX aam_ind ON aam (a);
CREATE INDEX
postgres = # EXPLAIN ANALYZE SELECT * FROM aam WHERE a = 1;
QUERY PLAN
-------------------------------------------------- -------------------------------------------------- --------------
Index Only Scan using aam_ind on aam ( cost = 0.42..8.44 rows = 1 width = 4 ) ( actual time = 2.096..2.101 rows = 1 loops = 1 )
Index Cond: ( a = 1 )
Heap Fetches: 1
Total runtime: 2.196 ms
( 4 rows )

Note that indexes and constraints (materialized views can have constraints !) On the parent table are not copied to materialized views. For example, a quick query that scans the primary key of a table can result in a deadly long sequential brute force, running on a materialized view.

postgres = # INSERT INTO bb VALUES (generate_series (1,100000));
INSERT 0 100000
postgres = # EXPLAIN ANALYZE SELECT * FROM bb WHERE a = 1;
QUERY PLAN
-------------------------------------------------- -------------------------------------------------- -------------
Index Only Scan using bb_pkey on bb ( cost = 0.29..8.31 rows = 1 width = 4 ) ( actual time = 0.078..0.080 rows = 1 loops = 1 )
Index Cond: ( a = 1 )
Heap Fetches: 1
Total runtime: 0.159 ms
( 4 rows )
postgres = # CREATE MATERIALIZED VIEW bbm AS SELECT * FROM bb;
SELECT 100000
postgres = # EXPLAIN ANALYZE SELECT * FROM bbm WHERE a = 1;
QUERY PLAN
-------------------------------------------------- -------------------------------------------------
Seq Scan on bbm ( cost = 0.00..1776.00 rows = 533 width = 4 ) ( actual time = 0.144..41.873 rows = 1 loops = 1 )
Filter: ( a = 1 )
Rows Removed by Filter: 99999
Total runtime: 41.935 ms
( 4 rows )

Such anti-patterns are definitely not recommended for use on industrial systems!
In general, materialized views are a great feature, especially for use in applications that require caching. Enjoy!

Source: https://habr.com/ru/post/188802/


All Articles