PostgreSQL 9.4 What's New?

Good day! The release of PostgreSQL 9.4 is not far off and it will be useful to go over some of the innovations introduced in this version. The article will cover (for the most part, the client model):

Changes in auto-update views

Auto-renewable views ( autoupdatable views ) are views on which you can perform DML operations. Conditions for such performances:

only one entity (table or view) in the FROM list
no operators WITH , DISTINCT , GROUP BY , HAVING , LIMIT and OFFSET
lack of operations on UNION , INTERSECT and EXCEPT sets
functions and operators are not applied to fields

These views were provided in PostgreSQL 9.3. In version 9.4 made some additions to them. One of them removes the restriction related to the fact that in this view there can be no fields to which functions or operators are applied (in 9.3, if there is at least one such field field, the view becomes non-auto-updating). In 9.4, there is an opportunity to work with the other fields. We illustrate this with an example.
')
Create a reference book and build on it a simple auto-update view:

CREATE TABLE book ( id serial NOT NULL, name text NOT NULL, author text NOT NULL, year integer NOT NULL, CONSTRAINT pk_book_id PRIMARY KEY ( id ), CONSTRAINT uk_book UNIQUE ( name, author ) ); INSERT INTO book ( name, author, year ) VALUES ( ' №1', ' . .', 2010 ), ( ' №2', ' . .', 2011 ), ( ' №3', ' . .', 2012 ); CREATE OR REPLACE VIEW vw_book AS SELECT b.id, b.name, b.author, b.year, ( b.year >= extract ( year FROM current_date ) - 2 ) AS is_new FROM book b;

Update the data:

 --   ,     UPDATE vw_book SET name = ' №10' WHERE name = ' №1'; -- ,  is_new   UPDATE vw_book SET is_new = false WHERE name = ' №3'; -- ,   SELECT * FROM vw_book b; --    ,     SELECT c.column_name, ( is_updatable = 'YES' ) AS is_updatable FROM information_schema.columns c WHERE c.table_name = 'vw_book' ORDER BY ordinal_position;

For auto-updated views, the WITH CHECK OPTION option has become available. Its meaning is that, with INSERT and UPDATE , a check will be added to the view that the added or modified data is not limited to the WHERE condition in the view. For example, we have a presentation in which old books are selected and it is necessary to prohibit the addition of new books through this presentation:

 --  ,         CREATE OR REPLACE VIEW vw_book_archive AS SELECT b.id, b.name, b.author, b.year FROM vw_book b WHERE b.is_new = false WITH CHECK OPTION; -- ,     year    INSERT INTO vw_book_archive ( name, author, year ) VALUES ( ' №100', ' . .', 2014 ); --   INSERT INTO vw_book_archive ( name, author, year ) VALUES ( ' №100', ' . .', 2010 );

Further, the views announced with the security_barrier option no longer cease to be auto-updating. Consider security_barrier in more detail.

Add a field to the books where the promotion code will be located:

 --    DELETE FROM book; INSERT INTO book ( id, name, author, year ) VALUES ( 1, ' №1', ' . .', 2010 ), ( 2, ' №2', ' . .', 2011 ), ( 3, ' №3', ' . .', 2012 ); --   ALTER TABLE book ADD COLUMN promotion_code text; --    UPDATE book SET promotion_code = 'CODE_' || id;

Let's create a view that returns books (excluding a certain secret book) and their promotional codes, as well as a function that displays the name of the book and its code through RAISE NOTICE:

 CREATE OR REPLACE VIEW vw_book_list AS SELECT b.* FROM book b WHERE b.name != ' №1'; CREATE OR REPLACE FUNCTION fn_book_promotion_code ( p_name text, p_code text ) RETURNS boolean AS $BODY$ BEGIN RAISE NOTICE 'Book "%" has code "%"', p_name, p_code; RETURN true; END $BODY$ LANGUAGE plpgsql COST 100;

Let's make a sample from this view (while calling fn_book_promotion_code to display the promotion code of the book) and look at the query plan:

 SELECT name, promotion_code FROM vw_book_list l WHERE fn_book_promotion_code ( l.name, l.promotion_code ); --   : --  №2 CODE_2 --  №3 CODE_3 --   : -- NOTICE: Book " №2" has code "CODE_2" -- NOTICE: Book " №3" has code "CODE_3" EXPLAIN ANALYZE SELECT name, promotion_code FROM vw_book_list l WHERE fn_book_promotion_code ( l.name, l.promotion_code ); -- Seq Scan on book b (cost=0.00..1.79 rows=1 width=23) (actual time=0.185..0.217 rows=2 loops=1) Filter: ((name <> ' №1'::text) AND fn_book_promotion_code(name, promotion_code)) Rows Removed by Filter: 1 Planning time: 0.064 ms Execution time: 0.229 ms

Note that the lines are read sequentially, the filter first checks the match of the name of the book, and then the fn_book_promotion_code function is called . Let us now try to reduce the cost of calling functions and observe what has changed in the query results and query plan:

 CREATE OR REPLACE FUNCTION fn_book_promotion_code ( p_name text, p_code text ) RETURNS boolean AS $BODY$ BEGIN RAISE NOTICE 'Book "%" has code "%"', p_name, p_code; RETURN true; END $BODY$ LANGUAGE plpgsql COST 0.01; SELECT name, promotion_code FROM vw_book_list l WHERE fn_book_promotion_code ( l.name, l.promotion_code ); --   : --  №2 CODE_2 --  №3 CODE_3 --      : -- NOTICE: Book " №1" has code "CODE_1" -- NOTICE: Book " №2" has code "CODE_2" -- NOTICE: Book " №3" has code "CODE_3" EXPLAIN ANALYZE SELECT name, promotion_code FROM vw_book_list l WHERE fn_book_promotion_code ( l.name, l.promotion_code ); -- Seq Scan on book b (cost=0.00..1.04 rows=1 width=23) (actual time=0.215..0.240 rows=2 loops=1) Filter: (fn_book_promotion_code(name, promotion_code) AND (name <> ' №1'::text)) Rows Removed by Filter: 1 Planning time: 0.064 ms Execution time: 0.254 ms

Since the cost of calling a function is meager, the scheduler first calls it and then checks the name of the book. That, respectively, allows you to find out the code of the secret book. To prevent this situation, the vw_book_list view should be created with the security_barrier option:

 CREATE OR REPLACE VIEW vw_book_list WITH ( security_barrier = true ) AS SELECT b.* FROM book b WHERE b.name != ' №1'; SELECT name, promotion_code FROM vw_book_list l WHERE fn_book_promotion_code ( l.name, l.promotion_code ); --  : --  №2 CODE_2 --  №3 CODE_3 --  : -- NOTICE: Book " №2" has code "CODE_2" -- NOTICE: Book " №3" has code "CODE_3" EXPLAIN ANALYZE SELECT name, promotion_code FROM vw_book_list l WHERE fn_book_promotion_code ( l.name, l.promotion_code ); -- Subquery Scan on l (cost=0.00..1.06 rows=1 width=23) (actual time=0.078..0.106 rows=2 loops=1) Filter: fn_book_promotion_code(l.name, l.promotion_code) -> Seq Scan on book b (cost=0.00..1.04 rows=2 width=52) (actual time=0.009..0.010 rows=2 loops=1) Filter: (name <> ' №1'::text) Rows Removed by Filter: 1 Planning time: 0.069 ms Execution time: 0.122 ms

Now the scheduler selected the filter with the function call as a separate subquery. The vw_book_list view remains auto-updating:

 UPDATE vw_book_list SET promotion_code = 'CODE_555' WHERE name = ' №2';

Changes in materialized views

The main problem of the materialized views, which appeared in PostgreSQL 9.3, was that during the process of updating the view an exclusive ( ACCESS EXCLUSIVE ) lock was used, making it impossible to query the view. In 9.4, for the REFRESH MATERIALIZED VIEW command, the CONCURRENTLY option has been added with which the update of the materialized view uses an EXCLUSIVE lock, compatible with ACCESS SHARE concurrency locks arising from SELECT queries. Behind the scenes, with this update, a temporary version of this view is created, then a comparison is made and, for differences, the corresponding INSERT and DELETE commands are executed. This approach requires the creation of UNIQUE INDEX for one or more fields of the materialized view. Let's look at it in action:

 --   CREATE TABLE author ( id serial NOT NULL, first_name text NOT NULL, last_name text NOT NULL, CONSTRAINT pk_author_id PRIMARY KEY ( id ), CONSTRAINT uk_author_name UNIQUE ( first_name, last_name ) ); --   CREATE TABLE publication ( id serial NOT NULL, author_id integer NOT NULL, name text NOT NULL, CONSTRAINT pk_publication_id PRIMARY KEY ( id ), CONSTRAINT fk_publication_author_id FOREIGN KEY ( author_id ) REFERENCES author ( id ), CONSTRAINT uk_publication_name UNIQUE ( author_id, name ) ); --   INSERT INTO author ( first_name, last_name ) VALUES ( '', '' ); --  id = 1 INSERT INTO author ( first_name, last_name ) VALUES ( '', '' ); --  id = 2 INSERT INTO publication ( author_id, name ) VALUES ( 1, ' №' || generate_series ( 1, 1000000 ) || ')' ); INSERT INTO publication ( author_id, name ) VALUES ( 2, ' ' ), ( 2, '  ' ); –-      CREATE MATERIALIZED VIEW mvw_publication AS SELECT p.id, a.first_name || ' ' || a.last_name AS author_name, p.name FROM publication p INNER JOIN author a ON a.id = p.author_id; CREATE UNIQUE INDEX idx_mvw_publication_id ON mvw_ publication ( id ); --    REFRESH MATERIALIZED VIEW CONCURRENTLY mvw_publication; –-   ... –-     (  ) SELECT * FROM mvw_publication; –-   SELECT l.mode FROM pg_locks l INNER JOIN pg_class c ON c.oid = l.relation WHERE .relname = 'mvw_publication';

ALTER SYSTEM command to set configuration parameters

This command allows you to change server configuration parameters via an SQL query. Unlike the SET and set_config commands , which affect the session (or transaction), the change will be permanent. In fact, the parameter is added to the $ PGDATA / postgresql.auto.conf file, which is read by the server (when starting or receiving a SIGHUP signal) after reading postgresql.conf. Examples of this command:

 --   ALTER SYSTEM SET log_min_duration_statement = '1min'; --   ALTER SYSTEM SET log_min_duration_statement TO '2min'; --   ALTER SYSTEM SET log_min_duration_statement TO DEFAULT;

WITH ORDINALITY option for returning rowsets

For the unnest function (and other functions that return a rowset), the WITH ORDINALITY option is added, which displays the order of the string. Also, in unnest you can now list several arrays, each of which will be a separate column:

 SELECT * FROM unnest ( ARRAY['cat', 'dog', 'mouse'], ARRAY['Tom', 'Jack', 'Lili'] ) WITH ORDINALITY AS t ( cat, dog, mouse ); -- : -- cat Tom 1 -- dog Jack 2 -- mouse Lili 3 --      SELECT *, row_number() OVER () AS i FROM unnest ( ARRAY['cat', 'dog', 'mouse'], ARRAY['Tom', 'Jack', 'Lili'] ) AS t ( cat, dog, mouse );

New features for aggregated data

The standard functions-aggregates ( sum , avg , corr , etc.) added functions for ordered sets and sets of hypothetical series:

 --   CREATE TABLE salary AS SELECT ( random ( ) * 100 + 2000 )::int AS value FROM generate_series ( 1, 100 ); --     SELECT mode() WITHIN GROUP ( ORDER BY value DESC ) FROM salary; --  ,       SELECT percentile_disc ( 0.5 ) WITHIN GROUP ( ORDER BY value ) FROM salary; -- ,     (       ) SELECT percentile_cont ( 0.5 ) WITHIN GROUP ( ORDER BY value ) FROM salary; --     ( 1 / n  1) SELECT s.value, cume_dist ( 2026 ) WITHIN GROUP ( ORDER BY value ) FROM salary s GROUP BY s.value;

There is also an option for filtering data that will fall into aggregation functions:

 --  ,    2050 SELECT avg ( s.value ) FILTER ( WHERE s.value >= 2050 ) FROM salary s; --    CASE WHEN … SELECT avg ( CASE WHEN s.value >= 2050 THEN s.value ELSE NULL END ) FROM salary s;

GIN and GiST index improvements

GIN indices (with multiple fields) used for full-text search are now faster and take up less space. GiST has added support for the inet and cidr types :

 --  IP- CREATE TABLE machine ( ip cidr ); INSERT INTO machine ( ip ) VALUES ( '192.168.1.1'::cidr ), ( '192.168.1.10'::cidr ), ( '192.168.2.11'::cidr ); --     CREATE INDEX idx_machine_ip ON machine USING GiST ( ip inet_ops ); --    192.168.1.0/24 SELECT * FROM machine WHERE ip && '192.168.1.0/24'::cidr;

Improved EXPLAIN command output

In the output of the EXPLAIN command, if there is grouping, a column is displayed by which data is grouped. In addition, the planning time for the query plan is now displayed:

 EXPLAIN ANALYZE SELECT s.value, count ( * ) FROM salary s GROUP BY s.value HAVING count ( * ) >= 2; -- HashAggregate (cost=2.75..3.60 rows=68 width=4) (actual time=0.045..0.053 rows=26 loops=1) Group Key: value Filter: (count(*) >= 2) Rows Removed by Filter: 42 -> Seq Scan on salary s (cost=0.00..2.00 rows=100 width=4) (actual time=0.007..0.015 rows=100 loops=1) Planning time: 0.042 ms Execution time: 0.082 ms

If bitmap heap scan is used , then how many blocks are matched ( exact ) and how many are missing ( lossy ):

 --    INSERT INTO salary ( value ) SELECT ( random ( ) * 10000 + 1000 )::int AS value FROM generate_series ( 1, 1000000 ); CREATE INDEX idx_salary_value ON salary ( value ); SET work_mem = '64kB'; EXPLAIN ANALYZE SELECT * FROM salary s WHERE s.value BETWEEN 2010 AND 2020; -- Bitmap Heap Scan on salary s (cost=28.83..2739.72 rows=1210 width=4) (actual time=0.370..17.824 rows=1030 loops=1) Recheck Cond: ((value >= 2010) AND (value <= 2020)) Rows Removed by Index Recheck: 96457 Heap Blocks: exact=486 lossy=429 -> Bitmap Index Scan on idx_salary_value (cost=0.00..28.53 rows=1210 width=0) (actual time=0.286..0.286 rows=1030 loops=1) Index Cond: ((value >= 2010) AND (value <= 2020)) Planning time: 0.098 ms Execution time: 17.920 ms SET work_mem = '32MB'; EXPLAIN ANALYZE SELECT * FROM salary s WHERE s.value BETWEEN 2010 AND 2020; -- Bitmap Heap Scan on salary s (cost=28.83..2739.72 rows=1210 width=4) (actual time=0.283..1.214 rows=1030 loops=1) Recheck Cond: ((value >= 2010) AND (value <= 2020)) Heap Blocks: exact=915 -> Bitmap Index Scan on idx_salary_value (cost=0.00..28.53 rows=1210 width=0) (actual time=0.157..0.157 rows=1030 loops=1) Index Cond: ((value >= 2010) AND (value <= 2020)) Planning time: 0.076 ms Execution time: 1.269 ms

Warming up cache

In the new pg_prewarm extension, a function of the same name is available, with the help of which the necessary entity is forcedly loaded into the cache (system OS or PostgreSQL). Let's see how this happens.

First, install the extension and create a test table:

 CREATE EXTENSION pg_prewarm; CREATE TABLE big AS SELECT array_to_string ( array_agg ( tx ), '' ) || '#' || generate_series ( 1, 10000 ) AS value FROM ( SELECT 'a' || generate_series ( 1, 1000 ) AS x ) t; --     ALTER SYSTEM SET shared_buffers = '256MB';

Now, stop the PostgreSQL server, flush the OS caches to disk and start the server again (there may be other commands in your OS):

 /etc/init.d/postgresql-9.4 stop sync /etc/init.d/postgresql-9.4 start

Let’s look at the test table, observing where the data comes from:

 --   EXPLAIN ( ANALYZE, BUFFERS ) SELECT * FROM big; -- Seq Scan on big (cost=0.00..76047.00 rows=5000000 width=8) (actual time=0.013..448.978 rows=5000000 loops=1) Buffers: shared read=26047 Planning time: 0.081 ms Execution time: 689.083 ms --   EXPLAIN ( ANALYZE, BUFFERS ) SELECT * FROM big; -- Seq Scan on big (cost=0.00..76047.00 rows=5000000 width=8) (actual time=0.044..449.973 rows=5000000 loops=1) Buffers: shared hit=32 read=26015 Planning time: 0.027 ms Execution time: 692.045 ms --   EXPLAIN ( ANALYZE, BUFFERS ) SELECT * FROM big; -- Seq Scan on big (cost=0.00..76047.00 rows=5000000 width=8) (actual time=0.044..449.973 rows=5000000 loops=1) Buffers: shared hit=32 read=26015 Planning time: 0.027 ms Execution time: 692.045 ms

It is clearly seen that since there is still nothing in the cache, the data is read from the disk ( shared read ), but with each subsequent request the cache is filled ( shared hit ).

Stop the PostgreSQL server again, reset the OS cache and start the server. And again we will look at the EXPLAIN result, but before that, fill the cache with the data from the test table:

 --  26047  SELECT pg_prewarm ( 'big' ); EXPLAIN ( ANALYZE, BUFFERS ) SELECT * FROM big; -- Seq Scan on big (cost=0.00..76047.00 rows=5000000 width=8) (actual time=0.007..407.269 rows=5000000 loops=1) Buffers: shared hit=26047 Planning time: 0.129 ms Execution time: 642.834 ms

All data is already in the cache.

Triggers for external tables

PostgreSQL 9.3 introduced the postgres_fdw extension, which allows read / write access to tables located in another database, the so-called foreign tables . In 9.4 it is now possible to create triggers for them.

Create an external table:

 --   test0 CREATE TABLE city ( country text NOT NULL, name text NOT NULL, CONSTRAINT uk_city_name UNIQUE ( country, name ) ); --   test1 CREATE EXTENSION postgres_fdw; --    CREATE SERVER fdb_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS ( host 'localhost', port '5432', dbname 'test0' ); --   CREATE USER MAPPING FOR PUBLIC SERVER fdb_server OPTIONS ( password 'pwd' ); --    CREATE FOREIGN TABLE fdb_city ( country text, name text ) SERVER fdb_server OPTIONS ( table_name 'city' );

Create a trigger that prohibits adding a city called N / A and check it:

 --   test1 CREATE OR REPLACE FUNCTION tfn_city_change ( ) RETURNS trigger AS $BODY$ BEGIN IF ( NEW.name = 'N/A' ) THEN RAISE EXCEPTION 'City with name "N/A" not allowed'; END IF; RETURN NEW; END $BODY$ LANGUAGE plpgsql VOLATILE; CREATE TRIGGER tr_city_change BEFORE INSERT ON fdb_city FOR EACH ROW EXECUTE PROCEDURE tfn_city_change ( ); --   INSERT INTO fdb_city ( country, name ) VALUES ( 'USA', 'New York' ); --     INSERT INTO fdb_city ( country, name ) VALUES ( 'USA', 'N/A' );

But in the test0 database, this trigger is “not visible”, which allows you to specify any city names:

 --   test0 --   INSERT INTO city ( country, name ) VALUES ( 'Italy', 'N/A' );

Changes for json and new jsonb data type

Undoubtedly, the new jsonb type was the most anticipated innovation in PostgreSQL 9.4. Syntactically, it does not differ from json , but the data is stored in an expanded binary format, which slows down the addition of new data, but ensures high processing speed. In general, it is better to store JSON in jsonb .

For jsonb it is possible to create indexes ( GIN , btree and hash ). The GIN has two operator classes for it:

standard (jsonb_ops) - supports the @>,?,? & and? operators |
jsonb_path_ops - supports the @> operator

Although the operator class jsonb_path_ops supports only one operator, it is more productive and, as a rule, takes up less space for the same data than jsonb_ops .

New features for working with JSON include (for jsonb type, respectively, jsonb_ *):

json_array_elements_text - expanding a JSON array into a set of values with type text
json_array_elements_text - roll up values into a JSON array
json_object - building a JSON object from a text array
json_typeof - information about the type of JSON value

JSON and PostgreSQL types have this relationship (equivalent for null is not, since NULL in PostgreSQL has different semantics):

JSON type	PostgreSQL type
string	text
number	numeric
boolean	boolean

Let's try to work with JSON:

 --   CREATE TABLE book ( id serial NOT NULL, name text NOT NULL, params jsonb NOT NULL DEFAULT '{}' ); --   INSERT INTO book ( name, params ) SELECT ' #' || tx, ( '{ "pages": ' || 500 + ( tx % 500 ) || CASE tx % 1000 WHEN 0 THEN ', "gold_edition": true' ELSE '' END || ' }' )::jsonb FROM ( SELECT generate_series ( 1, 1000000 ) x ) t; --  ""  SELECT count ( * ) FROM book WHERE params @> '{ "gold_edition": true }'::jsonb; --       SELECT count ( * ) > 0 FROM book WHERE name = ' #1' AND params ? 'pages'; --     SELECT params -> 'pages' FROM book WHERE name = ' #11'; --   "pages" SELECT jsonb_typeof ( params -> 'pages' ) FROM book WHERE name = ' #11'; --   CREATE INDEX idx_book_params ON book USING gin ( params jsonb_path_ops );

Operators @>? and some others are specific to the jsonb type. A more detailed review of jsonb and performance testing draws on a separate article.

The article did not consider server improvements (in particular, related to SSL, VACUUM and backgound_workers). See changelog for more information. In conclusion, we would like to note that PostgreSQL is confidently moving both in relational and in NoSQL directions. Gradually, new features are added, which are improved in the next versions, which is good news.

Useful links:

Thank you for attention.

Source: https://habr.com/ru/post/234909/

All Articles