Part 2. TABLESAMPLEPart 3. GROUPING SETS, CUBE, ROLLUPIn the 4th quarter of 2015, PostgreSQL 9.5 is expected to be released. As always, the new version,
in addition to new bugs, brings new features and "buns". In this article, two of them will be considered, namely INSERT ... ON CONFLICT DO NOTHING / UPDATE and Row-level security. The second alpha version has already been released, so the most impatient can install it and try new functionality.
You can download it
here.INSERT ... ON CONFLICT DO NOTHING / UPDATE
He is colloquially UPSERT. Allows in case of a conflict when inserting to update the fields or ignore the error.
What was previously proposed to be implemented using the
stored function will now be available out of the box. You can use the
ON CONFLICT DO NOTHING / UPDATE clause in an
INSERT statement . In this case, the expression indicates separately
conflict_target (by which field / condition the conflict will be considered) and
conflict_action (what to do when the conflict occurred:
DO NOTHING or
DO UPDATE SET ).
')
The complete syntax for an
INSERT statement is:
[ WITH [ RECURSIVE ] with_query [, ...] ] INSERT INTO table_name [ AS alias ] [ ( column_name [, ...] ) ] { DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query } [ ON CONFLICT [ conflict_target ] conflict_action ] [ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ] where conflict_target can be one of: ( { column_name_index | ( expression_index ) } [ COLLATE collation ] [ opclass ] [, ...] ) [ WHERE index_predicate ] ON CONSTRAINT constraint_name and conflict_action is one of: DO NOTHING DO UPDATE SET { column_name = { expression | DEFAULT } | ( column_name [, ...] ) = ( { expression | DEFAULT } [, ...] ) | ( column_name [, ...] ) = ( sub-SELECT ) } [, ...] [ WHERE condition ]
For us, the fun begins after ON CONFLICT.
Let's look at examples. Create a table in which the credentials of certain persons will lie:
CREATE TABLE account ( id bigserial, name varchar, surname varchar, address varchar, PRIMARY KEY (id), CONSTRAINT unique_person UNIQUE (name, surname, address) ); Query returned successfully with no result in 31 ms.
Execute the insert request
INSERT INTO account (id, name, surname, address) VALUES (1, '', '', ', ') ON CONFLICT (id) DO NOTHING; Query returned successfully: one row affected, 12 ms execution time. SELECT * FROM ACCOUNT;
id | name | surname | address |
---|
one | Vasya | Pupkin | Moscow Kremlin |
Here,
conflict_target is
(id) and
conflict_action is
DO NOTHING .
If you try to execute this query a second time, then the insertion will not occur, and it will not give you any error message:
Query returned successfully: 0 rows affected, 12 ms execution time.
If we did not specify
ON CONFLICT (id) DO NOTHING , we would get an error:
INSERT INTO account (id, name, surname, address) VALUES (1, '', '', ', '); ********** Error ********** ERROR: duplicate key value violates unique constraint "account_pkey" SQL state: 23505 Detail: Key (id)=(1) already exists.
The same behavior (as for
ON CONFLICT (id) DO NOTHING ) will be for the query:
INSERT INTO account (id, name, surname, address) VALUES (DEFAULT, '', '', ', ') ON CONFLICT (name, surname, address) DO NOTHING; Query returned successfully: 0 rows affected, 12 ms execution time.
In it, we already take the
id value by default (from a sequence), but indicate another
conflict_target — in the three fields that are subject to a unique constraint.
As mentioned above, you can also specify a
conflict_target using the
ON CONSTRAINT construct , specifying the constraint name itself:
INSERT INTO account (id, name, surname, address) VALUES (DEFAULT, '', '', ', ') ON CONFLICT ON CONSTRAINT unique_person DO NOTHING; Query returned successfully: 0 rows affected, 11 ms execution time.
This is especially useful if you have an exclusion constraint, which you can refer to only by name, and not by a set of columns, as is the case with the restriction of uniqueness.
If you have built a partial unique index, then this can also be specified in the condition. Let only our people named Vasya will have unique combinations of the surname + address in our table:
ALTER TABLE account DROP CONSTRAINT unique_person; CREATE UNIQUE INDEX unique_vasya ON account (surname, address) WHERE name='';
Then we can write this query:
INSERT INTO account (id, name, surname, address) VALUES (DEFAULT, '', '', ', ') ON CONFLICT (surname, address) WHERE name='' DO NOTHING; Query returned successfully: 0 rows affected, 12 ms execution time.
And finally, if you want
DO NOTHING to be triggered by any uniqueness / exception conflict on insertion, this can be written as follows:
INSERT INTO account (id, name, surname, address) VALUES (DEFAULT, '', '', ', ') ON CONFLICT DO NOTHING; Query returned successfully: 0 rows affected, 12 ms execution time.
It is worth noting that it is impossible to specify multiple
conflict_action , so if one of them is specified and the other is triggered, then there will be an error when pasting:
INSERT INTO account (id, name, surname, address) VALUES (DEFAULT, '', '', ', ') ON CONFLICT (id) DO NOTHING; ********** Error ********** ERROR: duplicate key value violates unique constraint "unique_person" SQL state: 23505 Detail: Key (name, surname, address)=(, , , ) already exists.
Let us turn to the possibilities of
DO UPDATE SET .
For
DO UPDATE SET , unlike
DO NOTHING, an indication of
conflict_action is required.
The
DO UPDATE SET construct updates the fields that are listed in it. The values ​​of these fields can be set explicitly, set by default, obtained from a subquery, or taken from the special expression
EXCLUDED , from which you can take data that was originally proposed for insertion.
INSERT INTO account (id, name, surname, address) VALUES (1, '', '', ', ') ON CONFLICT (id) DO UPDATE SET name='', surname=''; Query returned successfully: one row affected, 11 ms execution time. SELECT * FROM ACCOUNT;
id | name | surname | address |
---|
one | Petya | Petrov | Moscow Kremlin |
INSERT INTO account AS a (id, name, surname, address) VALUES (1, '', '', ', ') ON CONFLICT (id) DO UPDATE SET name=EXCLUDED.name || ' ( ' || a.name || ')', surname=EXCLUDED.surname || ' ( ' || a.surname || ')'; Query returned successfully: one row affected, 13 ms execution time. SELECT * FROM ACCOUNT;
id | name | surname | address |
---|
one | Petya (former Vasya) | Petrov (former Pupkin) | Moscow Kremlin |
INSERT INTO account (id, name, surname, address) VALUES (1, '', '', ', ') ON CONFLICT (id) DO UPDATE SET name=DEFAULT, surname=DEFAULT; Query returned successfully: one row affected, 11 ms execution time. SELECT * FROM ACCOUNT;
id | name | surname | address |
---|
one | Null | Null | Moscow Kremlin |
INSERT INTO account (id, name, surname, address) VALUES (1, '', '', ', ') ON CONFLICT (id) DO UPDATE SET name=(SELECT some_field FROM other_table LIMIT 1);
The
WHERE clause can also be used. For example, we want the
name field not to be updated if the
address field in the table row already contains the text “Kremlin”, otherwise it would be added:
INSERT INTO account AS a (id, name, surname, address) VALUES (1, '', '', ', ') ON CONFLICT (id) DO UPDATE SET name=EXCLUDED.name WHERE a.name not like '%%'; Query returned successfully: 0 rows affected, 12 ms execution time.
And if we want the
name field not to be updated, if the
address field in the inserted data contains the text "Kremlin", otherwise, it was added:
INSERT INTO account AS a (id, name, surname, address) VALUES (1, '', '', ', ') ON CONFLICT (id) DO UPDATE SET name=EXCLUDED.name WHERE EXCLUDED.name not like '%%'; Query returned successfully: one row affected, 11 ms execution time. SELECT * FROM ACCOUNT
id | name | surname | address |
---|
one | Vasya | Null | Moscow Kremlin |
ROW LEVEL SECURITY
Row-level security or row-level security is a mechanism for delimiting access to information to the database, which allows users to restrict access to individual rows in the tables.
This functionality may be interesting to those who use databases with a large number of users.
It works as follows: the rules for a specific table are described, according to which access to specific rows is limited when executing certain commands, using the
CREATE POLICY expression. Each rule contains a certain logical expression that must be true for the string to be visible in the query. The rules are then activated using the
ALTER TABLE ... ENABLE ROW LEVEL SECURITY statement. Then when trying to access, for example, when a
SELECT query is made, it is checked whether the user has the right to access a specific row and if not, they are not shown to him. The superuser can see all lines by default, since his default flag is BYPASSRLS, which means that no checks will be performed for this role.
The syntax for the
CREATE POLICY statement
is :
CREATE POLICY name ON table_name [ FOR { ALL | SELECT | INSERT | UPDATE | DELETE } ] [ TO { role_name | PUBLIC | CURRENT_USER | SESSION_USER } [, ...] ] [ USING ( using_expression ) ] [ WITH CHECK ( check_expression ) ]
Rules are created for specific tables, so there can be several rules in the database with the same name for different tables.
After the
FOR statement, it is indicated for which requests the rule is applied, the default is
ALL , that is, for all requests.
After
TO - for which roles, by default -
PUBLIC , that is, for all roles.
Further, in the
USING expression, a Boolean expression is specified, which must be true in order for the specific string to be visible to the user in queries that use already existing data (
SELECT ,
UPDATE ,
DELETE ). If the boolean expression returns null or false, then the string will not be visible.
The
WITH CHECK statement specifies a Boolean expression that must be true for the query to add or modify data (
INSERT or
UPDATE ) to succeed. If the boolean expression returns null or false, then there will be an error. The
WITH CHECK clause is executed after
BEFORE triggers (if present) and before any other checks. Therefore, if a
BEFORE trigger modifies a string so that the condition does not return true, there will be an error. For
UPDATE to succeed, it is necessary that both conditions return true, including if a conflict occurs in the
INSERT ... ON CONFILCT DO UPDATE request and the request attempts to modify the data. If the
WITH CHECK clause is omitted, the condition from the
USING clause will be substituted instead.
Under conditions, aggregate or window functions cannot be used.
Usually, it is required to control access based on which database user requests the data, so we can use functions that return
System Information (
System Information Functions ).
Let us turn to examples:
Add a
db_user field to the
account table, fill this field for an already existing record and add new records:
ALTER TABLE account ADD COLUMN db_user varchar; Query returned successfully with no result in 16 ms. UPDATE account SET db_user='pupkin' WHERE surname=''; INSERT INTO account (name, surname, address, db_user) VALUES ('', '', ', ', 'petrov'), ('', '', '-, ', 'sidorov'); Query returned successfully: 2 rows affected, 31 ms execution time.
Create roles:
CREATE ROLE pupkin WITH LOGIN PASSWORD 'pupkin'; CREATE ROLE petrov WITH LOGIN PASSWORD 'petrov'; Query returned successfully with no result in 31 ms.
Create a rule and enable RLS on the table:
CREATE POLICY select_self ON account FOR SELECT USING (db_user=current_user); ALTER TABLE account ENABLE ROW LEVEL SECURITY; Query returned successfully with no result in 12 ms.
In this query, we created a rule according to which, the user in the
SELECT query will see only those rows in which the value of the
db_user field coincides with the name of the current database user.
Run a request from the postgres user:
SELECT * FROM account
id | name | surname | address | db_user |
---|
one | Vasya | Pupkin | Moscow Kremlin | pupkin |
five | Peter | Petrov | Moscow Red Square | petrov |
6 | Ivan | Sidorov | St. Petersburg, Winter Palace | sidorov |
Perform the same request from the pupkin user:
id | name | surname | address | db_user |
---|
one | Vasya | Pupkin | Moscow Kremlin | pupkin |
Let's create a rule according to which only the pupkin user can insert lines with the last name "Pupkin":
CREATE POLICY insert_update_pupkin ON account WITH CHECK (surname<>'' OR current_user='pupkin')
Let's try to fulfill the request from the pupkin user:
INSERT INTO account (name, surname, address) VALUES ('', '', ', ') Query returned successfully: one row affected, 13 ms execution time.
Check:
select * from account;
id | name | surname | address | db_user |
---|
one | Vasya | Pupkin | Moscow Kremlin | pupkin |
Op-pa! We forgot to specify the
db_user field and the record that we inserted, we will not see. Well, let's fix this logic with the help of a trigger in which we fill in the
db_user field
with the name of the current user:
CREATE OR REPLACE FUNCTION fill_db_user() RETURNS TRIGGER AS $BODY$ BEGIN NEW.db_user = current_user; RETURN NEW; END; $BODY$ LANGUAGE plpgsql VOLATILE; CREATE TRIGGER fill_db_user BEFORE INSERT ON account FOR EACH ROW EXECUTE PROCEDURE fill_db_user();
We try again:
INSERT INTO account (name, surname, address) VALUES ('', '', ', '); select * from account;
id | name | surname | address | db_user |
---|
one | Vasya | Pupkin | Moscow Kremlin | pupkin |
21 | Ivan | Pupkin | Kiev, Maidan | pupkin |
Let's try to change the data on Ivan Pupkin user petrov:
UPDATE account SET db_user='petrov' WHERE id=21 Query returned successfully: 0 rows affected, 13 ms execution time.
As you can see, the data has not changed, it happened because the
USING condition from the
select_self rule
was not fulfilled.
If several rules correspond to one request, they are combined via
OR .
It is worth noting that the rules apply only when explicit requests are made to the tables and do not apply when the system performs checks (constaints, foreign keys, etc.). This means that the user, using queries, determine that any value exists in the database. For example, if a user can insert into a table that refers to another table, from which he cannot make a
SELECT . In such a case, it may try to make an INSERT in the first table and by the result (it was inserted or an error occurred while checking referential integrity) to determine if a value exists in the second table.
There are many ways to use row-level security:
- the same database is used by several applications with different functionality
- multiple instances of the same application with different rights
- access by roles or user groups
- etc.
In the following parts, I plan to consider such new features of PostgreSQL 9.5 as:
- Part 2. TABLESAMPLE
- SKIP LOCKED
- BRIN indices
- GROUPING SETS, CUBE, ROLLUP
- New features for JSONB
- IMPORT FOREIGN SCHEMA
- other