Last week, steep varanio literally read DevConf a bottomhole report for all those who migrated to the Postgres from MySQL, but still do not use the full database. Based on the speech, this publication was born.
We are pleased to announce that preparations for PG Day'17 Russia are in full swing! We have published the full schedule of the upcoming event. We invite everyone to come and talk with Anton personally
Since the report on DevConf caused generally positive reviews, I decided to issue it in the form of an article for those who for some reason could not attend the conference.
Why did the idea of ​​such a report come about? The fact is that PostgreSQL is now clearly a hype technology, and many are switching to this DBMS. Sometimes - for objective reasons, sometimes - simply because it is fashionable.
But quite often there is such a situation, when some conditional programmer Vasya wrote yesterday on MySQL, and today he suddenly started writing at Posgres. How will he write? Yes, in general, as before, using only the most minimal set of new base features. Practice shows that years pass before the DBMS begins to be used more or less fully.
Immediately disclaimer: this is not an article "Muskul vs posgres". It is up to you to go to the post office or not. Uber, for example, switched back to MySQL for some reason.
We must pay tribute to Oracle, they are clearly moving MySQL in the right direction. 5.7 made strict mode by default. In the eighth version, CTE and window functions are promised, as well as getting rid of the MyISAM engine in system tables. Those. it is clear that resources are being invested in the base, and users' descriptions are studied very seriously.
However, PostgreSQL is still full of unique features. As a result, I tried to make a brief overview of the capabilities of the developer base.
Many types of data are built into the database, in addition to the usual numeric and string. As well as operators for their interaction.
For example, there are types cidr, inet, macaddr for working with ip addresses.
-- , ip '128.0.0.1' cidr '127.0.0.0/24' -- && select '127.0.0.0/24'::cidr && '128.0.0.1'; -- false
Or for example, time with timezone (timestamptz), time interval, etc.
-- -? SELECT NOW() AT TIME ZONE 'America/New_York'; -- -? SELECT NOW() AT TIME ZONE 'America/New_York' - NOW() AT TIME ZONE 'Europe/Moscow'; -- : -07:00:00
When I was preparing this slide, I decided to look out of curiosity, and what time shift relative to UTC was 100 years ago, in 1917:
select '1917-06-17 00:00:00 UTC' at time zone 'Europe/Moscow'; -- : 1917-06-17 02:31:19
Those. Muscovites lived on time UTC + 02: 31: 19.
In addition to these, there are other built-in data types: UUID, JSONB, XML, bit strings, etc.
Separately, it is necessary to consider the type of "array". Arrays have long been well integrated into PostgreSQL. Multidimensional arrays, slices, intersection, union operators, etc. There are many functions for working with arrays.
--- SELECT ARRAY [1, 2, 8, 10] && ARRAY [1, 2, 3, 4, 5]; --- ? SELECT ARRAY [1, 2] <@ ARRAY [1, 2, 3, 4, 5]
There is a very convenient function, which is called: array. An argument is a certain SELECT query, and the output is the result of the query as an array.
There is an inverse function: unnest. It takes an array and returns it as the result of the query. This is convenient, for example, when you need to manually insert several identical records with different id, but you do not want to do copy-paste:
INSERT INTO users (id, status, added_at) SELECT user_id, 5, '2010-03-03 10:56:40' FROM unnest(array[123, 1232, 534, 233, 100500]) as u(user_id)
Own types can be created in three ways. First, if you know the C language, then you can create a base type, along with some int or varchar. Example from the manual:
CREATE TYPE box ( INTERNALLENGTH = 16, INPUT = my_box_in_function, OUTPUT = my_box_out_function );
Those. create a couple of functions that can do your type from cstring and vice versa. Then you can use this type, for example, in a table declaration:
CREATE TABLE myboxes ( id integer, description box );
The second method is a composite type. For example, to store complex numbers:
CREATE TYPE complex AS ( r double precision, i double precision );
And then use this:
CREATE TABLE math ( result complex ); INSERT INTO math (result) VALUES ((0.5, -0.6)::complex); SELECT (result).i FROM math; -- : -0.6
The third type you can create is the domain type. A domain type is simply an alias to an existing type with a different name, i.e. name that matches your business logic.
CREATE DOMAIN us_postal_code AS TEXT;
us_postal_code is more semantic than some abstract text or varchar.
You can make your own operators. For example, the addition of complex numbers (we defined the complex type above):
-- , , SQL CREATE OR REPLACE FUNCTION sum_complex(x COMPLEX, y COMPLEX) RETURNS COMPLEX AS $$ SELECT xr + yr, xi + yi; $$ language sql; -- "" CREATE OPERATOR + ( PROCEDURE = sum_complex, LEFTARG = COMPLEX, RIGHTARG = COMPLEX );
Let's do some spherical example in vacuum. Create RUR and USD types, and a rule for converting one type to another. Since I don’t know si well, for example let's make a simple composite type:
CREATE TYPE USD AS ( sum FLOAT ); CREATE TYPE RUR AS ( sum FLOAT ); -- ( 60, ) CREATE FUNCTION usd2rur(value USD) RETURNS RUR AS $$ SELECT value.sum * 60.0; $$ LANGUAGE SQL; -- , "". CREATE CAST ( USD AS RUR ) WITH FUNCTION usd2rur(USD) AS ASSIGNMENT;
Actually, that's all, now you can use. How much will there be 100 bucks in rubles?
select '(100.0)'::usd::rur;
The result will be:
rur -------- (6000) (1 row)
There are extensions where data types are described and everything that is needed for them. For example, the ip4r
extension describing types for IP addresses and their ranges.
If you look at the sources https://github.com/RhodiumToad/ip4r/blob/master/ip4r--2.2.sql , you will see that the extension is simply, in fact, a set of CREATE TYPE
, CREATE OPERATOR
, CREATE CAST
and etc.
The rules for indexing are described. For example, the type ip4r
(range of IP addresses) can be indexed by the GIST index by the &&
operator (and others). Thus, you can make a table to search for cities by IP.
Or, for example, there is an extension uri
, which makes a type in which you can store your link so that you can easily draw out a schema or a host from it (I have not tried it in production yet, only I plan).
In addition to the standard btree
there are others: GIN
(can be used for some array operations, for jsonb, for full-text search), GIST
, brin
, etc.
There are situations when you have 10 million rows in a table, and only 100 of them, for example, in the status “Payment is being processed”. And you constantly pull this status "handled" somehow like this: select ... where status = 2
.
It is clear that we need an index here. But such an index will take up a lot of space, while really you need a very small part of it.
In the post index, you can make an index not for the entire table, but for the rows defined by a given condition:
CREATE INDEX my_money_status_idx on my_money(status) WHERE status = 2;
This index will work well on select * from my_money where status = 2
requests and at the same time take up little space.
In the post index, you can do indices not in one column, but in any expression. For example, you can index the first name with the last name:
CREATE INDEX people_names ON people ((first_name || ' ' || last_name));
And then such a request will quickly work:
SELECT * FROM people WHERE (first_name || ' ' || last_name) = 'John Smith';
In addition to standard UNIQUE and NOT NULL, you can also do other integrity checks in the database. In the domain type, you can write check:
CREATE DOMAIN us_postal_code AS TEXT CHECK( VALUE ~ '^\d{5}$' OR VALUE ~ '^\d{5}-\d{4}$' );
which checks that only 5 digits or 5 digits, a hyphen and 4 digits will be in the us_postal_code column. Of course, you can write here not only regular, but also any other conditions.
Also check can be written in the table:
CREATE TABLE users ( id integer, name text, email text, CHECK (length(name) >= 1 AND length(name) <= 300) );
Those. The name must have at least one character, and not more than 300.
Generally speaking, the types themselves are also a kind of restriction, an additional check that the base does. For example, if you have a type complex (see above), consisting essentially of two numbers, then you will not accidentally insert a string there:
INSERT INTO math (result) VALUES ((0.5, '')::complex); ERROR: invalid input syntax for type double precision: ""
Thus, sometimes a composite type may be preferable to jsonb, because in json you can cram anything at all.
In contrast to the simple uniqueness of UNIQUE or PRIMARY KEY, you can make uniqueness among the specific rowset specified by the condition. For example, email should be unique among undeleted users:
CREATE UNIQUE INDEX users_unique_idx ON users(email) WHERE deleted = false;
Another funny thing: you can make uniqueness not by one field, but by any expression. For example, you can make it so that in the table the sum of two columns will not be repeated:
CREATE TABLE test_summ ( a INT, b INT ); CREATE UNIQUE INDEX test_summ_unique_idx ON test_summ ((a + b)); INSERT INTO test_summ VALUES (1, 2); INSERT INTO test_summ VALUES (3, 0); --
The EXCLUDE keyword allows you to make it so that when you insert / update a string, this string will be compared with others for a given operator. For example, a table containing non-overlapping IP ranges (checked by the intersection operator &&
):
CREATE TABLE ip_ranges ( ip_range ip4r, EXCLUDE USING gist (ip_range WITH &&) );
In general, the usual UNIQUE is, in essence, EXCLUDE with the =
operator.
Stored procedures can be written in SQL, pl / pgsql, javascript, (pl / v8), python, etc. For example, it is possible in the language R to calculate some statistics and return from it a graph with the result.
This is a separate big topic, I advise you to look for the report of Ivan Panchenko on this subject.
It will be in MySQL 8, but anyway, let's briefly dwell on this.
CTE is easy. You take a piece of the request and make it separately under some name.
WITH subquery1 AS ( SELECT ... -- . ), subquery2 AS ( SELECT ... -- , ) SELECT * -- FROM subquery1 JOIN subquery 2 ON ...
From the point of view of query optimization, it is necessary to take into account that each such CTE subquery is executed separately. This can be either a plus or a minus.
For example, if you have 20 joins with subqueries and groupings, the query planner may not understand your intentions and the query plan will be suboptimal. Then you can make a part of the query in the cte-subquery, and the rest is already filtered in the main query.
And vice versa, if you decide to simply submit part of the request to the CTE for readability, then sometimes this can go sideways for you.
In CTE, you can use not only SELECT queries, but also UPDATE.
Example: update users with age> 20 years, and in the same request to give the names of the updated along with some country there.
with users_updated AS ( UPDATE users SET status = 4 WHERE age > 20 RETURNING id ) SELECT name, country FROM users JOIN countries ON users.country_id = countries.id WHERE id IN ( SELECT id FROM users_updated );
But here we must understand that sometimes with the help of CTE you can shoot yourself well in the foot.
Such a request is syntactically correct, but the meaning is complete nonsense:
WITH update1 AS ( UPDATE test SET money = money + 1 ), update2 AS ( UPDATE test SET money = money - 1 ) SELECT money FROM test;
It seems that we have added a ruble, then we have taken away the ruble, and it should remain as it is.
But the fact is that when they are executed, update1 and update2 will take the initial version of the table, i.e. in fact, it turns out that one update will overwrite the changes of the other. Therefore, with the update inside the CTE, you need to know exactly what you are doing and why.
I have already written about window functions in detail here: https://habrahabr.ru/post/268983/ . Window functions also promise in MySQL 8.
For aggregate functions (for example, COUNT or SUM), you can add a FILTER condition, i.e. aggregate not all strings, but only limited by some expression:
SELECT count(*) FILTER (WHERE age > 20) AS old, count(*) FILTER (WHERE age <= 20) AS young FROM users;
Those. we counted people over twenty and those over twenty.
Everyone knows that psql has commands for viewing different objects, for example, \d
, \dt+
, etc.
There is a special command called \watch
. Those. you execute the query, then write\watch 5
and your request will be executed every 5 seconds until canceled.
This works not only with select, but also with any other, for example with update (for example, when you need to update a large table slowly by a bit).
It's like a View, only cached (materialized). The cache can be updated using the REFRESH MATERIALIZED VIEW command. There is also the CONCURRENTLY keyword so that Postgres does not lock SELECT queries when updating.
I haven’t tried it in production yet, so I don’t know if this is applicable in practice (if someone used, share your experience in the comments). The bottom line is that you can subscribe to some event, and you can also notify subscribers that an event has occurred, by passing a string with add. information.
The Foreign Data Wrappers mechanism allows you to use some external data as simple tables. Those. For example, you can zajdoynit postgresovuyu table, muscular table, and csv file.
SEQUENCE is the latest version of the MySQL th AUTO_INCREMENT. Unlike MySQL, sequence can exist separately from the tables or vice versa, "tick" for several tables at once. You can set various parameters, such as increment size, looping, and so on.
This is the tip of the iceberg, in fact. There are still a lot of nuances that are not mentioned in the article at all, because no article is enough for everything. For only one stored procedure, you can write a book. Or look, for example, the full list of sql-commands of the current version: https://www.postgresql.org/docs/9.6/static/sql-commands.html
The main thing that I wanted to show in the article: despite its high-level nature, PostgreSQL is a very old DBMS, in which there is a lot of things, and which is very well expanded. Therefore, when switching to it from MySQL, it is recommended to look through the manual, read articles, etc.
Source: https://habr.com/ru/post/331460/
All Articles