📜 ⬆️ ⬇️

Why PostgreSQL is better than other open source SQL databases. Part 2

Friends, we present to your attention the second part of the translation “How is PostgreSQL better?”. Hopefully, it will cause the same heated discussion in the comments as the first part. And also with pleasure we will continue the discussion with you personally at PG Day'16 Russia , to which there is very little left!

The PostgreSQL slogan states that it is "The most advanced open source database in the world." In the first part of this series, we looked at data storage — the model, structures, types, and size limits — to give you several reasons why Postgres confirms his words with a deed. In the second part, we’ll talk about data manipulation and searching, including indexing, virtual tables and query capabilities. In this series, we find out what distinguishes PostgreSQL from other open source databases, namely from MySQL, MariaDB and Firebird.



Indexing


Postgres offers indexing options that other open source databases do not have. In addition to standard indexes, it supports partial indexes, functional indexes, GiST and GIN indexes. Let's take a look at some of them in more detail.
')
Partial indexes

Partial indexes can be created when you want to index only a single subset of the table. For example, only rows, where the values ​​in the columns correspond to certain conditions. This winning feature allows you to maintain adequate index sizes, which will improve performance and reduce disk space. A key aspect of partial indexes is that the column being indexed may differ from the columns by which conditions are defined. For example, you want to index only the accounts of paying users, and not those that were created for internal testing:

--       CREATE INDEX paying_accounts_idx ON accounts (account_id) WHERE account_type <> 'test'; 

It is important to note that sometimes in MySQL the term “partial index” is used to refer to the truncation of indexed values ​​to a certain number of bytes, and not to limit the indexed rows based on a condition. Partial indexes in the form described by us are not supported in MySQL.

Functional Indexes

Functional indexes (or expression-based indexes) can be created using any function to precompute a column for indexing. New values ​​are indexed and treated as constants for query execution, rather than being calculated whenever a query is run. For example, if you have a webclik of clicks that collects clicks on URLs in whatever format they come in, you might want to create an index that cites lower case for data normalization (PostgreSQL is case sensitive: compose.io and Compose. io will be considered different results):

 --    URL    CREATE INDEX webhits_lower_urls_idx ON webhits (lower(url)); 

GIST and GIN (as well as BRIN!)

GiST (Generalized Search Tree, Generalized Search Tree) allows you to combine B-tree, R-tree and user-defined types of indexes to create an individual index with advanced query capabilities. GiST is used in PostGIS (which we have made standard for all PostgreSQL installations since January) and OpenFTS (an open-source full-text search engine). Postgres also supports SP-GiST, which allows you to create partitioned search indexes for an incredibly fast search.

GIN (Generalized Inverted Index, Generalized Inverted Index) allows you to index composite data types, which make it possible to combine other data types in different ways to create something completely individual. A detailed description of the composite data types can be found in the first part of this series.

The syntax for creating GIST and GIN indexes is as follows: CREATE INDEX ... ON ... USING GIST | GIN .... Very simple!

In PostgreSQL 9.5, BRIN (Block Range Index) was introduced, which allows you to split large tables into ranges based on a column for indexing. This means that the query planner can only scan the range specified in the query. Also, when indexing ranges, the disk space required for indexing will be significantly less than with the standard B-Tree index.

For comparison

The other SQL databases we are considering narrowing the gap when it comes to functional indexes. MySQL 5.7.6 introduced the generated columns that can be used as functional indexes. In MariaDB, virtual (also known as "generated" or "calculated") columns appeared in version 5.2, but only support the use of built-in functions to create columns (there are no user-defined functions). In version 2.0, Firebird introduced expression indexing using computed columns. However, none of these databases support partial, GiST or GIN indices. In addition, we mentioned in the first part that native JSON data types cannot be indexed in these databases.

When you set up all the indexes and want to analyze their performance, do not forget to read Matt Barr's article on mySidewalk, “Simple Index Check in PostgreSQL” .

Virtual table functions


Virtual tables are required for many queries. All SQL databases we compare offer some kind of virtual table functionality. PostgreSQL can give you more.

CTEs and recursion

Postgres support Common Table Expressions (CTE) using a WITH expression. We demonstrated this feature in a PostgreSQL - Series, Random and With article. CTEs allow you to create virtual tables right in your query, expressing a logical sequence of operations. Thus, they are much easier to read and test than virtual tables created with subqueries somewhere else in the query. PostgreSQL CTEs can also be used recursively. This convenient feature allows you to go through a hierarchy with a query that repeatedly refers to itself, until there are more data levels that can be returned. Here is an example of a recursive CTE that identifies the levels, themes, and parental relationships in the systematics of a theme:

 --    CTE WITH RECURSIVE topic_taxonomy_recursive (level, parent_topic_name, topic_name) AS ( SELECT 1, tt.parent_topic_name, tt.topic_name FROM topic_taxonomy tt WHERE tt.parent_topic_name = 'All Topics' UNION ALL SELECT ttr.level + 1, tt.parent_topic_name, tt.topic_name FROM topic_taxonomy_recursive ttr, topic_taxonomy tt WHERE ttr.topic_name = tt.parent_topic_name ) SELECT level, parent_topic_name, topic_name FROM topic_taxonomy_recursive; 

MySQL and MariaDB do not use the WITH clause and, therefore, do not formally support CTE. Although you can create derived tables in these databases using subqueries, they do not allow recursion. In addition, despite the fact that the query optimizer in MySQL has been improved since the release of version 5.6, subqueries in this database are known for their problems and can significantly affect performance. Firebird is ahead of MySQL and MariaDB in this question and coincides in functionality with Postgres: it supports CTE with the help WITH and provides the possibility of recursion.

Materialized views

Materialized views are another handy feature of virtual tables that PostgreSQL supports. They, like regular views, represent the result of a query that you will use frequently, but the difference is that the result is stored on disk like a regular table. Materialized views can be indexed. In addition, unlike normal views, which are recreated every time they are called, views with a stored result are fixed in time. They are not updated unless intentionally done. This can significantly increase the speed with which queries are made using materialized views. Instead of using standard views or the need to perform complex table joins or to perform grouping functions in a query, use materialized views, where all the necessary data is already prepared and waiting on the disk. When you need to update data in a materialized view with a stored result, this can be done on demand using the REFRESH command. Here is an example of a materialized view that gives summary of revenue:

 --   ,      CREATE MATERIALIZED VIEW aggregatedMonthlyRevenue (year, month, total_revenue) AS ( SELECT date_part('year', date) AS year, date_part('month', date) AS month, SUM(revenue) AS total_revenue FROM revenue WHERE date >= '2014-01-01' GROUP BY date_part('year', date), date_part('month', date) ORDER BY date_part('year', date), date_part('month', date) ); --  ,   REFRESH MATERIALIZED VIEW aggregatedMonthlyRevenue; 

Firebird, MySQL and MariaDB do not support materialized views, although you can use a kind of workaround in these databases by creating a regular table and using a stored procedure or trigger to update it as needed.

Query capabilities


Postgres query capabilities are extensive.

We already talked a little about WITH in the previous section. Let's look at a couple more extra functions that can be used in SELECT queries.

Join requests

PostgreSQL provides UNION, INTERSECT, and EXCEPT conditions for interaction between SELECT queries. UNION will add the results of the second SELECT query to the results of the first. INTERSECT returns only those rows that match both SELECT queries. EXCEPT returns only those rows from the first SELECT query that do not match the rows from the second SELECT query. Let's look at an example using EXCEPT, where we want to return user contact information, except when the user received an email during the last week and answered it.

 /*      ,        */ SELECT c.lastName, c.firstName, c.email FROM customers c EXCEPT SELECT e.lastName, e.firstName, e.email FROM email_log e WHERE e.email_date > current_date - interval '7 days' AND e.email_action_date > current_date - interval '7 days' AND email_action_type = 'response'; 

Although MySQL, MariaDB, and Firebird support UNION, none of them supports INTERSECT or EXCEPT. However, using the union and EXISTS clause in the query, you can get the same result as in PostgreSQL. But the request will be more complicated.

Window functions

Window functions that are aggregate functions on top of some lines of the result (providing a “window” into a subset) can be extremely useful. In essence, they allow iteration through the lines in the sections that relate to the current line to perform the function. Standard features include ROW_NUMBER (), RANK (), DENSE_RANK () and PERCENT_RANK (). The keyword OVER, optionally used with PARTITION BY and ORDER BY, indicates that a window function is being used. As an example, in the “Functions and Not Only” section below, we used a window function with ROW_NUMBER () OVER ... to determine the median in a series of numeric values. Note that the WINDOW clause in queries with window functions is not mandatory, but allows you to create and name windows to preserve order.

Firebird, MySQL and MariaDB currently do not support window functions, although they were announced a few years ago when planning Firebird 3.

Lateral nested queries

The LATERAL keyword can be applied to subqueries in the FROM clause to add cross references between the subquery and other tables or virtual tables that were created before it. This way you can write more simple queries. This works in such a way that each row is evaluated against a cross-referenced table, which may mean an improvement in speed during the execution of a query. Let us give an example in which we want to get a list of students and information about whether they have recently read anything about technology:

 --    LATERAL    SELECT s.firstName, s.LastName, x.topic_name FROM students s JOIN content_log c ON c.student_id = s.id LEFT OUTER JOIN LATERAL ( SELECT t.topic_name FROM content_topics t WHERE t.parent_topic_name = 'Technology' AND t.id = c.topic_id AND c.date > current_date - interval '30 days' ) x ON true; 

MySQL, Firebird and MariaDB currently do not support lateral subqueries. Again, you can find workarounds, but this will make requests more complex.

And one more thing to note: MySQL and MariaDB do not support FULL OUTER JOIN, but you can use a workaround using UNION ALL to merge all the rows of two tables.

Functions and not only


PostgreSQL provides robust built-in operators and functions, including those that support specialized data types, discussed in detail in part one of this series. In addition, it allows you to create your own operators and functions (including aggregates), as well as stored procedures and triggers. We will not be able to examine them all in detail, since the topic is too extensive (!), But let's analyze a few simple examples of functions.

Postgres supports 4 types of user-defined functions: query language, procedural language, C language and internal. Each type can take and return both basic and composite data types. Note that in PostgreSQL, the CREATE FUNCTION command is not only used to create functions, but also stored procedures.

Let's look at an example of creating a function that returns a composite data type:

 --       "datetext" CREATE TYPE datetext AS ( date date, date_as_text text ); /*  ,           datetext */ CREATE FUNCTION show_date_as_text(date) RETURNS datetext -- this is our composite type AS $$ SELECT $1, to_char($1, 'FMMonth FMDD, YYYY') $$ LANGUAGE SQL; --     SELECT show_date_as_text('2015-01-01'); -- : (2015-01-01,"January 1, 2015") 

And here is an example of a real function for finding the median in a series of numerical data:

 --  ,        CREATE FUNCTION median(numeric[]) RETURNS numeric AS $$ SELECT AVG(x.result) FROM ( SELECT result, ROW_NUMBER() OVER (ORDER BY val) as ra, ROW_NUMBER() OVER (ORDER BY val DESC) as rd FROM unnest($1) result -- notice the use of array "unnest" ) AS x WHERE x.ra BETWEEN x.rd - 1 AND x.rd + 1; $$ LANGUAGE SQL; --     SELECT median(ARRAY[1,2,3,4,5,6,7]); -- : 4 

Although the other open source SQL databases involved in this comparison also allow you to create your own functions, stored procedures, and triggers, they do not have the diversity of data types and customization options like Postgres. In addition, PG allows you to create your own operators. Other compared databases do not support user-defined operators.

Postgres is uniquely customizable and has no equal among MySQL, MariaDB and Firebird.

Language extensions


PostgreSQL has many language extensions. Some of them are part of the distribution, and many others are available through third-party developers.

At Compose, we only support trusted language extensions for PostgreSQL to ensure the security of your installations. We added support for PL / Perl in February and PL / v8 (a procedural language based on JavaScript) - in August. These language extensions, which have more built-in functions than the SQL-based PL / pgSQL built-in language (also available in Compose installations), allow you to create sophisticated scripts for manipulating and processing data on the server.

Summarizing


PostgreSQL is extremely rich in functionality, with many built-in features and countless ways to customize and extend them to meet your needs. Add to this the generally accepted reliability and maturity, and it will become clear why this database solution is worth the effort of any large enterprise. At the same time, it remains affordable and effective for small projects as well.

Despite the fact that we told you only about a small number of features that make Postgres stand out against other open source SQL solutions, in fact there are many more of them (and even more appeared in version 9.5!). We hope that this series of two articles provided a convincing overview of the reasons why you should choose PostgreSQL.

Source: https://habr.com/ru/post/302160/


All Articles