Using functions in PostgreSQL as parameterized views

In daily work, there is often the challenge of clearly and simply referring to large lists of columns and expressions in a sample, and / or dealing with cumbersome and vague conditions in the where clause. Usually, representations are used for these purposes, which is quite convenient and intuitive. You can compare the query:

 select v.* from v_active_user vau, v_detailized_user v where v.id=vau.id

which is quite clearly perceived as "we take active users and get detailed information on them" and the same request, but, so to speak, in expanded form:

 select u.first_name, u.last_name, u.patronymic_name, format('%s %s %s', u.first_name, u.last_name, u.patronymic_name) as fio, birth_date, (extract(days from now() - birth_date)/365.25)::int as full_years_count, hire_date, (current_date - hire_date) as work_time, (select string_agg(format('%s %s', ch.first_name, ch.last_name),', ') from children ch where array[u.id] <@ array[ch.mother_id, ch.father_id]) as children, dep.id as dep_id, coalesce(dep.name, get_localized_message('  - ')) as dep_name .... .... as last_vaction_date, .... as salary_history from usr u join dep on u.dep_id=dep.id .... where u.state='active' and not exists(select * from pending_fires ...) and not exists(select * from usr_vacation uv where ...) and exists(...) and col123<>col321 ...

Requests of this type — with a large list of columns and expressions derived from them, with complex conditions and which in real life are often burdened by historical strata — are often completely unreadable and obscure. It is probably worth noting that the very change in the concept of “active” (for example, to remove or add remote workers or employees on maternity leave, etc.) can become not so nontrivial, but very tedious; yes and the number of errors it is unlikely to affect quite favorably; and changing the list of columns or just the expression entails similar consequences. Perhaps, it can be said that if for the tables the expression select * from table strictly unacceptable, then for representations of this type it is probably even preferable. Well for some, at least.

Consider another task. Suppose we have a simple table of users:

 create table usr( id serial primary key, name text, added timestamptz )

and friends table:

 create table friend( usr_id int references usr(id), friend_usr_id int references usr(id), primary key(usr_id, friend_usr_id) )

Required:
Get a specific user with a list of friends.

The solution is trivial:

 select u.id, array(select f.friend_usr_id from friend f where f.usr_id=u.id) as friends from usr u

Since this operation is required quite often, we create a representation for it:

 create view v_usr_with_friends as select u.id, array(select f.friend_usr_id from friend f where f.usr_id=u.id) as friends from usr u

Everything is good, but there is a new requirement: to get a user with a list of friends who are also friends of another user (for example, a viewer):

 select u.id, array(select f.friend_usr_id from friend f where f.usr_id=u.id and exists(select * from usr_friend f2 where f2.usr_id=another_usr_id and f2.friend_usr_id=f.friend_usr_id) ) as friends from usr u

Unfortunately, it is impossible to create a view based on this request - you cannot pass the second user ID as a parameter; but there is an opportunity to get around this restriction using Cartesian products

 create or replace view usr_with_common_friends as select u2.id as another_usr_id, u.id, array(select f.friend_usr_id from friend f where f.usr_id=u.id and exists(select * from friend f2 where f2.usr_id=u2.id and f2.friend_usr_id=f.friend_usr_id) ) as friends from usr u, usr u2

Using the resulting presentation is completely natural:

 select * from usr_with_common_friends where id=1 and another_usr_id=2

A new demand is coming: it is required to receive not just common friends, but common friends who have registered during the specified period of time. Since creating a table with all possible time intervals is not possible, you will have to create a function:

 create or replace function usr_with_common_friends_created_at_tr(tr tstzrange) returns table(another_usr_id int, id int, friends int[]) as $code$ select u2.id as another_usr_id, u.id, array(select f.friend_usr_id from friend f, usr u3 where f.usr_id=u.id and exists(select * from friend f2 where f2.usr_id=u2.id and f2.friend_usr_id=f.friend_usr_id) and u3.id=f.friend_usr_id and u3.added <@ tr ) as friends from usr u, usr u2; $code$ language sql **stable**

Use is also quite convenient:

 select * from usr_with_common_friends_created_at_tr(tstzrange(now() - make_interval(years:=1), now())) where id=1 and another_usr_id=2

It would seem that a query using this function will work straightforward - first, the function will return all possible strings, and then they will be filtered by condition. Let's get a look:

 explain select * from usr_with_common_friends_created_at_tr(tstzrange(now() - make_interval(years:=1), now())) where id=1 and another_usr_id=2

Plan:

 QUERY PLAN Nested Loop (cost=0.30..67.17 rows=1 width=8) -> Index Only Scan using usr_pkey on usr u (cost=0.15..8.17 rows=1 width=4) Index Cond: (id = 1) -> Index Only Scan using usr_pkey on usr u2 (cost=0.15..8.17 rows=1 width=4) Index Cond: (id = 2) SubPlan 1 -> Nested Loop (cost=19.44..50.82 rows=1 width=4) -> Hash Join (cost=19.29..30.01 rows=6 width=8) Hash Cond: (f.friend_usr_id = f2.friend_usr_id) -> Bitmap Heap Scan on friend f (cost=4.24..14.91 rows=11 width=4) Recheck Cond: (usr_id = u.id) -> Bitmap Index Scan on friend_pkey (cost=0.00..4.24 rows=11 width=0) Index Cond: (usr_id = u.id) -> Hash (cost=14.91..14.91 rows=11 width=4) -> Bitmap Heap Scan on friend f2 (cost=4.24..14.91 rows=11 width=4) Recheck Cond: (usr_id = u2.id) -> Bitmap Index Scan on friend_pkey (cost=0.00..4.24 rows=11 width=0) Index Cond: (usr_id = u2.id) -> Index Scan using usr_pkey on usr u3 (cost=0.15..3.46 rows=1 width=4) Index Cond: (id = f.friend_usr_id) Filter: (added <@ tstzrange((now() - '1 year'::interval), now()))

Surprisingly, this is not the case - the server was able to deploy the function directly to the request body. Yes, Postgresql in some cases is able to inject the function body directly into the request.
When does this happen?

Scalar functions:

The function is implemented in SQL ( LANGUAGE SQL ) as a simple select that returns a scalar type
Feature marked immutable or stable
The function does not contain subqueries.
Function not marked as security definer
The function has no specific set (n., set enable_seqscan=off , etc.)
The function returns only one column.
The returned type must match the type of the function.
And a number of restrictions (see the full list below)

This can be useful for encapsulating uncomplicated but cumbersome logic, for example:

 create or replace function is_system_catalog_table_name(r anyelement) returns boolean as $code$ select substring(r.relname from 1 for 3)='pg_' -- , ,      $code$ immutable language sql

Request:

 explain analyze select * from pg_class pc where is_system_catalog_table_name(pc)

Plan:

 Seq Scan on pg_class pc (cost=0.00..6955.59 rows=827 width=201) Filter: ("substring"((relname)::text, 1, 3) = 'pg_'::text)

As you can see, there is no function call here - the function code was inserted directly into the request body. This function can be considered as a kind of macro.

At the same time, I would like to pay attention to the compact syntax of the function call recording - the record is immediately passed as a parameter, and is not taken as a strictly specific type ( pg_class in this case), but as an arbitrary type with a relname column.

For example:

 create table t(id serial, relname text); insert into t(relname) select relname from pg_class limit 100; explain select * from t pc where is_system_catalog_table_name(pc) Seq Scan on t pc (cost=0.00..2.50 rows=1 width=25) Filter: ("substring"(relname, 1, 3) = 'pg_'::text)

Table functions have similar, but much softer limitations:

Function implemented in SQL ( LANGUAGE SQL )
immutable or stable function
Function not security definer
The function is not strict
No specific set
The function body contains a single select (and only select , insert / update / delete not allowed)
Return column types must match the types in the function declaration.
And some more specific limitations.

Thus, embedding the function body directly into a request in Postgres makes it possible to effectively implement the “presentation with parameters” that is absent in the standard, but nevertheless in demand and convenient.

Interestingly, in DB2 and SQL Server, the functions that are embedded in the query are also used to solve the "view with parameters" task.

References:
https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions
https://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0000935.html

Source: https://habr.com/ru/post/277335/

All Articles

Using functions in PostgreSQL as parameterized views

More articles: