📜 ⬆️ ⬇️

Stored Functions on C in PostgreSQL


Hello, habracheloveki! Many of you have come up with rendering business logic in a DBMS in the form of stored functions / procedures, facilitating the client. This has both advantages and disadvantages. Today I would like to tell you how to create stored functions in PostgreSQL written in the C language. The article will contain the very basics you need to know to get started with them.

Description of user functions


Currently, PostgreSQL allows you to define custom functions of the following types:

SQL functions are functions in the body of which there is one or more SQL queries, with the return result being the result of the last query. However, if the return result is not void , INSERT , UPDATE , or DELETE with the RETURNING clause is allowed.

C functions are statically and dynamically loaded. Statically loaded (also called internal functions) are created on the server during database cluster initialization, dynamically loaded - loaded by the server on demand.

Functions in procedural languages ​​require the creation of appropriate extensions, and some of the languages ​​can be of two types - trusted and not trusted (for the latter there is no possibility to limit user actions). In the basic delivery of PostgreSQL go plpgsql , pltcl , plperl and plpython , a list of other languages ​​can be found here . Extensions for procedural languages ​​are created via SQL :
CREATE EXTENSION pltcl; --    pltcl CREATE EXTENSION pltclu; --     pltcl 

Or via the console (plpython is available only in a non-trusted form):
 createlang plpythonu 

Dynamically loadable C functions


Dynamically loaded functions on C are contained in dynamically loaded (or shared) libraries, which are loaded when the function is first called. An example of creating such a function:
 CREATE OR REPLACE FUNCTION grayscale ( r double precision, g double precision, b double precision ) RETURNS double precision AS 'utils', 'grayscale' LANGUAGE C STRICT; 

This example creates the function grayscale (which has three float parameters and returns a float ), which is located in the utils dynamic library. The STRICT keyword is used so that the function returns NULL if at least one of the arguments is NULL. If the absolute path is not specified, the directory specified in the variable dynamic_library_path is implied, whose value can be viewed like this:
 SELECT current_setting ( 'dynamic_library_path' ); 

Moreover, if the value of a variable or the path to the dynamic library begins with $ libdir , then $ libdir is replaced with the path to the directory containing the PostgreSQL libraries, which can be found using the console command:
 pg_config --pkglibdir 

Since the library is loaded with the rights of the user under which the PostgreSQL daemon is started (as a rule, it is postgres ), this user must have access rights to the library.
')
For functions, there are two types of convention: version 0 (obsolete) and version 1 . The functions of version 0 are not portable and have limited functionality, therefore, the functions of version 1 are implied below. To indicate that we are using version 1 , the function, before defining it, must be marked with a special macro:
 PG_FUNCTION_INFO_V1(grayscale); 


Dynamic library structure


Each library must have a magic block (one, regardless of the number of files), in order to be able to detect inconsistencies, for example, the older version of PostgreSQL server and the version of PostgreSQL with which the library is compiled. This block is declared like this:
 #include <fmgr.h> #ifdef PG_MODULE_MAGIC PG_MODULE_MAGIC; #endif 

If necessary, you can define the initialization function _PG_init (having no parameters and returning void ), which is called after loading the library, and the completion function _PG_fini (having the same signature as _PG_init ), which is called before unloading the library. Please note, the documentation states that currently the libraries are not being unloaded, therefore the _PG_fini function will never be called. Example functions:
 void _PG_init() { createLog(); } void _PG_fini() { destroyLog(); } 

Functions in the library have a specific form, for parameters, receiving arguments, returning the result and some other operations, special macros are provided (described in more detail below):
 Datum grayscale(PG_FUNCTION_ARGS) { float8 r = PG_GETARG_FLOAT8(0); float8 g = PG_GETARG_FLOAT8(1); float8 b = PG_GETARG_FLOAT8(2); PG_RETURN_FLOAT8(0.299 * r + 0.587 * g + 0.114 * b); } 


Data types


The basic data types used in functions are divided into three types:

Types of the first type can be 1, 2 or 4 bytes in size (or 8, if sizeof (Datum) is 8 on your platform). When defining your own types (for example, via typedef ), you must be sure that they are the same size on all architectures.

Types with a fixed length, passed to the pointer, are structures. To allocate memory for them (and types of the third kind), it is necessary using palloc , for example:
 typedef struct { float r, g, b, a; } Color; Color *color = (Color*)palloc(sizeof(Color)); 

For types of the third type, it is necessary to define a field (4 bytes) for storing the size of the entire type (data size + field size) and, in fact, the data itself, which is continuously located beyond this field. This can be done using the structure of the form:
 typedef struct { int32 length; char data[1]; } text; 

The value of a field with a type size is set implicitly using the SET_VARSIZE macro. Other macros for working with variable length types passed by the pointer:
 char data[10]; ... text *string = (text*)palloc(VARHDRSZ + 20); // VARHDRSZ -      SET_VARSIZE(string, VARHDRSZ + 20); // SET_VARSIZE -    memcpy(VARDATA(string), data, 10); // VARDATA -     

The correspondence between types in C and SQL functions is indicated in this table.

Regarding the types passed by the pointer, it should be noted that the data pointed to by this pointer cannot be changed, since this may be data that is directly on the disk, which can lead to damage to it. The consequences will not be very happy.

Function structure


The signature of the function should have the following form:
 Datum grayscale(PG_FUNCTION_ARGS); 

Datum is a special type for the return value of a function, essentially a typedef pointer. The PG_FUNCTION_ARGS macro is expanded into a pointer to a structure containing meta information about the function parameters: the context of the call, whether the value is NULL, and so on. The function arguments are accessed using the PG_GETARG_ * macros:
 float8 r = PG_GETARG_FLOAT8(0); //     float8 int32 x = PG_GETARG_INT32(1); //     int32 text *msg = PG_GETARG_TEXT_P(2); //     text* 

If the SQL function is declared without STRICT , you can use PG_ARGISNULL to check if the value of the argument is NULL . You can return the result of a function as NULL , via PG_RETURN_NULL . As an example, let's see how the implementation of a function without STRICT looks like:
 Datum grayscale(PG_FUNCTION_ARGS) { if (PG_ARGISNULL(0) || PG_ARGISNULL(1) || PG_ARGISNULL(2)) { PG_RETURN_NULL(); } float8 r = PG_GETARG_FLOAT8(0); float8 g = PG_GETARG_FLOAT8(1); float8 b = PG_GETARG_FLOAT8(2); PG_RETURN_FLOAT8(0.299 * r + 0.587 * g + 0.114 * b); } 

Function example



Now, knowing how to write a stored function in C, let's summarize and look at an example. The environment we will have is:

Create a utils.c file with the following contents:
 #include <postgres.h> #include <fmgr.h> #ifdef PG_MODULE_MAGIC PG_MODULE_MAGIC; #endif PG_FUNCTION_INFO_V1(grayscale); Datum grayscale(PG_FUNCTION_ARGS) { float8 r = PG_GETARG_FLOAT8(0); float8 g = PG_GETARG_FLOAT8(1); float8 b = PG_GETARG_FLOAT8(2); PG_RETURN_FLOAT8(0.299 * r + 0.587 * g + 0.114 * b); } 

Next, let's compile utils.c into an object file, while it should be with position-independent code (for gcc , this is the fpic option):
 cc -I/usr/local/pgsql/include/server -fpic -c utils.c 

The pg_config --includedir-server command will prompt the location of the directory with header files. Linking the object file as a dynamic library (if everything is in order, we should have the utils.so dynamic library, copy it to / usr / local / pgsql / lib ):
 cc -shared -L/usr/local/pgsql/lib -lpq -o utils.so utils.o 

Now, connect to our database and create the function grayscale_c in it, specifying some options:
 CREATE OR REPLACE FUNCTION grayscale_c ( r double precision, g double precision, b double precision ) RETURNS double precision AS 'utils', 'grayscale' LANGUAGE C STRICT VOLATILE COST 100;; 

Check its performance:
 SELECT grayscale_c ( 0.6, 0.5, 0.5 ); -- : 0.5299 

But that is not all. Let's compare this function with a similar one, but on plpgsql. Let's call it grayscale_plpgsql :
 CREATE OR REPLACE FUNCTION grayscale_plpgsql ( r double precision, g double precision, b double precision ) RETURNS double precision AS $BODY$ BEGIN RETURN 0.299 * r + 0.587 * g + 0.114 * b; END $BODY$ LANGUAGE plpgsql STRICT VOLATILE COST 100; 

And we will do some test:
 CREATE TABLE color AS SELECT random () AS r, random () AS g, random () AS b FROM generate_series ( 1, 1000000 ); SELECT grayscale_c ( r, g, b ) FROM color; --   : 926  SELECT grayscale_plpgsql ( r, g, b ) FROM color; --   : 3679  

A small check:
 SELECT * FROM color WHERE grayscale_c ( r, g, b ) != grayscale_plpgsql ( r, g, b ); -- 0  

Very good result.

As we have seen, the creation of dynamically loaded functions on C is not such a complicated thing. Difficulties, as a rule, are hidden in their implementation.

PS Thank you for your attention.

References:

Source: https://habr.com/ru/post/196544/


All Articles