
Hello, habracheloveki! Many of you have come up with rendering business logic in a DBMS in the form of stored functions / procedures, facilitating the client. This has both advantages and disadvantages. Today I would like to tell you how to create stored functions in
PostgreSQL written in the C language. The article will contain the very basics you need to know to get started with them.
Description of user functions
Currently,
PostgreSQL allows you to define custom functions of the following types:
- SQL functions
- C functions
- functions in procedural languages ( plpgsql , pltcl , plperl , etc.)
SQL functions are functions in the body of which there is one or more
SQL queries, with the return result being the result of the last query. However, if the return result is not
void ,
INSERT ,
UPDATE , or
DELETE with the
RETURNING clause is allowed.
C functions are statically and dynamically loaded. Statically loaded (also called internal functions) are created on the server during database cluster initialization, dynamically loaded - loaded by the server on demand.
Functions in procedural languages require the creation of appropriate extensions, and some of the languages can be of two types - trusted and not trusted (for the latter there is no possibility to limit user actions). In the basic delivery of
PostgreSQL go
plpgsql ,
pltcl ,
plperl and
plpython , a list of other languages can be found
here . Extensions for procedural languages are created via
SQL :
CREATE EXTENSION pltcl;
Or via the console (plpython is available only in a non-trusted form):
createlang plpythonu
Dynamically loadable C functions
Dynamically loaded functions on C are contained in dynamically loaded (or shared) libraries, which are loaded when the function is first called. An example of creating such a function:
CREATE OR REPLACE FUNCTION grayscale ( r double precision, g double precision, b double precision ) RETURNS double precision AS 'utils', 'grayscale' LANGUAGE C STRICT;
This example creates the function
grayscale (which has three
float parameters and returns a
float ), which is located in the
utils dynamic library. The
STRICT keyword is used so that the function returns
NULL if at least one of the arguments is
NULL. If the absolute path is not specified, the directory specified in the variable
dynamic_library_path is implied, whose value can be viewed like this:
SELECT current_setting ( 'dynamic_library_path' );
Moreover, if the value of a variable or the path to the dynamic library begins with
$ libdir , then
$ libdir is replaced with the path to the directory containing the
PostgreSQL libraries, which can be found using the console command:
pg_config --pkglibdir
Since the library is loaded with the rights of the user under which the
PostgreSQL daemon is started (as a rule, it is
postgres ), this user must have access rights to the library.
')
For functions, there are two types of convention:
version 0 (obsolete) and
version 1 . The functions of
version 0 are not portable and have limited functionality, therefore, the functions of
version 1 are implied below. To indicate that we are using
version 1 , the function, before defining it, must be marked with a special macro:
PG_FUNCTION_INFO_V1(grayscale);
Dynamic library structure
Each library must have a magic block (one, regardless of the number of files), in order to be able to detect inconsistencies, for example, the older version of
PostgreSQL server and the version of
PostgreSQL with which the library is compiled. This block is declared like this:
#include <fmgr.h> #ifdef PG_MODULE_MAGIC PG_MODULE_MAGIC; #endif
If necessary, you can define the initialization function
_PG_init (having no parameters and returning
void ), which is called after loading the library, and the completion function
_PG_fini (having the same signature as
_PG_init ), which is called before unloading the library. Please note, the documentation states that currently the libraries are not being unloaded, therefore the
_PG_fini function
will never be called. Example functions:
void _PG_init() { createLog(); } void _PG_fini() { destroyLog(); }
Functions in the library have a specific form, for parameters, receiving arguments, returning the result and some other operations, special macros are provided (described in more detail below):
Datum grayscale(PG_FUNCTION_ARGS) { float8 r = PG_GETARG_FLOAT8(0); float8 g = PG_GETARG_FLOAT8(1); float8 b = PG_GETARG_FLOAT8(2); PG_RETURN_FLOAT8(0.299 * r + 0.587 * g + 0.114 * b); }
Data types
The basic data types used in functions are divided into three types:
- fixed length passed by value
- with fixed length, passed by pointer
- variable length transmitted by pointer
Types of the first type can be 1, 2 or 4 bytes in size (or 8, if
sizeof (Datum) is 8 on your platform). When defining your own types (for example, via
typedef ), you must be sure that they are the same size on all architectures.
Types with a fixed length, passed to the pointer, are structures. To allocate memory for them (and types of the third kind), it is necessary using
palloc , for example:
typedef struct { float r, g, b, a; } Color; Color *color = (Color*)palloc(sizeof(Color));
For types of the third type, it is necessary to define a field (4 bytes) for storing the size of the entire type (data size + field size) and, in fact, the data itself, which is continuously located beyond this field. This can be done using the structure of the form:
typedef struct { int32 length; char data[1]; } text;
The value of a field with a type size is set implicitly using the
SET_VARSIZE macro. Other macros for working with variable length types passed by the pointer:
char data[10]; ... text *string = (text*)palloc(VARHDRSZ + 20);
The correspondence between types in C and
SQL functions is indicated in
this table.
Regarding the types passed by the pointer, it should be noted that the data pointed to by this pointer cannot be changed, since this may be data that is directly on the disk, which can lead to damage to it. The consequences will not be very happy.
Function structure
The signature of the function should have the following form:
Datum grayscale(PG_FUNCTION_ARGS);
Datum is a special type for the return value of a function, essentially a
typedef pointer. The
PG_FUNCTION_ARGS macro
is expanded into a pointer to a structure containing meta information about the function parameters: the context of the call, whether the value is
NULL, and so on. The function arguments are accessed using the
PG_GETARG_ * macros:
float8 r = PG_GETARG_FLOAT8(0);
If the
SQL function is declared without
STRICT , you can use PG_ARGISNULL to check if the value of the argument is
NULL . You can return the result of a function as
NULL , via
PG_RETURN_NULL . As an example, let's see how the implementation of a function without
STRICT looks like:
Datum grayscale(PG_FUNCTION_ARGS) { if (PG_ARGISNULL(0) || PG_ARGISNULL(1) || PG_ARGISNULL(2)) { PG_RETURN_NULL(); } float8 r = PG_GETARG_FLOAT8(0); float8 g = PG_GETARG_FLOAT8(1); float8 b = PG_GETARG_FLOAT8(2); PG_RETURN_FLOAT8(0.299 * r + 0.587 * g + 0.114 * b); }
Function example

Now, knowing how to write a stored function in C, let's summarize and look at an example. The environment we will have is:
- operating system: Ubuntu 12.10
- PostgreSQL version: 9.3
- compiler: gcc 4.7.2
Create a utils.c file with the following contents:
#include <postgres.h> #include <fmgr.h> #ifdef PG_MODULE_MAGIC PG_MODULE_MAGIC; #endif PG_FUNCTION_INFO_V1(grayscale); Datum grayscale(PG_FUNCTION_ARGS) { float8 r = PG_GETARG_FLOAT8(0); float8 g = PG_GETARG_FLOAT8(1); float8 b = PG_GETARG_FLOAT8(2); PG_RETURN_FLOAT8(0.299 * r + 0.587 * g + 0.114 * b); }
Next, let's compile utils.c into an object file, while it should be with
position-independent code (for
gcc , this is the
fpic option):
cc -I/usr/local/pgsql/include/server -fpic -c utils.c
The
pg_config --includedir-server command
will prompt the location of the directory with header files. Linking the object file as a dynamic library (if everything is in order, we should have the
utils.so dynamic library, copy it to
/ usr / local / pgsql / lib ):
cc -shared -L/usr/local/pgsql/lib -lpq -o utils.so utils.o
Now, connect to our database and create the function
grayscale_c in it, specifying some options:
CREATE OR REPLACE FUNCTION grayscale_c ( r double precision, g double precision, b double precision ) RETURNS double precision AS 'utils', 'grayscale' LANGUAGE C STRICT VOLATILE COST 100;;
Check its performance:
SELECT grayscale_c ( 0.6, 0.5, 0.5 );
But that is not all. Let's compare this function with a similar one, but on plpgsql.
Let's call it
grayscale_plpgsql :
CREATE OR REPLACE FUNCTION grayscale_plpgsql ( r double precision, g double precision, b double precision ) RETURNS double precision AS $BODY$ BEGIN RETURN 0.299 * r + 0.587 * g + 0.114 * b; END $BODY$ LANGUAGE plpgsql STRICT VOLATILE COST 100;
And we will do some test:
CREATE TABLE color AS SELECT random () AS r, random () AS g, random () AS b FROM generate_series ( 1, 1000000 ); SELECT grayscale_c ( r, g, b ) FROM color;
A small check:
SELECT * FROM color WHERE grayscale_c ( r, g, b ) != grayscale_plpgsql ( r, g, b );
Very good result.
As we have seen, the creation of dynamically loaded functions on C is not such a complicated thing. Difficulties, as a rule, are hidden in their implementation.
PS Thank you for your attention.
References: