
Custom aggregates are one of those unique features of PostgreSQL that no one seems to understand. However, as soon as you create at least one truly working example, the clouds will open up, and you will be surprised how you once lived without such a solid functional. So let's create such a simple unit. It will return a state, that is, the most frequently occurring value, for a logical (boolean) field.
Who and why might want a strange one? Well, imagine that you monitor some of your web servers day and night, and want to have an hourly picture of uptime. Suppose every 30 seconds the server status is recorded in a table:
TRUE - the server is running,
FALSE - the server is down. Then if most of the time the server worked, we want to return
TRUE , and if for the most part the server was, then
FALSE , respectively. And if the monitoring system itself lies and, therefore, there is no data, we will return
NULL .
Of course, all this can be turned with the help of many different other mechanisms, for example using the
WINDOW mechanism. However, imagine that in one request you will need to work with other accumulated statistics, for example, downtime or server operation. In this case, PostgreSQL offers you an elegant mechanism.
To begin, we need a statistical function that will accumulate data on the Boolean field. Usually such a function has two input parameters:
- the parameter in which the calculated values will be stored (for this function will be called for each line);
- the type parameter of the column, which will be the value for the current row.
')
Suppose we want to store the number of UP and DOWN readings for a server. To do this, you can use an integer array.
With the same success it can be done using the composite type, approx. translator . It is easy to write such a function even in pure SQL:
CREATE OR REPLACE function mode_bool_state(int[], boolean) RETURNS int[] LANGUAGE sql as $body$ SELECT CASE $2 WHEN TRUE THEN array[ $1[1] + 1, $1[2] ] WHEN FALSE THEN array[ $1[1], $1[2] + 1 ] ELSE $1 END; $body$;
Note that the result of the function int [] will be served as the first parameter to the input of the same function when calling for the next line, approx. translator .
To make a decision and display the final result, create another function:
CREATE OR REPLACE FUNCTION mode_bool_final(INT[]) RETURNS boolean LANGUAGE sql as $body$ SELECT CASE WHEN ( $1[1] = 0 AND $1[2] = 0 ) THEN NULL ELSE $1[1] >= $1[2] END; $body$;
Things are easy - to announce the unit:
CREATE AGGREGATE mode(boolean) ( SFUNC = mode_bool_state, STYPE = INT[], FINALFUNC = mode_bool_final, INITCOND = '{0,0}' );
Here,
SFUNC and
FINALFUNC are the names of our functions,
STYPE is the data type for collecting statistics,
INITCOND is the initial conditions.
Let's see how it works!
SELECT server_name, sum(CASE WHEN server_up THEN 0.5 ELSE 0 END) as minutes_up, mode(server_up) as mode FROM servers WHERE montime BETWEEN '2013-04-01' and '2013-04-01 01:00:00';
server_name minutes_up mode
web1 56.5 TRUE
web2 0.0 FALSE
web3 48.0 TRUE
web4 11.5 FALSE
PS Tom Brown's English
article also tells how to create a custom aggregate. In it, the author does not use the final optional function
FINALFUNC , since the type for collecting data
STYPE in his example coincides with the base type of the aggregate.