How to use JSON restrictions when working with PostgreSQL

Earlier in the blog at Habré we talked about the development of our product - billing for telecom operators Hydra , and also addressed issues of working with infrastructure and the use of new technologies. For example, we looked at the advantages of Clojure and situations where it is worth and not worth using MongoDB .

Today we will talk about working with JSON, and in particular, the application of restrictions. Interesting material on this topic was published in his blog by developer Magnus Hagander (Magnus Hagander) - we present to you the main thoughts of this material.
')
Hagander writes that in the course of communication on the sidelines of one of the conferences, he wondered whether it was possible to use the advantages of SQL and NoSQL databases at the same time. In particular, the interlocutors asked the developer about the possibility of applying the extended limitations of the PostgreSQL DBMS. “If you think that this may work in a particular case, most likely it will be so,” Hagander is convinced.

However, if in the case of unique UNIQUE keys or checking constraints, everything is quite simple, then when working with exclusive constraints, everything is not so simple.

In addition to the technical side, the question certainly is, is it worth doing this at all? The more restrictions added to JSON data, the more structured they become. On the other hand, there are databases in which the main advantage is the presence of dynamic schemes, but they still require key indexes and restrictions (unlike PostgreSQL, where binary JSONB is unstructured even after indexing).

In PostgreSQL, keys and constraints can be defined both for columns and directly for any expressions, provided they are immutable - the result depends only on the input values, but not on the attributes that are external to the expression. This also applies to JSONB.

Consider a standard table containing JSON:

postgres=# CREATE TABLE jsontable (j jsonb NOT NULL); CREATE TABLE postgres=# CREATE INDEX j_idx ON jsontable USING gin(j jsonb_path_ops); CREATE INDEX

Of course, such a declaration of a table with only a JSONB field in practice very rarely turns out to be good ideas. Most often, in reality, there is more information, and you need more than one JSONB field - but for example we use this option.

By creating standard reverse indexes using jsonb_path_ops, you can get a fully unstructured index in JSONB as efficiently as possible. This index will not be used in the current example, however, in actual development it is one of the main reasons for using JSONB. Add some information in the form of registration records to demonstrate the limitations. For example, we use the semi-fixed schema. In addition, as the sort key, the user identifier UUID is used here - they usually do this:

 postgres=# INSERT INTO jsontable (j) VALUES ($${ "uuid": "4e9cf085-09a5-4b4f-bc99-bde2d2d51f41", "start": "2015-03-08 10:00", "end": "2015-03-08 11:00", "title": "test" }$$); INSERT 0 1

The first thing to look at is whether it is possible to verify the uniqueness of the uuid field. This identifier must be unique throughout the table. However, as practice shows, this requirement is not always complied with, which means that in order to ensure that there are no duplicates, it is necessary to use a restriction. It's pretty simple:

 postgres=# CREATE UNIQUE INDEX j_uuid_idx ON jsontable(((j->>'uuid')::uuid)); CREATE INDEX

Here, an index is created using the extracted UUID field value (using a unique index based on the B-tree). This index can be used both for searching by key, and for eliminating duplication of keys. Using the j->>'uuid' command, the text value of the uuid field is retrieved, then the ::uuid command is used to convert to the built-in type of unique identifiers.

In principle, you can directly create a restriction on a text field, but it is much more efficient to use UUIDs for this, since it is processed as a 128-bit integer.

 postgres=# INSERT INTO jsontable (j) VALUES ($${"uuid": "4e9cf085-09a5-4b4f-bc99-bde2d2d51f41", "start": "2015-03-08 11:00", "end": "2015-03-08 12:00", "title": "test2"}$$); ERROR: duplicate key value violates unique constraint "j_uuid_idx" DETAIL: Key (((j ->> 'uuid'::text)::uuid))=(4e9cf085-09a5-4b4f-bc99-bde2d2d51f41) already exists.

There remains one more problem - there is no check for the existence of this field. You can insert records into the table in which there simply will not be a UUID field. This is due to the fact that the - >> operator returns NULL by default, which does not cause a unique violation (since one NULL is not equal to another NULL). If you want to eliminate this drawback, you can implement the check constraint CHECK:

 postgres=# ALTER TABLE jsontable ADD CONSTRAINT uuid_must_exist CHECK (j ? 'uuid'); ALTER TABLE

With this restriction, it will no longer be possible to insert records into the table without a UUID field, and creating a unique index in the previous step ensures that there are no duplicates. Conversion to the UUID type allows you to ensure the correctness of the data format. This set of indices and constraints replicates the functionality of the classic column defined as uuid NOT NULL UNIQUE .

However, there are more complex exclusionary restrictions - that is why a registration record was chosen as an example.
The main idea here is the impossibility of the existence of overlapping entries in the table with the implemented restrictions. For ordinary relational databases, implementing such a mechanism is simple - you just need to create the usual exclusive restriction.

For JSON data, you can apply a constraint implementation method similar to the one above based on the extraction function. However, there is one problem - you can create restrictions for expressions only if they are unchanged , and the conversion of text to timestamps is changeable.

The reason for the transformation of text to timestamps is the conversion of values that depend on external values, for example:

 postgres=# SELECT 'today'::timestamp; timestamp --------------------- 2016-03-08 00:00:00 (1 row)

This is a good example of the variability of such a transformation, since the value will change every day. And the value that is used as an index should not be changed (until the value is explicitly updated, of course).

If it is known that the database does not contain such data types, then the use of the function is likely to be unchanged. If it is precisely known, then you can create a small interface function that will convert the above expression into an unchanged one:

postgres = # CREATE FUNCTION immutable_tstamp (t text) RETURNS timestamp LANGUAGE sql IMMUTABLE AS $$ SELECT t :: timestamptz AT TIME ZONE 'UTC' $$;
CREATE FUNCTION

This function implements an immutable conversion to a timestamp, which is hardcoded in the UTC time zone. In addition, it makes sense to also implement a validation constraint, which will make sure that the data contains only immutable values,

In the next step, the function is combined with the tsrange () function - this allows you to create an exclusive constraint based on the expression, which ensures that there are no overlapping records. This task is solved by step-by-step retrieval of the start and end time records using the constant conversion function. Then, to create valid timestamps and pass them to the exclusive constraint, the tsrange function is called using the overlap operator (&&).

 postgres=# ALTER TABLE jsontable ADD CONSTRAINT overlapping_times EXCLUDE USING gist( tsrange( immutable_tstamp(j->>'start'), immutable_tstamp(j->>'end') ) WITH && ); ALTER TABLE

As a result, we have prohibited the addition of records with overlapping time fields:

 postgres=# insert into jsontable (j) values ($${ "uuid": "4e9cf085-09a5-4b4f-bc99-bde2d2d51f43", "start": "2015-03-08 10:30", "end": "2015-03-08 11:30", "title": "test" }$$); ERROR: conflicting key value violates exclusion constraint "overlapping_times" DETAIL: Key (tsrange(immutable_tstamp(j ->> 'start'::text), immutable_tstamp(j ->> 'end'::text)))=(["2015-03-08 10:30:00","2015-03-08 11:30:00")) conflicts with existing key (tsrange(immutable_tstamp(j ->> 'start'::text), immutable_tstamp(j ->> 'end'::text)))=(["2015-03-08 11:00:00","2015-03-08 12:00:00")).

With the help of the function and constraints, we implemented the functionality of the usual exclusive constraints, defined as EXCLUDE USING gist(r WITH &&) if the corresponding range is available in the r column.

So the answer to the initial question “is it possible to use the advantages of both SQL and NoSQL DBMS at the same time?” - yes. At least for the time being, a DBMS with capabilities of both types is used, namely PostgreSQL.

The important point is that if, as in our example, the scheme is known, the system will work faster and more efficiently thanks to the storage of fields in a relational form. Of course, you need to use the right tool for work, so you need to use JSON only if the scheme is at least semi-dynamic. At the same time, the possibility of declaring constraints for a part of the schema is very useful even if the data is not relational and everything does not work so fast. In the end, the whole essence of the dynamic scheme is its flexibility.

Our experience

We use JSON in PostgreSQL in several projects. In particular, in a project for managing business processes, we store in these fields the values of process variables, the structure of which is determined at the time of product introduction, and not during its development.

Work with the fields of the table is made through the adapter framework Ruby On Rails for PostgreSQL. Reading and writing work in native Ruby mode - through hashes and lists. Thus, you can work with data from the field without additional conversions.

The article describes an important aspect of the integrity of stored data, but often in addition to simple storage, you need to search for them, for which PostgreSQL also contains functional indexes that can significantly increase data access operations for frequent queries.

How to use JSON restrictions when working with PostgreSQL

Our experience

Other technical articles on our blog:

More articles: