PostgreSQL Document Designing API (Part 1)

This article is a translation, the original article is right here , by Rob Conery .

Postgres , as many know, supports JSON as a type of data storage, and with output 9.4, Postgres now supports JSON storage in the form of jsonb - a binary format.

This is great news for those who want to step beyond simple “storing JSON as text”. jsonb now supports indexing using the GIN index, and also has a special query operator that allows you to take advantage of the GIN index.
')

Who cares?

It was fun to discover jsonb in Postgres and see what it can do. What, in its own way, is a problem: this is only acquaintance and reflections, in order to do some work, this is not enough.

The implication is that in other systems (such as RethinkDB ), there is a huge, already built-in functionality to help you save documents, send requests to these documents, and carry out optimization. Postgres also has some interesting possibilities in this direction, but writing out-of-the-box queries is just a little ... not enough, to be honest.

Let's look at this query:

select document_field -> 'my_key' from my_docs where document_field @> '{"some_key" : "some_value"}';

It slightly reveals the weirdness of the moment when it comes to JSON and Postgres : these are all strings. Obviously, SQL cannot recognize JSON , so you have to format it as a string. Which in turn means that working with JSON directly in SQL is a pain. Of course, if you have a good query design tool , then the problem is simplified to a certain extent ... but it still exists.

Moreover, document storage is fairly free. Use one field which is jsonb ? Or a few fields in a larger table structure? It all depends on you, which, of course, is not bad, but too much freedom of choice can also be a paralyzing factor.

So why worry about this? If you want to use a document-oriented database, then use a document-oriented database. I agree with this ... but there is one truly compelling reason to use Postgres (at least for me) ...

Postgres is ACID compliant. So you can expect that she will record your data and, quite likely, will not lose them .

In addition, Postgres is a relational database, which in turn means that it is possible to switch to a more strict scheme over time. There are a number of reasons why you might want to choose Postgres , for the moment, suppose that the choice is made and it is time to start working with documents and jsonb .

Best API

As for me, I would like to see more features that support the idea of working with documents. At the moment we have built-in tools that allow us to deal with JSON types, but nothing that supports a higher level of abstraction.

This does not mean that we will not be able to build such an API with our own hands ... As I did. It begins ...

Document Oriented Table

I want to store documents in a table that contains meta-data, as well as additional ways to work with information, namely: Full Text Search.

The structure of the table itself may vary - why don't we build this abstraction! Let's start with this:

 create table my_docs( id serial primary key, body jsonb not null, search tsvector, created_at timestamptz not null default now(), updated_at timestamptz not null default now() )

There will be some overlap here. The document itself will be stored in the body field, including the id, which, in turn, is stored as the primary key (this is necessary, as this is still Postgres ). I use duplication, however, for the following reasons:

This API is mine and I can be sure that everything is synchronized.
This is done in document-oriented systems.

Saving document

What would I like from the save_document function ...

Create tables on the fly
Create relevant indexes
Create timestamps and search field (for full-text index)

This can be achieved by making my own save_document function and, for fun, I will use PLV8 - javascript inside the database. In fact, I will create two functions - one will create my table in a specific way, the other will save the document itself.

First, create_document_table :

 create function create_document_table(name varchar, out boolean) as $$ var sql = "create table " + name + "(" + "id serial primary key," + "body <b>jsonb</b> not null," + "search tsvector," + "created_at timestamptz default now() not null," + "updated_at timestamptz default now() not null);"; plv8.execute(sql); plv8.execute("create index idx_" + name + " on docs using GIN(body <b>jsonb</b>_path_ops)"); plv8.execute("create index idx_" + name + "_search on docs using GIN(search)"); return true; $$ language plv8;

This function creates a table and appropriate indexes — one for the jsonb fields in our document-oriented table, and another for the tsvector full-text index. Please note that I build SQL strings on the fly and execute with plv8 - this is how to behave with javascript in Postgres .

Next, let's create our save_document function:

 create function save_document(tbl varchar, doc_string jsonb) returns jsonb as $$ var doc = JSON.parse(doc_string); var result = null; var id = doc.id; var exists = plv8.execute("select table_name from information_schema.tables where table_name = $1", tbl)[0]; if(!exists){ plv8.execute("select create_document_table('" + tbl + "');"); } if(id){ result = plv8.execute("update " + tbl + " set body=$1, updated_at = now() where id=$2 returning *;",doc_string,id); }else{ result = plv8.execute("insert into " + tbl + "(body) values($1) returning *;", doc_string); id = result[0].id; doc.id = id; result = plv8.execute("update " + tbl + " set body=$1 where id=$2 returning *",JSON.stringify(doc),id); } return result[0] ? result[0].body : null; $$ language plv8;

I'm sure that this function looks a bit strange, but if you read it line by line, you can understand some things. But why is JSON called .parse ()?

This is due to the fact that Postgres ' ovsky jsonb type here is not JSON om - it is a string. Outside of our PLV8 site, it's still the Postgres world and it works with JSON as a string (storing it in jsonb in binary format). Thus, when our document gets into our function as a string that needs to be parsed, if we want to work with it, as with a JSON object in javascript.

In the case of insert, you can see that I have to synchronize the document ID with the primary key that was created. A bit cumbersome, but it works well.

As a result, you can see that when you insert the original, as well as with update, doc_string is supplied as an input argument to plv8.execute . This is also due to the fact that JSON values must be treated as strings in Postgres .

It really can be confusing. If I try to input doc (our JSON. Parsed object), then it will be converted plv8 to [Object object] . What is strange.

Moreover, if I try to return a javascript object from this function (suppose our doc variable), I will get an error that this is the wrong format for the JSON type. What pushes into a stupor.

As a result, I simply return the data from the result of the query - and this is a string, believe it or not, and I can simply transfer it directly as a result. It is worth noting that all the results of plv8.execute are returned as elements with which you can work as with javascript objects.

Result

It works really well! And fast. If you want to try it in business - you will need to install the PLV8 extension and then write your request according to:

 create extension plv8; select * from save_document('test_run', '{"name" : "Test"}');

You should see a new table and a new entry in this table:

Future plans

In the next article I will add some additional features, namely:

Automatic update search fields
Insert multiple documents using arrays

This is a good start!

Source: https://habr.com/ru/post/272395/

All Articles