This article is a translation, the original article is
right here , by
Rob Conery .
Postgres , as many know, supports
JSON as a type of data storage, and with output 9.4,
Postgres now supports
JSON storage in the form of
jsonb - a binary format.
This is great news for those who want to step beyond simple “storing
JSON as text”.
jsonb now supports indexing using the GIN index, and also has a special query operator that allows you to take advantage of the GIN index.
')
Who cares?
It was fun to discover jsonb in Postgres and see what it can do. What, in its own way, is a problem: this is only acquaintance and reflections, in order to do some work, this is not enough.
The implication is that in other systems (such as
RethinkDB ), there is a huge, already built-in functionality to help you save documents, send requests to these documents, and carry out optimization.
Postgres also has some interesting possibilities in this direction, but writing out-of-the-box queries is just a little ... not enough, to be honest.
Let's look at this query:
select document_field -> 'my_key' from my_docs where document_field @> '{"some_key" : "some_value"}';
It slightly reveals the weirdness of the moment when it comes to
JSON and
Postgres : these are all strings. Obviously, SQL cannot recognize
JSON , so you have to format it as a string. Which in turn means that working with
JSON directly in SQL is a pain. Of course,
if you have a good query design tool , then the problem is simplified to a certain extent ... but it still exists.
Moreover, document storage is fairly free. Use one field which is
jsonb ? Or a few fields in a larger table structure? It all depends on you, which, of course, is not bad, but too much freedom of choice can also be a paralyzing factor.
So why worry about this? If you want to use a document-oriented database, then use a document-oriented database. I agree with this ... but there is one truly compelling reason to use
Postgres (at least for me) ...
Postgres is ACID compliant. So you can expect that she will record your data and, quite likely,
will not lose them .
In addition,
Postgres is a relational database, which in turn means that it is possible to switch to a more strict scheme over time. There are a number of reasons why you might want to choose
Postgres , for the moment, suppose that the choice is made and it is time to start working with documents and
jsonb .
Best API
As for me, I would like to see more features that support the idea of ​​working with documents. At the moment we have built-in tools that allow us to deal with
JSON types, but nothing that supports a higher level of abstraction.
This does not mean that we will not be able to build such an API with our own hands ... As I did. It begins ...
Document Oriented Table
I want to store documents in a table that contains meta-data, as well as additional ways to work with information, namely: Full Text Search.
The structure of the table itself may vary - why don't we build this abstraction! Let's start with this:
create table my_docs( id serial primary key, body jsonb not null, search tsvector, created_at timestamptz not null default now(), updated_at timestamptz not null default now() )
There will be some overlap here. The document itself will be stored in the
body field, including the id, which, in turn, is stored as the primary key (this is necessary, as this is still
Postgres ). I use duplication, however, for the following reasons:
- This API is mine and I can be sure that everything is synchronized.
- This is done in document-oriented systems.
Saving document
What would I like from the
save_document function ...
- Create tables on the fly
- Create relevant indexes
- Create timestamps and search field (for full-text index)
This can be achieved by making my own
save_document function and, for fun, I will use
PLV8 - javascript inside the database. In fact, I will create two functions - one will create my table in a specific way, the other will save the document itself.
First,
create_document_table :
create function create_document_table(name varchar, out boolean) as $$ var sql = "create table " + name + "(" + "id serial primary key," + "body <b>jsonb</b> not null," + "search tsvector," + "created_at timestamptz default now() not null," + "updated_at timestamptz default now() not null);"; plv8.execute(sql); plv8.execute("create index idx_" + name + " on docs using GIN(body <b>jsonb</b>_path_ops)"); plv8.execute("create index idx_" + name + "_search on docs using GIN(search)"); return true; $$ language plv8;
This function creates a table and appropriate indexes — one for the
jsonb fields in our document-oriented table, and another for the
tsvector full-text index. Please note that I build SQL strings on the fly and execute with
plv8 - this is how to behave with javascript in
Postgres .
Next, let's create our
save_document function:
create function save_document(tbl varchar, doc_string jsonb) returns jsonb as $$ var doc = JSON.parse(doc_string); var result = null; var id = doc.id; var exists = plv8.execute("select table_name from information_schema.tables where table_name = $1", tbl)[0]; if(!exists){ plv8.execute("select create_document_table('" + tbl + "');"); } if(id){ result = plv8.execute("update " + tbl + " set body=$1, updated_at = now() where id=$2 returning *;",doc_string,id); }else{ result = plv8.execute("insert into " + tbl + "(body) values($1) returning *;", doc_string); id = result[0].id; doc.id = id; result = plv8.execute("update " + tbl + " set body=$1 where id=$2 returning *",JSON.stringify(doc),id); } return result[0] ? result[0].body : null; $$ language plv8;
I'm sure that this function looks a bit strange, but if you read it line by line, you can understand some things. But why is
JSON called .parse ()?
This is due to the fact that
Postgres ' ovsky jsonb type here is not
JSON om - it is a string. Outside of our PLV8 site, it's still the
Postgres world and it works with
JSON as a string (storing it in
jsonb in binary format). Thus, when our document gets into our function as a string that needs to be parsed, if we want to work with it, as with a
JSON object in javascript.
In the case of insert, you can see that I have to synchronize the document ID with the primary key that was created. A bit cumbersome, but it works well.
As a result, you can see that when you insert the original, as well as with update,
doc_string is supplied as an input argument to
plv8.execute . This is also due to the fact that
JSON values ​​must be treated as strings in
Postgres .
It really can be confusing. If I try to input
doc (our
JSON. Parsed object), then it will be converted plv8 to
[Object object] . What is strange.
Moreover, if I try to return a javascript object from this function (suppose our
doc variable), I will get an error that this is the wrong format for the
JSON type. What pushes into a stupor.
As a result, I simply return the data from the result of the query - and this is a string, believe it or not, and I can simply transfer it directly as a result. It is worth noting that all the results of plv8.execute are returned as elements with which you can work as with javascript objects.
Result
It works really well! And fast. If you want to try it in business - you will need to install the PLV8 extension and then write your request according to:
create extension plv8; select * from save_document('test_run', '{"name" : "Test"}');
You should see a new table and a new entry in this table:
Future plans
In the
next article I will add some additional features, namely:
- Automatic update search fields
- Insert multiple documents using arrays
This is a good start!