PostgreSQL document-oriented API design: Finding what you are looking for (Part 3)

In the first and second parts of this small series of articles, I showed various ways to save a document and then update its search field . In addition, I showed how to save a lot of documents transactionally. In this article I will consider the options for performing queries.

The best way to search for documents

In the first part, we created a table that looks like this:

create table my_docs( id serial primary key, body jsonb not null, search tsvector, created_at timestamptz not null default now(), updated_at timestamptz not null default now() )

Since we have control over how data is stored, we can write our own functions to extract this data in many fun ways! The most difficult thing behind (saving, updating, etc.) - now let's have some fun.

Extract the document by ID

Each document has an id field associated with it entirely, thanks to the save_document function. This is still Postgres, so each field needs a primary key and we put it inside the document itself. I made my own integer, but you can also use a bigint Twitter snowflake if you wish. At the moment we will use the type of serial .
')
The function for this is quite straightforward:

 create function find_document(tbl varchar, id int, out jsonb) as $$ //find by the id of the row var result = plv8.execute("select * from " + tbl + " where id=$1;",id); return result[0] ? result[0].body : null; $$ language plv8; select * from find_document('customers',20);

This is the simplest of all possible functions — it takes the name of the table and the ID that needs to be found and performs the fastest of all the queries (which we love!): Search by primary key . Speed: we like it .

Now let's create one for the bulk query. For this, I want to introduce a certain criterion and teach back the first match. This will work correctly only if I sort the result, so I add ORDER BY and specify the ID as the default parameter:

 create function find_document( tbl varchar, criteria varchar, orderby varchar default 'id' ) returns jsonb as $$ var valid = JSON.parse(criteria); //this will throw if it invalid var results = plv8.execute("select body from " + tbl + " where body @> $1 order by body ->> '" + orderby + "' limit 1;",criteria); return results[0] ? results[0].body : null $$ language plv8; select * from find_document('customers','{"last": "Conery"}', 'first');

Moreover, we can expect strange behavior depending on the driver we use. The first thing I would like to point out is that I overload the find_document function because Postgres allows me to do this. Which in turn means that the only difference between our original function, which searches by id, and this function is a list of arguments.

For the npgsql driver this is not a problem. For the node_pg driver, this is another problem. Since I set the default value for the orderby parameter, some confusion arises when choosing which function to run. As far as I can tell, the node_pg driver cares not for the type of function arguments, but only for their number . Thus, if we try to launch the “search by id” function mentioned above, then our second function will fire.

Again: Npgsql (.NET driver) has no such problem. So if you have problems, just rename one of the functions, or remove the default value for the parameter.

Another thing I would like to draw attention to is that I set the type parameter varchar for the criteria parameter. This was done, although not technically correct, but it makes the API a little better. If I set it up as jsonb, the query would have to be executed as follows:

 select * from find_document('customers','{"last": "Conery"}'::jsonb, 'first');

The difference is small, because we will use the API mainly from the code (which will be discussed in the next post).

Filtration

Let's repeat the same thing, only for a few returned documents:

 create function filter_documents( tbl varchar, criteria varchar, orderby varchar default 'id' ) returns setof jsonb as $$ var valid = JSON.parse(criteria);//this will throw if it invalid var results = plv8.execute("select body from " + tbl + " where body @> $1 order by body ->> '" + orderby + "'",criteria); var out = []; for(var i = 0;i < results.length; i++){ out.push(results[i].body); } return out; $$ language plv8; select * from find_document('customer_docs','{"last": "Conery"}');

This is more interesting. The result I get is setof jsonb , which means I have to return a number of jsonb lines. It’s not quite clear how to do this with PLV8, and it may be a better way than mine - but this is what I’m sure it works with.

As soon as I get the result (rows from our document-oriented table), I need to start a loop that will take and insert the body of the jsonb field into the array, which I will return later.

All this works because the body field is jsonb, which in turn is text. This is not a javascript object, because if it were, then I would get an error (old [Object object] parsing stupidity).

SQL injection

Many of you will notice that the orderby parameter here is concatenated directly inward. If you allow your users to write SQL to your database, then yes, this is a problem. But hopefully, you will perform this function from the driver, which parametrizes your requests for you, to something like this:

 db.filter("customers", { last : "Conery", orderBy : "a';DROP TABLE test; SELECT * FROM users WHERE 't' = 't" }, function(err, res){ console.log(err); console.log(res); });

… will not work. Why not? Because ideally you do something like this:

 select * from filter_documents($1, $2, $3);

If not, you get what you deserve :).

Full text search

Let's finish the full text search on our documents, as it should. This is my favorite part:

 create function search_documents(tbl varchar, query varchar) returns setof jsonb as $$ var sql = "select body, ts_rank_cd(search,to_tsquery($1)) as rank from " + tbl + " where search @@ to_tsquery($1) " + " order by rank desc;" var results = plv8.execute(sql,query); var out = []; for(var i = 0; i < results.length; i++){ out.push(results[i].body); } return out; $$ language plv8; select * from search_documents('customers', 'jolene');

This is all pretty simple if you know how indexing works for full-text search in Postgres. Here we simply work with the search field (which is indexed by the GIN index for speed), which we update each time we save. This request is lightning fast and very easy to use.

Making indexes more flexible

In the two functions that take the criterion (search and filtering), I use the "content" operator . This is a small @> character.

This operator is specific to jsonb and allows us to use the GIN index on the body field. This index is as follows:

 create index idx_customers on customers using GIN(body jsonb_path_ops);

Here jsonb_path_ops gives a special piquancy. It tells the indexer to optimize for jsonb “content” operations (in fact: if this piece of jsonb is contained in another piece of jsonb). This means that the index is faster and smaller.

Now, I could cite a bunch of sources and articles on how PostgreSQL bypasses MongoDB and others when it comes to writing / reading. But it is misleading.

Read and write speed

If you take one PostgreSQL server against one MongoDB server, MongoDB will look much dumber and Postgres will smoke on almost every metric. This is due to the fact that Postgres was so designed - a “scalable” database.

If you optimize MongoDB and add servers to distribute the load, the indicators will become closer to each other, but, in addition, you will have to deal with a horizontal system that can behave as you do not expect from it . This is all very controversial, of course, but the following should be noted:

Postgres indexing slows down. So, if it destroys your write performance, you should set the index only for what you want to index (indicating (body -> my_field) ) when setting the GIN index
If you access something very often (for example, an email address), just copy it into a separate column and make it UNIQUE You can process synchronization in code, or in a separate function.

In the next article I will dive into ways to call this stuff from code!

Source: https://habr.com/ru/post/272545/

All Articles