📜 ⬆️ ⬇️

PostgreSQL Document-Oriented API Design: Full-Text Search and Save Many Documents (Part 2)

In the first part of this series of articles , I created a good save function, as well as another function that allows you to create variable document-oriented tables on the fly. They work properly and do exactly what they need, but we can still do a lot. Especially: I want full-text search, indexed on the fly and save many documents within a transaction.

Let's do that.

Full text search


Our document-oriented table has a tsvector search field that is indexed using the GIN index for speed. I want to update this field every time I save a document, and I don’t want a lot of noise from the API when I do it.

In this regard, I will resort to some convention.
')
Usually, when creating a full-text index, the fields are stored with rather specific names. Such as:

I would like to check my document at the time of saving for the presence of any keys that I would like to index and then save them in the search field. This can be done using a function that I called update_search :
create function update_search(tbl varchar, id int) returns boolean as $$ //get the record var found = plv8.execute("select body from " + tbl + " where id=$1",id)[0]; if(found){ var doc = JSON.parse(found.body); var searchFields = ["name","email","first","first_name", "last","last_name","description","title", "street", "city", "state", "zip", ]; var searchVals = []; for(var key in doc){ if(searchFields.indexOf(key.toLowerCase()) > -1){ searchVals.push(doc[key]); } }; if(searchVals.length > 0){ var updateSql = "update " + tbl + " set search = to_tsvector($1) where id =$2"; plv8.execute(updateSql, searchVals.join(" "), id); } return true; }else{ return false; } $$ language plv8; 

I re-use javascript (PLV8) for this purpose, and pull out the document based on the ID. After that, I go through all the keys, to check if there are any among them that I might want to store, and, if there is, put them into an array.

If there are hits in this archive, I concatenate these objects and save them in the search field of the document using the to_tsvector function, which is built into Postgres, which takes plain text and turns it into indexable values.

Here it is! By executing this script, we get the following:

image

Ideally, now I can simply insert it at the end of my save_document function and it will be called transactionally every time I save something:

 create function save_document(tbl varchar, doc_string jsonb) returns jsonb as $$ var doc = JSON.parse(doc_string); var result = null; var id = doc.id; var exists = plv8.execute("select table_name from information_schema.tables where table_name = $1", tbl)[0]; if(!exists){ plv8.execute("select create_document_table('" + tbl + "');"); } if(id){ result = plv8.execute("update " + tbl + " set body=$1, updated_at = now() where id=$2 returning *;",doc_string,id); }else{ result = plv8.execute("insert into " + tbl + "(body) values($1) returning *;", doc_string); id = result[0].id; doc.id = id; result = plv8.execute("update " + tbl + " set body=$1 where id=$2 returning *",JSON.stringify(doc),id); } //run the search indexer plv8.execute("perform update_search($1, $2)", tbl,id); return result[0] ? result[0].body : null; $$ language plv8; 

Saving many documents


At the moment, I can pass a single document to the save_document function, but I would like to be able to pass it an array. I can do this by checking the type of the argument, and then start the loop:

 create function save_document(tbl varchar, doc_string jsonb) returns jsonb as $$ var doc = JSON.parse(doc_string); var exists = plv8.execute("select table_name from information_schema.tables where table_name = $1", tbl)[0]; if(!exists){ plv8.execute("select create_document_table('" + tbl + "');"); } //function that executes our SQL statement var executeSql = function(theDoc){ var result = null; var id = theDoc.id; var toSave = JSON.stringify(theDoc); if(id){ result=plv8.execute("update " + tbl + " set body=$1, updated_at = now() where id=$2 returning *;",toSave, id); }else{ result=plv8.execute("insert into " + tbl + "(body) values($1) returning *;", toSave); id = result[0].id; //put the id back on the document theDoc.id = id; //resave it result = plv8.execute("update " + tbl + " set body=$1 where id=$2 returning *;",JSON.stringify(theDoc),id); } plv8.execute("select update_search($1,$2)", tbl, id); return result ? result[0].body : null; } var out = null; //was an array passed in? if(doc instanceof Array){ for(var i = 0; i < doc.length;i++){ executeSql(doc[i]); } //just report back how many documents were saved out = JSON.stringify({count : i, success : true}); }else{ out = executeSql(doc); } return out; $$ language plv8; 

The good side of working with javascript here is that the logic required for such a routine is quite simple (as opposed to PLPGSQL). I highlighted the whole process of saving to its separate function - which is javascript after all - so I can avoid duplication.

Then I want to check that the input argument is an array. If so, then I go over its members and call executeSql , returning everything that has accumulated during execution.

If this is not an array, I just do everything the same way it was, returning the entire document. Result:

image

Perfectly! The best thing about it is that it all happens inside a transaction . I like it!

Odd Node


If only it could work perfectly from Node! I tried both in .NET and in Node, with .NET everything just works (strangely) using the Npgsql library. From Node, not so much.

In short: the node_pg driver makes a very strange conversion when it sees an array object as an input parameter. Pay attention to the following:

 var pg = require("pg"); var run = function (sql, params, next) { pg.connect(args.connectionString, function (err, db, done) { //throw if there's a connection error assert.ok(err === null, err); db.query(sql, params, function (err, result) { //we have the results, release the connection done(); pg.end(); if(err){ next(err,null); }else{ next(null, result.rows); } }); }); }; run("select * from save_document($1, $2)", ['customer_docs', {name : "Larry"}], function(err,res){ //works just fine } 

This is the usual Node / PG code. At the very end, the run function is configured to call my save_document function and transfer some data. When the PG sees the input object, it turns it into a string and the save will be normal.

In the case, if you send an array ...

 run("select * from save_document($1, $2)", ['customer_docs', [{name : "Larry"}, {name : "Susie"}], function(err,res){ //crashes hard } 

I get back an error telling me that this is incorrect JSON. An error message (from Postgres) reports that this is due to poorly formatted JSON:

 {"{name : "Larry"}, ...} 

What ... yes, it's terrible. I tried to formulate what is happening, but it simply looks like the node_pg driver parses the external array — possibly by calling the Underscores library’s flatten method. I dont know. To get around this, you need to change your call to the following:

 run("select * from save_document($1, $2)", ['customer_docs', JSON.stringify([{name : "Larry"}, {name : "Susie"}]), function(err,res){ //Works fine } 

Forward!


The preservation procedure is quite smooth and it makes me happy. In the next article, I will configure the search engines, as well as create a full-text search function.

Source: https://habr.com/ru/post/272411/


All Articles