📜 ⬆️ ⬇️

PostgreSQL Designing a Document-Oriented API: Comprehensive Queries (Part 4)

Keeping documents in Postgres is a bit easier, now we have serious saving procedures , the ability to run full-text search , and some simple search and filtering procedures .

This is only half the story, of course. Rudimentary searches can serve the needs of the application, but they will never work in the long run when we need to ask deeper questions.

Source document


Document storage is a very big topic. How to store a document (and what to store), for me, is divided into three areas:


I am very much to the last. I am an information store and when something happens, I want to know what / why / where to any limits.
')
That's what I used to do to save information about people buying something from Tekpub. This is the format of the document that I was going to put into operation, but never came to this (because of the sale on Plularsight).

{ "id": 1, "items": [ { "sku": "ALBUM-108", "grams": "0", "price": 1317, "taxes": [], "vendor": "Iron Maiden", "taxable": true, "quantity": 1, "discounts": [], "gift_card": false, "fulfillment": "download", "requires_shipping": false } ], "notes": [], "source": "Web", "status": "complete", "payment": { //... }, "customer": { //... }, "referral": { //... }, "discounts": [], "started_at": "2015-02-18T03:07:33.037Z", "completed_at": "2015-02-18T03:07:33.037Z", "billing_address": { //... }, "shipping_address": { //... }, "processor_response": { //... } } 

This is a great document . I love big documents! This document is the exact result of all information movements during the checkout process:


I want this document to be an autonomous, self-contained object that does not need any other documents to be completed. In other words, I would like to be able to:


This document is complete on its own and that’s great!

OK, enough, let's write some reports.

The formation of data. Actual table


Going analytics it is important to remember two things:


Performing huge queries on the combined tables takes forever, and this ultimately leads to nothing. You should build reports on historical data that do not change (or change very little) over time. Denormalization helps with speed, and speed is your friend when building reports.

Given this, we must use the kindness of PostgreSQL to form our data into a sales fact table . An “actual” table is simply a denormalized data set that represents an event in your system — the smallest amount of digestible information about a fact .

For us, this fact is a sale, and we want this event to look like this:

image

I use the Chinook sample database with some random sales data created with Faker .

Each of these records is a single event that I want to accumulate, and all the information about the dimension with which I want to combine them (time, supplier) is already included. I can add more (category, etc.), but for now this is enough.

This data is in tabular form, which means that we must extract it from the document shown above. The task is not easy, but much easier because we use PostgreSQL:

 with items as ( select body -> 'id' as invoice_id, (body ->> 'completed_at')::timestamptz as date, jsonb_array_elements(body -> 'items') as sale_items from sales ), fact as ( select invoice_id, date_part('quarter', date) as quarter, date_part('year', date) as year, date_part('month', date) as month, date_part('day', date) as day, x.* from items, jsonb_to_record(sale_items) as x( sku varchar(50), vendor varchar(255), price int, quantity int ) ) select * from fact; 

This is a set of generic table expressions (TSS), combined together in a functional way (see below). If you have never used OTV - they may look a bit unusual ... until you look closely and understand that you simply combine things together with names.

In the first query above, I pull the sales id and call it invoice_id , and then pull the timestamp and convert it to timestampz , simple steps in essence.

What becomes more interesting here is jsonb_array_elements , which pulls an array of objects from the document and creates an entry for each of them. That is, if we had a single document in the database with three objects and would run the following query:

 select body -> 'id' as invoice_id, (body ->> 'completed_at')::timestamptz as date, jsonb_array_elements(body -> 'items') as sale_items from sales 

Instead of one record representing the sale, we would get 3:

image

Now that we have selected the objects, we need to separate them into separate columns. Here comes the next trick with jsonb_to_record . We can immediately use this function, describing type values ​​on the fly:

 select * from jsonb_to_record( '{"name" : "Rob", "occupation": "Hazard"}' ) as ( name varchar(50), occupation varchar(255) ) 

In this simple example, I convert jsonb to a table — all I need to do is tell PostgreSQL how to do it. This is exactly what we are doing in the second OTV (“event”) above. Also, we use date_part to convert dates.

This gives us an event table that we can save to the view if we:

 create view sales_fact as -- the query above 

You might think this request is terribly slow. In fact, it is quite fast. This is not some level mark, or something like that - just a relative result to show you that this query is, in fact, fast. I have 1000 test documents in the database, the execution of this query on all documents returns in about a tenth of a second:

image

PostgreSQL. Cool thing.

Now we are ready for some savings!

Sales report


Then everything is easier. You simply combine the data you want, and if you have forgotten about something, you simply add it to your view and you do not have to worry about any table joins. Just a data conversion that really happens quickly.

Let's see the top five sellers:

 select sku, sum(quantity) as sales_count, sum((price * quantity)/100)::money as sales_total from sales_fact group by sku order by salesCount desc limit 5 

This query returns data in 0.12 seconds. Fast enough for 1000 entries.

PTS and functional requests


One of the things I really like about RethinkDB is its own query language, ReQL. He is inspired by Haskell (according to the command) and the whole is in composition (especially for me):

To understand ReQL, it helps to understand functional programming. Functional programming is included in the declarative paradigm in which the programmer seeks to describe the value that he wants to calculate, rather than describe the steps necessary to calculate this value. Database query languages ​​tend to tend to a declarative ideal, which at the same time gives the query processor the most freedom in choosing the optimal execution plan. But while SQL achieves this using special keywords and a specific declarative syntax, ReQL has the ability to express arbitrarily complex operations through a functional composition.


As can be seen above, we can approximate this using TSS combined together, each of which transforms the data in a specific way.

Conclusion


There is a lot more that I could write, but let's just summarize it all with the fact that you can do everything that other document-oriented systems can do and even more. The capabilities of queries in Postgres are very large - there is a very small list of things you cannot do and, as you have seen, the ability to convert your document into a table structure helps a lot.

And this is the end of this small series of articles.

Source: https://habr.com/ru/post/272675/


All Articles