When to use unstructured data types in PostgreSQL? Comparison Hstore vs. JSON vs. Jsonb

Since PostgreSQL began to support NoSQL (through HStore, JSON and JSONB), the question of when to use PostgreSQL in relational mode, and in which in NoSQL mode, has been raised quite often. Will you be able to completely abandon traditional table structures and work with document views in the future? Do you mix both approaches? The answer to this question is not surprising - it all depends on many factors. Every new data storage model including Hstore, JSON and JSONB has its ideal applications. Here we dig deeper and find out about the features of each of them and see when what to use.

HStore

Excluding XML , Hstore was the first truly unstructured data type added to PostgreSQL. Hstore was added quite a long time ago in Postgres 8.3 before upsert , before streaming replication , and before window functions . Hstore is essentially a key / value store directly in PostgreSQL. Using Hstore you are limited in the choice of data type used. Essentially, you only have strings. You don't even have data nesting; In short, it’s a single-level key / value data type.

The advantage of Hstore is that you do not need to define keys (as opposed to columns) in advance. You can simply insert a record, and it will store all the necessary data. Let's say you have an example of a table creation script:

CREATE TABLE products ( id serial PRIMARY KEY, name varchar, attributes hstore );

With Hstore, you can insert everything you want into the attribute column. In this case, the request to add these keys and values will look like this:

 INSERT INTO products (name, attributes) VALUES ( 'Geek Love: A Novel', 'author => "Katherine Dunn", pages => 368, category => fiction' );

The sample request will be:

 SELECT name, attributes->'author' as author FROM products WHERE attributes->'category' = 'fiction'

The obvious advantage of this approach is flexibility, but where it truly manifests itself completely is the ability to use different types of indexes. In particular, the GIN or GiST index will index each key and value within the Hstore. That is, the added index will be used for filtering if the PostgreSQL scheduler requires it.

Since Hstore is not the full equivalent of a document, it is important to understand whether it is beneficial to use it as such.
If you have relational data and also some data that may not always exist in a column, then this approach can be a great solution. For example, in most cases, product catalog attributes can be a great example for this type of data. Then, for some products, such as books (which you store in a separate “Products” table), parameters such as genre, year of publication may be defined. In another case, for products such as clothes that you also store in the same table, other parameters can already be defined - size and color. Adding a column to the product table for each possible parameter is redundant and unreasonable.

Json

Starting with version 9.2, Postgres has JSON support. Now, PostgreSQL can compete with MongoDB. (Although the JSON functionality in PostgreSQL 9.2 is, of course, a bit exaggerated . More on this below.)

The data type is in JSON format in Postgres, if you look into pretty much just a text field. All you get with the JSON data type is validation of the value when inserting. Postgres enforces the JSON format. One small potential advantage over JSONB (which we will look at next) is that JSON maintains padding in the data coming into the database. So if you are very demanding on the formatting of your data or you need to save the record in one or another structure, JSON can be useful.

In addition, over time, Postgres has acquired some pretty useful features . Should you use JSON? Indeed, the PostgreSQL type JSON simply provides a check on the text field. If you store some form of log data that you rarely request, the JSON data type works well in this case. Since JSON is fairly simple, it will have much higher bandwidth when inserted. For something more complicated, I would recommend using JSONB, which will be described below.

Jsonb

Finally, in Postgres 9.4 we got real and proper JSON support in the form of JSONB. B means “better” (Better). JSONB is a binary representation of data in JSON format. This means that the data is compressed and more efficient for storage than plain text. In addition, under the hood, he has a mechanism similar to Hstore. Technically, sometime during development, there was an almost implemented type of Hstore2 and a separate type of JSON, and they were subsequently merged into JSONB as it is now.

The JSONB type is pretty much what you would expect from a JSON data type. It allows you to implement nested structures, use basic data types, and also has a number of built-in functions for working with it. The best part of this similarity with Hstore is indexing. Creating a GIN index on a JSONB column will create an index for each key and value within this JSON document. The ability to index and nest data inside a document means that JSONB is superior to Hstore in most cases.

Although there is still a small question about when to use JSONB exclusively. Suppose you create a database of documents and select Postgres from all the options. With a package like MassiveJS, this can be quite convenient.

The most common examples of use are:

Track data events by adding variable payload events.
Storage of game data is fairly common, especially where you have a single player game and a variable data scheme based on the user's state.
Tools that combine multiple data sources, an example here can be a tool that integrates multiple customer databases to Salesforce, to Zendesk, or to something else. The combination of schemes makes it a more painful procedure than it should be.

Let's look at another example of working with JSONB. The script creates a table and inserts some data for an example:

 CREATE TABLE integrations (id UUID, data JSONB); INSERT INTO integrations VALUES ( uuid_generate_v4(), '{ "service": "salesforce", "id": "AC347D212341XR", "email": "craig@citusdata.com", "occurred_at": "8/14/16 11:00:00", "added": { "lead_score": 50 }, "updated": { "updated_at": "8/14/16 11:00:00" } }'); INSERT INTO integrations ( uuid_generate_v4 (), '{ "service": "zendesk", "email": "craig@citusdata.com", "occurred_at": "8/14/16 10:50:00", "ticket_opened": { "ticket_id": 1234, "ticket_priority": "high" } }');

In the above case, you can easily find all the events that occurred c user email craig@citusdata.com, and then do some actions. For example, you can conduct some form of behavioral analytics, and calculate the users who did foo and then bar, or make a simple report.
After adding the Gin index all the data within my JSONB fields are automatically indexed:

 CREATE INDEX idx_integrations_data ON integrations USING gin(data);

Conclusion

In most cases, JSONB is probably what you are looking for when you plan to use the non-relational data type. For Hstore and JSON, you can also find good use, even in rarer cases. JSONB does not always fit into the data model. If you can normalize the schema, then you will have an advantage, but if the schema has a large number of optional columns (for example, with event data) or one schema is very different from the other, then JSONB is much better suited.

In summary, the decision algorithm:

JSONB - In most cases
JSON - If you process logs, you do not often have to request data or do not need to use them as anything more than logging tasks.
Hstore - works great with textual data based on a key-value representation, but in general, JSONB also does an excellent job with this task.

Source: https://habr.com/ru/post/306602/

All Articles

When to use unstructured data types in PostgreSQL? Comparison Hstore vs. JSON vs. Jsonb

HStore

Json

Jsonb

Conclusion

More articles: