📜 ⬆️ ⬇️

JSONB requests in PostgreSQL

Earlier, I wrote how to enable jsonb support in postgres / psycopg2 . Today I experimented with how to request data in JSON type columns.
There is documentation on this subject, but it was not quite clear to me how the various operations work:

CREATE TABLE json_test ( id serial primary key, data jsonb ); INSERT INTO json_test (data) VALUES ('{}'), ('{"a": 1}'), ('{"a": 2, "b": ["c", "d"]}'), ('{"a": 1, "b": {"c": "d", "e": true}}'), ('{"b": 2}'); 


The request has worked, let's display all the data to check:

 SELECT * FROM json_test; id | data ----+-------------------------------------- 1 | {} 2 | {"a": 1} 3 | {"a": 2, "b": ["c", "d"]} 4 | {"a": 1, "b": {"c": "d", "e": true}} 5 | {"b": 2} (5 rows) 

Now let's filter the results. There are several operators that we can use, and we will see later why jsonb was chosen as the type.
')
Equality
In jsonb, we can verify that two JSON objects are identical:

 SELECT * FROM json_test WHERE data = '{"a":1}'; id | data ----+------ 1 | {"a": 1} (1 row) 


Restrictions
We can also get a json object containing another, i.e. "Being a subset":

 SELECT * FROM json_test WHERE data @> '{"a":1}'; 

Says: - Give us all the objects starting with key a and value 1:

  id | data ----+-------------------------------------- 2 | {"a": 1} 4 | {"a": 1, "b": {"c": "d", "e": true}} (2 rows) 

Restrictions in both directions:
In this case, the query will display an empty object and an exact match for the second:

 SELECT * FROM json_test WHERE data <@ '{"a":1}'; id | data ----+---------- 1 | {} 2 | {"a": 1} (2 rows) 


Existence key / element
The last batch of statements will check for the existence of a key (or a string element in the array).

  id | data ----+-------------------------------------- 2 | {"a": 1} 3 | {"a": 2, "b": ["c", "d"]} 4 | {"a": 1, "b": {"c": "d", "e": true}} (3 rows) 

Get the objects that have any keys from the list:

 SELECT * FROM json_test WHERE data ?| array['a', 'b']; id | data ----+-------------------------------------- 2 | {"a": 1} 3 | {"a": 2, "b": ["c", "d"]} 4 | {"a": 1, "b": {"c": "d", "e": true}} 5 | {"b": 2} (4 rows) 

And all the values ​​of objects having an exact match of the keys from the list:

 SELECT * FROM json_test WHERE data ?& array['a', 'b']; id | data ----+-------------------------------------- 3 | {"a": 2, "b": ["c", "d"]} 4 | {"a": 1, "b": {"c": "d", "e": true}} (2 rows) 

Keys to bypass
You can also filter entries matching key-> path. In simple cases, the use of restriction operators may be simpler, but not complex, they can not do. We can use these operations in SELECT, but it is still more interesting to apply them in the WHERE clause.

 SELECT * FROM json_test WHERE data ->> 'a' > '1'; 

We get all the records of the values ​​of the associative element with the key a equal to 1.
Note the need to use a text value, not a number:

  id | data ----+--------------------------- 3 | {"a": 2, "b": ["c", "d"]} (1 row) 

We can make a comparison between the primitives of objects and arrays:

 SELECT * FROM json_test WHERE data -> 'b' > '1'; id | data ----+-------------------------------------- 3 | {"a": 2, "b": ["c", "d"]} 4 | {"a": 1, "b": {"c": "d", "e": true}} 5 | {"b": 2} (3 rows) 

It turns out that arrays and objects are larger than numbers.
We can also look at a deeper path:

 SELECT * FROM json_test WHERE data #> '{b,c}' = '"d"'; 

We obtain an object where the element b has a child object c, and c is equal to the string “d”:

 id | data ----+-------------------------------------- 4 | {"a": 1, "b": {"c": "d", "e": true}} 

There are also versions of these operators that return text, not a JSON object. In the case of the last request, this means that we do not need to compare with the JSON object (in the version where we really want to get the string):

 SELECT * FROM json_test WHERE data #>> '{b,c}' = 'd'; id | data ----+-------------------------------------- 4 | {"a": 1, "b": {"c": "d", "e": true}} (1 row) 

So, up to this point, everything is fine. We can work with different data, and the same data can be used in jsonb indexes too. However, a more attentive reader may have noticed that we are dealing with JSON data, which have an object path from the root. It doesn’t have to be this way: arrays are also valid JSON, any of the valid examples are really like this:

 SELECT 'null'::json, 'true'::json, 'false'::json, '2'::json, '1.0001'::json, '"abc"'::json, '1E7'::jsonb; 

Note the last entry, which is a jsonb type and is converted to canonical form:

  json | json | json | json | json | json | jsonb ------+------+-------+------+---------+-------+---------- null | true | false | 2 | 1.00001 | "abc" | 10000000 (1 row) 

JSON is also null different from SQL NULL.
So, what happens when we store objects of mixed “type” in a JSON column?

 INSERT INTO json_test (data) VALUES ('[]'), ('[1,2,"a"]'), ('null'), ('1E7'), ('"abc"'); SELECT * FROM json_test; id | data ----+-------------------------------------- 1 | {} 2 | {"a": 1} 3 | {"a": 2, "b": ["c", "d"]} 4 | {"a": 1, "b": {"c": "d", "e": true}} 5 | {"b": 2} 6 | [] 7 | [1, 2, "a"] 8 | null 9 | 10000000 10 | "abc" (10 rows) 

The whole structure was derived without problems. Let's see if we can work with these objects and queries?
The equality check works great:

 SELECT * FROM json_test WHERE data = '{"a":1}'; SELECT * FROM json_test WHERE data = 'null'; 

Constraints also work as expected:

 SELECT * FROM json_test WHERE data @> '{"a":1}'; SELECT * FROM json_test WHERE data <@ '{"a":1}'; 

Keys and existing items also work. Not surprisingly, one request will match the elements in the array, as well as the keys in the object:

 SELECT * FROM json_test WHERE data ? 'a'; id | data ----+-------------------------------------- 2 | {"a": 1} 3 | {"a": 2, "b": ["c", "d"]} 4 | {"a": 1, "b": {"c": "d", "e": true}} 7 | [1, 2, "a"] (4 rows) 


 SELECT * FROM json_test WHERE data ?| array['a', 'b']; id | data ----+-------------------------------------- 2 | {"a": 1} 3 | {"a": 2, "b": ["c", "d"]} 4 | {"a": 1, "b": {"c": "d", "e": true}} 5 | {"b": 2} 7 | [1, 2, "a"] (5 rows) 


 SELECT * FROM json_test WHERE data ?& array['a', 'b']; id | data ----+-------------------------------------- 3 | {"a": 2, "b": ["c", "d"]} 4 | {"a": 1, "b": {"c": "d", "e": true}} (2 rows) 

But as soon as we start making keys or 'get' elements, we get problems;

(Apparently, the author of the article at the time of writing was installed PotgreSQL 9.4 betta version, so some of the requests were filled with errors, checked on 9.4.1 all requests are being processed):

 SELECT * FROM json_test WHERE data ->> 'a' > '1'; ERROR: cannot call jsonb_object_field_text (jsonb ->> text operator) on an array 

You can still use key-path traversal if you have non-scalar values:

 SELECT * FROM json_test WHERE data #> '{b,c}' = '"d"'; ERROR: cannot call extract path from a scalar SELECT * FROM json_test WHERE data #> '{b,c}' = '"d"' AND id < 8; id | data ----+-------------------------------------- 4 | {"a": 1, "b": {"c": "d", "e": true}} (1 row) 

Pay attention to the syntax for key path, for strings (must be json keys) or integer (in array indices).
This imposes very strict restrictions. I do not know how such things work in MondgoDB.

But in perspective, if you store data in arrays and json objects in one column, then there may be some problems in the future. But all is not lost. You can get strings based on basic objects:

 SELECT * FROM json_test WHERE data @> '{}'; id | data ----+-------------------------------------- 1 | {} 2 | {"a": 1} 3 | {"a": 2, "b": ["c", "d"]} 4 | {"a": 1, "b": {"c": "d", "e": true}} 5 | {"b": 2} (5 rows) 

Then you can combine this query with the query above:

 SELECT * FROM json_test WHERE data @> '{}' AND data ->> 'a' > '1'; id | data ----+--------------------------- 3 | {"a": 2, "b": ["c", "d"]} (1 row) 

Indeed, in Postgres, you don’t even need to be sure that data @> '{} comes first.
But what if we only need the array data types? It turns out you can use the same trick:

 SELECT * FROM json_test WHERE data @> '[]'; id | data ----+------------- 6 | [] 7 | [1, 2, "a"] (2 rows) 

And it can still be combined with other operators:

 SELECT * FROM json_test WHERE data @> '[]' AND data ->> 1 = '2'; id | data ----+------------- 7 | [1, 2, "a"] (1 row) 

Well, the @> operator entry is only available for jsonb columns, so you cannot query mixed data for regular json columns.

What's next?

Considering jsonb in Postgres was a third-party project, now I'm working on json (b) queries in ORM django. With Django 1.7 in the search functions you can write something like:

 # Exact MyModel.objects.filter(data={'a': 1}) MyModel.objects.exclude(data={}) # Key/element existence MyModel.objects.filter(data__has='a') MyModel.objects.filter(data__has_any=['a', 'b']) MyModel.objects.filter(data__has_all=['a', 'b']) # Sub/superset of key/value pair testing MyModel.objects.filter(data__contains={'a': 1}) MyModel.objects.filter(data__in={'a': 1, 'b': 2}) # Get element/field (compare with json) MyModel.objects.filter(data__get=(2, {'a': 1})) # Get element/field (compare with scalar, including gt/lt comparisons) MyModel.objects.filter(data__get=(2, 'a')) MyModel.objects.filter(data__get__gt=('a', 1)) # key path traversal, compare with json or scalar. MyModel.objects.filter(data__get=('{a,2}', {'foo': 'bar'})) MyModel.objects.filter(data__get=('{a,2}', 2)) MyModel.objects.filter(data__get__lte=('{a,2}', 2)) 

But I'm not sure that the names from the last set will work. The name “get” seems a bit universal, and maybe we could use different names for the input type substitution, although only integer and string are valid.

Source: https://habr.com/ru/post/254425/


All Articles