📜 ⬆️ ⬇️

Ideal catalog, an implementation option

In the continuation of the article " Ideal catalog, architecture sketch ", I will show with examples how to use the proposed database structure to store arbitrary data and perform arbitrary searches on this data. The scripts are in the repository - universal_data_catalog_idea .

I invite under the cut, those who are interested to look at these scripts with author comments.

Repository content


In the repository a full set of infrastructure scripts:

  1. creating tables
  2. filling data;
  3. cleaning tables from data;
  4. deletion of all created tables;

When I started writing scripts, their volume was small, but when I got to create test suites, the amount of code grew to 1000+ lines. It seems to me that such large scripts should be inserted into the article unnecessary, so if you want to touch the “live” data, then clone the repository and see how it turns out in real life.
')
In the database schema, I made some small changes - the editors (redactor_id) took out from the content (content), now the content itself, the editors themselves.

Highlights of data organization


The system itself does not dictate the rules of use, any logic can be applied to it.

The main purpose of the system is data retrieval, data storage is a necessary, but nonetheless secondary functionality.

You can store any data, any configuration. Catalog data consists of Entities (item) and their Values ​​(content). Each Value is a value of a specific Characteristic (property). Entities with the same set of Characteristics can be combined into Rubric.

An entity may belong to one Category, maybe to several, in the present embodiment, only to one.

Adding data (storage)


How to add data to the directory can be found in \ deploy \ commit_dml.sql. The sequence is as follows:

  1. create rubrics;
  2. if the hierarchy of Rubrics is necessary, then create a Hierarchy (element_tree) and distribute the rubrics across the hierarchy (rubric_element_tree);
  3. add Characteristics (property);
  4. if it is intended to use the Characteristics in accordance with any rules, then you can add the Options of these rules (tag) and dock the Characteristics and Options (property_tag) accordingly;
  5. assign rubrics for characteristics (rubric_property);
  6. add Entities (item);
  7. Group Entities by Rubrics (rubric_item);
  8. add content (content) for the characteristics;
  9. dock Values ​​with Entities (item_content);
  10. if it is assumed that Values ​​will have several editors, then add Editors (redactor) and assign Editors Values ​​(redactor_content);
  11. if not only string search is expected, then convert user input (content.raw) into a specific data type and write data to the appropriate table (date_matter.date_time, digital_matter.digital, duration_matter.duration, string_matter.string);

The list came out long, but in fact all the steps fit into two steps:

  1. add Entity;
  2. set values;

The remaining steps as necessary. With the theory of information storage sorted out. Now practical application.

Practical content catalog


Suppose we want to make our own Avito for the sale of excavators.
To do this, we will add the “Excavators” root heading, and to it, two subsidiaries: “Excavators for mining” and “Excavators for loaders”.
For the category "Excavators career" assign properties:
"Product Model";
"Trademark";
"Bucket capacity";
"The price of goods in rubles";
"Units of measurement for the item";
For the category "Excavators loaders" we assign a similar set of properties plus the property "Spade capacity".
Specifications:
"Product Model",
"Trademark",
"Bucket capacity",
"Shovel capacity",
are system values, the values ​​of these Characteristics are set by the content manager, we assign the Option to them - “SYSTEM_PROPERTY”.
“Bucket capacity” and “Spade capacity” are numerical data - The “DIGITAL_DATA_TYPE” option, search by them will be a search by value range - “BETWEEN_SEARCH_TYPE”.
“Product model” and “Trademark” are string data - “STRING_DATA_TYPE”, search by “Trademark” characteristic will be like in the manufacturers reference list - by enumeration - “ENUMERATION_SEARCH_TYPE”, by “Product model” we will look for the substring entry - “LIKE_SEARCH_TYPE” .

We hammer in Essences and Values. Now you can perform a general search.

General search


General search implies search everywhere, our everywhere is limited only by Rubrics and Entities, in principle, you can also search in Values, since they store user input in the format of strings. Search everywhere means searching for substrings, in which columns? Apparently only the "title" and "description".

Go!

--  "" (       ) SELECT 'RUBRIC', rr.code, rr.title, rr.description FROM rubric rr WHERE (rr.title ILIKE '%' || :SEARCH_PATTERN || '%' OR rr.description ILIKE '%' || :SEARCH_PATTERN || '%') AND EXISTS ( SELECT NULL FROM ( WITH RECURSIVE road_map ( id, element_tree_id, code, horizont ) AS ( SELECT cet.id AS id, cet.element_tree_id AS element_tree_id, r.code AS code, 0 AS horizont FROM element_tree cet LEFT JOIN rubric_element_tree ret ON cet.id = ret.element_tree_id LEFT JOIN rubric r ON ret.rubric_id = r.id WHERE r.code = rr.code UNION SELECT pet.id, pet.element_tree_id, r.code, horizont + 1 FROM element_tree pet JOIN road_map c ON (c.element_tree_id = pet.id) LEFT JOIN rubric_element_tree ret ON pet.id = ret.element_tree_id LEFT JOIN rubric r ON ret.rubric_id = r.id ) SELECT NULL FROM road_map rm WHERE rm.code = :CATALOG_ROOT ORDER BY horizont DESC LIMIT 1 ) R ) UNION SELECT 'ITEM', i.code, i.title, i.description FROM rubric rr JOIN rubric_item ri ON rr.id = ri.rubric_id JOIN item i ON ri.item_id = i.id WHERE (i.title ILIKE '%' || :SEARCH_PATTERN || '%' OR i.description ILIKE '%' || :SEARCH_PATTERN || '%') AND EXISTS ( SELECT NULL FROM ( WITH RECURSIVE road_map ( id, element_tree_id, code, horizont ) AS ( SELECT cet.id AS id, cet.element_tree_id AS element_tree_id, r.code AS code, 0 AS horizont FROM element_tree cet LEFT JOIN rubric_element_tree ret ON cet.id = ret.element_tree_id LEFT JOIN rubric r ON ret.rubric_id = r.id WHERE r.code = rr.code UNION SELECT pet.id, pet.element_tree_id, r.code, horizont + 1 FROM element_tree pet JOIN road_map c ON (c.element_tree_id = pet.id) LEFT JOIN rubric_element_tree ret ON pet.id = ret.element_tree_id LEFT JOIN rubric r ON ret.rubric_id = r.id ) SELECT NULL FROM road_map rm WHERE rm.code = :CATALOG_ROOT ORDER BY horizont DESC LIMIT 1 ) R ); 

Request parameters


Root heading


 :CATALOG_ROOT 

The root heading code (rubric.code) serves to limit the search area, we can search in “Excavators” in general, but we can search only in the quarry ones, or we can search in goods in general, but we can only in excavators.

Determining whether the next object in the ancestors has a given Rubric is made through a hierarchical query specific to each DBMS.

You can search without restrictions in the region, then if we added services to our catalog (for example, for renting excavators) and a resume (for example, an excavator driver) and looking for the substring “excavator”, then we also have the result of renting excavators and positions for an engineer excavator.

Search line


 :SEARCH_PATTERN 

The search string is actually the substring that we are looking for; you can split the search string into spaces (or any other delimiter) and search with the condition:

 WHERE (i.title ILIKE '%' || :PATTERN_PART1|| '%' OR i.description ILIKE '%' || :PATTERN_PART1 || '%') AND (i.title ILIKE '%' || :PATTERN_PART2|| '%' OR i.description ILIKE '%' || :PATTERN_PART2|| '%') --       AND (i.title ILIKE '%' || :PATTERN_PART_N|| '%' OR i.description ILIKE '%' || :PATTERN_PART_N|| '%') 

The resulting script certainly looks “gigantic”, but it is not written manually, our application writes it, and the DBMS swallows any script, so do not look at the number of letters - this is not significant.

Suppose with the help of this search, the user has found the heading he needs, now in the heading it is necessary to find an interesting position, for this it is necessary to make a search among the Entities in the specified parameters.

Search by parameters


What would the user imagine the search boundaries, for him it is necessary to calculate these boundaries.
If we do a search by category, then we show the parameters (boundaries) of the search only for system characteristics.

To do this, let's see what our heading has, there are system characteristics and what search methods are set:

 --         -       SELECT btrim(p.code) AS "property", btrim(tu.code) AS "author_type", btrim(ts.code) AS "search_type", btrim(tt.code) AS "data_type" FROM rubric r JOIN rubric_property rp ON rp.rubric_id = r.id JOIN property p ON rp.property_id = p.id JOIN property_tag ptu on p.id = ptu.property_id JOIN tag tu on ptu.tag_id = tu.id JOIN property_tag pts on p.id = pts.property_id JOIN tag ts on pts.tag_id = ts.id JOIN property_tag ptt on p.id = ptt.property_id JOIN tag tt on ptt.tag_id = tt.id WHERE r.code = 'ekskavatory-karernye' AND tu.code = 'SYSTEM_PROPERTY' AND ts.code IN ('LIKE_SEARCH_TYPE','BETWEEN_SEARCH_TYPE','ENUMERATION_SEARCH_TYPE') AND tt.code IN ('DIGITAL_DATA_TYPE','STRING_DATA_TYPE') ; /* MANUFACTURER_MODEL, SYSTEM_PROPERTY, LIKE_SEARCH_TYPE, STRING_DATA_TYPE TRADE_MARK, SYSTEM_PROPERTY, ENUMERATION_SEARCH_TYPE, STRING_DATA_TYPE BUCKET_CAPACITY_M3, SYSTEM_PROPERTY, BETWEEN_SEARCH_TYPE, DIGITAL_DATA_TYPE */ 

Explanation of the request


We look at what characteristics are systemic:

 tu.code = 'SYSTEM_PROPERTY' 

We look at what type of search is defined for these Characteristics:

 ts.code IN ('LIKE_SEARCH_TYPE','BETWEEN_SEARCH_TYPE','ENUMERATION_SEARCH_TYPE') 

We look at the data type for these Characteristics:

 tt.code IN ('DIGITAL_DATA_TYPE','STRING_DATA_TYPE') 

If one of the three search parameters for the Characteristics is not specified, then the search is not possible (joining the tu ts tt tables via JOIN).

Determine the properties for the category "Excavators career."

 r.code = 'ekskavatory-karernye' 

Query result


As a result, we obtain three characteristics and search parameters:

  1. MANUFACTURER_MODEL , SYSTEM_PROPERTY, LIKE_SEARCH_TYPE, STRING_DATA_TYPE
  2. TRADE_MARK , SYSTEM_PROPERTY, ENUMERATION_SEARCH_TYPE, STRING_DATA_TYPE
  3. BUCKET_CAPACITY_M3 , SYSTEM_PROPERTY, BETWEEN_SEARCH_TYPE, DIGITAL_DATA_TYPE

Now for each Characteristics we are looking for boundaries.

Calculation of search boundaries



Calculation of search bounds for “MANUFACTURER_MODEL”


For “MANUFACTURER_MODEL” - the type of search by entry (“LIKE_SEARCH_TYPE”) - we do not calculate the boundaries, we display the field for entering the search string for this Characteristic to the user .

Calculation of search bounds for TRADE_MARK


 --    SELECT sm.string FROM rubric r JOIN rubric_item ri ON r.id = ri.rubric_id JOIN item i ON ri.item_id = i.id JOIN item_content ic ON i.id = ic.item_id JOIN content c ON ic.content_id = c.id JOIN property p ON c.property_id = p.id JOIN string_matter sm ON c.id = sm.content_id WHERE p.code = 'TRADE_MARK' AND r.code = 'ekskavatory-karernye' GROUP BY sm.string; /*   */ 

Data type - "STRING_DATA_TYPE" - means we are analyzing - string_matter.string. The search method is “ENUMERATION_SEARCH_TYPE” - it means that we are doing “GROUP BY”. Calculated two values ​​- "Uralmash" and "Donex", the user displays two checkboxes .

Calculation of search bounds for "BUCKET_CAPACITY_M3"


 --    SELECT max(dm.digital) AS maximum, min(dm.digital) AS minimum FROM rubric r JOIN rubric_item ri ON r.id = ri.rubric_id JOIN item i ON ri.item_id = i.id JOIN item_content ic ON i.id = ic.item_id JOIN content c ON ic.content_id = c.id JOIN property p ON c.property_id = p.id JOIN digital_matter dm ON c.id = dm.content_id WHERE p.code = 'BUCKET_CAPACITY_M3' AND r.code = 'ekskavatory-karernye'; /* 25,0.75 */ 

Data type - “DIGITAL_DATA_TYPE” - means analyzing - digital_matter.digital. Search method - “BETWEEN_SEARCH_TYPE” - then we do MIN () and MAX (). Calculated the boundaries from 0.75 to 25, the user displays something like this:

 <input type="range" min="0.75" max="25"> 

Search by parameters


 --    /* 'ekskavatory-karernye' 'MANUFACTURER_MODEL' '12' 'TRADE_MARK' '' '' 'BUCKET_CAPACITY_M3' 0.75 25 */ SELECT i.code FROM rubric_item ri join rubric r on ri.rubric_id = r.id JOIN item i ON ri.item_id = i.id JOIN item_content ic ON i.id = ic.item_id JOIN content c ON ic.content_id = c.id JOIN string_matter sm ON c.id = sm.content_id JOIN rubric_property rp ON ri.rubric_id = rp.rubric_id JOIN property p ON c.property_id = p.id AND p.id = rp.property_id WHERE r.code = :CODE AND p.code = :MODEL_PROPERTY AND sm.string LIKE '%'||:MODEL_LIKE||'%' INTERSECT SELECT i.code FROM rubric_item ri join rubric r on ri.rubric_id = r.id JOIN item i ON ri.item_id = i.id JOIN item_content ic ON i.id = ic.item_id JOIN content c ON ic.content_id = c.id JOIN string_matter sm ON c.id = sm.content_id JOIN rubric_property rp ON ri.rubric_id = rp.rubric_id JOIN property p ON c.property_id = p.id AND p.id = rp.property_id WHERE r.code = :CODE AND p.code = :MARK_PROPERTY AND sm.string IN ( :MARK1 , :MARK2) INTERSECT SELECT i.code FROM rubric_item ri join rubric r on ri.rubric_id = r.id JOIN item i ON ri.item_id = i.id JOIN item_content ic ON i.id = ic.item_id JOIN content c ON ic.content_id = c.id JOIN digital_matter dm ON c.id = dm.content_id JOIN rubric_property rp ON ri.rubric_id = rp.rubric_id JOIN property p ON c.property_id = p.id AND p.id = rp.property_id WHERE r.code = :CODE AND p.code = :BUCKET_PROPERTY AND dm.digital BETWEEN :MIN_BUCKET AND :MAX_BUCKET ; 

For each Characteristic, we make a search in accordance with the conditions set by the user and select those Entities that satisfy all the conditions, that is, we intersect - INTERSECT - the results of each separate search with each other.

If the user sets one condition, then the search will be based on one Characteristic, if there are 100 conditions, then 100 subqueries will be executed and the Entities will be given to the user who are present in the results of each of the 100 subqueries.

As a result of the search by parameters, the user has determined an interesting position (Entity).
It should be noted that a good search had to be done with the selection of the user "SYSTEM", but I missed this point.

Search by user values


I remind you that we are doing Avito for excavators. That is, each position of the catalog has its price, and each user has his own price. And when our user opened the catalog position card, he saw several offers and accordingly wanted to perform a search on these offers.

We had the properties:


These are user properties, their values ​​are set by users, this is regulated by the “USER_PROPERTY” Option.

Search boundaries


Define a set of custom characteristics for the Excavators Loaders category:

 SELECT btrim(p.code) AS "property", btrim(tu.code) AS "author_type", btrim(ts.code) AS "search_type", btrim(tt.code) AS "data_type" FROM rubric r JOIN rubric_property rp ON rp.rubric_id = r.id JOIN property p ON rp.property_id = p.id JOIN property_tag ptu on p.id = ptu.property_id JOIN tag tu on ptu.tag_id = tu.id JOIN property_tag pts on p.id = pts.property_id JOIN tag ts on pts.tag_id = ts.id JOIN property_tag ptt on p.id = ptt.property_id JOIN tag tt on ptt.tag_id = tt.id WHERE r.code = 'ekskavatory-pogruzchiki' AND tu.code = 'USER_PROPERTY' AND ts.code IN ('LIKE_SEARCH_TYPE','BETWEEN_SEARCH_TYPE','ENUMERATION_SEARCH_TYPE') AND tt.code IN ('DIGITAL_DATA_TYPE','STRING_DATA_TYPE') ; /* GOODS_ITEM_PRICE_RUB,USER_PROPERTY,BETWEEN_SEARCH_TYPE,DIGITAL_DATA_TYPE GOODS_ITEM_UNITS_OF_MEASURE,USER_PROPERTY,ENUMERATION_SEARCH_TYPE,STRING_DATA_TYPE */ 

GOODS_ITEM_PRICE_RUB:


GOODS_ITEM_UNITS_OF_MEASURE:


 --      'jcb-4cx' SELECT min(dm.digital) AS minimum, max(dm.digital) AS maximum FROM item i JOIN item_content ic ON i.id = ic.item_id JOIN content c ON ic.content_id = c.id JOIN property p ON c.property_id = p.id JOIN digital_matter dm ON c.id = dm.content_id WHERE p.code = 'GOODS_ITEM_PRICE_RUB' AND i.code = 'jcb-4cx'; /* 3400000 4700000 */ SELECT sm.string FROM item i JOIN item_content ic ON i.id = ic.item_id JOIN content c ON ic.content_id = c.id JOIN property p ON c.property_id = p.id JOIN string_matter sm ON c.id = sm.content_id WHERE p.code = 'GOODS_ITEM_UNITS_OF_MEASURE' AND i.code = 'jcb-4cx' GROUP BY sm.string; /* /  . */ 

Search query


We see that all positions have a unit of measure in pieces, recorded simply in different ways, so we are only looking for the price.

 /* :ITEM_CODE => 'jcb-4cx' :PRICE_PROPERTY => 'GOODS_ITEM_PRICE_RUB' :MIN_PRICE => 3400000 :MAX_PRICE => 4000000 */ SELECT r.id, r.title, r.description, c.raw FROM item i JOIN item_content ic ON i.id = ic.item_id JOIN content c ON ic.content_id = c.id JOIN redactor_content rc ON c.id = rc.content_id JOIN redactor r ON rc.redactor_id = r.id JOIN digital_matter dm ON c.id = dm.content_id JOIN rubric_item ri ON i.id = ri.item_id JOIN rubric_property rp ON ri.rubric_id = rp.rubric_id JOIN property p ON c.property_id = p.id AND p.id = rp.property_id WHERE i.code = :ITEM_CODE AND p.code = :PRICE_PROPERTY AND dm.digital BETWEEN :MIN_PRICE AND :MAX_PRICE ; /*   , -3,3 800 000 ,      ,3 400 000 */ 

Actually this is a simplified search purely for one Characteristic "price".
As a result of the request, you can also issue digital_matter.digital, but then (if we have INTERSECT on several Characteristics), you should cast to the TEXT type (digital_matter.digital::TEXT), in principle, we output the data in text form, so you can issue content.raw.

Conclusion


Actually, this is enough to understand how to apply the idea of ​​the ideal catalog in practice. Cases certainly an order of magnitude more than the search for goods and supplier.

The most important thing I wanted to show is that the idea is flexible enough to be perfect Universal :)

Given my recovery_mode, in another week I will be able to lay out php scripts to dynamically generate SQL queries.

Thanks to everyone who read, I will be grateful for any criticism and any advice.

Addon


For the sake of completeness, there are not enough hierarchical scripts, for the withdrawal of all rubrics from the root, and for the withdrawal of all parents for an arbitrary element. These scripts are in \ deploy \ view_catalog_settings_and_data.sql.

Map rubrics:

 --       ,    'GOODS' WITH RECURSIVE road_map ( id, element_tree_id, code, horizont ) AS ( SELECT pet.id AS id, pet.element_tree_id AS element_tree_id, r.code AS code, 0 AS horizont FROM element_tree pet LEFT JOIN rubric_element_tree ret ON pet.id = ret.element_tree_id LEFT JOIN rubric r ON ret.rubric_id = r.id WHERE r.code = :ROOT UNION SELECT cet.id, cet.element_tree_id, r.code, horizont + 1 FROM element_tree cet JOIN road_map c ON (c.id = cet.element_tree_id) LEFT JOIN rubric_element_tree ret ON cet.id = ret.element_tree_id LEFT JOIN rubric r ON ret.rubric_id = r.id ) SELECT code, horizont FROM road_map ORDER BY horizont ASC; 

Path from the root to the specified node (rubric)

 --         'ekskavatory-karernye' WITH RECURSIVE road_map ( id, element_tree_id, code, horizont ) AS ( SELECT cet.id AS id, cet.element_tree_id AS element_tree_id, r.code AS code, 0 AS horizont FROM element_tree cet LEFT JOIN rubric_element_tree ret ON cet.id = ret.element_tree_id LEFT JOIN rubric r ON ret.rubric_id = r.id WHERE r.code = :CHILD UNION SELECT pet.id, pet.element_tree_id, r.code, horizont + 1 FROM element_tree pet JOIN road_map c ON (c.element_tree_id = pet.id) LEFT JOIN rubric_element_tree ret ON pet.id = ret.element_tree_id LEFT JOIN rubric r ON ret.rubric_id = r.id ) SELECT code, horizont FROM road_map ORDER BY horizont DESC; 

Show all Values ​​of all Characteristics of one Entity

 --     'doneks-eo-4112a-1' SELECT i.title, p.title, dm.digital::TEXT FROM rubric_item ri JOIN item i ON ri.item_id = i.id JOIN item_content ic ON i.id = ic.item_id JOIN content c ON ic.content_id = c.id JOIN digital_matter dm ON c.id = dm.content_id JOIN rubric_property rp ON ri.rubric_id = rp.rubric_id JOIN property p ON c.property_id = p.id AND p.id = rp.property_id WHERE i.code = :CODE UNION SELECT i.title, p.title, sm.string::TEXT FROM rubric_item ri JOIN item i ON ri.item_id = i.id JOIN item_content ic ON i.id = ic.item_id JOIN content c ON ic.content_id = c.id JOIN string_matter sm ON c.id = sm.content_id JOIN rubric_property rp ON ri.rubric_id = rp.rubric_id JOIN property p ON c.property_id = p.id AND p.id = rp.property_id WHERE i.code = :CODE; 

Source: https://habr.com/ru/post/323498/


All Articles