📜 ⬆️ ⬇️

Ideal catalog, architecture sketch

I turned up the task to develop a universal catalog of goods and services, part-time catalog of companies, documents and anything else. In the work, this “experience” was not useful, but the idea is good, in my opinion to my humble opinion :) I would like to share and listen to the criticism.

The catalog implies orderliness - a hierarchy, it implies the immediate storage of information, and of course the search, probably the analyst ... something else? Nothing else comes to mind.

Now the points.

Hierarchy


Definitely the grouping of information will be tree-like, from the abstract “Catalog” to more specific, to more specific (for example, “hammer”). The level of detail can be any, we will not keep ourselves within the framework of "section", "subsection", "category", "subcategory", let the depth of branching be infinite.
')
Information will be stored in some kind of DBMS, for working with a hierarchy this DBMS must be able to hierarchical queries, there are not a few such DBMS, of the free ones, the most pop is PostgreSQL.
DDL Decals Tree of Elements:

CREATE TABLE element_tree ( id SERIAL PRIMARY KEY NOT NULL, element_tree_id INTEGER, is_hidden INTEGER DEFAULT 0, insert_date TIMESTAMP WITH TIME ZONE DEFAULT now(), CONSTRAINT fk_element_tree FOREIGN KEY (element_tree_id) REFERENCES element_tree (id) ); CREATE INDEX ix_element_tree_element_tree_id_id ON element_tree (element_tree_id, id); 

Explanation of the table structure


id is the row identifier of the table.

element_tree_id - reference to the parent element of the tree.

is_hidden is the remote record flag (0 is a valid record, 1 is a remote one), why is the flag only raised for the record, instead of being deleted? because when you meet an identifier in the logs and you want to see what it is, it is very convenient to make selections in the current database, instead of doing this select in the backup (and it’s not a fact that this record will be found in this backup).

insert_date - the date of adding the record, conveniently when you know - this is a 100 year old record or it was inserted five minutes ago as a result of an unsuccessful insert.

Of course, people who are familiar with the system all this (is_hidden, insert_date) are not very necessary, but for those who look at the system like a ram on a new gate, these fields will be very useful, in my practice, usually in the role of a ram :)

CONSTRAINT fk_element_tree - the external key to itself - a pointer to the parent element.
INDEX ix_element_tree_element_tree_id_id - an index to search for child branches (nodes of descendants), if we find the parent by the primary key, then for an accelerated search of descendants we need to have a separate index.

As someone experienced and advanced could notice, there are no columns in the table for the name of the element. And why? And because the hierarchy is only a hierarchy, and not the nodes of the tree are ordered, but those tables that are docked to the hierarchy, therefore the names in the tables, and in the hierarchy only the grouping of elements.

Direct storage of information


In the end, any directory is a list of individual headings. A rubric is a certain group of entities with a unique set of characteristics. That is, we have the relation of the Essence itself and the grouped entities - the Rubric, and besides the Rubric is also a grouping for the unique Characteristics of these Entities.

That is, information is divided into three parts - Essence, Characteristic, Category, where the Rubric is the junction point of several Entities and Characteristics.

In the language of the DBMS, it sounds like this:

Table Rubrics
 CREATE TABLE rubric ( id SERIAL PRIMARY KEY NOT NULL, code CHAR(100), title VARCHAR(4000), description VARCHAR(4000), is_hidden INTEGER DEFAULT 0, insert_date TIMESTAMP WITH TIME ZONE DEFAULT now() ); CREATE UNIQUE INDEX ux_rubric_code ON rubric (code); 

Table of Entities (a certain Stuck, which can turn out to be both a commodity and a service, or a company, or in general a report and anything else):

 CREATE TABLE item ( id SERIAL PRIMARY KEY NOT NULL, code CHAR(100), title VARCHAR(4000), description VARCHAR(4000), is_hidden INTEGER DEFAULT 0, insert_date TIMESTAMP WITH TIME ZONE DEFAULT now() ); CREATE UNIQUE INDEX ux_item_code ON item (code); 

Table Characteristics (properties):

 CREATE TABLE property ( id SERIAL PRIMARY KEY NOT NULL, code CHAR(100), title VARCHAR(4000), description VARCHAR(4000), is_hidden INTEGER DEFAULT 0, insert_date TIMESTAMP WITH TIME ZONE DEFAULT now() ); CREATE UNIQUE INDEX ux_property_code ON property (code); 

Here we see new columns:

code - a unique code (mnemonic) for writing, registering in queries and configs identifiers are not Camille, because identifiers can be different on different machines and make sure that they are the same slightly tedious, it is much more convenient to use a write code - it and it is easier to remember than a set of identifier digits, and in the code when you see the word, and not the magic numbers, the essence of what is happening becomes a little clearer.
title - name (name had to be replaced with title, because name is a keyword for PostgreSql).
description - description (the name is used for selection in the list, and the description for the actual description of the destination of the entry).

Now about how all this is connected.

Organization of information in the catalog


The rubrics are docked to the element tree, the docking is performed by a separate table:

 CREATE TABLE rubric_element_tree ( id SERIAL PRIMARY KEY NOT NULL, rubric_id INTEGER NOT NULL, element_tree_id INTEGER NOT NULL, insert_date TIMESTAMP WITH TIME ZONE DEFAULT now(), CONSTRAINT fk_rubric_element_tree_rubric_id FOREIGN KEY (rubric_id) REFERENCES rubric (id), CONSTRAINT fk_rubric_element_tree_element_tree_id FOREIGN KEY (element_tree_id) REFERENCES element_tree (id) ); CREATE UNIQUE INDEX ux_rubric_element_tree_rubric_id ON rubric_element_tree (rubric_id); CREATE UNIQUE INDEX ux_rubric_element_tree_element_tree_id ON rubric_element_tree (element_tree_id); 

The table performs the role of “one-to-one connection”, the table has two foreign keys, for each key column there is its own index.

Both the Rubric and the Tree Element can only be joined once, so an index with a unique character is made for each column.

Each rubric has its own set of Characteristics (properties):

 CREATE TABLE rubric_property ( id SERIAL PRIMARY KEY NOT NULL, rubric_id INTEGER NOT NULL, property_id INTEGER NOT NULL, insert_date TIMESTAMP WITH TIME ZONE DEFAULT now(), CONSTRAINT fk_rubric_property_rubric_id FOREIGN KEY (rubric_id) REFERENCES rubric (id), CONSTRAINT fk_rubric_property_property_id FOREIGN KEY (property_id) REFERENCES property (id) ); CREATE UNIQUE INDEX ux_rubric_property_rubric_id ON rubric_property (rubric_id, property_id); CREATE INDEX ix_rubric_property_property_id ON rubric_property (property_id); 

The table has two foreign keys, one-to-many relationships.

One Rubric has one Characteristic once - provided by the index, different Rubrics can have the same Characteristic - an index according to the Characteristic without uniqueness of values.
Each rubric has its own set of Entities (Pieces):

 CREATE TABLE rubric_item ( id SERIAL PRIMARY KEY NOT NULL, rubric_id INTEGER NOT NULL, item_id INTEGER NOT NULL, insert_date TIMESTAMP WITH TIME ZONE DEFAULT now(), CONSTRAINT fk_rubric_item_rubric_id FOREIGN KEY (rubric_id) REFERENCES rubric (id), CONSTRAINT fk_rubric_item_item_id FOREIGN KEY (item_id) REFERENCES item (id) ); CREATE UNIQUE INDEX ux_rubric_item_rubric_id_item_id ON rubric_item (rubric_id, item_id); CREATE UNIQUE INDEX ux_rubric_item_item_id ON rubric_item (item_id); 

The table has two foreign keys, one-to-many relationships. One Rubric has several Entities, each Entity can belong to only one Rubric.

It was the structure of information storage, and where is the information itself?

The information itself is kept separate.

Data storage


Table Values ​​(value of the information characteristics):

 CREATE TABLE content ( id SERIAL PRIMARY KEY NOT NULL, raw VARCHAR(4000), redactor_id INTEGER NOT NULL, property_id INTEGER NOT NULL, insert_date TIMESTAMP WITH TIME ZONE DEFAULT now(), CONSTRAINT fk_content_redactor_id FOREIGN KEY (redactor_id) REFERENCES redactor (id), CONSTRAINT fk_content_property_id FOREIGN KEY (property_id) REFERENCES property (id) ); CREATE INDEX ix_content_redactor_id ON content (redactor_id); CREATE INDEX ix_content_property_id ON content (property_id); 

This label is not quite normal, in fact it is just a “memory” cell that stores the value (raw). The value of a specific characteristic (property_id). The value specified by the specific Editor (redactor_id). From the plate it is not clear what the value of this characteristic refers to, whether to a hammer model, or to a video card model, docking with Essence is the task of a separate table, but it is still early, you need to tell Editors about:

 CREATE TABLE redactor ( id SERIAL PRIMARY KEY NOT NULL, code CHAR(100), title VARCHAR(4000), description VARCHAR(4000), is_hidden INTEGER DEFAULT 0, insert_date TIMESTAMP WITH TIME ZONE DEFAULT now() ); CREATE UNIQUE INDEX ux_redactor_code ON redactor (code); 

Why do we need an editor? The information catalog was supposed to be something like Wikipedia, where each Editor could define his own version of the Characteristic values ​​for each Entity. And the System itself had to work with variations of the Editors to represent the same Entity, to consider the analytics on these variations.

The Table of Values ​​stores only a string representation of the characteristic information. This is actually user input. The system works with a different view of this information, with a view depending on the type of data. Each data type has its own table.

Strings

 CREATE TABLE string_matter ( content_id INTEGER NOT NULL, id SERIAL PRIMARY KEY NOT NULL, string VARCHAR(4000), insert_date TIMESTAMP WITH TIME ZONE DEFAULT now(), CONSTRAINT fk_string_matter_content_id FOREIGN KEY (content_id) REFERENCES content (id) ); CREATE UNIQUE INDEX ux_string_matter_content_id ON string_matter (content_id); 

* TEXT should be used to store strings in PostgreSql

Numbers

 CREATE TABLE digital_matter ( content_id INTEGER NOT NULL, id SERIAL PRIMARY KEY NOT NULL, digital DOUBLE PRECISION, insert_date TIMESTAMP WITH TIME ZONE DEFAULT now(), CONSTRAINT fk_digital_matter_content_id FOREIGN KEY (content_id) REFERENCES content (id) ); CREATE UNIQUE INDEX ux_digital_matter_content_id ON digital_matter (content_id); 

Dates (time stamps)

 CREATE TABLE date_matter ( content_id INTEGER NOT NULL, id SERIAL PRIMARY KEY NOT NULL, date_time TIMESTAMP WITH TIME ZONE insert_date TIMESTAMP WITH TIME ZONE DEFAULT now(), CONSTRAINT fk_date_matter_content_id FOREIGN KEY (content_id) REFERENCES content (id) ); CREATE UNIQUE INDEX ux_date_matter_content_id ON date_matter (content_id); 

Time intervals

 CREATE TABLE duration_matter ( content_id INTEGER NOT NULL, id SERIAL PRIMARY KEY NOT NULL, duration INTERVAL, insert_date TIMESTAMP WITH TIME ZONE DEFAULT now(), CONSTRAINT fk_duration_matter_content_id FOREIGN KEY (content_id) REFERENCES content (id) ); CREATE UNIQUE INDEX ux_duration_matter_content_id ON duration_matter (content_id); 

The data types are specially selected "international", so that you can transfer the database structure to any platform, to any DBMS.

The name "matter" is chosen for consonance with the words "matter" and "essence".

And I didn’t tell about one thing, these are options:

 CREATE TABLE option ( id SERIAL PRIMARY KEY NOT NULL, code CHAR(100), title VARCHAR(4000), description VARCHAR(4000), is_hidden INTEGER DEFAULT 0, insert_date TIMESTAMP WITH TIME ZONE DEFAULT now() ); CREATE UNIQUE INDEX ux_option_code ON option (code); 

Options are needed as tags that determine how to handle the Characteristics, how to analyze them, what search mechanism to use, with which data type to save, what rights (permissions) are required for access and all such business logic is tied to options.

Options match with Features:

 CREATE TABLE property_option ( id SERIAL PRIMARY KEY NOT NULL, property_id INTEGER NOT NULL, option_id INTEGER NOT NULL, insert_date TIMESTAMP WITH TIME ZONE DEFAULT now(), CONSTRAINT fk_property_option_property_id FOREIGN KEY (property_id) REFERENCES property (id), CONSTRAINT fk_property_option_option_id FOREIGN KEY (option_id) REFERENCES option (id) ); CREATE UNIQUE INDEX ux_property_option_property_id_option_id ON property_option (property_id, option_id); CREATE INDEX ix_property_option_option_id ON property_option (option_id); 

Content connects with Entities:

 CREATE TABLE item_content ( id SERIAL PRIMARY KEY NOT NULL, item_id INTEGER NOT NULL, content_id INTEGER NOT NULL, insert_date TIMESTAMP WITH TIME ZONE DEFAULT now(), CONSTRAINT fk_item_content_item_id FOREIGN KEY (item_id) REFERENCES item (id), CONSTRAINT fk_item_content_content_id FOREIGN KEY (content_id) REFERENCES content (id) ); CREATE UNIQUE INDEX ux_item_content_item_id_content_id ON item_content (item_id, content_id); CREATE UNIQUE INDEX ux_item_content_content_id ON item_content (content_id); 

Actually this is all part of the Information catalog.

Chip "architecture"


Now the most important question, why such a number of connecting plates, is it really impossible to connect the tables directly?

The answer is that such an “architecture” is aimed at maximum modularity. Each table is sharpened for one function and these functions can be flexibly combined. Flexibility is violated only by the Values ​​table - content, of course, the connection with the Editors could be put into a separate table, but this is too much over the edge (although in the next implementation I will do so). The relationship of content with property is hard because Value (content) cannot be interpreted outside the Characteristics (property).

The flexibility of connections is made for the convenience of transferring subjects between other subjects of the System.

That is, we can transfer the Entity with the same set of Values ​​between different Rubrics, and in each Rubric we will see and work with the Entity only with those Characteristics that are defined for this Rubric. We can freely transfer Values ​​from one Entity to another, while not affecting the values ​​themselves.

We can use only the string representation of information and forget about the highly specialized views in the * _matter tables.

We can use only Rubrics without rubricating Rubrics around the Element Tree. And we can only scatter across the Tree only the Rubrics that we want to give users access, and system Rubrics should not join the tree and thus hide them from the users.
We can add or delete Characteristics for the Category, and the Values ​​will not be affected and will not be affected.

That is, from project to project we can use only the functionality that is needed, and which is not needed, you can cut it in two accounts, simply by eliminating unnecessary classes from the assembly.
In general, we can twist and wriggle with data as we please without any changes in the database structure, and accordingly without changes in the classes working with these data, if the logic changes, we would have to change only the business logic layer without changing the data access layer without changing the "primitive" classes responsible for the data editing interface.
With the increased overhead of data access, we got more flexibility and greater resistance to users' careless actions, we can conduct some experiments without the need for backups, for such “risky” and lazy programmers like me, this is a big plus :)

To all this "beauty" there is also a PHP code, but about it next time, and considering my “Recovery mode”, only in a week.

Ps. Probably, after your comments, it will be necessary to write once again about this “architecture”, and then you can tell about PHP classes to work with this data storage and processing system.

ER Chart




Continuation


Ideal catalog, architecture sketch

Source: https://habr.com/ru/post/322930/


All Articles