📜 ⬆️ ⬇️

Tree DBMS

All those who have experience using, as a data warehouse, tree-like DBMS are invited to discuss. It would be useful to share the experience of developing tree structures, describing the specifics of building the index tree and the full-text information retrieval algorithms within the data warehouse.

Since any computer system, in order to optimize the exchange, exchanges between memory and a disk in the form of blocks, the atomic element storing data on the disk is a block. It's no secret that many DBMSs (the same ORACLE and MSSQL) actually store data in B-trees. A B-tree is a set of logically related blocks arranged in a hierarchy, at each level of which blocks are defined, each of which has the same number of descendants. The description of the algorithm of the B-tree is beyond the scope of this blog.

Relational, object, or direct access is provided by a logical model. Let me try to assume that a reasonable use of a logical data model that is as close as possible to actual storage will allow you to more easily and quickly process low-level data than using other logical models (SQL, etc.), although the requirements for the level of development of data access mechanisms are significantly increased. It is possible that direct access can be represented by a logical tree. An example of a logical data tree is the global in Cache DBMS.
')
I will give several examples of using, from personal experience, tree data structures (globals).


The use of logical trees can be useful for describing an incomplete and fuzzy subject area, information about which will be supplemented in the process of using the system. An example of such a subject area is the newspaper ad . Expansion of the subject area is possible both due to the inclusion of new categories (initially there were only car ads, and in the future there are sections of real estate, work, dating, etc.), and due to the refinement and increase in knowledge of the already described category (dynamic-speed characteristics car massages and so on). Suppose that different categories may intersect with each other (known or unknown at the time of the initial description).

We describe the data structure in the form of a global. In the description below, there are no official words and symbols except:
  1. s - set command
  2. ^ - globall symbol (logical tree)
  3. [] - namespace (can be defined on a remote server)
  4. $$$ is a user constant

description of the structure of the vehicle
//---------------------------UniVehicleModel---------------------------------------
//
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"UniVehicleModel","@p","makeId")="id."

//
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"UniVehicleModel","@p","modelId")="id."

//
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"UniVehicleModel","@p","year")="ta."

//
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"UniVehicleModel","@p","bodyTypeId")="id."


ad description (automotive only)
//----------------- ------------------------------------
//CarClassified
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"CarClassified")="variationType"
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"CarClassified","@t")="Classified"

//--------------------------------Classified---------------------------------------------
//
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Classified","@p","Price")="enclosure"

//
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Classified","@p","contactList")="list"

//
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Classified","@p","imageList")="list"

//
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Classified","@p","additionalText")="s."

//
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Classified","@p","rubricId")="id."

//
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Classified","@v","CarClassified","@p","UniVehicleModel")="enclosure"


As you can see from the description of the structure of the Classified (Classified) it contains a list of contacts. But contacts can have different variations:
  1. phone
  2. e-mail
  3. address
  4. GPS
  5. other contacts

What is not displayed in the description of the structure of the ad, but is described in the contact structure:
//------------------------------------Contact-------------------------------------
//
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Contact","@p","contactPerson")="ta."

// GPS
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Contact","@v","GPSContact","@p","GPS")="enclosure"

//
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Contact","@v","PhoneContact","@p","Phone")="enclosure"

//
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Contact","@v","AddressContact","@p","UniRealEstateAddress")="enclosure"

// -
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Contact","@v","WebSiteContact","@p","WebSite")="enclosure"

// eMail-
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Contact","@v","eMailContact","@p","eMail")="enclosure"

//--------------------- -------------------------------
//AddressContact
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"AddressContact")="variationType"
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"AddressContact","@t")="Contact"

//PhoneContact
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"PhoneContact")="variationType"
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"PhoneContact","@t")="Contact"

//GPSContact
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"GPSContact")="variationType"
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"GPSContact","@t")="Contact"

//WebSiteContact
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"WebSiteContact")="variationType"
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"WebSiteContact","@t")="Contact"

//eMailContact
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"eMailContact")="variationType"
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"eMailContact","@t")="Contact"


I apologize for the abundance of incomprehensible code in this blog, but suppose that we need to expand the system (newspaper ads) and include in it the subject area real estate . Just add a description of the new ad variation:
//----------------- --------------------------------------------
//RealEstateClassified
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"RealEstateClassified")="variationType"
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"RealEstateClassified","@t")="Classified"

//
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"Classified","@v","RealEstateClassified","@p","UniRealEstate")="enclosure"


And add a description of the structure of the property market:
//---------------------------------------UniRealEstate------------------------------------------
//
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"UniRealEstate","@p","floorQuanity")="n."

//
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"UniRealEstate","@p","floorNumber")="n."

//
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"UniRealEstate","@p","housePlaningType")="ta."

//
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"UniRealEstate","@p","totalArea")="ta."

//
s ^[$$$MEDIA]$$$mBodyMediaGlobal($$$mType,"UniRealEstate","@p","cellingHight ")="ta."
//------------------------------------------------------------------------------------------------------------------


The structure is described.

Perhaps I chose not the most successful service flags for the description:
  1. "@t" - types
  2. "@p" - properties
  3. "@v" - variations
  4. "List" - lists
In any case, they are easy to replace with more correct ones.

In the future, you can easily delve into the description of the required subject area: whether it be contacts, cars, real estate and so on. Of course, a recursive structure processing mechanism is also needed, which, based on the tree described above, writes, reads and updates data. That is, the body comes to the input (for example, xml) the mechanism runs along the body tree at the input, compares it with the structure description storage tree, and performs processing. Writing such an algorithm is not difficult for a programmer with some experience, and I will not give my code for this mechanism - as I am sure there are more worthy examples. One of the advantages of storing a description of a data structure in the form of a logical tree is that data processing mechanisms know nothing about the subject area (about input data), which can evolve as knowledge is gained. Of course, knowledge of the subject area — in some form, must be at the interface level (it is possible to use similar structures) —but all the mechanisms within the system (including CRUD mechanisms, indexing and search mechanisms) are not tied to the subject area (they know nothing about data).

Of course, the description of the data structure in the tree is not enough for this blog. In the near future I plan to describe the storage of data and search indexes in trees. Also in the global structure description, it is very convenient to store rules (function names) that should be called recursively at the data processing stage - and can influence the structure traversal path. I would appreciate just criticism. Ready to answer clarifying questions.

The methodology described in this blog, to one degree or another, is used in a live Internet project.

Source: https://habr.com/ru/post/59380/


All Articles