On Habré you can often find various articles about how this or that is done, with direct implementation, code, examples, rationales (even if controversial). Someone puts an example of the
controller , someone gives practical advice
on javascript . However, I did not see anyone talking about the organization of the database structure. Further than some school examples it does not go (if I am mistaken, correct and give references). No, SQL vs NoSQL does not interest me. In my humble conviction - the DBMS is secondary in matters of database organization. The performance issues of specific DBMS do not immediately become relevant. Whatever the chosen DBMS, for a specific task, there is only one requirement for performance -
performance must be sufficient . But the ways to achieve this sufficiency, ways to conveniently and beautifully place data - in order to quickly and easily retrieve them, organization of reference books and indexes, input and output, methods of scaling and / or changing the structure of the database over the course of life, methods used, resolved and unresolved problems , useful recipes and tips are all that I want to talk about.
The development of database structures is a very interesting and non-trivial process. In this vast area there are few living examples that can be viewed and discussed. Do you, the developers of the database, always everything is clear what and how to do? Let's share knowledge, let's ask, tell, discuss, learn. What is the difference between a table or an object or a global one? It is important what meaning is invested, what connections are built, by what means these connections are realized.
A couple of days ago a
translation was published, in which my approach to database programming was called extreme - I do not quite agree with that. In the comments, there were at least three people (@Ogoun
uaoleg 4dmonster ) who said that they would be interested to look at the live use of MUMPS and find out why we should not be afraid of globals. For these people and all those who are interested in discussing the topics raised by me, I am writing this article.
Definition:
The reference book is a slowly changing list of unique positions, containing brief and accurate information of a scientific, industrial or applied nature, united by a single theme . For example, address directories (countries, cities, streets ...) In the
definition of Wikipedia, I did not accidentally add "slowly changing."
Requirements:
The main requirements provided to the directory:
- The operation of retrieving the name of the directory element (retrieve) should be performed quickly.
- The name of any element of the directory can be changed in one place, and this change will apply throughout the system.
Consider in more detail:
')
Quickly get the name of the item directory - means
not to search . That is just read in a known place and give out. This suggests that slowly changing information is often accessed. The answer must be issued quickly. How this is done in Caché, I will show further. If there is any specialized search (find all the cities with the letter A at the distance of no more than x km from the city N) - you can still eat processor time - then the issuance of the name of the directory element is not.
Changing the name of the directory element in one place means that the updateName operation is complicated in complexity and runtime, similar to the retrieve operation, except for cases when a new name is required to be checked for validity. But even in this case, no restructuring and re-indexing of large data sets that use this element of the directory is required. This logically follows from one simple peculiarity of any reference books and data in general -
there may be errors in the names . That is, you can design to develop and run a system of any degree of complexity and necessity, and after some time, it turns out that the name has been mistaken. You need to fix this error. You do not want to check and re-index the whole or most of your system for a long time to correct this error. You do not want old / new analytical / operational reports or statistics - to stop agreeing with each other due to the change of one name. You do not want to re-generate seo and other page templates for your site. You want to change one name in one place and forget about it. Of course, to most DB developers, probably this is my requirement, it will seem obvious, and not worthy of such a large number of letters. But it turned out, in practice, it is performed much less frequently than can be imagined.
TestLet's do a simple test. Those of you who have a website (not necessarily that it belongs to you, it may just be the website of your company, or your client to whom you have access and can make changes on it) try now, in one place, change the name any directory entry (for example, a city). For example,
St. Petersburg - rename it to
St. Petersburg , or
Kiev - to the
Mother of Russian Cities or something else. Can you do it in one place? Tag value changed on all pages of your site? Are the addresses of all your pages unchanged? Did the captions and meta descriptions of the linked pages change automatically? No, I do not consider here complex systems of the parser, replacement of the endings of words and other things. Caching is also not considered for simplicity. (
Poll below ).
Additional requirements provided for the directory:
- You must be able to store the names of the directory elements in different languages while respecting the previous requirements. The retrieve operation for any new language should be performed as quickly.
- It is necessary to keep a history of changes in directory elements (names, structural positions, and other characteristics) with the ability to track and display information dependent on time t (as this city was called yesterday, or t years ago).
These requirements are called additional, because not all databases are important and necessary. For example, the ability to store the names of elements in different languages is not required for everyone. However, even if your website is tailored to one specific language or region, do not rush to remove this functionality from the list of requirements - it may be useful to you. The URLs of some pages contain the Latin name of certain elements of the directory. If this name is generated on the basis of the reference name every time - then with updateName - you can get a new URL for the old page, only because of the correction of the error in the title.
ExampleThe
example.com site directory contains an element with the name
Kiev . All examples relating to this city and site are on the page
example.com/kiev the value of kiev - was obtained by a simple transliteration + translation to lower case. Now, suppose you change the name of the reference element to the
Mother of Russian Cities . Since your page is linked to the transliteration of the element name and the name has changed, the translitement itself also changes, the address of the
example.com/mat_gorodov_russkih page also changes.
These problems can be avoided by specifying the name of the reference item in several languages. Let the first one be
ru and the second
partUri . Then when you change the name in one language, it will not automatically change to another (at least it can be controlled). And the page address will remain the same.
Keeping the history of changes in the directory elements - functionality that is not always required, it is implemented a little more complicated than multilingual or name change. This functionality entails an increase in the time required to change the directory element. However, with proper implementation, this time increases slightly. Also, based on the fact that the directory is a slowly changing information, there is nothing wrong with a longer change of the element.
Implementation
Let all directory elements be stored in the global
^ DictionaryGlobal is a global variable in which changes are saved to disk. Variable indices are in brackets after the variable name, separated by commas. For example:
^ Global Variable ("index1", "index2", ..., "indexN") = "value"Indices and values are quoted only if they are not numbers. The index can also be another variable (more on this later).
Assume that the global dictionary indices
^ Dictionary will mean the following:
- ontology (rough classification of reference books) - our ontology Vehicle (vehicles)
- directory name - TransmissionType (transmission type)
- reference item identifier (and all identifiers are unique, even within different directories and ontologies)
- item version number (0-current current version, the rest is history)
- element property name
The name of the global and the meaning invested in each index are invented by me (the developer). The description of these independent rules I will give later (possibly in the following articles). At the moment we’ll just see how the simplest single-level reference book can be organized, without any nesting and other things. The data given in the example are taken from the real live database used. The
zw command displays the values of a variable (global or local) with all defined indices. We will execute commands in the terminal.
MONTOLOGY> is the name of the namespace (defined and invented by me earlier). We will display information on our directory - run the command:
zw ^ Dictionary ("Vehicle", "TransmissionType") and look at the result:
MONTOLOGY>zw ^Dictionary("Vehicle","TransmissionType") ^Dictionary("Vehicle","TransmissionType",1,0,"UpdateTime")="62086,66625" ^Dictionary("Vehicle","TransmissionType",1,0,"uid")=888 ^Dictionary("Vehicle","TransmissionType",2,0,"UpdateTime")="62086,66625" ^Dictionary("Vehicle","TransmissionType",2,0,"uid")=888 MONTOLOGY>
Let's take a closer look at what we brought out. So, we have printed all the elements of the
Vehicle Ontology
TransmissionType directory. As you can see in this reference there are only two items with identifiers
1 and
2 . It is also obvious that once the 4th index is only
0 , then all elements of this directory are relevant, and after adding they have never been changed (there is no history). Each directory entry has only two properties:
UpdateTime (the date and time of the update in the Caché format) and
uid (the identifier of the user who made the change). Once again pay attention to the almost complete absence of official words and symbols in the team and the result.
As we see in our directory, something important is missing - namely, names. Let the names of all elements of all directories in all languages are stored in the global
^ NameDictionaryElementWe assume that the global indices
^ NameDictionaryElement will mean the following:
- reference item identifier
- tongue
- the version number of the name (0-current current version, the rest is history)
- property name (we will only use updateTime)
We will display information on the names of elements that interest us - we will execute the command:
zw ^ NameDictionaryElement (1), ^ NameDictionaryElement (2) This command is similar to two consecutively executed commands:
zw ^ NameDictionaryElement (1) and
zw ^ NameDictionaryElement (2)Let's look at the result:
MONTOLOGY>zw ^NameDictionaryElement(1),^NameDictionaryElement(2) ^NameDictionaryElement(1,"partUri",0)="akp" ^NameDictionaryElement(1,"partUri",0,"UpdateTime")="62086,66625" ^NameDictionaryElement(1,"ru",0)="" ^NameDictionaryElement(1,"ru",0,"UpdateTime")="62086,66625" ^NameDictionaryElement(2,"partUri",0)="meh" ^NameDictionaryElement(2,"partUri",0,"UpdateTime")="62086,66625" ^NameDictionaryElement(2,"ru",0)="" ^NameDictionaryElement(2,"ru",0,"UpdateTime")="62086,66625" MONTOLOGY>
As you can see, both elements have names in two languages. Ru - Russian, partUri - is used in urla. We also see that the names did not change after adding - there is no history (there is only zero current version).
Retrieve
Now let's write the simplest
Dictionary program and add a
retrieve function (method) to it - which will return the name of the directory element in the required language and version (code from the live project):
q - the abbreviated spelling of the
quit command - (ala return)
$ g - abbreviated spelling of the
$ get command - that is, safe access to a variable, if there is no value for the specified indexes - the default after the comma is returned, in our case the empty string
""Now we give examples of calling our function (subroutine), the syntax of the call is as follows:
w $$ subprogram ^ ProgramName (parameters if present)w is the abbreviated name of the
write command.
$$ means that the function returning the value is written by the developer (if it is systemic then the dollar sign is one).
Do not be confused by the symbol
^ in the program call, because it is also used to refer to globals (global variables). The fact is that the programs are stored in the same globals, like the ones that I cited in the example (I will tell about this later). So execute the following commands:
MONTOLOGY>w $$retrieve^Dictionary(1) MONTOLOGY>w $$retrieve^Dictionary(2) MONTOLOGY>w $$retrieve^Dictionary(1,"partUri") akp MONTOLOGY>w $$retrieve^Dictionary(2,"partUri") meh MONTOLOGY>
Of course, these examples are not far from the "school". In the future, I plan to tell you how the global indexes are arranged, how to create global rules and programs that work with these rules. Directories will have more complex structures. But the names of these globals will remain the same (new ones will simply be added) - they are actually used in a live project, and the retrieve method too. All this in the following articles.
Thanks for attention.
I will be glad to questions and comments.