📜 ⬆️ ⬇️

Blocks. Inside the Caché database file

Not so long ago, articles on the global in Caché were posted on the InterSystems blog on Habré, what it is prepared with and how it is served ( part 1 and part 2 ). This is all, of course, interesting, the convenience of working with any data models that the developer wishes. But what provides a good speed of handling these globals?



Theory


The Caché database is a directory with the name of the database, which contains the file CACHE.DAT. On * nix-systems, a disk partition can act as a database.

Data in Caché is stored in blocks, and those, in turn, are organized as a balanced B * tree. If we recall that we store globals in the form of a tree in a simplified understanding, the globals themselves will be the globals themselves, and the globals will be the leaves. The difference between a balanced B * tree and a regular B-tree is that the branches have right-hand links, which help (in our case with globals) bypass the indices rather quickly using the $ Order and $ Query functions, without having to go up to the tree trunk.
')
The block size in the database file is fixed, by default it is 8192B, but it is possible to allow creating a database with block sizes of 16kB, 32kB and 64kB. The system developer can select the required block size depending on the nature of the data he plans to store. But it must always be borne in mind that data is read block by block - even if a single value of 1 byte is requested, several blocks will be read, and only the last block in this chain will contain the data. Caché also has different global buffers — you cannot mount or create a new database if the global buffer is not configured with the appropriate block size — this will lead to an error.





The picture just allocated memory for the global buffer for databases with 8kB blocks - only such bases with 8kB blocks will work in this system. Blocks in the database are grouped into cards, one card in the case of an 8kB block describes 62464 blocks, and is stored in the card block that goes first in the card.

Types of blocks


There are several types of blocks. At each block level, the right link must point to a block of the same type, or to a zero block, which may mean that there is no further data.


So, in the first block of the Caché database is the service information about the database file. In the second - a block map. And the first block of the catalog goes third (block number 3) and there can be several such blocks of the catalog for the database. Next are pointer blocks (branches), data blocks (leaves of trees) and blocks of large rows. As I wrote above, the block (s) of the global directory stores information about all available globals in the database. It can also store global settings even if there is no data in such a global. In this case, the node describing such a global will have a null lower link. You can view a list of globals from the global catalog through the management portal. You can also enable the ability to save the global in the directory after deletion - for example, to save the sort.



In the same place, you can create a new global - in this case, you can immediately set up any available sorting and select it different from the one that is installed by default in the database.



In general, the tree of blocks can be represented as in the picture below. Blocks are marked in red.


Database integrity


To date, the development of the Caché DBMS, the possible cases and errors that could lead to the degradation of the database, are minimized, and the need to repair the database occurs less and less. But in any case, the integrity check is recommended regularly on an automatic basis. To do this, there is the ^ Integrity utility, which can be launched through the terminal from the% SYS area, through the management portal, on the Databases page, and also through the task manager. By the way, the task of automatic integrity checking is already configured by default, but it is disabled - you just need to activate it:





In the process of checking the integrity, the correctness of the indication of lower links, the correctness of block types is checked, the right links are checked. Globals are also compared to match the sort order. If, as a result of the integrity check, errors were found, you can use the ^ REPAIR utility, which can be run in the% SYS area. This utility allows you to view any block as well as edit it if necessary, i.e. repair db.

Practice


But all this is theory. What the global and its blocks look like is actually quite difficult to judge. The only available way to view blocks is the ^ REPAIR utility mentioned above. The output of this program looks like this:



I recently started working on a project that allows you to walk through the block tree, without the risk of damaging the database, and conveniently viewed in a browser, with the ability to save this visualization in SVG or PNG format. The project is called CacheBlocksExplorer, the source of the project is laid out on Github .



Of the features implemented:


What else needs to be done:


I also wanted to display the whole tree at once, but until I found such a suitable library that could quickly display several hundred thousand blocks along with links, it turns out very slowly in the current library and rendering in the browser was slower than reading this structure in Caché .

In the next article I will explain in more detail with examples how everything works and what can be learned about our globals and blocks if you have a tool like the Cache Block Explorer developed by me.

Source: https://habr.com/ru/post/267951/


All Articles