A little more about how MarkLogic Server stores data.
About data formatsMarkLogic Server is an XML database, but in addition to XML it can store JSON, text and binary data. At the same time, JSON documents are transformed into XML when they enter the database. Text documents are indexed as XML text objects without a “parent” object. Binary documents are not indexed by default, but it is possible to create an index of their metadata and extracted content.
About indexes')
Indexes in MarkLogic are used everywhere and this is done to increase database performance. Out of the box are available
Text index and
Structure index , which index all XML data and are used when performing XQuery queries, which allows to achieve high efficiency. Metadata indexes are also available:
Collection Indexes ,
Directory Indexes ,
Security Indexes ,
Properties Indexes .
It should be noted that the indexes in MarkLogic Server can exceed the size of the XML data itself by 2 or even 3 times. But such a situation is possible only with a large number of indexes involved. This is also influenced by the fact that MarkLogic compresses XML data during storage. Out of the box, MarkLogic usually has a small index size relative to the source data.
About internal representationConsider a little more detail about how data is stored in the MarkLogic Server. The main concepts here are the following:
Database is the highest abstraction over the internal presentation of data in MarkLogic Server. It provides access to data as a single entity, regardless of scaling mechanisms and internal representation.
The
Database object combines security settings, xml document schemas, a set of triggers, in-memory cache settings, indices, search governing options, logging settings, replication options, backup settings and a set of Forest objects.
Forest are objects in which data and indexes are stored. A database can have more than one Forest object and they can be located on the same or on different servers. The
“local-disk failover” mechanism manipulates Forest objects; for this purpose, one or more
“replica forest” objects are assigned to one Forest object, which allows for increased reliability.
Forest has significantly fewer settings than Database objects. For Forest, you can configure the location of the data on the
“data directory” file system, specify the location for storing
large data directory large objects or the location of the so-called
“fast data directory” , i.e. directories on a fast file system.
The “fast data directory” is used to store the transaction log and data fragments. This directory should be located on a storage device other than the one on which the
“data directory” is located . When filling in the
“fast data directory” large objects from it are merged with the data located in the
“data directory” . Inside Forest, data is stored in Stand objects.
Stand - Is part of the Forest facilities. Each Stand is a packed binary file stored in the subdirectories of the Forest object. The Stand object itself consists of XML fragments.