NoSQL DBMS MarkLogic - a brief overview

The purpose of this article is to acquaint Habr's readers with MarkLogic (ML) NoSQL-DBMS. A quick search shows that among Russian-speaking IT people it is little known. With this review I will try to correct this situation.

1) The main purpose is to store large amounts of information that does not have a rigid structure (as opposed to table-based DBMS) and efficiently search through it. At the same time, MarkLogic is a “self-contained” server, and not an add-on to another database server (for example, any SQL).

2) The database model is file-folder. In fact, any database in ML is a virtual file system, with directories, access control, timestamps, etc. Each file is an XML document indexed by the server; Search is performed inside any XML with regard to its markup. Plain-text and binary documents can also be stored, and in the latter, you can index and search for meta-information. Built-in processing of PDF, images, archives, MS Office documents, etc. For large binary, you can configure transparent saving not in the database, but in the host file system.
')
3) Query language:

3.1) Basic - XPath (with some stretch it can be viewed as SQL, only in relation to XML); although its advanced features, described in the XQuery standard and known as 'FLWOR', are more commonly used. XQuery queries can be received by the server "on the fly" from the program that starts the connection to the database; and also taken from the database itself or from the host file system (see below).

3.2) In addition to XPath, the query can be constructed using the built-in functions ML, which allows for a quick search by index and produce results in acc. with relevance. The result of any query is a set of lines in XML format, plain-text or binary.

4) Execution of requests is carried out either through the built-in HTTP-server, or through XDBC / ODBC.

4.1) You can use HTTP both by requesting documents from the database along the path and name, and by executing saved XQuery queries. The result can be given in any form (XML / HTML, json, binary, etc.). At the same time, it is possible to receive request headers and control response headers. The latest version of ML has the ability to create RESTful services.

4.2) The request can be made via XDBC, for which there are connector modules for Java, C # and Perl. When developing complex XQuery there is an opportunity for step-by-step debugging. (For this, I use IDE Oxygen, but according to some information there is a plugin for Eclipse.)

5) Saving and modifying documents is done by ML's built-in functions. (XQuery is in its pure form so far a language of queries, unlike SQL.) Each XQuery query is a transaction, and the default changes are entered into the database after the successful completion of the script. MarkLogic can validate data with an XML Schema when invoking the appropriate XQuery statement.

6) Data uploading can be done by executing acc. XQuery via XDBC, through a RESTful service, or via the embedded WebDAV.

7) Separately, I note the ability to produce XSL-transformations using ML. The XSL script has access to all server functions and data. XSL itself can be either selected from the database (or from the host file system), or received from outside.

8) Configuration is carried out through the built-in Web-interface, although you can directly edit configuration files. In my opinion, there are no built-in capabilities for working from the command line. I used Perl scripts.

9) ML contains a huge set of functions, both for administering it itself, and for managing the data stored in its database, for searching by indexes, etc. There are also functions for working with the file system of the server host on which the DBMS is running; to work with network resources; with transactions, schedules, events (triggers), etc. In essence, ML can be considered as a kind of virtual OS in relation to the scripts launched on it (taking into account item 2). This, in my opinion, is the main advantage of this DBMS, since the server part of the project of any complexity can be done on any hardware platform (ML is released for all more or less known), using one language - XQuery and having an idea of only one API.

10) Licensing conditions: this is a commercial product, but there is a license for commercial use of a free version of the DBMS (although with reduced capabilities, for example: the amount of stored information has a 40GB limit; the number of processors used is 2; there is no clustering capability). Until recently, it was an industrial DBMS, used mainly in large companies, projects and public institutions (for example, in the library of the US Congress). However, the availability of a free license made it convenient in startups. Such an approach to licensing is logical: most projects in the early stages of their existence rarely experience high loads and do not require specific functionality. But as business develops and profits are made, acquiring a “full-bodied” version becomes justified.

In the future, I intend to publish my notes related to the development of scripts for this DBMS and the study of its capabilities; and also I will not ignore other DBMS of this level.

Source: https://habr.com/ru/post/170605/

All Articles

NoSQL DBMS MarkLogic - a brief overview

More articles: