Data in MarkLogic Server [Part1]

MarkLogic Server is a document-oriented native XML database. As with any document-oriented DB in MarkLogic Server, data can be represented as a file-folder structure. By the way, when accessing the repository via WebDAV, the data is presented in this way. In addition to XML itself, any binary data can be stored in MarkLogic Server as files.

Internally, the presentation of XML data in MarkLogic Server is quite complex and will be discussed later. Now it’s worth saying that you can only put well-formed XML in the MarkLogic Server, since it is not stored as plain text, but as an XML data object. The encoding of the internal representation of XML data is Unicode, which eliminates many problems with different languages. All Entity in XML data is expanded into digital entities. If only they are used in the document, this will not cause any problems, otherwise the MarkLogic Server must “know” about all the entities used.

Consider two interesting questions - access to documents in the MarkLogic Server and filling it with these same documents.
')
Next, we will proceed from the fact that the database stores documents of the form

<horse xmlns=”ns1”> <location>vacuum</location> <geometry>spherical</geometry> </horse>

And it will be stored like this

 /horses/ horse1.xml horse2.xml …

For XPath we will use the prefix ns1 = "ns1"

First of all, we will look at how to retrieve (read) XML documents from the database in XQuery code.

To read one document, use the fn: doc function.

 let $id := "horse1" let $uri := fn:concat("/horses/", $id, ".xml") return fn:doc( $uri )

Specify the path to the document as a parameter and get its contents

 fn:doc( [$uri as xs:string*] ) as document-node()*

The result of this function is the root element (tag) for the XML document document () , text () element for a text document and binary () element for binary documents.

Any XPath can be applied to the XML result of this function. Like so

 fn:doc( $uri )/ns1:horse

or get a list of tags ns1: location

 fn:doc( $uri )/ns1:horse/ns1:location

or do it like this

 fn:doc( $uri )//ns1:location

Documents in MarkLogic Server can be combined into logical groups by storing them in different directories. For example, the documents (objects) “horseN.xml” are stored in the directory “/ horses /” . But! Quite often it is required to create intersecting groups (associations) of documents. For these purposes, as well as to speed up access to documents in MarkLogic Server, there is a collection mechanism. Each document can consist of several collections at the same time; there are no restrictions on this.

Next, consider the method of obtaining documents from the collection. Suppose that our documents are included in the “horses-collection” collection , then access to the collection is performed as follows

 let $collections := ("horses-collection") return fn:collection($collections)

where $ collections is a list of collections whose documents you need to get

 fn:collection( [$uri as xs:string*] ) as document-node()*

This function returns documents included in the specified collections. The result of fn: collection is a list of documents.

The $ uri parameter in the fn: collection function is optional. If it is absent, fn: collection returns a list of all documents in the database. The following expression allows you to do this.

 fn:collection()

It is worth noting that collections in the MarkLogic Server do not need to be created or configured beforehand. When you add a document to a non-existent collection, it is created with it and the document is placed in the newly created collection. This approach allows you to create collections dynamically. This gives even more flexibility in organizing quick access to documents in the repository.

There is another way to access documents added by developers to simplify XQuery code.

 /ns1:horse

this expression is equivalent to the following

 fn:collection()/ns1:horse

That is, scan all documents in the repository and return the contents of documents that have the root tag ns1: horse .

It turns out that in the XQuery code you can write the XPath as a bleaching element and it will be applied to all documents in the repository and will return the result of its execution.

It is very careful to use this approach to retrieve data from the database because, with a large number of documents, this may require considerable resources and time.

The result of the following expression will be the same list of ns1: horse documents, but when it is executed, each tag in the database will be scanned, which is a very resource-intensive task.

 //ns1:horse

The use of such a method of obtaining data is justified only in the case when it is necessary to select all the tags whose specific location in the XML documents is not known or not permanently. Only in this case is it justified to scan all tags in the repository. Do not forget, also, that the query performance in this case will be an order of magnitude lower, especially with a large number of documents or tags in them.

Sometimes documents are saved in one directory without merging them into a collection. But at the same time, the arrangement of documents within one directory makes sense in the logic of the program and requires access to these documents as a single entity. To perform such a task, you can use the following method

 xdmp:directory("/horses /", "1")/ns1:horse

The xdmp: directory function returns all documents in the specified directory.

 xdmp:directory( $uri as xs:string*, [$depth as xs:string?] ) as document-node()*

Here, an optional variable can take two values “1” and “infinity” , but it describes the depth of nesting of the documents that will be included in the result. In the case of $ depth = “1”, MarkLogic Server will limit itself to documents in the specified directory, while “infinity” will force it to scan all subdirectories in search of documents.

All the XPath expressions above are simple, but in their place there may be something like this.

 //ns1:location[ (fn:starts-with(., “va”) and fn:starts-with(., “m”)) or (. eq “location1”) ]

Complicated XPath greatly increases the resources spent on a query. It is better to organize the retrieval of documents from the database by means of the DB itself, such as collections and directories, using simple XPath expressions if necessary.

The second important question is the filling of the base. At once I will make a reservation that MarkLogic Server does not support the XQuery Update extension for XQuery and provides document manipulation functions through its API.

Filling the database can be done in several ways:

1. Create a document directly from XQuery code. This is done like this

 declare variable $collections := ("horses"); let $uri := “/horses/horse1.xml” let $horse := <horse xmlns=”ns1”> <location>vacuum</location> <geometry>spherical</geometry> </horse> return xdmp:document-insert( $uri, $horse, xdmp:default-permissions(), $collections )

Parameters of the xdmp function: document-insert look like this

 xdmp:document-insert( $uri as xs:string, $root as node(), [$permissions as element(sec:permission)*], [$collections as xs:string*], [$quality as xs:int?], [$forest-ids as xs:unsignedLong*] ) as empty-sequence()

where $ uri is the address of the document relative to the repository root. In the settings of the MarkLogic Server database, it is possible to enable automatic creation of directories, and then $ uri can refer to a nonexistent directory and it will be created when creating the document.
$ root - the body of the document
$ permissions - document access settings
$ collections - the list of collections in which the document should be included.

2. Download data via WebDAV. This method is suitable for loading fairly large amounts of data into the storage. To access via WebDAV, a corresponding (WebDAV) application server for the database to be accessed must be created in MarkLogic Server.

3. For large amounts of data or very specific tasks, you can use the java utility RecordLoader for loading documents into MarkLogic Server.

4. AutoLoader is another useful utility that allows you to track changes to the file system and automatically upload documents to the MarkLogic Server. To download using the utility RecordLoader .

Source: https://habr.com/ru/post/194550/

All Articles

Data in MarkLogic Server [Part1]

More articles: