📜 ⬆️ ⬇️

Microsoft DocumentDB: second article, resources and concepts

As already mentioned in the first article, DocumentDB exposes access to its functionality in the form of a RESTful programming model, and entities stored inside the database are called resources and are addressed by URI. To access these resources, you can use standard HTTP verbs, headers and status codes.
While we are preparing a good example about DocumentDB (not a quick and thoughtful thing) and answers to your questions in the first article, we suggest reading a little more about the resources and concepts on which DocumentDB works.




The DocumentDB resource model consists of a set of resources that are stored in a specific structure within the account, and each of them is accessible by a constant URI. So, everything starts with a DocumentDB account. An account is a logical container in which databases are stored, each of which contains collections, which, in turn, contain stored procedures, triggers, UDFs, etc. Each database has users who have a set of permissions to manipulate documents. Permissions look like tokens, collections are containers of JSON documents and logic on JS.

System resources — accounts, databases, collections, users, stored procedures, triggers, and UDF — have a fixed schema; documents and attachments do not have constraints on the schema and, accordingly, are called user resources. Resources of both types are described by JSON.
')


Each account, which can be many within a single Azure subscription, is a collection container consisting of units that combine SSD storage and a fixed throughput rate. Units can be added or removed at any time. You can create and modify account settings on the Microsoft Azure Management Portal - portal.azure.com - or using the REST API (a good part of the platform functionality is set up for managing by REST API).
If the account is a logical container of the highest level, then the database is a container for collections and a user. Inside the account can be as many databases.



You can store as much data in the database as you need - from several gigabytes to petabytes - and all this storage will work on SSD with fixed bandwidth. However, the database is not fixed within a single machine — it can be a large database in which thousands of collections and terabytes of documents are stored.

A collection is a container of the next nesting level, already for JSON documents. The collection as a container serves not only for joining, but also as a scaling unit - transactions and queries. The easiest way to scale is to add more collections and distribute SSD storage over them. Automatic scaling is already working - the collection automatically changes its size as you add or delete documents. While DocumentDB is in the preview and has only one mode of operation (Standard Preview), the maximum size to which collections can be scaled is 10 GB.

Automatic indexing



DocumentDB does not require you to schedule a system for you at all. Documents do not imply its presence and, as soon as you add them to the collection, DocumentDB automatically indexes them (=> you can execute queries). Automatically indexing documents without having to think about the layout and secondary indexes is one of the main features of DocumentDB. At the same time, there is a stable and stable number of very fast write operations with successive requests.
Automatic indexing can be slightly corrected by choosing an indexing policy and, thus, gaining in performance and storage. You can either disable automatic indexing altogether, or select only some documents that will be indexed (and select which will NOT be indexed) and choose between synchronous (consistent) and asynchronous (lazy) modes (by default, the index is updated synchronously on each Insert operation, Replace or Delete, this behavior can be corrected to the "lazy" mode and, perhaps, to get some performance benefits with, for example, collections with a large number of reads).

Multi-document transactions



In RDBMS, business logic is usually written using stored procedures and triggers, starting as a transaction, which imposes on the developer the need to know two different development languages ​​- the application development language (JS, Python, etc.) and T-SQL. In DocumentDB, the JS execution model is available for collections in the form of stored procedures and triggers, which allows for efficient concurrency control, indexing, and not being distracted by an abundance of application tools.
DocumentDB independently wraps this logic in Ambient ACID transaction with snapshot isolation and, if JS throws an exception in the JS process, then the entire transaction is rolled back. JS execution takes place inside the engine in the same address space as the Buffer Pool, which has a good effect on performance.

function businessLogic(name, author) { var context = getContext(); var collectionManager = context.getCollection(); var collectionLink = collectionManager.getSelfLink() //  . collectionManager.createDocument(collectionLink, {id: name, author: author}, function(err, documentCreated) { if(err) throw new Error(err.message); //     var filterQuery = "SELECT * from root r WHERE r.author = 'George R.'"; collectionManager.queryDocuments(collectionLink, filterQuery, function(err, matchingDocuments) { if(err) throw new Error(err.message); context.getResponse().setBody(matchingDocuments.length); //   for (var i = 0; i < matchingDocuments.length; i++) { matchingDocuments[i].author = "George RR Martin"; // we don't need to execute a callback because they are in parallel collectionManager.replaceDocument(matchingDocuments[i]._self, matchingDocuments[i]); } }) }) }; 

All this is successfully wrapped in transactional execution via HTTP POST.

 client.createStoredProcedureAsync(collection._self, {id: "CRUDProc", body: businessLogic}) .then(function(createdStoredProcedure) { return client.executeStoredProcedureAsync(createdStoredProcedure.resource._self, "NoSQL Distilled", "Martin Fowler"); }) .then(function(result) { console.log(result); }, function(error) { console.log(error); }); 


JSON and JS our hero understands out of the box, so no problems with the types occur. Learn more - Azure DocumentDB REST APIs .

Stored procedures, triggers and UDF



As already mentioned, business logic can be written entirely in JS as a stored procedure, trigger or UDF. A JS application can be registered for execution for triggers, stored procedures and UDFs, triggers and stored procedures can CRUD, while UDFs do not have write access, and permission is only to perform simple operations, such as enumerations and creating new results previous operation. Each procedure, trigger, and UDF use a fixed amount of resources while not being able to access external JS libraries. If the allocated resources are exceeded, operations are blocked.

The procedure, trigger, and UDF can be registered for execution using the REST API, and after registration, the stored procedure, trigger, or UDF are precompiled and stored as a bytecode that is launched for execution.

Registration of stored procedures



Registering a stored procedure = creating a resource for a new procedure and assigning its collection with HTTP POST.

 var storedProc = { id: "validateAndCreate", body: function (documentToCreate) { documentToCreate.id = documentToCreate.id.toUpperCase(); var collectionManager = getContext().getCollection(); collectionManager.createDocument(collectionManager.getSelfLink(), documentToCreate, function(err, documentCreated) { if(err) throw new Error('Error while creating document: ' + err.message; getContext().getResponse().setBody('success - created ' + documentCreated.name); }); } }; client.createStoredProcedureAsync(collection._self, storedProc) .then(function (createdStoredProcedure) { console.log("Successfully created stored procedure"); }, function(error) { console.log("Error"); }); 


Execution of the stored procedure



Execution of the stored procedure is done again with HTTP POST with the transfer of the necessary parameters in the request body.

 var inputDocument = {id : "document1", author: "GG Marquez"}; client.executeStoredProcedureAsync(createdStoredProcedure.resource._self, inputDocument) .then(function(executionResult) { assert.equal(executionResult, "success - created DOCUMENT1"); }, function(error) { console.log("Error"); }); 


Trigger Registration



Trigger Registration = Create a new resource for the collection with HTTP POST, and in the process, you can specify whether the trigger will be called before or after and the type of operation to be performed (CRUD).

 var preTrigger = { id: "upperCaseId", body: function() { var item = getContext().getRequest().getBody(); item.id = item.id.toUpperCase(); getContext().getRequest().setBody(item); }, triggerType: TriggerType.Pre, triggerOperation: TriggerOperation.All } client.createTriggerAsync(collection._self, preTrigger) .then(function (createdPreTrigger) { console.log("Successfully created trigger"); }, function(error) { console.log("Error"); }); 


You can remove the registration of a trigger by executing HTTP DELETE on the trigger resource.

 client.deleteTriggerAsync(createdPreTrigger._self); .then(function(response) { return; }, function(error) { console.log("Error"); }); 


Attachments



In DocumentDB, you can store binary files (blobs) that look like special entities - attachments. An attachment is a special document (JSON) that refers to a real file. For example:
The content of a book is in storage in DocumentDB or any other.
The application can store each user's metadata as a separate document - for example, Alex for book1 will be available via the link / colls / alex / docs / book1.
Attachments point to the pages of the book, that is, / colls / alex / docs / book1 / chapter1, chapter2, etc.

Summary



In this introductory article, we looked at the very basic principles and concepts of DocumentDB. The service is new, so we are actively studying it ourselves, and we hope that soon we will be able to present some beautiful example of use. Stay in touch :)

useful links


Source: https://habr.com/ru/post/241307/


All Articles