Hello,
In August, we released a large number of new things on Microsoft Azure (
pruflink ), and quite naturally one of the most interesting for our audience was the Document NoSQL Database service called DocumentDB. The time has come, and we begin to write about it - the first article, as usual, is an introduction.

- What it is? Basic concepts.
- How to create an account?
')
What is Azure DocumentDB?
In the modern world, applications constantly produce and not always, but consume a large amount of data. The data, like the application, mimics over time, and the data schema changes with it, which periodically leads to the idea that the schema-free NoSQL database is a good solution for such scenarios - a quick, simple and customizable solution. However, many of these technologies do not allow performing complex queries and processes involving transactions, which complicates the process of managing nontrivial models.
Microsoft Azure DocumentDB is a document-oriented, NoSQL database specially designed for applications on the web and on mobile devices — with guaranteed fast read and write operations, circuit flexibility and the ability to quickly expand and scale the base down and up. Still in DocumentDB there are complex queries involving the SQL dialect, support for JavaScript, transaction processing with many documents and much more good. By default, DocumentDB supports JSON schema operations, and deep integration with JavaScript helps to execute business logic right inside the engine with transaction in mind.
Azure DocumentDB is:
- Queries with SQL syntax: storing heterogeneous JSON documents inside DocumentDB, querying them.
- Highly-concurrent, lock-free indexing technology with automatic indexing of document content (respectively, without the need to specify schema hints, secondary indexes or views).
- JavaScript inside the database - logic as stored procedures, triggers and UDF, which means that you can put the logic on top of JSON without danger of getting out of sync between the application and the database schema.
- Full transactional execution of JavaScript logic in the engine (INSERT, REPLACE, DELETE, SELECT in JavaScript as an isolated transaction)
- Four levels of customizable consistency - Strong, Bounded-Staleness, Session and Eventual.
- Complete manageability: there is no need to manage the database and machine resources, since DocumentDB is provided as a service. Each base is automatically backed up and protected from regional errors.
- Simple scaling through storage units and bandwidth.
Azure DocumentDB Resources
In Azure DocumentDB, data is replicated and addressed to a URI — simple RESTful access is set for all resources. You have an account for the base, and it is a unique global namespace. All resources inside the space are stored in JSON documents with metadata and collections of things. In the picture - the relationship between DocumentDB resources.

An account consists of a pack of databases, each of which consists of several collections, each of which contains stored procedures, triggers, UDFs, documents, and related attachments. Users can be assigned to the base with specific permissions to access collections, stored procedures, triggers, UDFs, documents, etc.
Develop with Azure DocumentDB
Once Azure DocumentDB exposes operations to resources with the REST API, requests can be performed with any language that can HTTP / HTTPS. For several languages, there are special libraries that simplify working with DocumentDB:
JavaScript transactions and execution
As already written, in Azure DocumentDB, you can write logic in the form of JavaScript, “programs” are then registered to collections and support operations on documents within these collections. An application on JS can be registered for execution for triggers, stored procedures and UDF, triggers and stored procedures can CRUD, while UDFs do not have write access. All JS logic is executed inside ambient ACID transaction with snapshot isolation, and the logic on JS is considered to be a kind of modern replacement for T-SQL. If, at runtime, JS throws an exception, the entire transaction is rolled back.
Let's look at an example!
Msn.com MSN is a huge portal that visits half a billion users per month. Hence the need for large scalable distributed storage with a free scheme. At some point, the development team decided to transfer everything to Azure and create there a single distributed User Data Store storage system with the following requirements:
- Scaling up to +425 million unique users +100 million already authenticated users
- 20 terabytes of storage
- Write latency - up to 15 ms
- No fixed scheme
- Transaction support
- Hadoop analytics over data
- Geographical distribution and availability
The choice fell on Azure DocumentDB. One part of the system, Health and Fitness, consists of the following components:
- Diet Tracker : daily diet monitoring - each entry contains data on calories, fats, protein, etc.
- Exercise Tracker : exercise monitoring.
- GPS Tracker : GPS tracking. Metadata about what is happening is stored in DocumentDB.
- Pedometer : Steps.
- Weight Tracker . Weight.
- Analysis : Historical data on diet, exercise, GPS, etc.
- Favorites and custom : bookmarks on your favorite food, exercises, metadata, etc.
The new MSN portal stores user data in DocumentDB with 150 bandwidth units with SSDs and three geographic regions.
The size of the documents varies from 1 to 10 kilobytes, and do not have any general scheme. Most collections are set up in such a way as to give optimal values ​​of throughput, the minimum overhead for indexing.

UDS distributes user information to collections, each user's data is stored in documents. In the process, there is a horizontal scaling and distribution by user ID.
Create a DocumentDB account
Go to the
new Microsoft Azure Management PortalClick New -> DocumentDB Account.

Or you can do the same by going to the “Data, storage, + backup” category and choosing DocumentDBand.

In New DocumentDB (Preview) select the desired configuration.

In Name, enter the name - it will be used in addressing the account (host). The Pricing Tier setting can not be put so far, since the functionality is in the preview and only one payment mode is available (for more information about prices
here ) . In the optional settings you can specify the capacity that will be allocated for the account - it is measured in units, adding or removing which you can quickly scale the solution (a unit consists of a weighted amount of storage and bandwidth and the default for the account is 1 unit). Read more about performance and bandwidth
here .Creating an account takes a few minutes.


Account created and ready to use. The default consistency mode is set to Session.

You can see what is happening with DocumentDB accounts in the Browse window.

Total - we looked at what DocumentDB is, at the basic concepts of the service, for example, use and created an account. In the next part - more about the concept and use.
useful links