📜 ⬆️ ⬇️

MongoDB - make good coffee

Introduction

Friends, first of all I want to thank you for the appreciation of my work, it is pleasant, and motivates me to continue. So, why we should buy our elephants, I think you already understood from the first article , someone has already downloaded and tasted, and someone is just going. Anyway, let's begin.

Today we will put MongoDB, below we will consider the newly baked HabraLogger and spy on the main page of Habr in real time.

Fall asleep coffee and pour boiling water

First, choose your release like and download - Downloads .
You can choose any release, I use 1.1.1. If you are not afraid of falls, put the last one - 1.1.2 (I don’t promise at all that it will fall, it's just possible).

If there is an assembly for your OS, I advise you to install the assembly, but if not, you will have to build from the sources. The simplest installation of the assembly is well given here - Quickstart . When building from source, you will have to work with Scons, and under FreeBSD, the error "shell / dbshell.cpp: 77: error: 'sigrelse' will not be declared in this scope" will arise - just comment out this line.
')
So, we assume that they have set. If replication is planned, I advise you to run `mongod --master`, with this key, MongoDB keeps a log of operations (oplog). You can play now using `mongo test` (test is the name of the database):

> db.habratest.save({abra: "foo", ka: true, da: 123, bra: {foo: "bar"}}); # .
> db.habratest.find({ka: true}); # ka == true,
{"_id" : ObjectId( "4af18c86977fd21033ca67f8") , "abra" : "foo" , "ka" : true , "da" : 123 , "bra" : {"foo ar"}}

And now let's create an index - or rather make sure it is available:

> db.habratest.ensureIndex({"da": 1});
true
> db.habratest.find({da: 123}); # .
{"_id" : ObjectId( "4af18c86977fd21033ca67f8") , "abra" : "foo" , "ka" : true , "da" : 123 , "bra" : {"foo" : "bar"}}
> db.habratest.update({da: 123},{"$set": {pi: 3.14159265}}); #
> db.habratest.find({da: 123});
{"_id" : ObjectId( "4af18c86977fd21033ca67f8") , "abra" : "foo" , "ka" : true , "da" : 123 , "bra" : {"foo" : "bar"} , "pi" : 3.14159265}
> db.habratest.update({da: 123},{"$set": {"bra.pi": 3.14159265}}); #
> db.habratest.find();
{"_id" : ObjectId( "4af190176ca438316825ddef") , "abra" : "foo" , "ka" : true , "da" : 123 , "pi" : 3.14159265 , "bra" : {"foo" : "bar" , "pi" : 3.14159265}}
> db.habratest.remove({ka: true});
> db.habratest.find();
> db.habratest.drop();
{"nIndexesWas" : 2 , "msg" : "all indexes deleted for collection" , "ns" : "test.habratest" , "ok" : 1}

The _id property is generated by the client automatically, and is inherent in each object, if desired, the ID can be formed independently, putting into it useful data.

I think everything is clear. For piquant details of the functions I advise you to refer to the documentation.

Now let's connect the driver to our development tool, the list of available drivers is mongodb.org/display/DOCS/Drivers . In this article I will be guided by PHP, but with any other language everything will be quite similar. In PHP, the driver is represented by the 'mongo' extension in PECL.

We drink the first cup

I have prepared a small sample application - HabraLogger .

I think everything is clear from the code and comments to it, if not, I will be happy to answer questions. You may have noticed that at the beginning of the script a couple of classes are inherited from the originals, this is not necessary and only gives syntactic sugar in the form of $ mongo-> mydb-> mycollection. The execution time motivates (0.411 milliseconds).

Surely some of you noticed the group () operation in a piece of code:

$stat['hostsHours'] = $db->hosts->group(
array('hour' => true) // keys
,array('count' => 0) // initial object
,'function (obj, prev) {++prev.count;}' // reduce function
,array('url' => $url) // condition
);

The first argument specifies the keys (fields) for grouping - GROUP BY hour.
In the second - the initial state of the object, which will be before the first iteration.
In the third - the function that is used to reduce (reduce) the set.
Well, in the fourth - the filter on the basis of which the set will be formed to reduce.
The SQL analogue of this action is SELECT hour, COUNT (*) count FROM hosts GROUP BY hour.

The HabraLogger algorithm is quite simple, the log is written to the hits and hosts collection, the hosts contain a unique index by IP and day, so only unique ones get there within a day, and when displaying the graph, a simple grouping is carried out.

I emphasize that this is only an example, and of course, a more optimal solution would be to rotate the log and perform aggregation once every n minutes. But we will consider optimization next time.

Clustering? I was waiting for this question!
To begin with, we define the terms:
Sharding - splitting a database into several shards.
Shard - one or more equivalent servers that store the same piece of data.
Config-server is a server that stores meta-information primarily about which shard'e which chunk lies.
Chunk - a range of documents by index, for example, the hosts collection can be broken by index on the url property.
mongos is a daemon that accepts requests from clients, interacts with the necessary shard'ami and config-servers, and sends a ready response to the client.

It looks like this:



I will not go into the retelling of the documentation, and the description of the behavior of mongos on each type of request. Sharding is described in detail in the official English-language documentation - Sharding Introduction .
Let me just say - Google dad uses a similar scheme.

We extinguish the light, let the water down

That's all for today. Note that the topic of the next article will be at the request of radio listeners. I can tell about GridFS (file system on MongoDB) and consider other examples.

Thank you for your attention, friends! Waiting for your feedback.

PS The statistics of visits to this article can be found here , and the statistics of another article that has been spied since the night is here .

UPD: We spy on the main page of Habr in real time.

Source: https://habr.com/ru/post/74273/


All Articles