📜 ⬆️ ⬇️

MongoDb in action - online store

Soon there will be a year since my acquaintance with MongoDb. I was not the first to start working with it, but, nevertheless, this technology is still perceived as experimental.

In general, I will say this: working with MongoDB is more convenient than with MS SQL. There are regular scripts that require more effort compared to SQL, however, as a result, you know more about how your database is organized and you have better control over what will slow down and what doesn't.

The habr is full of applications in the “Hello World” style, so the initialization of the environment will be lowered and proceed immediately to more advanced issues, namely:

')

Why is it more convenient to store the entire object as a whole rather than according to the tables?


For many programmers, it is still not obvious that the selection of records, even by the Primary Key, is a significant investment of time. It seems like you don’t need to know this - take a table for yourself, do a stored procedure for searching and don’t worry about anything else. However, in practice, objects are rarely flat, and the lazy loading, loved by everyone, very quickly catastrophically worsens the system operation time.
Here are some examples of combinations of objects from my actual experience over the past year and a half:
- the product has an arbitrary number of pictures and videos
- the product has an arbitrary number of characteristics
- a category has products that represent it
- in the case of inheritance, which adds new properties, we either lose space in the table, or we have an additional subquery
- for an object, the name and description are given in an arbitrary number of languages.

Describing any such script in C # is easy; but it would be difficult to make an effective data layer that would work on hundreds of thousands of records.

At the same time, using MongoDB, you can save such an object in one single call:
DocumentCollection.Save<T>(document); 

Download it with all nested classes is also elementary:
 DocumentCollection.FindOneById(id); 

For example, look at the presentation of the goods . For one single access to the database - search by category id and SeoFreindlyUrl, which takes 0.0012s (!), I get:
- the actual properties of the goods
- its parameters (in this case, only two, but in general their number and types are arbitrary)
- images (reused in categories; 2 pieces, each has a url + sizes)
- video (if there were)
- similar products (links)
- terms of sale (this object is reused in categories and system settings for inherited configuration of the warranty period, the possibility of return)
- manufacturer
- Seo lines (browser header, meta tags, seo text)

And I can immediately proceed to rendering.

For statistics: in the table of goods at the moment 154 thousand records; on average, one record takes 22KB; and the table size is 4GB.

The best way to read such a complex object, if we used SQL Server, would be to manually serialize all the properties in xml. MongoDB gives us all this without any effort.

Our whole system is based on three classes:
- BaseMongoClass (Id, Title, LastChanged)
- EntityRef (link, contains Id and Title, there are more fancy inheritors)
- BaseRepository, which implements all the necessary methods to work. We selected GetById, Get (query), FirstOrDefault, GetAll, GetByIds (by id list), GetByEntityRefs (by EntityRef list), Save, DeleteById, DeleteByQuery.

A specific repository is simply inherited from BaseRepository, indicating the type and name of the collection (in terms of Mongo is the name of the table), and implements some operations of the logic level, such as “find products by category”, etc.

MongoDB is the most convenient base when you need to save hierarchical information. The world is such that flat data is very, very small.

PS: Listing of base classes and repository can be downloaded here .

How to deal with the reports?



Of course, there are no reports in Mongo. In the project abo.ua we use the following approach:

A product may have one category (we have more, but I simplify a little). In the type of goods written literally the following:
 public EntityRef Category { get; set; } [BsonIgnore] // ,      public Category CategoryValue { get { if (Category == null || Category .IsEmpty()) return null; return AppRequestContext.Factory.BuildCategoryRepository().GetById(Category.Id); } } 

When we change a category, we set the Category. When we need a convenient way to learn something more about the category - we turn to CategoryValue.

In order not to waste time on the subtraction and deserialization of the category, the number of which, of course, changes relatively rarely, the CategoryRepository caches them all in RAM in the ObjectId -> Category dictionary, the speed to which exceeds the access to MongoDB.
When at least some category changes, we rebuild the entire dictionary.

You can, of course, use Memory databases, but experiments have shown that it is fundamentally slower than the process’s own memory.

Another relational issue is updating information in related objects. For example, the category was changed / deleted, but we want the product to have relevant information:
1. Always be prepared for inappropriate information. From the code above, CategoryValue will return null if there is no such category anymore; and the website will return code 404. This is still a simple script. When we read goods, we compare the properties with the definitions of the types of goods: have they removed any property? Have you changed the list of acceptable values ​​in it? What is the default value for properties added to a type? It sounds difficult, but in fact, when all the data is at hand, we have time to look through 1000 products within 0.1 s, which is quite enough.
2. Once you have taught the code to “self-heal” the integrity of the data, it becomes easy to write code that corrects the data in the database. It looks like this:
 var products = prodRepo.GetAll().OrderBy(p => p.Id).Skip(start).Take(portionSize).ToArray(); prodRepo.JoinPropertyTypes(products); //       products.AsParallel().ForAll(p => prodRepo.Save(p)); 

It remains only to call such code (asynchronously) for all objects that have been affected.

In the following parts


I will describe:

Source: https://habr.com/ru/post/149047/


All Articles