Productivity of unloading a large amount of data from Mongo to ASP.NET Core Web Api

There was a need to upload a large amount of data to the client from the MongoDB database. The data is json, with information about the car, received from the GPS tracker. This data comes in 0.5 seconds intervals. For the day for one machine is obtained approximately 172,000 entries.

The server code is written in ASP.NET CORE 2.0 using the standard MongoDB.Driver 2.4.4 driver. In the process of testing the service, a significant memory consumption of the Web Api application process became clear - about 700 MB when executing one request. When executing several queries in parallel, the amount of process memory can be larger than 1 GB. Since it is intended to use the service in a container on the cheapest 0.7 GB RAM droplet, a large consumption of RAM led to the need to optimize the data upload process.

Thus, the basic implementation of the method involves uploading all the data and sending it to the client. This implementation is shown in the listing below.

Option 1 (all data sent simultaneously)

//      // GET state/startDate/endDate [HttpGet("{vin}/{startTimestamp}/{endTimestamp}")] public async Task<StatesViewModel> Get(string vin, DateTime startTimestamp, DateTime endTimestamp) { //  var builder = Builders<Machine>.Filter; //   var filters = new List<FilterDefinition<Machine>> { builder.Where(x => x.Vin == vin), builder.Where(x => x.Timestamp >= startTimestamp && x.Timestamp <= endTimestamp) }; //   var filterConcat = builder.And(filters); using (var cursor = await database .GetCollection<Machine>(_mongoConfig.CollectionName) .FindAsync(filterConcat).ConfigureAwait(false)) { var a = await cursor.ToListAsync().ConfigureAwait(false); return _mapper.Map<IEnumerable<Machine>, StatesViewModel>(a); } }

As an alternative, the method of using queries with the number of the initial line and the number of unloaded lines was used, which is shown below. In this case, the unloading is performed in the Response stream to reduce the consumption of RAM.

Option 2 (using subqueries and writing to the Response stream)

  //      // GET state/startDate/endDate [HttpGet("GetListQuaries/{vin}/{startTimestamp}/{endTimestamp}")] public async Task<ActionResult> GetListQuaries(string vin, DateTime startTimestamp, DateTime endTimestamp) { Response.ContentType = "application/json"; await Response.WriteAsync("[").ConfigureAwait(false); ; //  var builder = Builders<Machine>.Filter; //   var filters = new List<FilterDefinition<Machine>> { builder.Where(x => x.Vin == vin), builder.Where(x => x.Timestamp >= startTimestamp && x.Timestamp <= endTimestamp) }; //   var filterConcat = builder.And(filters); int batchSize = 15000; int total = 0; long count =await database.GetCollection<Machine> (_mongoConfig.CollectionName) .CountAsync((filterConcat)); while (total < count) { using (var cursor = await database .GetCollection<Machine>(_mongoConfig.CollectionName) .FindAsync(filterConcat, new FindOptions<Machine, Machine>() {Skip = total, Limit = batchSize}) .ConfigureAwait(false)) { // Move to the next batch of docs while (cursor.MoveNext()) { var batch = cursor.Current; foreach (var doc in batch) { await Response.WriteAsync(JsonConvert.SerializeObject(doc)) .ConfigureAwait(false); } } } total += batchSize; } await Response.WriteAsync("]").ConfigureAwait(false); ; return new EmptyResult(); }

The option of setting the BatchSize parameter in the cursor was also used; data was also recorded in the Response stream.

Option 3 (using the BatchSize parameter and writing to the Response stream)

  //      // GET state/startDate/endDate [HttpGet("GetList/{vin}/{startTimestamp}/{endTimestamp}")] public async Task<ActionResult> GetList(string vin, DateTime startTimestamp, DateTime endTimestamp) { Response.ContentType = "application/json"; //  var builder = Builders<Machine>.Filter; //   var filters = new List<FilterDefinition<Machine>> { builder.Where(x => x.Vin == vin), builder.Where(x => x.Timestamp >= startTimestamp && x.Timestamp <= endTimestamp) }; //   var filterConcat = builder.And(filters); await Response.WriteAsync("[").ConfigureAwait(false); ; using (var cursor = await database .GetCollection<Machine> (_mongoConfig.CollectionName) .FindAsync(filterConcat, new FindOptions<Machine, Machine> { BatchSize = 15000 }) .ConfigureAwait(false)) { // Move to the next batch of docs while (await cursor.MoveNextAsync().ConfigureAwait(false)) { var batch = cursor.Current; foreach (var doc in batch) { await Response.WriteAsync(JsonConvert.SerializeObject(doc)) .ConfigureAwait(false); } } } await Response.WriteAsync("]").ConfigureAwait(false); return new EmptyResult(); }

One record in the database has the following structure:

 {"Id":"5a108e0cf389230001fe52f1", "Vin":"357973047728404", "Timestamp":"2017-11-18T19:46:16Z", "Name":null, "FuelRemaining":null, "EngineSpeed":null, "Speed":0, "Direction":340.0, "FuelConsumption":null, "Location":{"Longitude":37.27543,"Latitude":50.11379}}

Performance testing was performed upon request using the HttpClient.
I consider interesting not absolute values, but their order.

The performance test results for the three implementation options are summarized in the table below.

The data from the table are also presented in the form of diagrams:

findings

Summing up, we can say that the use of such measures to reduce the consumption of RAM leads to a significant deterioration in performance - more than 2 times. I recommend not to unload fields that are not used by the client at the moment.
Share your methods of solving a similar problem in the comments.

Additions

Tested implementation option with yeild return

Option 4 (using the BatchSize and yeild Return parameter)

 [HttpGet("GetListSync/{vin}/{startTimestamp}/{endTimestamp}")] public IEnumerable<Machine> GetListSync(string vin, DateTime startTimestamp, DateTime endTimestamp) { var filter = Builders<Machine>.Filter .Where(x => x.Vin == vin && x.Timestamp >= startTimestamp && x.Timestamp <= endTimestamp); using (var cursor = _mongoConfig.Database .GetCollection<Machine>(_mongoConfig.CollectionName) .FindSync(filter, new FindOptions<Machine, Machine> { BatchSize = 10000 })) { while (cursor.MoveNext()) { var batch = cursor.Current; foreach (var doc in batch) { yield return doc; } } } }

The augmented results are tabulated:

It was also measured the time to move the cursor await cursor.MoveNextAsync() in option 3 and serialize batch objects

 foreach (var doc in batch) { await Response.WriteAsync(JsonConvert.SerializeObject(doc)); }

with writing to the output stream. Moving the cursor takes 1/3 of the time, serialization and output 2/3. Therefore, it is advantageous to use StringBuilder for Batch about 2000, while the memory gain is insignificant, and the data acquisition time is reduced by more than a third to 6-7 seconds, the number of await Response.WriteAsync(JsonConvert.SerializeObject(doc)) calls decreases. You can also serialize an object manually.

Source: https://habr.com/ru/post/343968/

All Articles