Node.JS and uploading a directory from 1C to the site

Recently closed another project. Essence: the creation of a new version of the online catalog. The old version of the site, for a number of reasons, did not suit the client. A feature of the project was its nomenclature base. The volume of the catalog nomenclature was ~ 26000 positions scattered on a tree of 513 nodes + product characteristics. Almost every nomenclature position had a 1-2K text description.

File upload directory in ComerceML 2 format for the old site weighed 104 MB. It was formed on the side of 1C for 10 minutes and after being transferred to the hosting, I parsed on the site side for an hour and a half (!) With a 100% CPU load.

Way out

As an alternative to the XML format, they decided to upload to JSON. The idea was to try to parse JSON with something that has a native parser implementation, namely node.js with its JSON.parse ().

Our 1C-nick, having dealt with the new format for it, achieved in several iterations that the 1C unloading formed valid JSON. The formation time of the discharge was reduced from 10 to 3.5 minutes. The same data that in the XML format occupied 104 megabytes fit in 58 megabytes of JSON. But it was expected, the surprise was another ...
')
To test the time for parsing the upload, I sketched a test code:

// Node.js var fs = require('fs'); function parser(filename, callback){ fs.readFile(filename, { encoding:'utf8' }, function(err, dataFromFile){ var parsedData; if(err){ callback(err); } else { try { console.time('parse data'); //  -  ... parsedData = JSON.parse(dataFromFile.toString().trim()); // <-   . console.timeEnd('parse data'); // ...    " ". callback(null, parsedData ); } catch (e){ callback(e) } } }); } parser('../import/import.json', function(err, data){ if(err){ throw (err); } console.log('groups', data.groups.length); console.log('items', data.items.length); console.log('properties', data.properties.length); });

Having run it on my machine (CPU 3.3GHz), I did not even have time to get up to go for tea. The result and the speed with which it was displayed in the console, made me assume that there was a bug in the code and it did not work correctly ...

> node parse.js

parse data: 718ms
groups 513
items 26098
properties 149

But it was not a bug. The data was indeed parsed and placed in memory in seconds. The number of elements in the collections completely coincided with the stated amount in 1C. It only remained to find the missing jaw under the table and write a service with a full data processing cycle.

General architecture of the unloading service

In general, uploading to the site works according to the standard scheme:

formation of unloading from 1C and its packaging by the archiver;
upload generated files via FTP;
Calling the HTTP unload handler

The unloading handler service is implemented according to the following scheme:

unpack the archive;
parse JSON;
report the HTTP response that everything is fine or an error has occurred;
if everything is good, set the employment flag and fill the data into the database until the victorious end;
die clearing the memory.
...
Revive new process - for this is responsible Monit .

On the production ( DigitalOcean, tariff for $ 10 ), from the moment of the call and to point 3, the service works out in general in 3-4 seconds, after which the repeated call of the service will return the busy flag while the base is flooded. The whole processing cycle of the upload with data entry in the database is 80 - 90 seconds. CPU load at the time of parsing looks like a single peak up to 70% with a base of 10 - 30%.

Eventually:

the formation time of the discharge was reduced from 10 to 3.5 minutes;
uploading volume decreased from 104 to 58 megabytes (1.5 megabytes after archiving);
the total processing time of the upload on the server side was reduced from one and a half hours to one and a half minutes;
???????
PROFIT

PS A remedy for headache when debugging.

For all its speed, JSON.parse () is very inconvenient for debugging. If there is an error in the JSON structure, you get almost zero debug information. While your 1C specialist masters JSON, the JSON Lint module is very helpful . It can be used as a separate utility or as a library. Unlike the regular parser, it tells the exception object the line number of the JSON file where the misunderstanding occurred, that when parsing the jambs in a file of tens of megabytes, it makes life easier. The price for this convenience is speed. It will drop 5-7 times compared to the native JSON.parse ().

The same test code with JSON Lint will look like this:

 // Node.js var fs = require('fs'), jsonlint = require("jsonlint"); //      function parser(filename, callback){ fs.readFile(filename, { encoding:'utf8' }, function(err, dataFromFile){ var parsedData; if(err){ callback(err); } else { try { console.time('parse data'); //  -  ... /*  Jsonlint            JSON. *    5-7    JSON.parse(). */ parsedData = jsonlint.parse(dataFromFile); //   ,     . console.timeEnd('parse data'); // ...    " ". callback(null, parsedData ); } catch (e){ callback(e) } } }); } parser('../import/import.json', function(err, data){ if(err){ throw (err); } console.log('groups', data.groups.length); console.log('items', data.items.length); console.log('properties', data.properties.length); });

In conclusion, I would like to traditionally wish this material to be useful to someone else as well as to be useful to us.

Source: https://habr.com/ru/post/189812/

All Articles

Node.JS and uploading a directory from 1C to the site

Way out

General architecture of the unloading service

PS A remedy for headache when debugging.

More articles: