📜 ⬆️ ⬇️

GAE: batch put, distribution and some fraudulent gestures

An incomprehensible title, the lack of a picture - in general a topic in a purely marasmic style.

Once Emele happened to find out where his hamster came from (I exaggerate), for which Emelya turned to maxmind with a request to give IP2City a base.

Vobshchem some smart start ...
')
Task : Make a definition of the city of users by IP for the GAE application.

First problems:
- it is impossible to upload the database file directly (1mb / file limit)
- limited free resources

and yes, the problems are not the last.

Of course, it was possible to raise servachek, pour the base on the muscle, or, in general, use the binary base and the Python IP, but did not want to breed a zoo + also a “BUT” slide, which are much smaller.

I did not dare to fill 3.5 million objects into the main application, for a reason:
- the application itself needs resources
- no interdependence (services are easily separated)

Therefore, a separate application was created, which in the future will be called via urlfetch and output data on a specific IP.

The 1Mb limit was overcome by creating a RequestHandler to get a part of the file and load this part of the objects into the storage.

On the client side there is a small script that sends data using the POST method.

And here ...



... start dancing "in the style of hard disco." As it turned out, after 40,000 inserted objects, 1 hour of processor time was consumed. And, having thought that separate put for each object is wasteful, I decided to try db.put () (so-called batch put), which inserts objects in batches.

As it turned out, db.put not only did not reduce the CPU time for each request (remember, the local script transmits 200 lines from the ip block file for one request and so on until the end of the file), but also increased it by about 10% . Agree, something is wrong here. I still do not understand what the problem is, but in my opinion db.put simply causes put on all objects in the list (yes, it still inserts objects in a single transaction if possible, but this is not our case).

With a simple recalculation, something turns out to be about $ 10 per load for the entire base (without locations)

Total, when developing for GAE:




Waiting for you in my LiveJournal .

Source: https://habr.com/ru/post/72610/


All Articles