Faced with the question of choosing a database for a project, I conducted a small study of Google App Engine on the speed of working with data. The results of the study are designed in the form of tables.
These calculations will save time for those who are looking for a platform to host their project, but not sure if the Google App Engine is suitable for it. In addition, these tables can be used as a kind of “cheat sheet” in order to roughly orient how long the processing of the query will take and how best to optimize it.

')
When reading the article, remember that Google App Engine is really automatically scalable. One of the
articles misled me and I had already begun to doubt it. I checked in practice:

With frequent requests even from the same IP address, GAE automatically raises new instances and evenly distributes the load between them. The figure above added 8 instances and this is far from the limit.
All measurements given in the article are relevant
for one instance . As the load on the site increases, the overall speed of query execution will increase (due to the automatic addition of new instances), but the speed of execution of one query will not change (to be more precise, it will decrease within 30%).
Test conditions For storage tests, a table with 1 key and 1 string value was used. The line size was always equal to 500 characters (let's say, the size of the average comment on a blog or news site). It is worth noting that when you increase the size of the object, the read / write speed does not change significantly.
Test 1. Write to Google BigTable data storeRecord by this value:Act | CPU time | Real time |
---|
1 database entry | 90 ms | 40 ms |
10 records in turn | 750 ms | 250 ms |
100 records in turn | 7250 ms | 2100 ms |
1000 records in turn | 71000 ms | 28000 ms |
Batch recording:Act | CPU time | Real time |
---|
10 records in one request | 670 ms | 60 ms |
100 records in one request | 6600 ms | 370 ms |
1000 entries in one request | 65000 ms | 2500 ms |
As you can see, the cost of a packet entry is slightly less than the cost of writing by one value. But the speed of execution of packet writing
up to 10 times higher .
Cost of recording:Approximately 10 records take 1 processor second (slightly faster, taking with a margin):
Amount | How many records can you buy? |
---|
1 cent | 3600 records |
1 dollar | 360 thousand records |
Maximum records per requestAct | Maximum comfort * | Technical maximum |
---|
Alternate recording | 20 | 1 thousand |
Batch write | 200 | 10 thousand |
For comparison: SQL Server Express R2 on Amazon EC2 Micro: 1 second 500 records.
* Requests that take more than 0.5 seconds cause discomfort. For one request, if possible, no more entries should be made than indicated in the table (in addition, other operations should be taken into account).
Test 2. Reading data from Google BigTable storage and displaying it on the page.Reading by this value:Act | CPU time | Real time |
---|
1 reading | 31 ms | 20 ms |
10 readings in turn | 160 ms | 100 ms |
100 readings in turn | 1600 ms | 750 ms |
1000 readings in turn | 12000 ms | 8500 ms |
Batch reading (selection by condition> && <)Act | CPU time | Real time |
---|
10 records in one request | 100 ms | 25 ms |
100 records in one request | 1000 ms | 80 ms |
1000 entries in one request | 9600 ms | 400 ms |
Similarly, writing: CPU time is spent in about the same way: with batch reading and with unit reading. But in real-time with batch reading it takes 10 times less.
Reading cost:Approximately 6 times cheaper than recording. 1 processor second - 60 readings:
Amount | How many times can you read |
---|
1 cent | 21 thousand readings |
1 dollar | 2 million readings |
Maximum reads per requestAct | Maximum comfort * | Technical maximum |
---|
Alternate reading | 60 | 3 thousand |
Batch read | 1100 | 50 thousand |
* Requests that take more than 0.5 seconds cause discomfort. For one request, if possible, one should not perform more readings than indicated in the table (in addition, other operations should be taken into account).
Test 3. Data Storage in MemcacheRecording and reading are performed for approximately the same time (reading is slightly faster, but for simplicity we will round it).
Number of operations | CPU / real time |
---|
one | 23 cpu / 11 ms |
ten | 94 cpu / 50 ms |
100 | 300 cpu / 300 ms |
1000 | 3000 cpu / 3000 ms |
Memcache costApproximately 3 times cheaper than reading from a database and 15 times cheaper than writing to a database.
Amount | How many operations can you buy? |
---|
1 cent | 60 thousand read / write operations |
1 dollar | 6 million read / write operations |
Maximum operations per requestAct | Maximum comfort * | Technical maximum |
---|
Read / write | 150 | 9 thousand |
* Requests that take more than 0.5 seconds cause discomfort. For one request, if possible, you should not perform more operations with Memcache than indicated in the table (in addition, you must take into account other operations).
Test 4. ScalingAll that is written above - concerns one instance and one request. The total speed of all requests can be much higher.
Practically conducted such an experiment. An application was created that executes 5 entries in the database and 100 readings with information displayed on the page (the size of dynamic data is 500 characters * 100 = 50 KB UTF-8). CPU Time consumption: 1300 ms per request.
This application was bombarded by requests in several streams (server bomber with a channel of 100 Mbps). Bombing results:
Number of threads | 1st request time |
---|
one | 100 ms |
ten | 150 ms |
So, when the load is increased 10 times, the total request processing speed increased about 7 times (about 150 ms 1 request, only 10 * 7 = 70 requests per second).
As a result of the experiment, 16 thousand requests were executed in 3 minutes (10 instances, 1600 requests per one) and 800 MB of traffic was spent.
ConclusionThe conclusion for myself made this: Google has once again overtaken all its competitors in cloud-based highly scalable hosting by a few steps. Prices for data processing, of course, are not small, but the unique pricing policy distinguishes GAE from other cloud services. I would like, of course, to see a worthy response from Micorsoft to Google App Engine, but it is very doubtful that we will wait for this, since such technologies require ownership of the KISS principle.
UPDATE about the pictureI look, in the comments reprimands began in the direction of Google about the image in the article, they say GAE could not stand the so-called. "Habraeffekt". This is all not true.
The picture is temporarily (from about 1 in the morning to 7 in the morning) disappeared as the free limit ended. Added 16 hours of time, I think that's enough. I note that even before I placed the counter on Habré, I already spent 30% of the free limit on experiments. The remaining 70% was enough for 20 thousand impressions of the counter (given its slowness - not so little).
By the way, this is not just a picture
but a counter . The picture with the counting results is created programmatically using the org.toyz.litetext library for each request (caching is disabled). Since the org.toyz.litetext library is very slow (I did not write it), each request consumes 500 ms of processor time. In addition, information about each display is recorded in the database.
Now (around 9 o'clock in Moscow) about 200 views per minute (everyone can see and count the counter independently). GAE keeps 13-15 instances on (now 13, there were 15). The load is still rising.
How many instances will be at the peak - I will write it off later. Details (with graphs) wrote in a new article:
habrahabr.ru/blogs/gae/115731