📜 ⬆️ ⬇️

The speed and cost of working with Google App Engine data in tables

Faced with the question of choosing a database for a project, I conducted a small study of Google App Engine on the speed of working with data. The results of the study are designed in the form of tables.

These calculations will save time for those who are looking for a platform to host their project, but not sure if the Google App Engine is suitable for it. In addition, these tables can be used as a kind of “cheat sheet” in order to roughly orient how long the processing of the query will take and how best to optimize it.

image
')


When reading the article, remember that Google App Engine is really automatically scalable. One of the articles misled me and I had already begun to doubt it. I checked in practice:

image

With frequent requests even from the same IP address, GAE automatically raises new instances and evenly distributes the load between them. The figure above added 8 instances and this is far from the limit.

All measurements given in the article are relevant for one instance . As the load on the site increases, the overall speed of query execution will increase (due to the automatic addition of new instances), but the speed of execution of one query will not change (to be more precise, it will decrease within 30%).

Test conditions For storage tests, a table with 1 key and 1 string value was used. The line size was always equal to 500 characters (let's say, the size of the average comment on a blog or news site). It is worth noting that when you increase the size of the object, the read / write speed does not change significantly.

Test 1. Write to Google BigTable data store

Record by this value:

ActCPU timeReal time
1 database entry90 ms40 ms
10 records in turn750 ms250 ms
100 records in turn7250 ms2100 ms
1000 records in turn71000 ms28000 ms


Batch recording:

ActCPU timeReal time
10 records in one request670 ms60 ms
100 records in one request6600 ms370 ms
1000 entries in one request65000 ms2500 ms


As you can see, the cost of a packet entry is slightly less than the cost of writing by one value. But the speed of execution of packet writing up to 10 times higher .

Cost of recording:

Approximately 10 records take 1 processor second (slightly faster, taking with a margin):

AmountHow many records can you buy?
1 cent3600 records
1 dollar360 thousand records


Maximum records per request

ActMaximum comfort *Technical maximum
Alternate recording201 thousand
Batch write20010 thousand


For comparison: SQL Server Express R2 on Amazon EC2 Micro: 1 second 500 records.

* Requests that take more than 0.5 seconds cause discomfort. For one request, if possible, no more entries should be made than indicated in the table (in addition, other operations should be taken into account).

Test 2. Reading data from Google BigTable storage and displaying it on the page.

Reading by this value:

ActCPU timeReal time
1 reading31 ms20 ms
10 readings in turn160 ms100 ms
100 readings in turn1600 ms750 ms
1000 readings in turn12000 ms8500 ms


Batch reading (selection by condition> && <)

ActCPU timeReal time
10 records in one request100 ms25 ms
100 records in one request1000 ms80 ms
1000 entries in one request9600 ms400 ms


Similarly, writing: CPU time is spent in about the same way: with batch reading and with unit reading. But in real-time with batch reading it takes 10 times less.

Reading cost:

Approximately 6 times cheaper than recording. 1 processor second - 60 readings:

AmountHow many times can you read
1 cent21 thousand readings
1 dollar2 million readings


Maximum reads per request

ActMaximum comfort *Technical maximum
Alternate reading603 thousand
Batch read110050 thousand


* Requests that take more than 0.5 seconds cause discomfort. For one request, if possible, one should not perform more readings than indicated in the table (in addition, other operations should be taken into account).

Test 3. Data Storage in Memcache

Recording and reading are performed for approximately the same time (reading is slightly faster, but for simplicity we will round it).

Number of operationsCPU / real time
one23 cpu / 11 ms
ten94 cpu / 50 ms
100300 cpu / 300 ms
10003000 cpu / 3000 ms


Memcache cost

Approximately 3 times cheaper than reading from a database and 15 times cheaper than writing to a database.

AmountHow many operations can you buy?
1 cent60 thousand read / write operations
1 dollar6 million read / write operations


Maximum operations per request

ActMaximum comfort *Technical maximum
Read / write1509 thousand


* Requests that take more than 0.5 seconds cause discomfort. For one request, if possible, you should not perform more operations with Memcache than indicated in the table (in addition, you must take into account other operations).

Test 4. Scaling

All that is written above - concerns one instance and one request. The total speed of all requests can be much higher.

Practically conducted such an experiment. An application was created that executes 5 entries in the database and 100 readings with information displayed on the page (the size of dynamic data is 500 characters * 100 = 50 KB UTF-8). CPU Time consumption: 1300 ms per request.

This application was bombarded by requests in several streams (server bomber with a channel of 100 Mbps). Bombing results:

Number of threads1st request time
one100 ms
ten150 ms


So, when the load is increased 10 times, the total request processing speed increased about 7 times (about 150 ms 1 request, only 10 * 7 = 70 requests per second).

As a result of the experiment, 16 thousand requests were executed in 3 minutes (10 instances, 1600 requests per one) and 800 MB of traffic was spent.

Conclusion

The conclusion for myself made this: Google has once again overtaken all its competitors in cloud-based highly scalable hosting by a few steps. Prices for data processing, of course, are not small, but the unique pricing policy distinguishes GAE from other cloud services. I would like, of course, to see a worthy response from Micorsoft to Google App Engine, but it is very doubtful that we will wait for this, since such technologies require ownership of the KISS principle.

UPDATE about the picture

I look, in the comments reprimands began in the direction of Google about the image in the article, they say GAE could not stand the so-called. "Habraeffekt". This is all not true.

The picture is temporarily (from about 1 in the morning to 7 in the morning) disappeared as the free limit ended. Added 16 hours of time, I think that's enough. I note that even before I placed the counter on Habré, I already spent 30% of the free limit on experiments. The remaining 70% was enough for 20 thousand impressions of the counter (given its slowness - not so little).

By the way, this is not just a picture but a counter . The picture with the counting results is created programmatically using the org.toyz.litetext library for each request (caching is disabled). Since the org.toyz.litetext library is very slow (I did not write it), each request consumes 500 ms of processor time. In addition, information about each display is recorded in the database.

Now (around 9 o'clock in Moscow) about 200 views per minute (everyone can see and count the counter independently). GAE keeps 13-15 instances on (now 13, there were 15). The load is still rising. How many instances will be at the peak - I will write it off later. Details (with graphs) wrote in a new article: habrahabr.ru/blogs/gae/115731

Source: https://habr.com/ru/post/115517/


All Articles