The speed and cost of working with Google App Engine data in tables

Faced with the question of choosing a database for a project, I conducted a small study of Google App Engine on the speed of working with data. The results of the study are designed in the form of tables.

These calculations will save time for those who are looking for a platform to host their project, but not sure if the Google App Engine is suitable for it. In addition, these tables can be used as a kind of “cheat sheet” in order to roughly orient how long the processing of the query will take and how best to optimize it.

')

When reading the article, remember that Google App Engine is really automatically scalable. One of the articles misled me and I had already begun to doubt it. I checked in practice:

With frequent requests even from the same IP address, GAE automatically raises new instances and evenly distributes the load between them. The figure above added 8 instances and this is far from the limit.

All measurements given in the article are relevant for one instance . As the load on the site increases, the overall speed of query execution will increase (due to the automatic addition of new instances), but the speed of execution of one query will not change (to be more precise, it will decrease within 30%).

Test conditions For storage tests, a table with 1 key and 1 string value was used. The line size was always equal to 500 characters (let's say, the size of the average comment on a blog or news site). It is worth noting that when you increase the size of the object, the read / write speed does not change significantly.

Test 1. Write to Google BigTable data store

Record by this value:

Act	CPU time	Real time
1 database entry	90 ms	40 ms
10 records in turn	750 ms	250 ms
100 records in turn	7250 ms	2100 ms
1000 records in turn	71000 ms	28000 ms

Batch recording:

Act	CPU time	Real time
10 records in one request	670 ms	60 ms
100 records in one request	6600 ms	370 ms
1000 entries in one request	65000 ms	2500 ms

As you can see, the cost of a packet entry is slightly less than the cost of writing by one value. But the speed of execution of packet writing up to 10 times higher .

Cost of recording:

Approximately 10 records take 1 processor second (slightly faster, taking with a margin):

Amount	How many records can you buy?
1 cent	3600 records
1 dollar	360 thousand records

Maximum records per request

Act	Maximum comfort *	Technical maximum
Alternate recording	20	1 thousand
Batch write	200	10 thousand

For comparison: SQL Server Express R2 on Amazon EC2 Micro: 1 second 500 records.

* Requests that take more than 0.5 seconds cause discomfort. For one request, if possible, no more entries should be made than indicated in the table (in addition, other operations should be taken into account).

Test 2. Reading data from Google BigTable storage and displaying it on the page.

Reading by this value:

Act	CPU time	Real time
1 reading	31 ms	20 ms
10 readings in turn	160 ms	100 ms
100 readings in turn	1600 ms	750 ms
1000 readings in turn	12000 ms	8500 ms

Batch reading (selection by condition> && <)

Act	CPU time	Real time
10 records in one request	100 ms	25 ms
100 records in one request	1000 ms	80 ms
1000 entries in one request	9600 ms	400 ms

Similarly, writing: CPU time is spent in about the same way: with batch reading and with unit reading. But in real-time with batch reading it takes 10 times less.

Reading cost:

Approximately 6 times cheaper than recording. 1 processor second - 60 readings:

Amount	How many times can you read
1 cent	21 thousand readings
1 dollar	2 million readings

Maximum reads per request

Act	Maximum comfort *	Technical maximum
Alternate reading	60	3 thousand
Batch read	1100	50 thousand

* Requests that take more than 0.5 seconds cause discomfort. For one request, if possible, one should not perform more readings than indicated in the table (in addition, other operations should be taken into account).

Test 3. Data Storage in Memcache

Recording and reading are performed for approximately the same time (reading is slightly faster, but for simplicity we will round it).

Number of operations	CPU / real time
one	23 cpu / 11 ms
ten	94 cpu / 50 ms
100	300 cpu / 300 ms
1000	3000 cpu / 3000 ms

Memcache cost

Approximately 3 times cheaper than reading from a database and 15 times cheaper than writing to a database.

Amount	How many operations can you buy?
1 cent	60 thousand read / write operations
1 dollar	6 million read / write operations

Maximum operations per request

Act	Maximum comfort *	Technical maximum
Read / write	150	9 thousand

* Requests that take more than 0.5 seconds cause discomfort. For one request, if possible, you should not perform more operations with Memcache than indicated in the table (in addition, you must take into account other operations).

Test 4. Scaling

All that is written above - concerns one instance and one request. The total speed of all requests can be much higher.

Practically conducted such an experiment. An application was created that executes 5 entries in the database and 100 readings with information displayed on the page (the size of dynamic data is 500 characters * 100 = 50 KB UTF-8). CPU Time consumption: 1300 ms per request.

This application was bombarded by requests in several streams (server bomber with a channel of 100 Mbps). Bombing results:

Number of threads	1st request time
one	100 ms
ten	150 ms

So, when the load is increased 10 times, the total request processing speed increased about 7 times (about 150 ms 1 request, only 10 * 7 = 70 requests per second).

As a result of the experiment, 16 thousand requests were executed in 3 minutes (10 instances, 1600 requests per one) and 800 MB of traffic was spent.

Conclusion

The conclusion for myself made this: Google has once again overtaken all its competitors in cloud-based highly scalable hosting by a few steps. Prices for data processing, of course, are not small, but the unique pricing policy distinguishes GAE from other cloud services. I would like, of course, to see a worthy response from Micorsoft to Google App Engine, but it is very doubtful that we will wait for this, since such technologies require ownership of the KISS principle.

UPDATE about the picture

I look, in the comments reprimands began in the direction of Google about the image in the article, they say GAE could not stand the so-called. "Habraeffekt". This is all not true.

The picture is temporarily (from about 1 in the morning to 7 in the morning) disappeared as the free limit ended. Added 16 hours of time, I think that's enough. I note that even before I placed the counter on Habré, I already spent 30% of the free limit on experiments. The remaining 70% was enough for 20 thousand impressions of the counter (given its slowness - not so little).

By the way, this is not just a picture but a counter . The picture with the counting results is created programmatically using the org.toyz.litetext library for each request (caching is disabled). Since the org.toyz.litetext library is very slow (I did not write it), each request consumes 500 ms of processor time. In addition, information about each display is recorded in the database.

Now (around 9 o'clock in Moscow) about 200 views per minute (everyone can see and count the counter independently). GAE keeps 13-15 instances on (now 13, there were 15). The load is still rising. ~~How many instances will be at the peak - I will write it off later.~~ Details (with graphs) wrote in a new article: habrahabr.ru/blogs/gae/115731

Source: https://habr.com/ru/post/115517/

All Articles

The speed and cost of working with Google App Engine data in tables

More articles: