Data storage in Google App Engine

This article is based on a blog entry by Nick Johnson. In addition to it, there are a few figures that are relevant at the moment and some notes have been added.

App Engine provides many ways to store information. Some (for example, data storage) are well known, but others are few, and all of them have different characteristics. This article will list the various possibilities and describe the advantages and disadvantages of each of them, so that you can make decisions with more information about the data storage options.

Data Storage (datastore)

The most famous, used and flexible data warehouse. Datastore is an App Engine non-relational database, it provides reliable long-term storage, and also provides maximum flexibility in storing, retrieving, and processing data.

Advantages:
- Reliable - the data is stored seriously and for a long time.
- Reading and writing - applications can both read and write data in the datastore. Also, the datastore provides a transaction mechanism to ensure integrity.
- Consistently - the storage view is the same for all application instances.
- Flexible - queries and indexing provide many ways to query and retrieve data
Disadvantages:
- Speed - since the datastore stores data on a disk and provides guaranteed reliability, the writing process requires waiting for confirmation that the data has been saved and the reading process has to take data from the disk.
How and where to use:

A specially trained datastore description is here .
Use datastore should be everywhere where it is necessary to reliably save the data used by the application in the future.
')
Where it is better not to use:

Often, developers write to the database debugging and technical information needed only by them. For such cases, the built-in App Engine logs are much better suited, they will be discussed below.

Memcache

Memcache is known as a "secondary" data storage mechanism. The memcache API provides applications with the ability to optimistically cache data to avoid costly operations. Memcache is often used as a caching layer for other APIs, such as the datastore, or for caching the results of any calculations.

Advantages:
- Fast - access time to memcache is usually a few milliseconds.
- Consistently - the storage view is the same for all application instances. In addition, memcache provides atomic operations, so applications can guarantee the integrity of the data stored in it.
Disadvantages:
- Unreliable - data can be deleted from memcache at any time.
- Not always available — during App Engine maintenance periods, memcache is not available.
How and where to use:

As a cache datastore, urlfetch or results of calculations.
Where it is better not to use:

For the storage of important data, do not forget that they can disappear from memcache at any time.

Instance memory

Application instances can also cache data using global variables or class members. This method provides the highest speed, but has some drawbacks.

Advantages:
- Quickly - literally, as fast as possible, since the data is stored in the same process that requests them.
- Conveniently - no need for an API, data is simply stored in global variables or in class members.
- Flexible - data can be stored in any format in which your program can process them. No need to serialize / deserialize them.
Disadvantages:
- Unreliable - instances can start or stop at any time, so applications should use instance memory only as a cache.
- Inconsistently - each instance has its own environment and, therefore, its global variables. Changes in one instance will not be reflected in other instances.
- Limited capacity - instans have a limited memory usage, after which they are destroyed. The limit for data in the instance's memory is about 50MB - when using a larger volume, the instances will be destroyed very often.
How and where to use:

For caching frequently used and rarely modified data - information about sessions, application settings, guest pages, etc. It is especially convenient to use instance memory in dict variables - you can create unique “key-data” storages for various types of data.
Where it is better not to use:

For caching frequently modified data or data with which the user interacts. Different requests of one user can be processed by different instances and caching in this case will cause significant confusion.

Blobstore

BLOB storage allows you to easily and efficiently store and deliver large amounts of data uploaded by the user.

Advantages:
- Supports large files - up to 2 GB on blob.
- Eliminates the need to write handlers.
- Provides a mechanism for high-performance service blobs, especially images.
- Applications can read the contents of blobs as if they were local files.
Disadvantages:
- ~~Read Only - the application cannot create blobs or modify already loaded ones.~~ March 30, 2011 Files API appeared in the App Engine - now the data in the blobstore can be changed.
- To use blobstore you need to enable billing.
How and where to use:

For storing custom images, files and other large objects.
Where it is better not to use:

For small files with which the application is planned to interact, BlobProperty in the datastore is better suited.

Local files

The application can read any files downloaded with the application and not marked as static content using standard file system operations. This adds read-only data that the application may need.

Advantages:
- Fast - reading local files involves only standard disk operations on the machine on which the application instance is running, so the speed is almost the same as that of memcache.
- Reliable - if the application is running, local files are always available.
- Flexibility - you can use any format or mechanism for accessing local files.
Disadvantages:
- Read Only - applications cannot modify files.
- Limited size - limits are 10 MB per file and 150 MB per application.
How and where to use:

Storing application settings, templates, etc.
Where it is better not to use:

No contraindications

Load queue of tasks (Task queue payloads)

This is not a repository in its traditional sense, data that can eliminate the need to use other storage systems can be attached to tasks from a taskqueue.

Advantages:
- Fast — the data is sent to the task when it is launched, so no additional API calls are required to get the data.
- Used properly, it avoids the need to store data somewhere else.
Disadvantages:
- Only for one task - the load is only useful as a repository for the data sent to the taskqueue task.
- Limited size - the size of the tasks, including the load, should not exceed 10Kb
How and where to use:

Background data processing, sending mail, updating the cache - any work, transferring which to the background execution will speed up the response to the user and does not affect the server response received by the user.
Where it is better not to use:

Processing more than 10K of data will require the use of other storage methods. You should also not forget that in some cases tasks from taskqueue can be executed with a significant delay.

Email

With App Engine, email can be used not only to communicate with users, but also for technical purposes. In this case, the data transfer method is similar to the use of taskqueue payload, but using email provides more options, such as transferring data to another App Engine application.

Advantages:
- Flexible - you can send large volumes by sending "regular" mail or send "admin" mail without affecting the mail quotas.
- Conveniently - the letter with data comes in as a POST request, for the convenience of processing of which there is a standard InboundMailHandler.
- The ability to exchange data between applications.
Disadvantages:
- Spam - unplanned emails may arrive at the address of the application, additional verification of incoming data is necessary.
- When sending, you need to use different methods depending on the amount of data - administrators can send letters no more than 16Kb, and sending regular letters is relatively expensive.
- For full-fledged work with the "admin" mail, it is desirable to include billing. An application with billing enabled can send 3,492,979 emails per day to administrators, while a total of 5,000 are disabled.
- Nontrivial process of connecting the application's address as an administrator - you need a temporary handler to add a Google account to this address and include it in the list of administrators.
How and where to use:

Transferring small amounts of data (up to 16Kb) between applications.
Where it is better not to use:

Frequent transfer of large amounts of data quickly consumes the mail quota - for this purpose, URLFetch is more suitable.

URLFetch

The URL retrieval API allows you to get information from other hosts using HTTP and HTTPS requests.

Advantages:
- The ability to get data from other applications / servers.
- Asynchronous — when you receive data asynchronously while waiting, you can perform other calculations.
- Size - an application can receive up to 32Mb per request, however, you can send no more than 1Mb through this API.
Disadvantages:
- The speed depends on the speed of the other host.
- Traffic - for URLFetch service and users a single traffic quota. Using URLFetch too much can lead to user denial of service.
How and where to use:

Background download and data processing, such as RSS. Interaction with third-party applications, for example with reCaptcha.
Where it is better not to use:

Retrieving data to the user when it is possible to implement faster methods.

Application logs

Usually, this method is undeservedly forgotten and datastore is used to collect information about the operation of the application. However, if you do not want to reduce the application performance during the collection of technical and debug information, then this method will work much better.

Advantages:
- Fast - logging takes a few milliseconds.
- Convenient separation of messages by priority.
Disadvantages:
- Record only - the application does not have access to the logs.
- Only text data, Cyrillic logs can cause errors.
- The need to parse - the presence of the function request_logs in the developer's tools only allows you to receive logs in the form of text, and you will need a separate parser to process it.
How and where to use:

To collect information about the operation of the application, measurements of query execution time, notifications about slow work of functions or about emergency situations.
Where it is better not to use:

In some cases, it is more expedient to store application statistics in the datastore. In such cases, it is better to transfer the data to the taskqueue task and write it to the datastore in the background.

Conclusion

App Engine gives much more ways to store data than it seems at first glance. Each of them has its own trade-offs, so it is likely that one or more of them will work for your application. Often the optimal solution includes a combination of methods, for example, datastore and memcache, or local files and instance memory.

Source: https://habr.com/ru/post/110901/

All Articles