I want to talk about some of the findings that I made after working on one of the most visited websites in the world.

I happened to take part in the work on this project as a consultant. Attendance of the resource is about 200 million unique users per month. Such popularity also means a high level of information security risks, in particular, the risk of being subjected to various types of attacks, the most common of which is DDoS. The organization, which I will not name, has introduced a wide range of solutions to prevent the impact of such attacks on the performance of the service.
These protective systems are quite common. They are based on assembling content at boundary nodes (CDN, ESI) and applying a multi-level passive cache.
')
This design is good for ensuring the stable operation of the service. However, the creation of applications designed to use the passive cache, means a greater additional burden on the programmer’s teams. Below we will discuss this in more detail.
Working on the project, I found a way to protect against DDoS attacks, which, having the same advantages as the passive cache, does not regulate as rigidly the architecture of the services that underlie the system. About him today and will be discussed.
What is a passive cache?
A service using a passive cache can only read data from a cache. The service knows nothing about the source of this data.
In this configuration, the cache support system is a key-value data store (for example, Redis), and the main data source is a relational database management system (for example, Oracle Database).
A service with an active cache first tries to read the data from the cache, and if this fails, it accesses the main data source.
Use passive cache to protect against DDoS
Using the passive cache architecture ensures that the main service with the source data source never encounters an unexpectedly large volume of requests. Regardless of how many and which queries will be executed by the service, the primary data source is used only by the Message Queuing service to populate the cache data store.
For example,
http://gajus.com/blog/ is a blog service. Here are the articles. The client can access individual articles using their unique indexes. Here are examples of article addresses:
In this example, "1", "2", "8" and "9" are resource identifiers, unique indexes that are used to access data in the repository.
The blog service in question uses an active cache. When a client requests an article with index “1”, the service accesses the cache and returns the result, or (if there is no record with the data of the requested article in the cache), it accesses the database, gets the result and stores it for some time in the cache.
If an attacker organizes an attack that involves performing HTTP requests to get an article with an index of “1”, all these requests will be served by the cache repository. Requesting data from a key-value repository does not require a large expenditure of resources. In order to successfully attack the system, loading over and above the search subsystem in such a repository, the attacker would need very serious power.
The situation varies greatly if an attack uses arbitrary values to construct an article identifier, for example, values in the range from 1 to 1 million. Now each request will result in the need to access the main database.
Unlike searching in a key-value repository, queries to the relational database are very resource-intensive. There are chances that requests / responses will need to go through a much larger number of nodes, it is possible that the answer will need to be processed using application logic, results will need to be cached, and so on.
The scaling of the key-value storage is performed quickly and inexpensively, which cannot be said of the scaling of the relational database management system.
If the service uses a passive cache, then scaling problems are limited to it. The passive cache architecture is designed to quickly and easily increase the cache power. However, such an architecture complicates development.
Developing services that use passive cache
When creating a service that uses a passive cache, you need to take into account several requirements.
- First, data can be read only from the cache.
- Secondly, after each operation requested from the service to create, update or delete data, it must put the corresponding task in the queue of requests to the source data store.
- Thirdly, after each task to create, update or delete data performed on the main storage data, the service should queue up the task of updating the corresponding cache sections.
Each CRUD operation of the system must be implemented taking into account the above limitations.
During the development process, the time between a request to perform a certain operation and a result is increased due to the presence of a task queue. This slows down the development and testing process. In addition, the programmer needs to know about specific errors that may occur due to outdated data in the cache.
On the other hand, during the development of an application that uses the active cache, the programmer can, in the course of work, simply disable the cache, achieving a very high speed of processing arbitrary requests to the main data store.
It is more difficult to develop systems using passive cache, but when safety comes first, they usually do not pay attention to such difficulties. However, this does not mean that all methods of protection against DDoS attacks will certainly require tremendous effort. Above, we looked at an example of an attack on the blog service by iterating over article identifiers. You can mitigate the effects of such attacks by making resource identifiers unpredictable.
Resource Identifier Signing
The reason why active cache systems are subject to the attacks described above is that an attacker can easily construct a resource identifier. Regardless of whether the identifier is a numeric ID (such as in our example), encoded in a base64 GUID, as in the GraphQL API, or a UUID, as in most document-oriented databases, the problem is that when the server receives request, he does not know whether the requested resource exists. The only way to find out is to execute a call, either to the cache, or to the main data source, and wait for a response. In order for the server, without accessing anything, to be able to determine whether the requested resource exists, resource identifiers can be signed.
Signing allows, without seriously affecting the performance of the system, to find out whether the request was correctly formed. If the resource identifier is signed, the attacker will not be able to query for identifiers that are not part of a limited set of public IDs.
It all works like this: the service receives a request, and tries to decrypt the resource identifier. If he succeeds, the decoded value is used to search for the requested entry. If the identifier cannot be decrypted, the request processing ends.I use this approach when creating GraphQL resource identifiers. In particular, the proxy, which forwards the GraphQL requests, first checks whether the resource ID is valid.
SGUID
Signed GUID, or
sguid is a pact for Node.js, in which I adopted the procedures for creating and verifying signed identifiers. You can
toSguid identifier using the
toSguid . To check and open signed identifiers, use the
fromSguid command. It looks like this:
import { fromSguid, InvalidSguidError, toSguid, } from 'sguid'; const secretKey = '6h2K+JuGfWTrs5Lxt+mJw9y5q+mXKCjiJgngIDWDFy23TWmjpfCnUBdO1fDzi6MxHMO2nTPazsnTcC2wuQrxVQ=='; const publicKey = 't01po6Xwp1AXTtXw84ujMRzDtp0z2s7J03AtsLkK8VU='; const namespace = 'gajus'; const resourceTypeName = 'article'; const generateArticleSguid = (articleId: number): string => { return toSguid(secretKey, namespace, resourceTypeName, articleId); }; const parseArticleSguid = (articleGuide: string): id => { try { return fromSguid(publicKey, namespace, resourceTypeName, articleSguid).id; } catch (error) { if (error instanceof InvalidSguidError) {
In addition to signing identifiers, Sguid is designed to use namespaces and resource type identifiers. This ensures the global uniqueness of identifiers.
Sguid uses the Ed25519
public key cryptosystem . The resulting signature is encoded using the base64 URL encoding.
The disadvantage of this approach is identifiers that are inconvenient for people to use:
pbp3h9nTr0wPboKaWrg_Q77KnZW1-rBkwzzYJ0Px9Qvbq0KQvcfuR2uCRCtijQYsX98g1F50k50x5YKiCgnPAnsiaWQiOjEsIm5hbWVzcGFjZSI6ImdhanVzIiwidHlwZSI6ImFydGljbGUifQ
Plus - scalable protection against DDoS attacks conducted at the application layer of the OSI model, without overly complicating the development process.
Results
It should be noted that the method described here does not at all help to protect against attacks aimed at overflow of communication channels. Moreover, it is effective only if the cache is able to store data for all meaningful requests. But, despite such limitations, this is a worthy approach to protecting against attacks designed for server cache misses.
In addition, it must be remembered that in defending against cyber attacks, it is important how the results of the attack look from the point of view of the attacker. In order to sort out an effectively constructed system using the resource identifiers using the approach described here, we need serious power. Perhaps the attacker simply does not expect this, and seeing that the system does not respond to his actions (although it may well work at the limit of capabilities), decides that he has already tried everything he can and stops the attack.
How do you protect against DDoS attacks?