Yes, all "real boys" are able to build web systems that can withstand monumental loads. Well, for "nepatsanoff" there is always
Google and the mass of
sites dedicated to this subject . However, the “growth problem” includes not only the question of the correct serving of data to the client and their correct replication / distribution on the cluster. Often problems arise from the fact that everything is just the opposite - it works too fast. Consider an example from recent practice:
Given:
- Event queue ( events )
- The fact that events can be linked in chains
- The fact that events can generate "new" events
- The processor ( daemon ) that processes this queue (casts the necessary classes, loads the libraries, does all that needs to be done in the event) and marks each entry in the queue as “processed”, writes a log and runs further.
- MySQL DBMS (historically)
')
So far, everything is pretty clear - there is a table for the queue with fields from the series: id (int), event_id (int fk), event_data (blob), execute_at (datetime), executed_at (datetime). The demon takes one event at a time, and does his evil work :)
But now our project has grown and the users in the system have increased, the machines in the cluster have also increased, well, the number of tasks in the queue has increased accordingly. Users waiting for “3 seconds” until the demon deigns to process the next step of the chain became broke and they began to beg for support for MORE productivity. The support was "torn and metal" and in the end decided to launch several more processors.
Accordingly, the architecture of the queue has changed - we have "locks". Simply put, before you start an event to handle, the demon "A" marks this event as "busy", and all other demons "do not see" it. Correspondingly, the process of sampling and processing began to look like this (pseudocode):
if (event = db.select ("select id from queue where locked = 'false' and execute_at <NOW () LIMIT 1")) {
db.execute ("update queue set locked = 'true' where id =" + event.id);
[...]
}
And everything would be fine if the system was not built on a cluster of ultra-pumped machines, and if in the “space” about 100 demons were not active. For, as practice has shown, between the first and second SQL queries easily had time to "wedge in" a few more demons and start working out the same task. If we proceed from the considerations that one task can “cast” new ones, then in a few days 1000000+ tasks for execution can be in the queue, and the logs on the server will be over 10 hectares of usable space. How senior to live!
Who is guilty? What to do? Transactions do not give proper results. How to live? Whom to beat? Who to take your hands off?
And what do we actually need? But it is necessary that the
DBMS automatically blocks for other processes the RANGE from which the selection is made (to block the table - not to offer). I mean that other demons simply did not see the series that fell into
my sample.
And what? And How? And everything is elementary, in MySQL there is a
SELECT * FROM table FOR UPDATE construct - which is exactly what it does. Correspondingly, we rewrite a piece of processing code into the following:
db.execute ("TRANSACTION START");
if (event = db.select ("select id from queue where locked = 'false' and execute_at <NOW () LIMIT 1 FOR UPDATE")) {
[...]
}
db.execute ("COMMIT");
That's all love!
ps Warning! All this works only on tables like InnoDB!