Such a pain! Crowds against the Web - 2: 0. Episode Two - Clones Come in at Noon

We continue the warstory optimization of the PHP + mySQL site of the rarest of words. One May day, digging into words, I got the idea to place textboxes under these words - to use [fashionable for Web 2.0] crowdsourcing. Complete what you want, just about this particular word. But, the collaborative was supposed to work — like Google Docs — if you and someone else are editing the words on the same page — the changes will be displayed simultaneously. Or, for example, someone edits the word “google” on the Google page, and on the TechMeme page, for example, there is also the word “google” - and they will see these changes in real-time from the Google page [it is not necessary to understand]. I would know what consequences this fun idea on mySQL will have later ...

It was easy to do this: prototype.js - in hands, ajax - every 10 seconds, we look at the database that changed in the last 10 seconds, intersect with the words on the current page, send js highlight effect and new text - voila. There were people, but very few people entered something, then I thought up to hang a “just said” block in the corner of the page. For example: "On the 'fibonacci.com' page, it was just said that 'fibonacci' → 'crazy math scientist'." This has already greatly interested people and occupancy has increased significantly. Everyone saw the movements on the site and were involved.

Actually, I made enough functionality for a rather amusing toy - “site wars with rare words”, auto categorizer, synonym, etc. I sat peacefully editing the “live” site, uploaded a new file, if I saw an error - corrected, without worrying that someone could see it. I did not suspect that already thousands of people see this. After another error in the name of the function, something happened ... I could not upload index.php to FTP ... it was TechCrunch ....
')
One of the site visitors wrote an article for them, they published it (“TheRarestWords: Intriguing Semantic SEO Project from Russia”) and a million of their subscribers went to the site [well, it got to the site much less].

The load on the server quickly increased. I suspecting nothing, at that moment I edited index.php, and let’s make a mistake in it. Many already sat on the open pages, and worked only ajax'om. The consequences were already inevitable - the point of no return was passed ... LoadAvg was steadily growing. And along with a database of 72 million records in MySQL, this did not give any hope for survival ... After five or ten or fifteen minutes, I managed to still load index.php - it had already started 18 hours of “optimization in combat conditions”.

Temporary poultices

The first thing index.php was replaced by:

  print "I, for one, welcome our TechCrunch overlords, but we are kind of overloaded"; 
   exit ();

which hit the load and let me fix it on:

 if (uptime ()> 50) {
   print "I, for one, welcome our TechCrunch overlords :) but we are kind of overloaded"; 
   exit ();
 };

 function uptime () {
	 $ fp = @ popen ('uptime', 'r');
	 $ s = @ fgets ($ fp);
	 @fclose ($ fp);

	 @preg_match ('# load average: ([0-9 \.] +) #', $ s, $ m);
	 return $ m [1];
 };

That is - if loadavg became more than 50 - the site just turned off. This allowed me to put the ceiling, above which the server did not rise more. At first he jumped in my opinion almost to a hundred, I definitely saw 70.

Then I began to think briskly which blocks of my site took the most time to generate. I knew that the auto-categorizer took a good half-second for each page, so it was immediately taken into if uptime ()> 5 ... else print "the block is unavailable now." - loadavg dropped significantly, but still went under the ceiling at 50. I continued to assign the ceiling of loading to other blocks at which they will be disconnected. This greatly relieved the load, but it continued to be under the ceiling, causing the site to shut down - after all, a million TechCrunch readers did their job, given that many of them were sitting on the site and filling in the cells.

Quickly think what to do next. I remembered that I was playing with memcached a bit and decided that I should use it.

memcached

yum install memcached

First, thinking, I realized that the top statistics of the words: “We have 17 million words in total, now we filled 12321 words” took a lot of time to calculate, because I took the table log, which recorded all definitions and did GROUP BY word to calculate the number of unique words. In addition, for each user, this happened every 10 seconds. But I did not want to just cache it, I wanted almost real-time statistics at any cost. The solution is obvious - we use memcached as a second database. So memcached_set ('count', real_account_m_mysql (), 0), that is, put the key, really counting the words, and now this very key was given to everyone. And if someone entered the word - I used memcached INCREMENT on this key. This did not solve the problem of unique words, so by rand (0.1000) <= 1, this key was replaced with a real account from mysql. That is, for about 1000 requests there was a small error (error), but it returned on its way to a real account [every 1000 requests].

Actually, this is about what, they say, the developers of Google - the aggregated data should be considered iteratively.

By the way, if you have a question why I didn’t just make a table of words, I made a log, where I kept all the historical definitions of each word - this is because of the trolls . The first time was a terrible lot of rubbish like "fsdhjfsdfsd" - registration is not required, write what you want. Therefore, in order to roll back this quickly, mate, advertisements (which was just a lot!), I needed such a table, but I could not redo it in real time on a new table.

Live vs beta

Actually, at that moment I realized that it’s impossible to work with a “live” site, so in httpd.conf I made a complete copy of the “VirtualHost block for the subdomain beta .site.com, where I copied all the scraps and worked with them so that I am debugging in real time the site under load - the people did not fall off a hundred or two hundred people because I was sealed in the name of the function.

Clearly, it was necessary to continue to optimize the piece that is done most often - namely, ajax. Ok, I really liked memcached as an alternate database, because it guaranteed no I / O, so I began to think about where I could apply it more. [loadavg less and less reached 50, but it was still almost impossible to work with the site]

My attention was attracted by the block: “the last thing that was said.” After all, he had to find the last 5 records in the mysql-table, and this, of course, is not a joke, it was done on the 100,000 table by the ORDER BY time1 DESC method ... every 10 seconds ... for each of the thousands who simultaneously sat on the site ... that is, about a hundred times per second. memcached block and voila ... the load is asleep!

Aliluya! stop ... and where is collaborative editing? I don’t see hundreds of people editing rare words on google.com ... your division, well, of course ... I cached this ajax block, and it caused the fields to change. He cached once for some page and tried to insert the same changes into others, only there were no such words.

Memcached array

So, I needed to do something to temporarily store what people did on the site in the last 10 seconds! I organized something like an array in memcached - the keys with the name from ' tmp_cache_0 ' to ' tmp_cache_20 ' [20 - picked experimentally] and when someone entered something - went through them all, looking for a free cell, and when I found her - put this key for 10 seconds in the denormalized value of 'word | page | that a person entered'. After all, every 10 seconds, all users on the site turned to ajax and in the next 10 seconds these values are not needed - the keys expired in memcached obviously played into my hands here, because I didn’t have to give DELETE either, so as not to block the base. As, for example, it would be necessary in a case with mySQL.

And now, with every ajax call, I didn’t need to contact mysql at all , except for the cases when someone entered something new — check 20 memcached keys, which is very fast and one more key is to update the real-time stats. Voila, the load dropped to 10-15! I was pleased with the Postgres logo and went to bed - by this time I was already 18 hours old.

I have to make a reservation that I understood that memcached is not permanent storage and the keys may fall out, but did not make a fallback in mySQL, because it is not a banking application and the fact that someone will not see one or two updates did not cause a disaster. I am an engineer in the brain and if the application works reliably in 95% of cases - I will not spend another 50% of the time to complete the remaining 5%. Although, in my case, I did not see a single lost key, but someone quite possibly could.

Yoi Haji
view from Habra

Source: https://habr.com/ru/post/40180/

All Articles

Such a pain! Crowds against the Web - 2: 0. Episode Two - Clones Come in at Noon

Temporary poultices

memcached

Live vs beta

Memcached array

More articles: