“Break the vote on RHS ++”. Give us 1,000,000 RPS

The second day of RHS ++ passed, and without delay we want to talk about how the whole world tried to break our voting box . Under the cut - the code, metrics, the names of the winners and the most active participants, and other dirty details.

Shortly before the RHS ++, we thought about how to entertain people? We decided to make a vote for the coolest programming language. And so that the results in real time were displayed on dashboards. The voting procedure was made simple: it was possible to go to the ODN.PW website from any device, specify your name and e-mail, and vote for any language. With the list of languages, they were not wise - they took the nine most popular ones according to GitHub, the “Other” language was the tenth item. Colors for the dice, too, took with GitHub.

But we understood that cheating would be inevitable, and with the use of “heavy artillery” - all of our own, an advanced audience, this is not a vote on the mommy forum. Therefore, we decided to welcome cheating in all possible ways. Moreover, they offered to try the community to put our voting station with a big load. To make it even easier for participants, they put a link to the API for cheating with bots. And at the same time they decided to award with incentive prizes the first three participants with the highest number of RPS. A separate nomination was prepared for the strongman who can break the ballot box alone.

The first day

Voting was launched almost from the very beginning of RIT ++, and it worked until 18 o'clock. Visitors and speakers of RIT ++ liked our entertainment. High-performance services professionals are actively involved in the race for RPS. Booth guests vividly discussed ways to put a vote. Teams of adepts of this or that language spontaneously emerged, which began to invent promotion strategies. Someone immediately sat down and started writing microservices or bots to participate in voting.

Some companies participating in RIT ++ and providing secure hosting services have also joined our competition. By the very end of the day, by joint efforts, the participants were able to put the system in for a short while. Well, as they "put it" - the service worked, we just rested against the ceiling by the number of simultaneously registered votes. Therefore, by 18 o'clock we suspended the voting, otherwise the results would have been unreliable.

According to the results of the first day, we received 160 million votes, and the peak load reached 20,000 RPS. It is curious that on this day the first and second places were taken by the active participant RIT ++
Nikolai Macievsky (Airi) and speaker Elena Grahovac from Openprovider.

At night, we prepared for the next day to meet him fully armed: we optimized the communication with the base and set the nginx before the Node.js application on each worker.

Second day

Many were interested in our proposal to put a vote, because the race for RPS is a fascinating task. In the morning we were already “waiting”: as soon as we switched the DNS, the number of RPS took off to 100,000. And in half an hour the load went up to 300,000 RPS.

It's funny that when we started to develop voting ballots, we decided that “it would be nice to support 100,000 RPS”. And just in case, laid the maximum performance of 1 million RPS, but at the same time even seriously did not consider the possibility of approaching such an indicator. And by the middle of the second day, they were practically betting on whether to break the ceiling into a million requests per second. As a result, we have reached about 500,000 RPS.

Implementation

We wrote down the project in 1.5 days, right before RIT ++. Voting is placed in the cloud service Google Cloud Platform. Three-tier architecture:

• Top level: a balancer acting as a front end, to which a flow of requests arrives. It spreads the load on the servers.
• Middle level: backend on Node.js 8.0. The number of machines involved is scaled according to the current load. This is done economically, and not with a margin, so as not to overpay for nothing. By the way, the project cost $ 8,000.
• Lower level: clustered MongoDB for storing votes, consisting of three servers (one master and two slaves).

All voting components are open source, available on Github:

• Backend: https://github.com/spukst3r/counter-store
• Frontend: https://github.com/weglov/treechart

During the development of the backend in the air, the idea of caching each request for vote wrapping and periodically sending them to the database was vital. But due to lack of time, uncertainty about the number of participants and the banal laziness, it was decided to postpone this idea and leave sending data to the database for each request. At the same time, MongoDB's performance in this mode is checked.

Well, as the first day showed, it was necessary to fasten the cache immediately. Each Node.js worker did not issue more than 3000 RPS per each POST to / poll, and the MongoDB master coughed heavily with LA> 100. Even optimization of query aggregation to get statistics by changing read preference to use slaves for reading did not help much. Well, nothing, it's time to implement the cache for cheating counters and for checking the validity of email (which was wrapped up in a simple _.memoize , because we never delete users). We also used a new project in Google Compute Engine, with higher quotas.

After enabling caching of votes, MongoDB felt excellent, showing LA <1 even at the peak of the load. And the productivity of each worker increased by 50% - up to 4500 RPS. To periodically send data, we used bulkWrite with the ordered parameter disabled to leave the database on the side of the order of execution of requests to optimize speed.

On the first day, a Node.js server worked on each worker, creating four child processes through the cluster module, each of which listened to port 3000. For the second day, we abandoned such a server and sent HTTP processing to the "professionals". Experiments have shown that nginx, interacting with the application via a unix-socket, gives approximately +500 RPS. The setting is fairly standard for a large number of connections: increased worker_rlimit_nofile, sufficient worker_connections, enabled tcp_nopush and tcp_nodelay. By the way, disabling the Nagle algorithm helped raise the RPS in Node.js. In each virtual, it was necessary to increase the limit on the number of open files and the maximum size of the backlog.

Results

For two days, no participant alone could not put our service. But at the end of the first day, by common efforts, they ensured that the system did not have time to register all incoming requests. On the second day, we set a record load of ~ 450,000 RPS. The difference in the testimony of the RPS on the front (which calculated and averaged the RPS on the basis of the actual records in the database) and the testimony of Google monitoring remains for us so far secret.

And we are pleased to announce the winners of our little competition:

1st place - {"_id": "ivan@buymov.ru", "count": 2107126721}
2nd place - {"_id": "burik666@gmail.com", "count": 1453014107}
3rd place - {"_id": "256@flant.com", "count": 626160912}

For prizes write kosheleva_ingram_micro !

UPD: TOP50 Hall of Fame

Source: https://habr.com/ru/post/330368/

All Articles

“Break the vote on RHS ++”. Give us 1,000,000 RPS

The first day

Second day

Implementation

Results

More articles: