In one
non-central remote region of our immense country, the regional stage of the All-Russian Olympiad in Informatics and Programming was once held. Until 2014, everything was fine, we held the Olympics on the old system, written in the distant 2004 by a very talented programmer, on Delphi. Since then, no one has changed it - worked, well, okay. In 2014, we decided to try ejudge. They didn’t pick up everything from the sources, they decided to take a ready-made image for the virtual machine. Everything was fine, everything worked.
But then came the year of 2015, in which some of the points of the Olympiad were changed a little, quite a bit, and the necessary “people” learned about these changes only 1-2 days before the start ...
This is where the fun begins.
The fact is that almost all these changes concerned only the two of us (I +
ripatti ).
I was responsible for the server (fedora19, ejudge) and its performance, he was responsible for the preparation of tests, the configuration of the tours in general. He has quite a lot of
experience in this.
')
So, I will go in chronological order.
January 21, Wednesday
They ask me if I can raise the server for the Olympiad on the basis of the university’s dedicated machines, to which I answer negatively, because there was little time left, and the environment might be unfamiliar to me (I thought that VMWare was there, and I could only have a Virtual Box). In general, I could not give a guarantee that everything will be fine.
January 22, Thursday
I find out that there is such a thing as tokens. This meant only one thing: the decisions of the participants should be checked during the tour, and not after. Remembering last year's tour, I decided that one server would pull everything. Last year, nothing fell, everything worked, everyone was happy. Started working on the server. He brought a car (iron) to the walls of the university.
explanationI raised the server with ejudge in the walls of my lyceum, last year, in advance, before the Olympiad. Therefore, at the last regional stage, it was decided to try a ready-made solution.
In the evening I find out from my partner that the previous version of ejudge (2.3) does not meet the requirements. Just by this time,
Alexander Chernov posted a working version. He even started a new repository with all the settings for the trial tour. It was very tempting, because I had an idea in my head to customize the old version. We decided to build a new version of the source, as there was no ready-made image. Here the first problems began.
Problem: how to start up ssh on port 22?
background, decision (partial)The point here at the university. They, like any organization, block port 22 outside. We could work quietly in the walls of the university, but the problems would have started outside the walls. Thank God, my supervisor was the administrator of the cluster, which had an external IP, but access to it was denied. I asked him to help, in the end he set us up completely. In fact, I asked to give me ssh access to the cluster (from where I quietly got on my server on port 22), but he really did not want to distribute access to the left-to-right. We decided to "fundamentally solve the problem." I give him all the passwords, logins, and he promised to look. Yes, I am a gullible person.
In fact, I myself tried to do it, but I could not.
Cuts from what he later sent:
... thirdly, the ssh server settings are stored in / etc / ssh / sshd_config, not ssh_config, I added in the first
Port 22
Port 5000
PermitRootLogin no
and all hung out as it should:
[root @ localhost ssh] # service sshd status
Redirecting to / bin / systemctl status sshd.service
sshd.service - OpenSSH server daemon
Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled)
Active: active (running) since Thu 2015-01-22 21:01:38 YEKT; 4min 53s ago
Ur-rr!
Port 5000 for ssh is free, I can go to it.
But neither github, nor yum update, nothing ...
More precisely, at night I could not adjust these things.
At 7 in the morning I called (I woke up my partner), he told everything. The problem was that we stupidly could not compile the source code, because some of the libraries were missing, I could not stifle them (ssh 5000). Tried one by one, but there, damn, dependencies are very good.
We decided to create another server with full ejudge settings (3.3) so that you would not have to go to the server later (it was in the server room, under lock and key, it was difficult to get physical access to the machine).
January 23, Friday, the beginning of a trial tour at 16:00
At 9 am I go to take a colloquium on Funkan, the dean put something, did not look. It seems not "unsatisfactory."
At 10 o'clock I start to collect a new ejudge in parallel with Artem. He does it a little faster, but I stopped at a small step and stopped thinking further.
The second problem.
Collect ejudge vesri 3.3 from fedora19 with ejudge 2.3We did not delete the old version, just started to install a new one.
Shrinking source code from github, launch.
git clone https://github.com/blackav/ejudge.git cd ejudge/ ./fedora-configure make su make install
Yes, exactly, ejudge-conrtol picked up the old version.
Everything worked, go to the web version - we see the old.
Renamed the folder where the old version of the binary was located. At the same time, he pursued 2 goals: to make him disappear from the paths and make a backup of the old version.
Now restart ejudge-control, which is located in / usr / bin / ejudge-control:
[ejudge@localhost ~]$ ejudge-control start 2015-01-27T19:03:18Z:info:ej-users 3.3.1, compiled 2015-01-23 09:25:21 mysql: SELECT config_val FROM config WHERE config_key = 'version' ; 2015-01-27T19:03:18Z:info:ej-super-server 3.3.1, compiled 2015-01-23 09:25:21 2015-01-27T19:03:18Z:info:configuration file parsed ok 2015-01-27T19:03:19Z:info:ej-jobs 3.3.1, compiled 2015-01-23 09:25:21 2015-01-27T19:03:19Z:info:ej-contests 3.3.1, compiled 2015-01-23 09:25:21 2015-01-27T19:03:19Z:info:using files as the new-server database
A little more shamanism, and the trial tour is ready!
We said this when it was about 5:00 pm
I ran with the distribution in the server. I come, and there the screen just went out. I thought the monitor fell asleep. Everything is worse - just a sysadmin for some reason I chopped off the power of my iron. Now I am waiting for the windows server 2008 to boot, then copy, import into the virtual box, launch, add static addresses, configure ssh. Due to the fact that the last time I set it up my nauch.ruk (Artur Vladimirovich Yuldashev), this time I had to spend a lot of time. This was aggravated by the fact that in the server I did not have the opportunity to google.
The time is 5:45 pm, the test tour is almost over, our server has not yet risen ... There are a lot of calls coming - we answer, they say, everything, we are round, we don’t have time to pick up the server.
Time 18:00, the server has not yet risen. Gathered with other jury, we think how to get out of this situation.
It was decided the following: Artem and I are not sleeping, we finish the trial tour and the first one, we will prepare everything by 10, from 10:00 to 11:00 we will start a trial round, and at 11:00 we will launch 1 round. So we lost sleep for 2 night.
We said goodbye and went home. Houses have become all set up again, set up. By morning everything was ready.
January 24, Saturday, 1st round (official schedule)
A trial round begins, and here we finally understand what we are dealing with.
TokensWhat it is?
Last year was the following situation: the participant sends the source code to the testing system, which, in turn, only checks for tests that are shown in the example for the task. If the parcel fails, then it does not queue for a full check. Therefore, our honorable one server calmly coped with the entire load (there were 150 participants in total).
This year we had to check the solution immediately on all tests. To prevent participants from abusing this, this notion was introduced - tokens. It is, so to speak, the right to see the result of his parcel. It was equal to 10. That is, I can send a solution to a problem as many times as I wish, but I can only see 10 times. Subsequent shipments are at your own risk.
The trial tour has begun, and we have a server delay of 15 minutes. That is, the participant sends the solution to the server, and it is checked there only after 15 minutes. We are not afraid of this. And in vain. We thought it would pass.
I do the Reload contest, reset the entire queue of parcels. In this case, no one informed about this. As a result, 10 minutes before the end of the trial tour we are again thrown parcels. Quietly close the contest, open the contest for the 1st round.
11:00, 1 round
Literally in 15-20 minutes, several parcels arrive, a bad queue appears. Artem made it clear right away. In the first task, in the easiest, as expected, only 48 tests. The solution is in the forehead, which is gaining 50 points out of 100, and there is a good solution that you need to think of. But the majority should have learned about this only after their decision received
TLE . As you understood, one sending of task A, solved in the forehead, took 24 seconds from the server. There were more and more such packages, questions to the jury about testing time began to come. Artyom explained everything correctly, sent a message to everyone. But even so, almost everyone sent at least one “free” solution A. And then the line began to grow naturally. At first 15 minutes, then abruptly 45. Everyone, especially the participants, were worried, tense, dissatisfied. First of all by us. Artyom was at home at this time, I was there and heard in my address almost everything I should have heard. Began to think, you need to somehow try to get out of the situation. Found the necessary
article in the documentation, but could not
use . After that, we just closed our eyes to 30 questions and waited for it all to end.
Finally over! Delayed check - 1 hour. The participant had to send the decision an hour before the end in order to have time to look at the verification protocol.
16:00, go to the assembly hall. I meet dissatisfied eyes. Still, I just deprived children of reaching the final. How could I still be looked at? I crossed paths with one very famous teacher, told me what the problem was, what solutions there were - to parallelize. Wish me luck.
All announced a problem openly. They said that we did not expect such loads and the like. Immediately began to think, to find a way out.
Option number 1. Put 1 server in each display class, 2 in large classes. After the Olympiad, we’ll collect all the results, no one will have problems with the network, the load can be reduced in order, which will give the opportunity to meet all the requirements by 100%. There are obvious flaws: it is now Saturday, almost all display classes are already closed, including the server one. We don’t have servers at hand, images of round 2, too. Display classes are too far from each other, in 3 buildings. You can not talk about access via ssh. Round 2 begins on Monday 9 am, train. On Monday morning such a thing is not done, for there are only two of us.
Option number 2: connect the computing nodes to the main server. This case is perfect. Nothing needs to be changed in terms of organizing the Olympiad. The only problem is to create these computing nodes.
There was nothing at hand then. 1 call - and in an hour we have 13 laptops, core-i7, 8 GB of RAM each. The only image of the car that I had was the image of a trial tour.
20:00, we sit on the department, set up a server for 1 laptop. They called Artyom, let him come, he helps me to set everything up (I did not know how to set up the tour). Suddenly the thought comes to the organizer's head - the house is empty (the wife with her grandchildren arrives only on Sunday afternoon), come to me for the night.
Everyone is happy, more precisely, we are with Artem. Another teacher is going with us to help us.
January 25, night - day
We took 7 laptops with us, arrived, unpacked. We prepared delicious food for us, and, gaining strength, we began.
We set up the 2nd round, dropped the image into the drive and thought, or maybe try to parallelize?
A lot of time, forces, like, too.
And now the fun part. How does the ejudge.
There is a service (daemon) responsible for compiling, running, testing programs - ej-super-run. He takes data from / home / judges /, where configuration files, tests, checkers and sent solutions are usually located.
I do not know which process is responsible for the web interface, but we launched ejudge-control, which launched the entire system. I did not go into details.
Under parallelization, it was suggested to share the / home / judges / folder. And no matter how - SSHFS, Samba, NFS.
But for this you need to reassemble working nodes with a certain key, as they are called in distributed systems - slaves. OS labs included creating network folders using NFS and Samba. I easily took up the samba and immediately rested on the first problem, which was already too lazy to solve. throwing it, set to NFS. It was logical to expect that I would also meet a lot of problems here. There remains the last, more familiar to me SSHFS. Familiar because I was once friends with SSH, I often worked with it.
Opened the first tutorial, set everything up.First we make sure the directory / home / judges / is empty, otherwise we clear it.
sshfs ejudge@192.168.1.11:/home/judges/ /home/judges/
After that the directory / home / judges / becomes common with the server. For complete convenience, you can mount it, but we did not do this, for it is already morning.
If you need to specify a different port, add the -p parameter.
sshfs -p 5000 ejudge@192.168.1.11:/home/judges/ /home/judges/
In the case of our server, this was relevant.
And, thank God, it worked!
We chose one laptop as the server, and another as the slave. It comes and virtual machines raised on them.
Through the web interface, I launched 2 parcels (with while (true), so that I would give out TLE on all tests), which the server itself had executed, and noted the time. They launched ej-super-run on the working node, sent 2 tasks to check again - happiness.
The working node picked up the package, began to test. The scan time is almost 2 times less, 30 seconds versus 50.
The next step was to connect the working node with the real server, because now 5000 port is not terrible for us.
Began to fill in the rest of the laptops, simultaneously optimizing the settings. They wanted a nice script to write, which could easily prescribe all the settings, but, alas, curved hands for that and curves that such things cannot be done right away. All the settings prescribed by hand. On the server, they stopped the ej-super-run process, let them deal only with the web interface.
Further we thought: on each laptop there are 4 cores, 1 working node can check only in single-threaded mode.
Give a man a mountain of gold, he will want one more
Either we raise 1 virtual machine, we give it a lot of resources, and in it we parallelize the cores, or we just raise 2 virtual machines, 2 cores each.
We didn’t care how much the system accelerated - 2 or 3 times, if we still have a lot of machines. We decided to stop there, to raise 2 cars on the laptop. When all 7 laptops were ready, we decided to reward ourselves with sleep at 12 o'clock.
January 26, 08:30
Already fresh, at the university, Artem also arrived. We got all 13 laptops, the guys from the “network service” quickly crimped the wires, set up the network, as a result, 12 of them were already on the network, the 13th laptop did not get Internet access, the wire, apparently, was old. Quickly picked up the first 7, after which the web interface began to slow down terribly, apparently sshfs downloaded to itself the entire directory, which was quite plump.
Round 2 has begun, we already have 14 working nodes! I began to quietly connect nodes to the system, one at a time, so as not to overload the system.
The queues on the server did not exceed 10 simultaneous tests. That is, in principle, enough and 5 laptops to conduct a full tour.
They came from television, they were told that we have 24 workers hubs. I had to raise everything until the end of the Olympiad to keep my word.
As a result, the 2nd round participants wrote much better than the 1st, although in the 1st round there was a participant who wrote on 400, and on the 2nd round, they scored only 370.
Alternative
In fact, everyone knew everything for a long time, and for this they even resorted to the help of Yandex. The latter received applications from regions that could not independently conduct the regional stage on new requirements. Applications had to be submitted 10 days before the Olympiad, so we did not consider this method. 26 regions turned to Yandex.
They also said that in other regions all is not well either, the scores are low in general.
This is how we, technical jury, shamefully conducted the regional stage of the All-Russian Olympiad.
Conclusion
UPD :
List of errors:
1) As a student who has task paralleling in the program of study (openMP, MPI), I had to understand that it is impossible to hold the Olympiad on 1 machine
2) It seems that you need to start to be interested even earlier, to go yourself, to find out what's new, and not to wait until they call. The fact is that this year I deliberately, for a month, or even 2, was interested in the ROI, but I was not informed of anything sensible.