📜 ⬆️ ⬇️

On the revision of the results of the competition in programming on JS

Thanks to the participants of the contest for programming long-suffering. I am writing this post to recognize and correct the serious mistake that we made when summing up.

We received a lot of comments on the method of testing solutions. Below are our responses to these comments.

Tests for correctness incomplete


Thanks to the two participants in the competition who sent additions to our set of tests for correctness. Thanks to these additions, we were able to identify several solutions that produce incorrect results in rare cases.
')

Performance tests give distorted results due to the nature of the testing methodology.


We ran the participants' solutions in the “virtual machine” implemented by the vm module in Node.js in order to prevent both the use of require and other possible “surprises” and attempts to manipulate the test system. However, it turned out that the vm module introduces serious distortions in the test results. In particular, when using this type of virtualization, access to the global namespace becomes disproportionately slow. Because of this, a simple transfer of auxiliary functions to the body of a filter function can improve performance several times. Most participants tested the performance of their code with simple wrappers using require , so they did not optimize it for the features of a particular virtualization method. We believe that such a stylistic choice as the placement of auxiliary functions inside or outside the main function should not have such a significant impact on the result.

We did not know about this feature of the Node.js virtualization module, but, upon learning, we considered it serious enough to review the results. Now we are testing in a normal kernel-level virtual machine. We load the participants' modules with require and call the filter function once. For each pass of the test, a new Node.js process is launched.

Unfortunately, this fix will lead to a change in the top three. We apologize to those whose victory was announced earlier, and to those who were initially deprived of well-deserved prizes.

In the performance tests are not characters ? in the rules


In contrast to the tests for correctness, which are designed to check the correctness of the program's behavior even in artificially created rare situations, the tests for performance should be similar to real life. In particular, real users hardly ever use the sign ? in the rules. Most of the rules in our performance test do not even contain asterisks, while in the rest, the asterisk is used instead of the whole part, either before or after the @ symbol, which also reflects the typical use of this feature.

Not enough rules in performance tests


Even the 100 rules are a bit too much for a typical advanced mail user. In contrast to the number of letters, we do not expect good scalability with an increase in the number of rules to such unrealistic numbers as, for example, 1000.

Not enough emails in performance tests


We decided to use 100,000 emails, as this is a realistic number for an active user per year. As an experiment, we decided to increase this number to 800,000 - larger sizes are already beginning to cause a shortage of RAM for some solutions with standard V8 settings. An enlarged test can be generated by the generate_large_test.js script with the xlarge option. We drove all the correct solutions on it and saw that the increase in the amount of input data changes the leaders somewhat.

And 100,000, and 800,000, and intermediate values ​​- in principle, equally worthy options. We could make a choice in favor of an enlarged test, or take a certain weighted average. However, we decided to leave the initial version with 100,000 letters in order to minimize the change in the testing methodology compared to the originally published one. If the above virtualization problem introduced such monstrous distortions that the competition could no longer be called fair, the size of the test data could be freely chosen in some reasonable range, and the size that we decided to use was quite reasonable.

Tests had to be published in advance.


We did not publish tests at the beginning of the competition, so as not to create a situation in which participants would be engaged in optimization for specific tests, and some would even try to deceive the test system. Say, seeing that our script that generates performance tests never uses the sign ? , and the * sign puts only at the beginning or end of the mask, a cunning participant could write one completely correct version of the algorithm for the case of testing for correctness (if the number of letters is small), and for a large number of letters switch to a faster version that supports only those cases which are found in performance tests.

Nevertheless, we decided in the future to publish more information about how to be tested. Let's say, without showing the test data themselves, we could announce in advance their volume.

Results


New testing system and final results published in the repository on GitHub . Corrected results - in the next post .

Once again, I apologize to everyone affected by this story.

Source: https://habr.com/ru/post/274961/


All Articles