JS Programming Contest: Word Classifier (Preliminary Results)

Thanks for waiting! We publish the preliminary results of the competition in programming .

312 solutions were tested, 50 of them fell or hung, 3 more were too slow to pass all tests. Of the remaining 259 decisions, 12 for various reasons were declared “out of competition”: the solutions did not work without the data file type amendment (the authors forgot the “gzip” box) or were sent by Hola employees.

The current results are preliminary. We hope that we did not make mistakes when summarizing the results, and then on June 20, 2016 these results will become final. Then, instead of decision IDs, the names or pseudonyms of their authors will be published.
')
The decision of the winner of the competition showed a result of 83.67% correct answers. Full lists of solutions with test results are in the English version of the post on GitHub .

In the same place, we publish “raw” machine-readable test results for each of the solutions, where there is more diverse information than in the summary tables. Based on these data, you can conduct your own analysis of the results, which we will be very happy about.

About choosing a dictionary

Many wondered why we chose such a strange dictionary, many of the “words” of which cannot be called English. It was important for us that the result in 100% was not achievable, otherwise we could not choose which of the solutions that reached 100% was the best (additional criteria would be needed, for example, performance). Standard spelling dictionaries contain from 50,000 to 165,000 words . Even a dictionary of 165,000 words could easily have been compressed up to 64KB, along with the code for unpacking. On the other hand, if we decided to proportionally reduce the quota (to 16 KiB, or even less), then there would already be a noticeable lack of space for the code, and the competition would become a competition to minimize the length of the code. We didn’t want to go in this direction, so we chose the largest “dictionary” that we could only find. It includes all imaginable highly specialized terms, as well as rare spelling variations of words, and even some nonexistent words generated as a result of the false alarms of the inflection algorithm (stemming). Thus, in the chosen dictionary only a quarter of words can be fully called the words of the English language. However, the other words in the dictionary are not completely random, but are combined by the similarity of statistical properties. Therefore, we decided to make such a compromise and chose an “insane” size dictionary from the one proposed by the SCOWL project.

Source: https://habr.com/ru/post/303178/

All Articles

JS Programming Contest: Word Classifier (Preliminary Results)

About choosing a dictionary

More articles: