
Today, May 11, 2013, at 01: 41: 39.8 UTC (05: 41: 39.8 Moscow time), the millionth article appeared in the Russian section of Wikipedia. By coincidence, the Russian section today celebrates its 11th anniversary. The
Life Extension Foundation article was created by a member of
UG72 .
Disputes have already flared
up about whether the article has the right to life, but the fact that it was she who took the line was unequivocally established.
The Wikipedia article counter shows the number of articles in which there is at least one link (there are
two other rule settings ). Thus, its value can be affected by the creation and deletion of articles as well as the renaming and even any editing. Add to this the fact that on the eve of the jubilee, the participants are beginning to massively fill their blanks in the hope that one of them will turn out to be an anniversary article, and that the meter, as a not very important thing, is updated asynchronously. As a result, it becomes very difficult to calculate the necessary article. But everyone is interested!
It is possible to get out all the same.
')
Due to the fact that any actions can affect the counter, and everything happens very quickly, attempts to calculate the article number are retrospectively doomed. You need to look at the counter in real time. When Wikipedia was not yet so well known, it was enough to go at the right time to a
list of new articles where the counter was located at that time, and have time to take a screenshot. But today, for example, the round value of the counter lasted less than two seconds.
The Wikimedia Foundation has
Tulserver - a set of servers to which the database of the Foundation’s projects are copied. Shell access to them can request any technically competent participant. Having gained access, Tulserver resources can be used for any actions that are useful for the foundation's projects.
The current value of the articles counter is stored in the database; there, of course, is information about new articles. Therefore, to fix the anniversary moment, it is enough to poll the database several times per second, follow the counter changes and log them. The log looks like this:
06: 05: 25.02 999397 Casisi, _Friedrich
06: 05: 25.51 999398 Kasiski, _Friedrich
06: 09: 02.67 999398 Krivolapov, _Grigory_Arhipovich
06: 09: 03.32 999399 Krivolapov, _Grigory_Arhipovich
06: 10: 16.17 999399 Light_industry_Russia
06: 10: 18.39 999400 Light_industry_Russia
Usually each article appears twice in it: for the first time the reading of the “Last created article” field changes, in the second - the value of the counter. Thus, for example,
Kasiski’s article
, _Friedrich, was 999398th.
Tonight there were problems in the anniversary area with access to Tulserver. The tracking script continued to work and register new articles, but the value of the counter was different! Understand why this is happening, quickly failed. Monitoring tools said that replication is carried out correctly and without delay. The difference in meter readings slowly floated around 100 articles. Therefore, the script had to urgently rewrite, so he took the data directly from the site. The instance working with the base remained running just in case.
MediaWiki has a
great API that allows you to pull out a lot of interesting data. To the API, you can formulate a request that simultaneously returns the value of the counter and the latest new pages:
ru.wikipedia.org/w/api.php?format=jsonfm&meta=siteinfo&action=query&siprop=statistics&list=recentchanges&rctype=newThe required data is in the fields
.query.statistics.articles
and
.query.recentchanges[0].title
. You need to do the same with this data - constantly poll them and log any changes. The asynchrony of the counter at the same time becomes noticeable in a smaller number of cases.
Since the HTTP request is longer than the request to the database, I just in case launched the same script from my personal server. At this I calmed down, lurked and waited.
Article created. Three logs in the million region look like this:
Tulserver data from a copy of the databasehttps://toolserver.org/~kalan/ruwiki-1m.txt 01: 36: 32.57 999878 Klavdievo
01: 36: 32.89 999879 Klavdievo
...
01: 41: 37.88 999908 fine
01: 41: 38.30 999909 fine
01: 41: 38.49 999909 Kruchinin, _Vladimir_Fyodorovich
01: 41: 38.93 999909 Kalyamin, _Vyacheslav_Ivanovich
01: 41: 39.09 999910 Kalyamin, _Vyacheslav_Ivanovich
01: 41: 40.69 999911 Kalyamin, _Vyacheslav_Ivanovich
01: 41: 40.75 999911 Life_Extension_Foundation
01: 41: 40.91 999912 Life_Extension_Foundation
01: 41: 41.95 999912 Fortigin, _Vitaliy_Sergeevich
01: 41: 42.11 999913 Fortigin, _Vitaliy_Sergeevich
01: 41: 43.07 999913 The Emperor _-_ power
01: 41: 43.29 999914 The Emperor _-_ power
01: 41: 43.35 999914 Chertova, _Nadezhda_Andreevna
01: 41: 43.97 999915 Glock_21
01: 41: 44.59 999916 Glock_21
01: 41: 44.65 999916 Volodya_Shishkin
01: 41: 44.69 999917 Volodya_Shishkin
...
01: 43: 17.60 999935 _ ()
01: 43: 17.69 999936 Bobrik_ (village)
API server datahttps://toolserver.org/~kalan/ruwiki-1m-2.txt 01: 36: 32.67 999966 Klavdievo
01: 36: 32.93 999967 Klavdievo
...
01: 41: 38.01 999997 Finely
01: 41: 38.67 999997 Kruchinin, Vladimir Fedorovich
01: 41: 39.12 999998 Vyacheslav Kalyamin, Vyacheslav Ivanovich
01: 41: 39.35 999999 Kalyamin, Vyacheslav Ivanovich
01: 41: 39.80 1000000 Life Extension Foundation
01: 41: 41.12 1000000 Fortigin, Vitaly Sergeevich
01: 41: 41.56 1000001 Fortigin, Vitaly Sergeevich
01: 41: 41.79 1000000 Fortigin, Vitaly Sergeevich
01: 41: 42.00 1000001 Fortigin, Vitaly Sergeevich
01: 41: 42.63 1000002 The emperor is power
01: 41: 43.09 1000003 Nadezhda Andreevna Chertova
01: 41: 43.32 1000004 Glock 21
01: 41: 44.22 1000004 Volodya Shishkin
...
01: 43: 17.01 1000023 Beaver (stanitsa)
01: 43: 17.22 1000024 Beaver (stanitsa)
My server, data from APIhttp://v.kalan.cc/ruwiki-1m-2.txt 01: 36: 32.72 999966 Klavdievo
01: 36: 32.96 999967 Klavdievo
...
01: 41: 37.95 999996 Fine
01: 41: 38.19 999997 Finely
01: 41: 38.68 999997 Kruchinin, Vladimir Fedorovich
01: 41: 38.92 999997 Vyacheslav Ivanovich Kalyamin
01: 41: 39.17 999999 Kalyamin, Vyacheslav Ivanovich
01: 41: 39.88 1000000 Life Extension Foundation
01: 41: 41.25 1000000 Fortigin, Vitaly Sergeevich
01: 41: 41.73 1000001 Fortigin, Vitaly Sergeevich
01: 41: 42.68 1000002 The Emperor is power
01: 41: 42.92 1000002 Nadezhda Andreevna Chertova
01: 41: 43.14 1000003 Nadezhda Andreevna Chertova
01: 41: 43.38 1000004 Glock 21
01: 41: 44.32 1000004 Volodya Shishkin
...
01: 43: 17.10 1000023 Beaver (stanitsa)
01: 43: 17.34 1000024 Beaver (stanitsa)
For all three logs, it is clear that the
Life Extension Foundation article took the line. According to the articles of
Klavdievo (999967) and
Bobrik (stanitsa) (1000024), it can be concluded that the difference between the readings of Tulserver counters and Wikipedia itself in the segment of interest to us was equal to 88. .
Fortunately, vandal articles this time swept past again.