
Wikipedia is a free, universal, multilingual, universal Internet encyclopedia created by the efforts of many users. Today, Wikipedia contains 25 million entries in 285 languages, nearly half a billion people access it every month. In terms of completeness and depth of coverage, Wikipedia is comparable to the famous British Encyclopedia. Thousands of volunteer editors from around the world constantly replenish it with fresh articles. Thanks to their unselfish labor, this gigantic storehouse of knowledge is being created and developed.
Wikipedia has become the world's most popular source of educational, historical and scientific knowledge and is among the top 10 most visited sites on the Internet. It attracts not only those who seek knowledge, or who want to share it disinterestedly, but also marketers and PR managers who are trying to use the site as an advertising platform, to place custom paid articles there. A company
called Wiki-PR was created, specializing in writing and posting articles and advertisements on Wikipedia. The placement price of one such article varied from $ 500 to $ 1,000. A monthly contribution of about $ 50-70 was paid separately so that the article or editing was not deleted, or vice versa, so that material that was undesirable for the customer was removed and no longer appeared on the Wikipedia pages. This moment deserves special attention.
Wikipedia is an open community, the first phrase that meets users when entering the site is: “Welcome to Wikipedia, the free encyclopedia that everyone can edit.” Thus, anyone can add an article to Wikipedia or make edits. But if they are advertising or biased, then they will definitely be noticed and deleted during editing. In order to avoid their removal, hundreds of additional accounts were created - soppuppets (English sock puppet - a doll from a stocking or a sock, worn on the hand, and entering into dialogue on its own behalf even with a puppeteer), who participated in the discussion of edits and created the visibility of their active support and approval.
')
Here it is necessary to make a small digression. Additional accounts created by one user are not prohibited in Wikipedia. It is recognized that there may be good reasons for creating such accounts, for example, for editing articles of various subjects, or for discussing controversial controversial topics. But to participate in the discussion of a particular topic simultaneously from several accounts, Wikipedia prohibits.
After the publication of the Daily Dot published
an article stating that the placement of custom-made materials in Wikipedia is not a single character, but passed into the category of business services, mass checks were made on the project. As a result of these checks, 250 additional user accounts were blocked, from which complimentary articles about products or companies were posted on the resource pages, and their interests were actively lobbied.

In her
blog, Sue Gardner, general manager of the Wikimedia Foundation, said that the actions of editors whose accounts were blocked violate the basic principles that make Wikipedia highly appreciated by many people. “Our readers know that Wikipedia is not perfect, but they also know that it serves only their interests and never tries to sell them or recommend any product in one form or another,” writes the executive director of the Wikimedia Foundation.
Gardner stressed that the investigation into the use of virtuals for editing articles has not yet been completed and the company intends to check the disinterest and independence of Wikipedia editors in the future.
One of the problems in identifying virtuals is that only certain site administrators have the right to use technical methods for comparing the IP of users, who resort to this only for good reasons. Therefore, the main method of identifying twins is the behavioral method: a comparison of edits and comments that suggest that they belong to the same person. This requires appropriate experience, such work takes a lot of time, but even in this case it can end in failure.
To help Wikipedia, researchers from the University of Alabama at Birmingham Raghib Hassan and Tamara Solorio created a program that can help identify sockpuppets - multiple accounts belonging to one person. The program is able to analyze text fragments that are added from different accounts, on the basis of which determines the likelihood that they belong to the same person. For comparison, grammar, punctuation, syntactic and some lexical features of the text are used.
The experiment showed that the accuracy of determining additional accounts of one person using this program is 70-75%, while further work on the program is expected to increase its effectiveness.
The program itself, as well as the tools that were used in its creation and testing, can be found on the project page:
docsig.cis.uab.edu/?page_id=68Compared to another similar program,
JStylo , which was presented at the 29C3 conference in Berlin, this project has the advantage of analyzing small text fragments, while JStylo requires 6.5 thousand of materials for each “suspect” words, and the length of the text, the authorship of which must be established, was not less than 500 words.
A program that can analyze and determine the authorship of short texts can be used not only to assist Wikipedia in identifying clones, but also to identify additional user accounts on forums, discuss news, post tweets, and other types of interaction on the Internet, where short comments are added. and text.