📜 ⬆️ ⬇️

Yandex supported Wikidata

Today at the SemTechBiz conference in San Francisco, it was announced that the Wikidata project received a grant in the amount of 150,000 euros from Yandex.

Wikidata is a Wikimedia Foundation project, a jointly edited knowledge base for centralized storage of structured data.

Yandex supported Wikidata
')
Especially for our techno-blog on Habré, we asked Denny Vrandečić, one of the founders of this project, about what Wikidata is in detail, how it differs from other similar projects and what benefits it can bring to the infrastructure of the future Internet and all its users.

What is Wikidata? What are the goals of this project? Why was Wikidata the first Wikimedia Foundation project since 2006?

Wikidata is a new Wikimedia Foundation project. The main task of the latter is to provide every person on the planet with free access to all possible knowledge. Our most famous project is Wikipedia, an open encyclopedia, available in more than 200 languages.


Versions in some of these languages ​​(for example, in Russian or English) are supported by very active communities. But for many others it is impossible to provide the same level of completeness and relevance. Also, it turns out that the encyclopedia in those languages ​​for which there are not enough editors is easier to spoil: there are not enough of those who would rule everything and besides would not allow information to become obsolete.

Wikidata is designed to partially fix this. We make an open multilingual database of structured data with information that can be used in Wikipedia and other projects - including external to Wikimedia. Our data can be used freely - the license allows almost any use. Everyone can make changes to the project data, which is now available in more than 300 languages.

In general, Wikimedia launched this project to improve the quality of the language versions of Wikipedia and allow editors to spend their time more efficiently.

How is Wikidata different from other similar projects - Freebase, DBpedia? Why make another machine-readable database of structured information?

DBpedia is committed to collecting data from Wikipedia, i.e. does almost the opposite of what Wikidata does. In addition, it follows that in DBpedia no data can be edited directly.

Freebase is a very similar project to Wikidata, and I allow possible interaction in the future. Starting from checking the consistency of our data to the exchange of them within the limits that our licenses allow. Let's see what happens. The main difference between Freebase and Wikidata is that for the latter, multilingualism and the availability of sources are much more important - and in fact both are in Freebase, but it’s not very easy to understand in their interface. The second obvious difference is that Google makes Freebase, and Wikidata is a non-profit organization. This, we hope, slightly reduces the risks of using data from it.

Do you plan on integrating with existing data warehouses?

We are already integrating with more and more external bases, mainly through identifier connections. Hundreds of thousands of pieces of information from Wikidata are already associated with VIAF, GND, MusicBrainz, IMDB and many other catalogs and databases. We believe that this may be one of the biggest contributions that Wikidata will make to the future of the web infrastructure, the creation of a network of knowledge and the connection of entities on the Internet.

What does Wikidata have to do with Wikipedia and how does it interact with its language sections?

Wikidata provides data that can be used in regional Wikipedia. Our first step was to organize access to links to versions of the article in different languages ​​that were previously kept decentralized - in each article separately. Now, Wikidata has a single central location for such links, and this made it possible to remove a lot of meaningless repetitive information from the language versions of Wikipedia.

The second step (but also the initial one) is to provide Wikipedia with another form of structured data. For example, identifiers from IMDB, which in some of the language versions of Wikipedia are already taken and displayed from Wikidata. We hope that this practice will gradually grow and become more and more useful for Wikipedia, although this process cannot be fast - first, Wikidata should earn the trust of Wikipedians. And they, in turn, must learn to properly use new opportunities. Communities overlap widely and will help a lot, but how exactly they can start using Wikidata will be the most important and interesting question for us in the future.

Who do you see as a Wikidata user? Are there any examples of success?

We now have over 8,000 active editors on Wikidata. This means that by the number of editors, Wikidata would be in the top ten of the most popular Wikipedia. And since Wikipedia is our main field of application, we are very glad that we are already so helpful. So this is our main example and indicator of success.

There are some other great examples of using Wikidata. For example, Wiri is a system that can take questions in a natural language (in this case, English) and answer them, Geneology Visualizer and Wikipedia's alternative interface for browsing - “ Tree of life ”. Some research projects are already using Wikidata data. For example, in the gender analysis of Wikipedia and to study the completeness of different languages . Such things with Wikidata become much easier to explore.

I think this is very good for a project that appeared just a few months ago. And as new features become available — data types for time, coordinates, numbers, or an interface for queries — we hope to further increase our usefulness. We know that several companies already maintain their internal copies of Wikidata. I hope that they bring some benefit. :)

Photos from the SemTechBiz conference

You often speak at conferences and in different universities. How did the active community respond to Wikidata?

They were just happy. Almost everyone who has ever dealt with links to articles in other languages, rejoiced at the emergence of Wikidata. And many people are very curious about where the infobox data will lead us as a community. Practically every Wikipedist with whom I spoke mentioned that they were very much looking forward to the appearance of such a project and even thought of doing it themselves. So they are very happy to see that he has finally appeared. Wikidata did not appear overnight. The idea of ​​such a project has been discussed since the first Wikimania conference in 2005 and even earlier. So, like many, I am happy to see it realized.

Naturally, such a heterogeneous, intellectual and having a critical view of things community, as in Wikipedia, may not have a common opinion. And a sufficient number of participants are experiencing due to problems that may arise with Wikidata. And their desire to wait, see how it works, make sure that the project is useful, and only then use it is understandable.
Volunteering is one of the basic principles of Wikidata. This is an offer. Any community can decide whether to accept it or not. And they down to the smallest details can choose what to use and what not.

At least, until today, I was very pleased with the way the community responded, and I hope that its members will continue to constructively communicate with us, show enthusiasm or thoughtfully criticize us.

Tell us a little about the team. How much time did it take to develop the first version?

We started with a team of 12 people who worked full time - we wanted to start quickly. The first year of work, full of ambitious goals, was clearly planned out. Our task was to show that we really cope with the large and complex problems that arose in the work on the project. Everything was going fine, and the release took place somewhere in six months. During this time, we began to add more and more features. After 10 months, the first Wikipedians began to use our data, and the Wikidata data themselves began to be enriched.

It also required some time from us: to work out the development and deployment cycles and learn how to communicate effectively with the main office in San Francisco. The Wikidata team is located in Berlin - the German branch of Wikimedia plays a leading role in the development - and this is the first time we are working on a project of this magnitude without the direct participation of the Wikimedia Foundation. There were a large number of things that could not be started without a settlement.

At the end of the first year of development, we reduced its pace, and the team reduced accordingly. Currently 10 people are working on Wikidata, and not all of them are full-time workers. Much more needs to be done, but no longer in an emergency mode: we must be careful, give the community a break and develop with us further. We continue to add many new features and work on our technical debt.

The first version was launched about a year ago, and the second - just recently. Can you share some statistics? How many objects have already been added? Does it happen automatically, semi-automatically or completely manually?

Now we have in the system described more than 13M objects. The numbers are absolutely amazing: support for statements was added only in February, and now - at the end of May - we have crossed the figure in 10M statements. This is very good compared to our expectations: when we needed to calculate the number of objects that we should have by the end of the first year, we agreed on 100,000.

The work is very much oblique towards semi-automatic editing. About 85-90% of all edits are made by three or four dozen robots. But because of the incredibly strong growth in the number of Wikidata edits — they are even ahead of those made in English-speaking Wikipedia — in reality, we have a large number of manual changes. Currently, about one million edits per month are made by more than 8,000 people. Also, changes made by robots are very limited and tightly regulated by their creators. But this is exactly what we expected and hoped for - an environment in which robots and humans can work together more effectively than in ordinary wikis.

What future do you see for Wikidata? What are your short-term and long-term goals? How do you decide what to do first? Who can participate in making this decision?

In the short term, we still lack several important features: support for data types for time, coordinates, several numbers, text and URLs, as well as several basic features — for example, the ability to sort and rank content. In addition, we are constantly working to support more types of export for our data, as well as the ability to set requests to Wikidata. Also this year, a Visual Editor will appear on Wikipedia. We are planning how to integrate into its interface in order to make the interaction between information in Wikipedia and Wikidata as convenient as possible. We are also working to support not only Wikipedia, but also other Wikimedia projects in the near future. In addition, we want to make it so that our software can be used for other work scenarios.

If we talk about long-term development plans for Wikidata, the key question for us is whether we can become what we hope for — the core repository of ID entities for the Web. We see a future in which all entities are identified using Wikidata. Applications may use data from Wikidata, but they may not, but we seriously hope that identifiers will become an important part of the Web in 2015. If Wikidata succeeds in doing this, I will assume that we have laid an important stone in the foundation of a more intelligent Web, where data communication between heterogeneous sources will be easier, and it will be useful to each user more than we even can now imagine.

Well, for now our tasks are more modest: to support Wikipedia, improving its quality and reducing the complexity of its operation. And, thus, to support the encyclopedia in its super task to bring knowledge to all people of the world.

Source: https://habr.com/ru/post/182360/


All Articles