📜 ⬆️ ⬇️

Google wants to measure the importance of sites on the facts, not links

The Google research team published an article entitled Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources on arXiv.org, which addresses the issue of calculating a special Knowledge-Based Trust (KBT) reputation profile for a specific web page. It is planned that KBT should be the basis for the future Google search engine algorithm, which builds sites in accordance with their “reliability”.

It is known that the PageRank link ranking algorithm determines the importance of a web page as the number of links leading to it. The real Google search takes into account many more factors, such as the presence of certain words on the pages of sites, the relevance of information, the user's location, adaptability to mobile devices - there are about 200 such factors. It is believed that the update of the search algorithm in September 2013, known as "Hummingbird" (Hummingbird), Google taught to respond not only to keywords, but also to the contexts and images that accompany them. Last year's update of the Dove algorithm (Pigeon) led to more relevant search results with geographically dependent information.

A new approach to ranking sites considers the importance of a web page as a numerical characteristic of the reliability of facts. As before, the search robot scans the site, extracts “assertions” from it, the accuracy of which is compared with the Knowledge Vault knowledge base. This knowledge base, owned by Google, now contains approximately 1.6 billion facts automatically collected from the Internet. Its main difference from the more well-known Knowledge Graph is its “omnivorousness”. If Knowledge Graph uses obviously reliable Wikipedia and Freebase as a source of information, then Vault does not shrink from anything and collects information from absolutely all sites from which at least something can be extracted. Based on the number of coincidences of “extracted” facts with those stored in Google Vault, the reliability of the resource is determined.

On the test data, the probabilistic model proposed by the authors of the work showed satisfactory results. Then, KBT figures were automatically calculated for 119 million real web pages. Further verification in the manual mode showed that the real data is quite amenable to the new ranking system. As soon as the results of the study will affect the existing search algorithm Google is still unknown.

')

Source: https://habr.com/ru/post/377011/


All Articles