Page Rank in the Web 2.0 era - Part 1

Elections are held in order to find out whose election forecast turned out to be more accurate. (c) Robert Orben

To evaluate Google’s contribution to search engine development, you need to move about 20 years ago. In those troubled times, the amount of information on the Internet was hundreds of times less than now, but the search for the necessary information was much more difficult. The user could spend a long time on the search engine site, trying to formulate a search engine query in different ways and still not get the desired result. There were even agencies that offered their services to search the Internet for money. At the dawn of search engines, the importance of a page was determined by many subjective factors, such as html markup, the number of terms, headings, and the boldness of the font on the page. Not infrequently, a specially created page or a copy of the original page, filled with the necessary headings and terms, turned out to be in top. At the same time, from the point of view of a person, she had absolutely no sense at all, but had a very high search engine ranking.

In 1997, two students at Stanford University proposed the famous Page Rank algorithm. In fact, this is the rare case when engineers jumped out of a perennial swamp and found a simple elegant solution that in one simple step closed a pile of problems and predetermined the outcome of the battle between CEO specialists and search engines for many years to come. The essence of the Page Rank - is "democracy" in the world of the Web. Each page on the site that contains a link to another site “votes” for it. In this way, the most frequently cited, authoritative sites of the primary sources rise to the top. Page Rank helps to raise the most popular sites in the top, which, like air bubbles in the water, emerge based on the "opinions" of a large number of less popular sites. This scheme worked well in the ecosystem of the early 2000s, where small Internet sites dominated, filled with webmasters and content managers. With the advent of Web 2.0, Internet users themselves have become the main source of information on the Internet, which has modified the Internet. First, the huge flow of information from users led to the emergence of giant sites with millions and sometimes tens and hundreds of millions of pages. Secondly, the sites began to contain a large amount of unstructured and unadapted information for search engines, a large number of local memes and syntax errors. Once a topic has been created, say on a forum or blog under one heading, it can easily move to another area for discussion. When searching on such sites, the main problem is no longer to determine the authority of the site, but to correctly rank the pages within the site itself, because now hundreds and thousands of pages can be searched. Of course, in such cases, the Page Rank does not work and many search engines use techniques from the "pre-Google" era, such as analyzing headings, tags, and so on.

In the next part, I will tell you whether it is possible to get around this problem using machine learning, how to make the machine rank pages within the site itself given its unique terminology using the example of searching this site.

Source: https://habr.com/ru/post/429902/

All Articles

Page Rank in the Web 2.0 era - Part 1

More articles: