📜 ⬆️ ⬇️

Search Algorithms, Reverse Index - Part 1

image
From this article I begin a series of articles on SEO, which will be the theory, practice and tips. Let's start naturally with the basics. The material briefly describes the algorithms by which modern search engines search, how indexing proceeds, what mathematical models are used when searching for documents.


What do you learn?


Algorithms search. What is indexing, inverted index. Mathematical models used by modern search engines.

Search algorithms


  1. Direct search - sequential enumeration of all data;
  2. Inverted Indexes - a list of words ( index file ) documented in alphabetical order with an indication of the position and other parameters of the word entry of the document.

Reverse index


As you probably guessed by the search engines, the inverted index algorithm is used, since the use of direct search is much more resource-intensive. Recovery from the reverse index will occur with losses (cases, hyphens, commas, etc. ). Therefore, a direct document index is also stored for displaying a snippet (a fragment of the found text of the document displayed in the search).

Document

There was a pop,
The conical forehead.
I went to the bazaar pop
See some product.

Reverse Document Index

bazaar (3,4)
was (1,2)
lived (1,1)
which (1,1)
coy (4,2)
forehead (2,1)
Pop (1.3) (3.2)

The parameters are the most primitive and for example only - the line position in the line. The parameters are also stored cases of words, and belonging to the passage.
')

Mathematical model


When searching using 3 types of mathematical models, here they are:
  1. Boolean (logical) - there is a word - found, no - not found;
  2. Vector (used by all PS) - word weight = TF * IDF;
    TF - word frequency in the document
    IDF - the rarity of the word in the collection (corpus of words)
  3. Probabilistic - selection of the issue in manual (with the help of assessors) - self-determination of the relevance of pages.

the main thing


Relevance - the degree of attitude. Promote only relevant documents.

How search engines work Segalovich I.V.

P.S. To be continued…

Source: https://habr.com/ru/post/53987/


All Articles