📜 ⬆️ ⬇️

SeoPult Max - by leader pattern

- Spadeville, are you a man or more than a man? Asked Maskull.
- He who is no more than a man is nothing.
- Where are you from?
- From musings, Maskall. No other mother can give birth to the truth. I pondered and rejected, and pondered again.

D. Lindsay, "Journey to Arcturus"


What is the main question of modern SEO? "How to bring the site to the top for certain requests"? Nothing like this. In fact, the most important thing is the cost of the target visitor, and the question is: “How to attract maximum target traffic for the smallest possible money?” And although the proportion of content in search engine spending is constantly growing, the main budget item is links. Buy them in the right quantities can only be on the stock exchanges. But when it comes to concrete actions, it is possible and necessary to fall into a stupor. How many links to take with TCI 10, 20, 500, etc. to promote a specific request - for example, "lithium batteries" - in a certain period of time? But links and donor pages have a lot of important characteristics, not only one TCI (for example, the system described below works with 184 criteria). And for each you need to somehow get the optimal distribution and, ultimately, to create the appropriate reference mass according to him from the poorly differentiated pile of donors offered on the stock exchanges.
')
Purchase of links to the eye should remain in the long past of the first decade of the XXI century. Yes, by typing links to decent donors, you can display a pre-optimized site in the TOP, it is not God knows what the problem is. However, with this approach, you can spend almost 5 times more money than the minimum possible, and with a fairly high level of competition, this automatically knocks you out of the battle for the first ten issuances, if not from the business as a whole. Fortunately, nature does not tolerate emptiness - if among the legions of legendary school students-seoshnik there is a niche for the scientific approach and mathematics, it will be filled.

Read more
A new approach to empiricism in SEO

About two years ago, we decided to develop a methodology that takes an experimental approach to the effectiveness of promotion to a new level. If the basic SeoPult algorithms work on the basis of filter cascades manually adjusted by our analysts according to the averaged results of experiments on a large but still limited sample of acceptor-donor complexes, then it was decided to make a new system: a) automatic and based on data collection and processing in real time; b) working with specific requests, not with their groups; c) analyzing many more factors than is available to a living specialist; d) able to highlight the success factors of current issuing leaders; e) able to apply these factors for the procurement of a reference mass that is as close as possible to the patterns of these factors to the reference masses of the competing leaders for specific search queries.

The AlterTrader Research specialists (founded in 2004 as a research laboratory, the priority activity of which was the development of scientific methods for analyzing financial markets and systems trading technologies) were involved in developing the mathematical principles of the SeoPult Max technology. The company currently has three departments: trading, financial mathematics and exploratory research). The result was two articles and methods to which we refer mathematics lovers: “Algorithm for selecting the most effective set of donors for website promotion in search engines” and “Implementing and analyzing the effectiveness of the method for constructing an optimal set of donors for website promotion in search engines” . Below we present a version of the popular article by Ilya Zyabreva, Oleg Pozharkov and Irina Pozharkova adapted for Habrahabr.

By the way, the algorithm is very demanding - both to incoming data received by various and rather nontrivial parsing, and to processing. Even for the initial launch, a mini-cluster of servers was required, and now, due to the high demand for SeoPult Max and the enormous amount of computation, the prefix "mini" is hardly appropriate.

Frequency distributions

Consider an example - an acceptor, which is affixed with 100 links from donor pages. Let the donor TCI be from 0 to 950. We divide the 0 ... 1000 segment into 10 equal parts (0-90, 100-190, etc.) and calculate how many donor TCI values ​​fall into each of them.

TCIFrequency
0-9050
100-19028
200-2907
300-390four
400-4903
500-5902
600-6902
700-7902
800-890one
900-990one
Total100

The acceptor from the example is 50 donors, in which the TIC falls in the range from 0 to 90, 28 - from TIC falls from 100 to 190, etc. The frequencies are shown as absolute, but it is often more convenient to use relative frequencies, which in this case show the proportion of donors whose TIC falls into the corresponding interval. To go to relative frequencies, it is necessary to divide the absolute frequencies by the total number of donors, i.e. in this case 100.

TCIRelative frequency
0-900.50
100-1900.28
200-2900.07
300-3900.04
400-4900.03
500-5900.02
600-6900.02
700-7900.02
800-8900.01
900-9900.01
Totalone



For each factor (we use 184 factors that were identified on the basis of previously conducted experiments that reveal the influence of donor qualities on the change in the positions of test acceptors), we can construct a distribution. To build one distribution, it is required to single out a factor, take one acceptor and all its donors. We obtain 184 distributions for the analyzed acceptor, which form the frequency pattern , a very specific and distinct characteristic of the reference mass (that is, the aggregate of donors).

The algorithm implemented in SeoPult Max builds frequency patterns for the TOP-50 of issuing Yandex on a given request. In this case, 50 x 184 = 9200 distributions of factors are compiled for obtaining 50 frequency patterns. On the basis of these distributions, an “ideal” frequency pattern of the leader of the issue is built on a specific search query, which also consists of 184 distributions. For each of the factors, the “ideal” frequency pattern has properties that are clearly expressed in TOP-10 (with gain to TOP-3) and, as a rule, less pronounced outside the top.

An example of working with individual factors

For example, we show the calculation principle for three factors:
TRaslm - donor relevance to a given search query. On the donor page in the anchor, the links to the acceptor may be the words of our request. Also, these words may be in the text of the donor page, or on it are similar links to other sites. From the point of view of the search engine, the relevance of the donor to the request will be clearly greater than zero. The factor TRaslm is calculated based on the formula of relevance aSLM:
"
where t is the lemma of the words of donor D,
Q is the set of lemmas of the query words,
aSLM is an approximated spectral language model.

In other words, TRaslm is the sum of the aSLM values ​​of the donor lemmas that are found in the query. The lemma in this case is the canonical form of the word.

TRCross is an index of mutual relevance of the donor and acceptor, which can also be called “pseudo-mathematical”. It is considered for the donor-acceptor ligament. The texts of the donor and the acceptor may intersect over some set of words. For each word from the text, the value of the relevance of the document to the query “word” is calculated. The sum of such word crossover relevances in the donor and acceptor texts, divided by the sum of all word relevances from the texts, will give the value TRCross. The formula for the factor looks like this:

where td - lemmas of the words of donor D,
ta - lemmas of the words of an acceptor A.

HostDist - “distance” between hosts. It is considered for the donor-acceptor bundle based on their IP addresses. This distance is used by “Yandex” to calculate the HostRank factor. Formula factor:

where n is an integer from 0 to 31, the number of the most significant bit, which distinguishes the IP addresses of the hosts of donor D and acceptor A. If the IP addresses are the same, then HostDist (A, D) = 0.

We calculate these factors for a specific query "website promotion". At the time of writing this article, the AlterTrader Research team of the TOP-10 issue of Yandex looked like this (we took the first 10 sites on request, without taking into account the “spectral” impurity).




The distributions of the considered leader acceptors for each of the factors can be summarized in one graph.


1. Distributions by TRaslm factor


2. Distributions by TRCross factor


3. Distributions by HostDist factor


The graphs built for each site from the TOP-10 have a similar appearance, on the basis of which a pattern can be distinguished for each of the factors and a “perfect” frequency pattern can be constructed.


Example: “Perfect” frequency pattern for TRaslm factor

If we build similar averaged patterns for each factor for sites that are outside the TOP-10, we get significant deviations from the ideal leader patterns, and the farther from the TOP, the deviations will be stronger. The algorithm implemented in SeoPult Max is able to reduce the deviation to a minimum by changing the acceptor's reference mass so that its pattern matches the patterns of the current Yandex issuing leaders for a particular search query.

Currently, all SeoPult clients can connect to their projects the SeoPult Max algorithm, which carries out such analysis on 184 factors, while the correspondence of the distributions to the current pattern is constantly monitored. The restructuring of a multitude of donors takes place within the framework of a given budget and the principle of minimal changes to an already existing multitude.

Conclusion

“Reflection must proceed in coolness and peace, and our poor hearts pound too hard, our brains are too hot for this,” wrote Melville in Moby Dick. A handy, “too human” approach to promotion in 2013 will not give optimal results - only complex mathematical methods for processing large amounts of information (by the way, getting it is not so easy either) allow us to develop a guaranteed optimal sequence of actions.

Source: https://habr.com/ru/post/166453/


All Articles