📜 ⬆️ ⬇️

Where the bots stop

The community of SEO's is famous for a variety of different opinions regarding the amount of text indexed by searches on a single web page.
The question is, how big should an optimized page be and where is the balance between a “too small” page, which the search engines find non-informative, and a “too big” page, on which potentially important content can be left without the attention of search engines?
As far as I know, no one has yet tried to answer this question with the help of his own experiment. SEO forum participants are usually limited to quoting recommendations published by search engines themselves.
Until now, the holy confidence of the SEO community in that leading search engines limit the amount of indexed text of the notorious hundred kilobytes makes their customers scratch their heads trying to understand what to do with text that goes beyond that.
Experiment
When I decided to set up an experiment to answer this question empirically, my goals were:


That's how the experiment went. I took 25 pages of various sizes (from 45kb to 4151kb), I entered unique, non-existent keywords on each page at intervals of 10kb (approximately every 10,000 characters). Keywords were generated automatically, especially for the experiment, and served as labels for the depth of indexing. Then the pages were published, and I went to make myself some coffee, because waiting for the search engines to come promises to be long (that's how much coffee should have been!).
Finally, I saw traces of the Big Three bots (Google, Yahoo, MSN) in the server logs. Server logs gave me the required information for a successful experiment.
It is important to note that I used special, experimental pages for this test. These pages are on the domain that I reserved for such experiments, and contain only text with keywords that I need for the test. If a person happens to get on these pages, filled with meaningless gibberish and keywords, his eyebrows will immediately crawl upward, but people are a completely unwanted audience here.
After I looked through the logs and made sure that the search engine bots looked in, the only thing left for me was to check the ranking in the output for each experimental page for each keyword I used. For this, I used Web CEO Ranking Checker. As you probably guessed, if search engines index only part of the page, then it will appear in the issue only for those keywords that were higher than the scanned limit.
results
This graph shows where the Big Three stopped issuing my test pages:
ranking
Now, when I have information about the amount of text on a page downloaded by search bots, I can calculate the length of the text on
page indexed by search engines. Believe me, the results are unexpected - to say the least. But it’s even more pleasant to share them with anyone interested in these burning search engine optimization questions.

yahoo
The second place belongs to the Great (in terms of search quality) and Dreadful (in its relation to SEO) Google. Googlebot can drag more than 600kb of information to its myriad servers. But at the same time, Google’s output contains only those pages where the keywords were located no more than 520kb from the beginning of the page. This is the exact page size, according to Google, which is the most informative and gives the maximum useful information to visitors, without forcing them to delve into the endless texts.
This graph shows how much information Google indexes on test pages.
google
Absolute Index Depth Champion - MSN. MSNbot can download up to 1.1MB of text from a single page. Most importantly, it indexes all this text and outputs it in the results. If the page size is larger than 1.1MB, then the content that is below the limit will remain non-indexed.
Here is how MSN works with test pages:
msn
MSN behaved in an amazing way during the first visit to the pages. If the page was less than 170kb, it was perfectly shown in the issue. Any other pages that exceed this barrier were not represented in the issue just in general, despite the fact that MSN download and completely.
It seemed that if the page size exceeds 170kb, then it actually has no chance to appear in the issue. However, after 4-5 weeks, large pages began to appear in the results, revealing the ability of the search engine to index large pages over time. It makes me think that the speed of MSN indexing depends on the page size. So, if you want some of the information on your site to be present in the MSN issue as soon as possible - place it on pages with a "weight" of less than 170kb.
The summary graph shows how much information the search engines download and how much is then stored in their indexes.
table
Thus, the experiment confirmed the fact that the leading search engines differ significantly in the amount of information on the page that they can actually scan. For Yahoo, the limit is 210kb, for Google 520kb, and for MSN - 1030kb. Pages of smaller size are indexed completely, but more are not indexed at all.
')
Going beyond

It turns out that this is bad, to place on your site texts that go beyond the index limit of search engines?
Of course not! If the text is more than the search engine can index, it will not harm your position in the issue. But most likely it will not help. If the information is important and useful to your visitors - do not hesitate and leave it on the page.
However, there is a widespread belief that search engines pay more attention to words located at the beginning and at the end of the page. In other words, if your page has the phrase “tennis ball” and it is located in the first and last paragraph of the page, then it will have significantly more weight to issue than the same phrase written twice somewhere in the middle of the text.
If you want to use this recommendation, but the size of your text goes beyond the indexing, the important point will be to remember that the last paragraph will not be the place where you finished writing, but the limit where the search engine finished indexing your page.
My translation of the article by Serge Bondar - Search Engine Indexing Limit: Where Do the Bots Stop .

Source: https://habr.com/ru/post/9385/


All Articles