📜 ⬆️ ⬇️

Robots exclusion profile

Very often on the page that is worth indexing, there is information that is not intended for indexing.

image

This is the fourth result for the query "there and here" on Habré.

And do not think that this only applies to navigation, which is repeated on each page. Probably, almost no one wants the news streams of other sites, advertising, very dynamic content to be indexed (“they are on the site now ...”). Someone would turn off the indexation of comments, and someone wants to hide the content of their posts to search engines and leave only the headlines.
')
In principle, there will be no such problem in the semantic web; but each of us has a chance not to live to those light times.

It turns out that the solution has long existed: microformat Robot Exclusion Profile .

Here’s how it should look:
< head profile =” http: // example . org / xmdp / robots-profile #” >
...
< div class =” robots-noindex ” > There once was a man from Nantucket… </ div >
< p > This page is not about < span class =” robots-noindex ” > pornography </ span > . </ p >

* This source code was highlighted with Source Code Highlighter .


Only one thing darkens this tale: as far as I know, at the moment the microformat is not finally adopted and is not supported by search engines.

Someone who happens to be on Google Developer Day or Yandex Subbotnik, ask the developers if they want to include at least the draft in their search engine algorithms. :)

PS If now it is possible to exclude part of the page from the index, then please tell us about it.

UPD: About <noindex> I know. But it violates the standard and is not perceived by Google.

Source: https://habr.com/ru/post/70077/


All Articles