Before each blogger
(advanced, yes), sooner or later the question arises: “Why would I write something in robots.txt, so that everything was in chocolate?”
Quite naturally, this question arose in front of me, but I wanted to write competently and usefully. It was useful to google and all that I found were clumsy examples of robots.txt, pulled from the
official site , which some authors gave out for their own crafts, dictated by a rare web-building muse.
I think it is not necessary to say that such examples were not very suitable for our realities with you
(read PS Yandex - author's note) .
')
Therefore, collecting together all the information found in the network, as well as his own thoughts and understanding of the "how it should be" wrote the following version.
What do we have?
First of all, what is important is different designs for Google (and the others) and for Yandex.
Due to the following: For Google, the canonical meta tag
(in the template manually, or with the help of numerous seo-plug-ins) is written in duplicate, which should solve the problem of duplicate content, but Yandex does not understand it yet, there are other things ...
Secondly , Yandex has a registered Host - which in any case will not hurt.
Thirdly, the task of resolving as many pages for the sap as possible did not stand, therefore everything superfluous is closed.
Fourth , more or less accepted CNC settings and references are used. If your CNC hierarchy and links are different
(for example, they are changed by some plug-in) , adjust them based on your settings.
The main errors seen by me:- Often, only the Host directive is prescribed for Yandex, leaving Dissalow empty, but this design gives Yandex the right to index everything again, despite the prohibitions in the first section, which, however, is logical.
- closing categories do not close archives by date and archive of the author.
- do not close system addresses
(trackbacks, entry and registration)I did the rest as I could in comments that can be safely removed if you figure it out.
I do not think that it is universal and perfect, but I think it will serve a good starting point for many. robots.txt:
User-agent: *
Disallow: /cgi-bin
#
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
#
Disallow: /wp-login.php
Disallow: /wp-register.php
# , rss-
Disallow: /trackback
Disallow: /feed
Disallow: /rss
Disallow: */trackback
Disallow: */feed
Disallow: */rss
Disallow: /xmlrpc.php
#
Disallow: /author*
#
Disallow: */comments
Disallow: */comment-page*
# ""
Disallow: /*?*
Disallow: /*?
# , wp-content
Allow: /wp-content/uploads
User-agent: Yandex
Disallow: /cgi-bin
#
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
#
Disallow: /category*
# .
Disallow: /2008*
Disallow: /2009*
#
Disallow: /author*
#
Disallow: /wp-login.php
Disallow: /wp-register.php
# , rss-
Disallow: /trackback
Disallow: /feed
Disallow: /rss
Disallow: */trackback
Disallow: */feed
Disallow: */rss
Disallow: /xmlrpc.php
#
Disallow: */comments
Disallow: */comment-page*
# ""
Disallow: /*?*
Disallow: /*?
# , wp-content
Allow: /wp-content/uploads
# Host
Host: mysite.ru
User-agent: Googlebot-Image
Disallow:
Allow: /*
#
User-agent: YandexBlog
Disallow:
Allow: /*
# rss-
Ps. I use this file on my blogs, I checked the validity and correctness in the panel of the webmaster, achieving the result I needed. Therefore, if something does not suit you, check and add yours.
Pps. I am not a seasoned SEO yet, so I could be mistaken somewhere. With robots.txt, the one who does not have such a file is not mistaken)