⬆️ ⬇️

Close the site mirror from indexing correctly

Hello% habrauser%.

Today I will show how not to close the site mirror from indexing, and how to close it correctly.



Prehistory


I work as a webmaster in one large company for our city.

We have made a client site for advertising and selling our products.

The client chose a domain name in the .RU zone, and the site worked successfully for about a year. The site is currently maintained and edited as necessary with us.

After some time, the client wanted a second domain for the site, but in the zone . .

He independently found “professional seo-specialists of promoters”, whose website was in the first place in issuing Yandex according to certain requests.

With them, he signed a contract, and monthly shipped a considerable amount of money in this seo-desk.

After a couple of weeks, SEOs finally woke up, contacted us and got ftp access to the site, as well as the password from the admin site.



Began "optimization"


It all started with the fact that they did not understand the PHPShop site engine.

They sent us a letter, we told them where it was being edited, in which file the tags they needed were found, and also in the appendix they gave some links to official documentation.



The site itself is designed so that the skeleton of the layout of the main page is in the index.tpl file, and the html skeleton of the remaining inner pages is in the shop.tpl file. The content itself is in the database and edited through the admin panel using a visual editor, or in the form of source html.

')

Seoshniki adjusted the layout as they needed, but on the main page they wanted to stick their copyright advertising.

In the contract, by the way, it is indicated that it is impossible to remove this indexed link to their website .



Grief-optimizers did not understand the elementary admin panel, stuffed the site with their scripts with crutches that interfered with the normal operation of the framework and went against the common sense. The most innocuous - they made a static version of the main one, and put this static index.html next to index.php.



Further, through their scripts, they made the redirection from index.php to index.html

What for? Closing your eyes to the fact that everything is edited through the admin area, it is possible through htaccess and mod_rewrite to make such redirection at the web server level by writing a few lines and not to connect mod_php to output 301 header ...



It turned out that the surprise left by SEOs a month later, when the client wanted to change to the main information, and turned to us.



I do everything by TZ, save, and slowly begin to go crazy.

Climbed into the database - new information. It is useful to pick the engine, and the debug shows that the information is given correctly. I go to ftp, and I find 4 unfamiliar scripts in the root of the site, in the name of which there is the name of the seo-company.



About an hour poked around in their code. By the way, the php code of their scripts was awarded an honorable place on the govnokod.ru resource .



I correct the jamb after them, transfer the work done by them from the static index.html to the engine database.



"We do not mind surprises"


After a couple of weeks, it turns out a new joint. It is the one about which this thread.



The client calls and maliciously says that we again did something on the site, and nothing works. We are starting to understand the problem again in perplexity.



We go to the main page of the mirror in the zone of the Russian Federation , and see the error 404.

Again, we begin to be a little surprised at the ability to add people to work on level ground, and to ask the question “how?”.

We go to the main domain in the zone. Ru (they promote it) - everything is fine.

Again I climb to poking around in the code of the mountain optimizers.



And ... I find the following code:

Domain names are not specifically disclosed, as well as keep silent what kind of seo company is so perverted



if($_SERVER['HTTP_HOST']!='domain.ru'&&$_SERVER['HTTP_HOST']!='www.domain.ru') { $page=file_get_contents_curl('http://domain.ru/hjgjgjhgjh'); $page=iconv('utf-8','windows-1251',$page); header('HTTP/1.1 404 Not Found',true,404); $page=str_replace('<head>','<head><base href="http://domain.ru/">',$page); echo $page; echo 'not found'; exit(); } 




facepalm.jpg



As we can see, they didn’t want a mirror to fall into the Yandex index, and made a brutal crutch.



I pull down their Hindu code, and do the following:



Create a robots.php file, and write the following there:

 <?php header("Content-type: text/plain"); //  ,         if(strpos($_SERVER['HTTP_HOST'], 'domain.ru') === false) { //      , : echo "User-agent: *\nDisallow: /\nHost: domain.ru"; //    ,        } else { //    domain.ru,       ?> User-agent: Yandex Disallow: /gbook/ID Disallow: /search/ Disallow: /highslide/ Disallow: /java/ Disallow: /license/ Disallow: /pageHTML/ Disallow: /tagcloud/ Disallow: /data/ Disallow: /capcha/ Disallow: /pages/ Host: domain.ru User-Agent: Slurp Disallow: / User-agent: * Disallow: /gbook/ID Disallow: */*.swf Disallow: /search/ Disallow: /highslide/ Disallow: /java/ Disallow: /license/ Disallow: /pageHTML/ Disallow: /tagcloud/ Disallow: /webstat/ Disallow: /data/ Disallow: /capcha/ Disallow: /pages/ Sitemap: http://domain.ru/sitemap.xml <?php } ?> 




The floor of the case is done, now we are deleting robots.txt , and we add such lines to the .htaccess file (mandatory condition apache + mod_rewrite):

 RewriteEngine On RewriteRule ^robots.txt$ robots.php 




UPD: the converted rule for nginx I personally did not check, but it should work.

If you write something in the comments - I will correct

 location = /robots.txt { rewrite ^(.*)$ /robots.php; } 




Everything! When robots request robots.txt, the server gives them a robots.php file



PS I hope that if someone uses these curves to close the mirror, he will add it to his “piggy bank” of scripts for optimization.



PPS It so happened that at the present time there are very few professional seo companies, and it turns out to be an eternal confrontation between “SEO developers”, some of them make a website, others break its functionality due to hand curves and misunderstanding.



PPPS If anyone is interested, I can still point a link to that seo-desk



Thanks for attention!



UPD: Not for the purpose of anti-advertising, but as a precautionary measure: demis.ru - seoshniki-heroes of this thread.



UPD: Another case. Got to do something, went layout. Already tired at their own expense and time to correct their shoals

Source: https://habr.com/ru/post/135921/



All Articles