I am the administrator of the bulletin board and apparently the efforts to maintain unique content (TBD is a topic of a separate tip) are not in vain because I noticed a bad tendency to rob ads through an RSS feed, analysis of the site’s html pages, i.e. content gray personalities like.
As SEO-shniki know, such duplication of content can negatively affect the ranking of a site because someone else’s site may be re-indexed first and therefore will be the only one revalant in the search results.
The first and effective method of struggle is to ping the IP site that steals the content and block it, which helped only the first time. But sometimes I just didn’t have enough time to identify thieves, and some sites began to add the use of proxy servers.
')
As a real programmer, I began to look for an automated solution and quickly found it. It is enough to remember that one of the most important differences between a web server and a user's computer on the Internet is that the server has open ports served by HTTP / SMTP / POP servers. Those. if you do not allow copying of content from IP with open ports, it will be a good obstacle.
To begin to identify gray personalities and their sites turned out to be enough of this PHP code:
//$ip – $_SERVER["REMOTE_ADDR"] $_SERVER["HTTP_X_FORWARDED_FOR"]
$fp = @fsockopen($ip, 80, $errno, $errstr, 1);
if( $fp !== FALSE )
{
// , IP HTTP- ..
}
It should be noted that the HTTP_X_FORWARDED_FOR field is set if the user goes through a normal proxy. But this field can be fake, so you need to check the IP specified in HTTP_X_FORWARDED_FOR, and in REMOTE_ADDR.
Since the session board is set for each visitor on the notice board, the check was performed only on the 2nd hit for each unique IP.
Trial launch of the script for 1 day showed high efficiency - 4 sites-robbers were identified and neutralized. Captured IP with open 80th ports - more than 2000! Therefore, for the analysis of IP, a WHOIS service was used courtesy of nic.ru and another script. Well, well, they don't have protection from automatic scanners and they are good;)
Now everyone who is on my ban list receives ad text like “This ad was illegally copied from
xxxx , which is a violation of copyright and related rights. Please do not use sites stealing content because such sites may be used to distribute hidden malware, and may also collect sensitive information about you. ”;)
However, you should not just block all IPs that have ports open to the outside. It turns out that there are home networks, where the 80th port opens proxy service statistics (yes, to each guest!), Where the list of favorite sites of this audience with the number of hits / MB and who dragged them ... is listed in detail
Additional Information:
- For the analysis of walking through a proxy server, the ports should be analyzed - 8080, 1080, 3126.
- Search for duplicates (I have a premium, personal opinion - the service is very often wrong) - http://www.copyscape.com/
I ask your ideas and comments in thread!