Very often, we celebrate a surge in site traffic by analyzing data collected by Google Analytics. This is perceived as an interest in the resource. And, of course, such an increase in attendance can not but rejoice.
But this is not always a reason for joy. Later we discover that most of this referral traffic was sent from spammers. That spam has become a big problem lately.
Referral spam occurs when your site receives fake traffic referrals from spam bots. This fake traffic is recorded by Google Analytics. If you notice traffic from spam sources in the Analytics, you need to perform certain actions to eliminate this data from statistics.
')

What is a bot?
Bots are called programs whose task is to perform repetitive tasks with maximum speed and degree of accuracy.
The traditional use of bots is web-indexing the content of Internet resources, regularly carried out by search engines. But bots can also be used for malicious purposes. For example, for:
- committing fraud by clicks;
- accumulation of e-mail addresses;
- content transfer to websites;
- the spread of malicious software;
- artificially overestimate the traffic resource.
Analyzing the tasks for which the bots are used, you can divide them into safe and dangerous.
Dangerous and safe botsAn example of a good bot is Googlebot, which Google uses to crawl and index web pages on the Internet.
Most bots (whether safe or dangerous) do not execute JavaScript scripts, but some do.
Search bots that execute Javascript scripts (like Google analytics code) appear in Google Analytics reports and distort traffic rates (direct traffic, referral traffic) and other metric data based on sessions (failure rate, conversion rate, etc.).
Search bots that do not execute JavaScript (for example, Googlebot) do not distort the above data. But their visits are still recorded in the server logs. They also consume server resources, degrade bandwidth and can adversely affect the speed of loading the site.
Secure bots, unlike dangerous ones, obey the robots.txt directive. They can create fake user accounts, send spam, collect email addresses, and can bypass CAPTCHA.
Dangerous bots use various methods that complicate their detection. They can affect the web browser (for example, Chrome, Internet Explorer, etc.), as well as traffic coming from a normal site.
It is impossible to say for sure which dangerous bots can be distorted by Google’s analytics data and which ones are not. Therefore it is necessary to consider all dangerous bots as a threat to data integrity.
Spam botsAs the name implies, the main task of these bots is spam. They visit a huge number of web resources daily, sending HTTP requests to sites with fake referrer headers. This allows them to avoid being detected as bots.
The forged header of the referrer contains the address of the website that the spammer wants to promote, or receive backlinks.
When your site receives an HTTP request from a spam bot with a fake referrer header, it is immediately recorded in the server log. If your server log has open access, it can be scanned and indexed by Google. The system treats the referrer value in the server log as a reciprocal link, which ultimately affects the ranking of the website promoted by the spammer.
Recently, Google's indexing algorithms are built in such a way as not to take into account the data from the logs. This eliminates the efforts of the creators of these bots.
Spam bots that have the ability to execute JavaScript scripts can bypass the filtering methods used by Google Analytics. Thanks to this ability, this traffic is reflected in the analytical reports of Google.
BotnetWhen a spam bot uses a botnet (a network of infected computers located locally or around the world), it can access the website using hundreds of different IP addresses. In this case, the blacklist of IP addresses or
rate limiting (rate of traffic sent or received) becomes largely useless.
The ability of spam bots to distort traffic to your website is directly proportional to the size of the botnet that uses the spam bot.With a large botnet size with different IP addresses, the spam bot can access your website without blocking by a firewall or other traditional security mechanism.
Not all spam bots send referrer headers.In this case, traffic from such bots will not appear as a source of referral traffic in Google Analytics reports. It looks like direct traffic, making it even more difficult to detect. In other words, whenever a referrer is not transmitted, this traffic is processed in Google Analytics as direct.
Spambot can create dozens of fake referrer headers.If you have blocked one source of the referrer, the spam bots will send another fake to the site. Therefore, filters for spam in Google Analytics or .htaccess do not guarantee that your site is completely blocked from spam bots.
Now you know that not all spam bots are dangerous. But some of them are really dangerous.
Very dangerous spam botsThe purpose of really dangerous spam bots is not only to distort the traffic of your web resource, clear the contents or receive e-mail addresses. Their goal is to infect someone else's computer with malware, to make your machine part of a botnet.
Once your computer is integrated into the botnet's network, it begins to be used to send spam, viruses, and other malicious programs to other computers on the Internet.
There are hundreds and thousands of computers around the world that are used by real people, at the same time being part of a botnet.
There is a high probability that your computer is part of a botnet, but you do not know about it.If you decide to block a botnet, you are most likely blocking traffic coming from real users.
There is a possibility that as soon as you enter a suspicious site from your referral traffic report, your machine becomes infected with malware.
Therefore, do not visit suspicious sites from analytics reports that are not properly protected (anti-virus programs installed on your computer). It is preferable to use a separate machine specifically for visiting such sites. Alternatively, you can contact your system administrator to deal with this problem.
Smart spam botsSome spam bots (like darodar.com) can send artificial traffic even without visiting your site. They do this by reproducing HTTP requests that come from the Google Analytics tracking code, using your web resource ID. They can not only send you fake traffic, but also fake referrers. For example, bbc.co.uk. Since the BBC is a legitimate site, when you see this referrer in your report, you do not even think that traffic coming from a reputable site may be fake. In fact, no one from the BBC has visited your site.
These smart and dangerous bots do not need to visit your website or run JavaScript. Since they do not actually visit your site, these visits are not recorded in the server log.And, since their visits are not recorded in the server log, you cannot block them by any means (blocking IP, user, referral traffic, etc.).
Smart spam bots crawl your site for web property IDs. People who do not use Google Tag Manager leave Google Analytics tracking code on their web pages.
The Google Analytics tracking code contains your web resource ID. The identifier is stolen by a clever spam bot and can be passed on to other bots. No one will guarantee that the bot that has stolen your web resource ID and the bot sending you artificial traffic is the same “person”.
You can solve this problem by using the Google Tag Manager (GTM).
Use GTM to track Google Analytics on your site. If the ID of your web resource has already been borrowed, then it’s likely to solve this problem too late. All you can do now is use a different ID or wait for a solution from Google.
Not any site comes under attack spam bots.Initially, the task of spam bots is to detect and use vulnerable sides of a web resource. They attack weakly protected sites. Accordingly, if you placed a page on a “budget” hosting or using a custom CMS, it has a great chance of being attacked.
Sometimes a site that often gets under the attack of dangerous bots, it is enough to change your web hosting. This simple way can really help.
Follow the steps below to discover sources of spam.
1) Go to the referral traffic report in your Google Analytics account and sort the report by percentage of failures in descending order:

2) Look at referrers with a 100% or 0% bounce rate, as well as those with 10 or more sessions. Most likely, they are spammers.
3) If one of your suspicious referrers belong to the list of sites listed below, then this is referral spam. You can not check it yourself:
buttons-for-website.com
7makemoneyonline.com
ilovevitaly.ru
resellerclub.com
vodkoved.ru
cenokos.ru
76brighton.co.uk
sharebutton.net
<em> simple-share-buttons.com </ em>
<em> forum20.smailik.org </ em>
<em> social-buttons.com </ em>
<em> forum.topic39398713.darodar.com </ em>
An exhaustive list of spam sources can be downloaded here .
4) When your suspicious referrer’s identity could not be verified, take a risk and visit a questionable website. Perhaps this is really a normal resource. Make sure you have anti-virus software before visiting such questionable resources. They are able to infect your computer at the time of the transition to their page.
5) After confirming the identity of the dangerous bots, the next step is to block them from visiting your site again.
How can you limit your site from spam bots?
Create an annotation on your chart and write a note explaining what caused the unusual traffic surge . It will be possible to discard this traffic during the analysis.

Block referral spam using the features of Spambot . Add the following code to the .htaccess file (or web configuration if using IIS):
RewriteEngine On Options +FollowSymlinks RewriteCond %{HTTP_REFERER} ^https?://([^.]+\.)*buttons-for-website\.com\ [NC,OR] RewriteRule .* – [F]
This code will block all HTTP and HTTPS directions from buttons-for-website.com, including buttons-for-website.com subdomains.
Block the IP address used by the spam bot . Take the .htaccess file and add the code shown below:
RewriteEngine On
Options + FollowSymlinks
Order Deny, Allow
Deny from 234.45.12.33
Note : There is no need to copy the code to your .htaccess - the scheme will not work. Here is shown only an example that provides IP blocking in the .htaccess file.
Spam bots are able to use different IP addresses. Systematically update the list of IP addresses of spam bots available on your site.
Block only IP addresses that affect the site.It makes no sense to seek to block each of the known IP addresses. The .htaccess file will become very bulky. They will become difficult to manage, the performance of the web server will decrease.
Notice that the number of blacklisted IP addresses is rapidly increasing? There is a clear sign of a security issue. Contact your web host or system administrator. Use Google to find a blacklist to block IP addresses. Automate this work by creating a script capable of independently finding and prohibiting IP addresses whose harmfulness is not questioned.
Take the opportunity to block the ranges of IP addresses used by bots-spammers . When there is confidence that a specific range of IP addresses is used by the spam bot, you can block a series of IP addresses at once with a single motion, as shown below:
RewriteEngine OnOptions + FollowSymlinks
Deny from 76.149.24.0/24
Allow from all
Here, 76.149.24.0/24 is the CIDR range (CIDR is the method used to represent address ranges).
Using CIDR blocking is more effective than blocking specific IP addresses, because it allows you to occupy a minimum of space on the server.
Note: You can hide a number of IP addresses in CIDR and vice versa open them with this tool:
www.ipaddressguide.com/cidrBlock prohibited users using spam bots . Analyze server log files weekly, detect and block malicious user agents using spam bots. After blocking, they will not be able to access the web resource. The ability to do this is shown below:
RewriteEngine On
Options + FollowSymlinks
RewriteCond% {HTTP_USER_AGENT} Baiduspider [NC]
RewriteRule. * - [F, L]
Using the Google search box, you can get an impressive list of resources that support the records of known prohibited user agents. Use the information provided to identify such user agents on your site.
The easiest way is to write a script to automate the entire process. Build a database with all known banned user agents. Use a script that will automatically identify and block them, based on data from the database. Regularly replenish the database with new prohibited user agents - they appear with an enviable consistency.
Block only user agents that actually affect the resource. It makes no sense to strive to block each known IP address - this will make the .htaccess file too large, it will become difficult to manage. Decreased server performance.
Use the “Bot Filtering” filter available in Google Analytics - “Exclude hits from known bots and spiders”.
Monitor server logs at least weekly . To start the fight against dangerous bots is really at the server level. While it was not possible to “discourage” spam bots from visiting your resource, do not exclude them from Google’s analytical reporting.
Use a firewall . Firewall will become a reliable filter between your computer (server) and virtual space. He is able to protect the web resource and from dangerous bots.
Get expert help from your system administrator . Round-the-clock protection of client web resources from malicious objects is his main job. The one who is responsible for network security has much more tools to repel bot attacks than the site owner. If you find a new bot that threatens the site, immediately inform the sysadmin about the discovery.
Use Google Chrome to surf the web . In case the firewall is not used, it is best to use Google Chrome to browse the Internet.
Chrome is also able to detect malicious software. At the same time, it opens web pages faster than other browsers, not forgetting to scan them for malware.If you use Chrome, the risk of “picking up” malware on your computer is reduced. Even when entering a suspicious resource from Google Analytics referral traffic reports.
Use custom alerts when monitoring unexpected attendance hops. Personalized notification in Google analytics will allow you to quickly detect and neutralize harmful requests from bots, minimizing their harmful effects on the site.
Use the filters available in Google Analytics . To do this, on the “Administrator” tab in the “Views” column, select “Filters” and create a new one.

To cope with setting up filters is quite simple. The main thing is to know how to do it.

You can use the “Bot Filtering” checkbox located in the “View Settings” section of the “Administrator” tab. It does not hurt.

Despite the ease of use of filters in Google Analytics, we still do not recommend using them in practice.

There are three good reasons for this:
- There are hundreds and thousands of bad bots, a huge number of new ones appear daily. How many filters will have to be created and applied to your reports?
- The greater the number of filters applied, the more difficult it will be to analyze reports received from Google’s analytical service.
- Blocking spam traffic in Google Analytics is a concealment, but not a solution. You will lose the ability to assess the degree of traffic distortion by spam bots.
Similarly, do not block referral traffic using the “ Referral exclusion list ” - this will not solve your problem. On the contrary, this traffic will subsequently be evaluated as direct, which will lead to the loss of the ability to monitor the impact of spam on the traffic of your web resource.
After the spam bot is ranked in the statistics of the analytical service of Google, traffic data will be distorted forever. You will not be able to fix it.
Conclusion
We hope that the recommendations listed above will help you get rid of all sources of spam to your site. This can be done in different ways, we also described those that have helped many resources protect their data in Google Analytics.