The purpose of the study was to obtain an up-to-date list of all active domains in the .RU zone, by
01.01.2016 it was registered with
5040277 names. They made a decision to go through each name with a crawler and analyze the result.
Server responses were as follows:
')
Full table with response codesrc | cnt | % |
200 | 2670175 | 53.0 |
IPFAIL | 826869 | 34.9 |
TIMEOUT | 486924 | 7.4 |
301 | 444719 | 7.3 |
404 | 191831 | 3.4 |
302 | 176133 | 3.2 |
403 | 108624 | 2.1 |
503 | 43330 | 0.8 |
CHARSETFAIL | 32606 | 0.6 |
500 | 19603 | 0.4 |
401 | 6847 | 0.1 |
303 | 5919 | 0.1 |
429 | 5501 | 0.1 |
502 | 5340 | 0.1 |
402 | 4232 | 0.1 |
0 | 2954 | 0.1 |
NONHTML | 1796 | 0.0 |
423 | 1688 | 0.0 |
400 | 1654 | 0.0 |
409 | 1125 | 0.0 |
307 | 1014 | 0.0 |
521 | 273 | 0.0 |
999 | 203 | 0.0 |
410 | 191 | 0.0 |
523 | 150 | 0.0 |
504 | 138 | 0.0 |
509 | 98 | 0.0 |
508 | 93 | 0.0 |
204 | 46 | 0.0 |
520 | 45 | 0.0 |
434 | 32 | 0.0 |
CLEX | 32 | 0.0 |
406 | 20 | 0.0 |
501 | 14 | 0.0 |
479 | eight | 0.0 |
407 | eight | 0.0 |
418 | 7 | 0.0 |
405 | 7 | 0.0 |
451 | four | 0.0 |
435 | four | 0.0 |
304 | four | 0.0 |
201 | 3 | 0.0 |
300 | 2 | 0.0 |
456 | 2 | 0.0 |
3 | one | 0.0 |
507 | one | 0.0 |
101 | one | 0.0 |
126 | one | 0.0 |
422 | one | 0.0 |
557 | one | 0.0 |
412 | one | 0.0 |
413 | one | 0.0 |
420 | one | 0.0 |
Total : | 5040277 | 100.0 |
IPFAIL - the domain could not be resolved (not delegated, name servers are not specified, etc.).
TIMEOUT - IP was received, but did not give anything and fell off on timeout.
CHARSETFAIL - content coding could not be recognized.
NONHTML - sites whose web servers did not interpret the scripts, but gave them to the text along with the connection details to the databases and other delights.
CLEX -
crawler exceptions by response size> 10mb.
301 redirect (permanent):
bulk - a zoo from satellite nets, alternative website addresses, and so on.
| cnt | % |
http: // www.domain | 215289 | 48.4 |
bulk | 144275 | 32.4 |
http: // domain / page | 76417 | 17.2 |
https: // domain | 7617 | 1.7 |
https: // www.domain | 1121 | 0.3 |
Total : | 444719 | 100.0 |
302 redirect (temporary):
bulk - all the same grids, errors, installers of various cms, etc.
| cnt | % |
bulk | 135464 | 76.9 |
http: // domain / page | 22658 | 12.9 |
http: // www.domain | 10660 | 6.1 |
https: // domain | 7168 | 4.1 |
https: // www.domain | 183 | 0.1 |
Total : | 176133 | 100.0 |
In the redirect through meta refresh, we also look, but this time there is nothing interesting there. The most popular way to send a user to a bunch of exploits.
All
2670175 domains that
donated 200 OK are running on
192213 IP addresses, top 10:
Here we meet really interesting guys:
180983 domains on ip 109.206.190.54 (
6.77% of all active ) are mirrors of
www.homes.ru (compared not only by ip, of course). With a huge margin go even from parking. Work in a big way.
Few average values of the content component of the main pages of the RuNet:
Average title length | 47 |
Average Keyword Length | 220 |
Average words per page | 515 |
Average page weight (in octets) | 42320 |
On 262 domains in the text occurs the word '
habrahabr '.
Links from the main pages to user profiles Habr List of domains that
donated 200 OK
dataoperator.ru/ru_domains_200_ok.zip