⬆️ ⬇️

Domain distribution statistics by AS, IP, NS, MX and other parameters

Once upon a time 1stat.ru stopped working - frankly, for us it was a tragedy (now it seems to work somehow).



The working day usually began with a cup of coffee and a pleasant view of the values ​​of domain growth. Of course, such metrics do not show the success of the company, nor its capitalization, nor the type of customers that are placed with the company. But despite this, you can indirectly assess the dynamics of growth or degradation of the company - if the number of domains on NS servers grows, and does not fall, this is good. As practice has shown, it somehow reflects the state of affairs in the market.







The first version of firststat.ru



Without thinking twice, they threw a script with a table, where they collected the values ​​for the NS servers for the domain and made some statistics. The relevance of the table was not the best, since we summarized all the domains on all servers and simply divided them by the number of servers, which over time with some providers changed. But this did not prevent us from arranging corporate events when moving upward by rating =). There was nothing interesting in this realization, it was done in one evening and it was too lazy to talk about it.

')

The second version of firststat.ru



About a year ago, pavelodintsov sent interesting statistics on the distribution of domains across autonomous systems.







As well as a link to the code that collects these statistics. The collection itself was designed to run various bash, perl and sqlite scripts. At the same time, the algorithm was quite simple and interesting, in principle, from Pavel, as a talented specialist, nothing should have been expected, although he worked rather slowly and collected data only on AS.







The algorithm was improved - it was completely rewritten in Python, historical tables were introduced, collection and storage of all records from the DNS was added. We also wrote a simple wrapper for displaying data on firststat.ru , which, given the availability of time and interest from the outside, we will definitely improve.







Statistics collection



Original project: https://github.com/pavel-odintsov/ru_open_statistics

My fork rewritten in Python: https://github.com/AlexeyManikin/domain_statistic



To work, you need the Python modules:

A MySQL server is also needed and a fast DNS resolver is highly desirable, as the bulk of the load falls on it. You must request all records (A, AAAA, SOA, TXT, MX, SRV, NS) for each of the 6,000,000+ domains. At first, I also collected information from similar projects to check the correctness of the results.



The structure of the project to describe, I believe, does not make sense, the code is quite simple to understand.

update_as_info.py - updates data on AS

update_domain.py - updates data by domains

update_statistic.py - aggregates statistics in the database



DB structure



Below are tables with fields that can be useful when retrieving data:



as_list : the table of correspondence of the AS number and its description, the country



domain : the table that stores the current state of the domain, I think most of the fields are clear from the title



domain_history : contains the history of changes in the records in the domain table, in addition to all the fields that exist in the domain, there are also other fields:



The following tables contain aggregated information:



All statistics can be selected from the domain and domain_history tables, but the queries are quite long. We decided to make separate tables for speeding up.



Server load



As a co-founder of hosting, after much persuasion, I was still assigned a server =) Accordingly, I started collecting statistics on an ongoing basis from January 2016, although I sometimes forgot to turn it on after a reboot.

The whole thing works on a 2 x Intel® Xeon® CPU E5-2620 0 @ 2.00GHz server with 32 Gb RAM.

It takes approximately 11 hours to collect data and aggregate statistics for all domains. Currently, the domain_history table contains 41436250 records, the total size of the database is 23 gigabytes. The number of threads pdns and parallel handlers was chosen experimentally, stopped at processing 150 threads.



CPU, LA, and memory usage graphs are shown below:















Run the statistics collection script





The easiest way to run through Docker . Install Docker and git



sudo apt-get update sudo apt-get install docker.io git 


download the repository



  cd /home git clone https://github.com/AlexeyManikin/domain_statistic.git 


collect images



  cd domain_statistic/docker docker-compose build 


we launch containers



  docker-compoces up -d 




Access to database



If anyone is interested, created a copy of the database, opened access to the database in ReadOnly mode, you can go play (limit 300 connections). If there is a habraeffekt - will be stupid =).

Server: manikin.beget.ru

Port: 3310

User: amanikin_stat

Password: openstatistic

Base: domain_statistic



If someone does not know how to use the console, install PhpMyAdmin : pma.amanikin.ru



Interesting SQL, which are useful to us:



 #  ,    MX NS- mysql> SELECT count(*) FROM domain WHERE mx1 LIKE 'ns1.%'; +----------+ | count(*) | +----------+ | 1064 | +----------+ 1 row in set (22.05 sec) #  mysql> SELECT count(*) FROM domain WHERE ns1 LIKE 'mx%'; +----------+ | count(*) | +----------+ | 436 | +----------+ 1 row in set (0.01 sec) #  ,   ,        mysql> SELECT count(*) FROM domain WHERE register_date_end <= '2016.05.27'; +----------+ | count(*) | +----------+ | 231235 | +----------+ 1 row in set (27.99 sec) #     ,  LIKE,  ,     FULLTEXT INDEX. #  ,    domain,   6 . ,     habrahabr   mysql> SELECT domain_name FROM domain WHERE MATCH (domain_name) AGAINST ('+habrahabr*' IN BOOLEAN MODE); +---------------+ | domain_name | +---------------+ | habrahabrs.ru | | habrahabr.ru | | habrahabra.ru | | habrahabr.su | | habrahabru.ru | +---------------+ 5 rows in set (2.18 sec) #  ,     LIKE   . #  - ,    ns    beget mysql> SELECT count(*) FROM domain USE INDEX(ns_all_ft) WHERE MATCH (ns1,ns2,ns3,ns4) AGAINST('+beget*' IN BOOLEAN MODE) +----------+ | count(*) | +----------+ | 224715 | +----------+ 1 row in set (2.18 sec) #     ru     tld = 'ru' 


Or how the DNS for the domain habrahabr.ru changed



Sample result
 mysql> select * from domain_history where domain_id = '1717034' \G *************************** 1. row *************************** id: 1717033 domain_id: 1717034 date_start: 2015-07-31 date_end: 2015-09-18 domain_name: habrahabr.ru registrant: registrator-ru tld: ru register_date: 2006-04-18 00:00:00 register_date_end: 2016-04-18 00:00:00 free_date: 2016-05-19 00:00:00 delegated: Y a1: 178.248.233.33 a2: NULL a3: NULL a4: NULL ns1: ns1.habradns.net. ns2: ns2.habradns.net. ns3: ns3.habradns.net. ns4: NULL mx1: alt1.aspmx.l.google.com. mx2: alt2.aspmx.l.google.com. mx3: aspmx.l.google.com. mx4: aspmx2.googlemail.com. txt: mailru-verification: 78856df53544c6fd v=spf1 include:spf.habramail.net include:aspmx.googlemail.com ~all asn1: 197068 asn2: NULL asn3: NULL asn4: NULL aaaa1: NULL aaaa2: NULL aaaa3: NULL aaaa4: NULL cname: NULL nserrors: *************************** 2. row *************************** id: 15762868 domain_id: 1717034 date_start: 2015-09-18 date_end: 2016-04-12 domain_name: habrahabr.ru registrant: registrator-ru tld: ru register_date: 2006-04-18 00:00:00 register_date_end: 2016-04-18 00:00:00 free_date: 2016-05-19 00:00:00 delegated: Y a1: 178.248.233.33 a2: NULL a3: NULL a4: NULL ns1: ns1.habradns.net. ns2: ns2.habradns.net. ns3: ns3.habradns.net. ns4: NULL mx1: alt1.aspmx.l.google.com. mx2: alt2.aspmx.l.google.com. mx3: aspmx.l.google.com. mx4: aspmx2.googlemail.com. txt: google-site-verification=kdqnhqj_7jlohipxgkwxizyedshff0askqtc2ovh1em mailru-verification: 78856df53544c6fd v=spf1 include:spf.habramail.net include:aspmx.googlemail.com ~all asn1: 197068 asn2: NULL asn3: NULL asn4: NULL aaaa1: NULL aaaa2: NULL aaaa3: NULL aaaa4: NULL cname: NULL nserrors: *************************** 3. row *************************** id: 35689334 domain_id: 1717034 date_start: 2016-04-12 date_end: 2099-01-01 domain_name: habrahabr.ru registrant: registrator-ru tld: ru register_date: 2006-04-18 00:00:00 register_date_end: 2017-04-18 00:00:00 free_date: 2017-05-19 00:00:00 delegated: Y a1: 178.248.233.33 a2: NULL a3: NULL a4: NULL ns1: ns1.habradns.net. ns2: ns2.habradns.net. ns3: ns3.habradns.net. ns4: NULL mx1: alt1.aspmx.l.google.com. mx2: alt2.aspmx.l.google.com. mx3: aspmx.l.google.com. mx4: aspmx2.googlemail.com. txt: google-site-verification=kdqnhqj_7jlohipxgkwxizyedshff0askqtc2ovh1em mailru-verification: 78856df53544c6fd v=spf1 include:spf.habramail.net include:aspmx.googlemail.com ~all asn1: 197068 asn2: NULL asn3: NULL asn4: NULL aaaa1: NULL aaaa2: NULL aaaa3: NULL aaaa4: NULL cname: NULL nserrors: 3 rows in set (0.04 sec) 




The full SQL dump in the form of an archive can be seen from the link (size 2.2G)



Similar projects



http://1stat.ru/

http://statonline.ru/

http://stat.nic.ru/

https://myip.ms/view/best_hosting/RUS

other…



Someone uses the statistics of visits, someone does not understand how they think, someone has a lot of advertising. In any case, I did not find the source of any service.



Links



Site - http://firststat.ru

Repository - https://github.com/AlexeyManikin/domain_statistic

Creators:

pavelodintsov - creator of the idea and prototype

redfenix - implementation in Python, data collection and storage, site prototype

ilin - site refinement



We will be glad to suggestions for improvement. If anyone needs access to the database for your applications - write, open.

Source: https://habr.com/ru/post/301894/



All Articles