⬆️ ⬇️

How to view 20 million domain names and be satisfied

Friends, welcome! Below you will find a story about how 20 million domain names were analyzed and what came of it. The results can be viewed by downloading the csv-file or by restoring a database dump in PostgreSQL.



image



If you wish, you can play around with the source here or directly with containers, using



docker-compose.yml
version: "2" services: app: image: danieljust/domain-finder-v1 tty: true ports: - "3000:3000" rabbit: image: rabbitmq:3 db: image: postgres environment: POSTGRES_PASSWORD: example POSTGRES_USER: postgres POSTGRES_DB: postgres 


Instructions can also be found on the githab

Enjoy reading!



Disclaimer!



Everything that you see and read in this article is not a call and agitation for domaining, and even more so for cybersquatting. All actions were performed for the sake of interest and, as they say, “for fun”.



Introduction



Many companies wanting to rebrand or are just entering the wide market have a desire to choose a beautiful domain.

For the sake of curiosity, it was decided to look at short 1-3 character domains as beautiful domains.



Terminology





The structure of the table in the database



idsldlengthtlddomainpriceroublepriceavailabledefinitive
oneoneactor1.actor20,0001199520TrueTrue


id - record identifier

sldlength - length of second-level domain

tld - top level domain

domain - the actual domain name

price - price in dollars

roubleprice - price in rubles

available - flag indicating domain availability

definitive - a flag indicating whether the flag was available verified with the registry


Result



In the process of work, interesting combinations of domain names were noticed, you can see them in the table below.



domainroubleprice
2.pizza47981
0.fail23991
a.xyz1199520
ab.xyz299880
ad.money11876
as.mba2400
as.guru11996
at.network23991
js.army47981


2.pizza - Perfect for the beginner pizzeria;

0.fail - for the super-reliable something;

a.xyz, ab.xyz - for those who want to be closer to Google;

ad.money - for advertising space;

as.guru, as.mba - for consulting firms;

at.network - for companies involved in network administration;

js.army - proletarians of all countries, unite.

Most two-character domains, if they were free, bite their price.

In the two-character top-level domains of the countries, four free domains were found (all in the Czech zone), and also for a small amount of 1,000 rubles.

In the three-character top-level domains of the countries there were much more free and at an affordable price.

The number of common top-level domains repeatedly prevails over the domains of countries (country domains make up only 4% of the total number of free domain names)



image



The path to the results



Stage 1. Start



The basis of possible characters in SLD were taken -1234567890abcdefghijklmnopqrstuvwxyz (37 characters in total).

It turns out that we have the number of arrangements with repetitions p ^ n.

Total 37+37 37+37 37 37=$5205options.

Since SLD cannot begin and end with a hyphen, we will exclude such cases and we will get 49284.

But this is only the beginning.



Stage 2. Selecting an API



Many sites let you know if a specified domain is busy via the web interface.

To perform the task of manual data entry is not enough and requires an API that can solve the problem.

During the search, the following options were met and dropped:



  1. provide us with your passport data, and we will give you access to the API;
  2. pay us only once (from 5 to 15 dollars) and get lifelong access to the API;
  3. Payment of access to the API once a month;
  4. each API request is about $ 0.01.


But the soul wanted to bring something useful into the world of open-source, and even as free as possible.

The solution was the API .



Its advantages:



  1. free;
  2. allows you to process up to 500 domains in one request;
  3. API documentation.


Its cons:



  1. limited number of requests per minute;
  2. The responses from the server do not always match what the UI offers.


For example, an API response may contain information that the site is busy and cannot be bought. At the same time, this domain name may be available for purchase through the UI.



How to be confident in the availability of the domain?



In the process of communication with technical support, it was found that with the final confirmation of the purchase of the selected domain, a control check of its availability is carried out.

From observations, the definitive flag makes it possible to conclude that the domain name is busy.



Stage 3. Choice of tools and solution preparation



Using the godaddy API, you can get a list of TLDs in which it is possible to purchase domain names.

TLDs consisting of one word are selected from them (* .com.ru, etc. have been removed). As a result, 400 TLD. Light arithmetic leads us to 49284400=$19,713,60domains to check.

The godaddy API can handle up to 500 domains in 1 request, but has a limited number of requests per minute.

In accordance with the above, the algorithm of the program was as follows:



  1. split all the domains you need to check into 5,000 domains;
  2. put the pieces in the RabbitMQ queue;
  3. take a portion of the data;
  4. divide by 500 domains. Send 10 requests;
  5. process data, put information about free domains into the database;
  6. wait 20 seconds;
  7. If there are messages in the queue, perform steps 3-6 again.


For convenience, PostgreSQL and RabbitMQ were raised as docker containers.



Stage 4. Data analysis



After the work of the script was finished, it became necessary to extract from the obtained data something interesting and useful.

The data is kindly placed in domains.sql and domains.csv .

image



In the following, filtering refers to the search for found SLDs in the list of the most frequent English letter combinations in accordance with this source.



image

image



From the pair of graphs above, we can conclude that the number of free domain names containing commonly used combinations of letters of the English alphabet tends to zero.



The five most expensive domain names



domainroubleprice
ads.cloud11,906,200
vod.cloud11,852,400
usa.cloud11,852,400
seo.cloud11,852,400
vip.cloud11,852,400


Top 5 Cheapest Domain Names



domainroubleprice
xt1.company590
xt1.casa590
xsz.company590
xt1.click590
xt1.business590


Conclusion



That's all Folks!



Having gone through the Internet, many fun domains have been identified. And the most important thing is that new companies should not despair: interesting domain names are still free, it remains only to see them.



')

Source: https://habr.com/ru/post/334992/



All Articles