📜 ⬆️ ⬇️

Current data on the telephone codes of Russian cities

Many applications require up-to-date data on the telephone codes of Russian cities. An Internet search leads to the following sad results: there are a lot of questionable resources, where city codes are laid out, but their relevance and accuracy leaves much to be desired, or is unknown. There are online services that provide codes for a particular city. This could be used, but making several tens of thousands of requests manually is not a pleasure for everyone.

In fact, the primary source of this information is the Federal Communications Agency Rossvyaz, which provides information on the current phone numbering. These files are called so that even search engines, burdened by the beginnings of artificial intelligence, could not issue this magic page using standard search phrases of searchers for the list of valid phone codes - “Extract from the registry of the Russian system and the numbering plan”.

I assume that nowadays the departments are obliged to share open information, therefore these files are forcedly laid out by the agency in open access, but the habit of officials “to help individually” for wealthy citizens and leading companies also creates the fact that stumbling on these files is not so easy, and telephone codes of all localities of the Russian Federation in this file are three-digit, i.e. Consist of exactly 3 digits!

The second disadvantage of these files is that the names of settlements in Russia are invented by the communications workers themselves and do not have one hundred percent match with the names used by the postal service (the KLADR database known to many). Probably, there are still some sources of naming settlements, but I relied on KLADR data. For this reason, when combining this data with your list of cities, you will have to sweat a bit, writing down specific discrepancies manually.
')
Since I needed to get the resulting data in the form of mysql-tables, I wrote the code in Mysql to get the data I needed.

The specific code probably goes beyond the format of such an article, so I am describing the algorithm and parts of the “pitfalls”. Perhaps this will be a plus for readers, because will allow the algorithm to be implemented in a more convenient and familiar environment.

First, import the data from the files (command LOAD DATA INFILE).

Since the agency has kindly provided the first 3 digits of the phone code, our task is to catch the remaining ones and add them to these first 3.

Each line of the file except for the three-digit city code contains a range of seven-digit numbers, the legal entity to which it is allocated and the settlement. It is known that the number of digits in the required code is from 3 to 6. It is also known that more than 2 telephone codes were not allocated to one locality.

For each locality we calculate the “variability” (the number of different digits), which comes first in the numbers of all ranges after the three-digit code. At the same time, we calculate the same for four-digit codes (they are obtained by adding the first digit of the range to the three-digit code on the right), five-digit and six-digit codes.

Let me explain this wording in more detail by example. If the city code of Berdsk, Novosibirsk Region contains 5 digits - 38341, then when presenting all the telephone numbers of this city in seven-digit form, the first two digits will always be 2 digits - 41. But the third digit will already contain more than one option (from 3 or more , assuming that the city may have 2 telephone codes)!

In fact, this is the main part of the algorithm. The algorithm is not ideal from a theoretical point of view, since it is obvious that if all the available numbers in a locality contain only 2 variants of the first digit, then this approach will generate an error - instead of a single code, the locality will have 2.

Underwater rocks


Before this request, I make a request to clean the data from garbage, because identical regions, districts and settlements in the agency files are not always completely contextually identical. Somewhere there are tabs symbols. Somewhere missing the name of the district, etc. For example, in a file with codes starting with digit 3, my query contains about 20 lines, so catching all these situations is not complicated.

It should be noted that there are codes in the files that do not have a regional binding: 800, codes allocated to mobile telecom operators and codes for paid numbers. It is logical that they should be excluded from the above request.

Due to the theoretical non-ideality of the algorithm, it is quite reasonable to supplement the scheme with a verification request - compare the data with the reference books you already have, as well as conduct a random test for individual localities through online services that you are 100 percent sure of the relevance and correctness of the data. If any calculated city codes are incorrect, you can delete them with one more additional request.

Due to the need to “clean up garbage” in the file and differences in the composition of this “garbage”, it is more correct to write a cleaning request individually for each of the files. Also, I have not investigated the issue of changes in the composition of the "garbage" in files historically, since I worked one-time only with the October data. Probably, an attempt to fully automate the process of updating the telephone codes of cities in Russia as the agency updates the files will encounter a high level of variability of the “garbage” in time and to fully automate the process, most likely it will not work.

Conclusion


I hope that my article will help someone a little in solving their problems. It will also be interesting to hear the opinion of the agency programmers on the reasons for the three-digit codes in all settlements of Russia and the high degree of variability of the texts (the name of the region, region and settlement), which logically should be identical due to the very nature of these data.

PS As part of my task, I made similar requests for international codes. But there is no point in citing the order of actions, since I could not find this data in the public domain and received an “acquaintance” dial-up of one of the ip-telephony operators. From his explanations it followed that another operator will have his dial-plans and, accordingly, his own nuances of isolating codes.

Source: https://habr.com/ru/post/337184/


All Articles