📜 ⬆️ ⬇️

How to determine the mobile operator and home region by phone number

When we try to determine which operator a phone number belongs to, we usually look at its DEF code. For example, if the number starts at 916, then this is MTS, at 968 - Beeline, 926 - Megaphone (it all depends on your region). But this method is very conditional and is completely inappropriate when accurate data is needed. In reality, everything is more complicated: DEF codes are often divided among themselves by several operators, and it is not at all necessary that the desired number refers to the operators of the big four. Finally, the number can be simply ported.

In the article I will talk about how to reliably identify the mobile operator that serves it by phone number, as well as get additional, “free” information - the subscriber’s home region. You can use this data as you please, from prefilling the address in the user profile and redirecting to the regional version of your service, to using this data in processing and statistics. At the end of the article there will be a link to github with sources.

At once I will make a reservation that the home region of the subscriber, by and large, is in no way connected with the current location of the user, i.e. the designated region answers the question “Where does the number come from?” and not “Where is the user?”.

Data sources


Rossvyaz


We receive our phone number when we enter into a service contract with a telecom operator. In turn, the distribution of number ranges between telecom operators, as well as standardization and common control over telecommunication services, are carried out by the relevant state and international organizations. In Russia, such an organization is the Federal Communications Agency (Rossvyaz).
')
Thus, the most reliable source of information who serves the Russian phone number is Rossvyaz, and this is the open data that the agency publishes on its website: www.rossvyaz.ru/opendata . The latest list of mobile number ranges is in CSV at the link . Each line in the file looks like:

DEF-, , , ,

However, since 2013 it has been possible to transfer the number from the operator to the operator. So, being guided only by the registries of Rossvyaz, one cannot unequivocally say that the number is serviced by a certain operator. But this is quite possible to say about the region, because the number portability works only within the home region and transfer the number from MTS Novosibirsk to Tele2 St. Petersburg will not work in principle.

Thus, if the task is to determine only the region of the user, then the RosSvyaz registries will be sufficient.

Transferred Numbers Database


If you need to accurately determine the operator, then you can not do without the database of the Transferred Numbers , the operator of which is CRIS . The procedure for connecting to the database can be found on their website: zniis.ru . But, unfortunately, as far as I know, it is not easy to directly connect to them, but having received the connection, you cannot share the base with anyone.

The structure of this database is extremely simple: these are three CSV files in which in the format “number, operator name” are listed:


At the time of this writing, there are about 6 million entries in the BDPN.

Summarizing: we have certain ranges of numbers that correspond to specific operators and regions (Rossvyaz), and a list of exception numbers from these ranges (BDPN), which applies only to the name of the operator.

How to identify subscribers


The most obvious solution to this problem: look at the word "range" and use the listed capacity literally. Those. To determine the number, we sort all operators by their ranges and look for an entry that belongs to the minimum range in which a specific number falls. The complexity of this algorithm will be like that of a binary search, which is pretty good.

But there is a more original and universal way of implementation, the complexity of which is a constant, regardless of the size of the data. This method involves the use of number masks.

Number mask


The number mask is a string consisting of numbers and a special character with the value “wildcard of a single character” (“?”), Which says that any digit can be in its place. And after the question mark can only be a question mark.

Thus, one of the Beeline ranges in Moscow "79031000000 - 79031999999" , in the form of a mask will be recorded as "79031 ??????" .

It is very convenient to work with such masks, for example, to set them manually in the configuration. In addition, the representation of ranges in the form of masks makes it possible to use more efficient storage methods and simple search algorithms.

Hash table


For example, one of these algorithms is storing mask-operator matches in a hash table (or any other key-value store). The essence of the algorithm is as follows: all such masks are added to a hash table, where they are keys. The values ​​in the table are operator objects with regions.

The job search is most clearly explained by example. Let's say we are looking for information by number: 7 (903) 100-1234 , and we have a mask 79031 ?????? - Beeline, Moscow.

First, we look for the key entry in the table exactly as the original number: 79031001234 .
If not found, then change the last digit of the number to "?" and look for the key 7903100123? .

If you didn’t find anything again, then we change the last digit to "?" and look for 790310012 ??, and so on.

Finally we will do a search by key 79031 ?????? and find that the number refers to the operator Beeline, Moscow.

It can be seen that in this case the complexity of the algorithm is equal to the complexity of several takes from the hash table, which, when correctly implemented, is usually equal to a constant. The complexity of searching in such a tree depends on the length of telephone numbers, which, according to the recommendation of ITU-T E.164, does not exceed 15 characters.

The same algorithm can be applied to the ported numbers - they can simply be added to the same hash table.

Prefix tree


A much more efficient performance method is building a prefix tree from masks, which will be based on the fact that numbers consist of numbers. Each node of this tree can have up to 10 digital descendant nodes (0-9) and one wildcard node. A wildcard node can only have wildcard descendants. When adding a new mask to the tree, each character of the mask will consistently turn into a node. Thus, in fact, we represent all of our masks in the form of a single tree.
For example, a tree consisting of masks:
7913? - Mno1
791 ?? - Mno3
7952 - Mno2
7953 - Mno3
795? - Mno1
will look like in the picture (the listed masks in the tree go from left to right).


The search algorithm in the tree, I think, is already clear: we take in order each digit from the desired number and go down the tree successively from the root. First of all we descend on digital nodes, if there are no digital nodes, then we see if there is a "?" - node. If there is, then ultimately check the length of the mask, and if it matches the number, then the operator is found.

Conclusion


Depending on the restrictions, it is possible to combine these approaches and separate the storages of the ported numbers and masks of the Rossvyaz. For example, from memory it is more profitable to use the hash-table approach for ported numbers, and it is always more profitable to use a mask tree for rossvyaz registries. When searching, first look at the table, and if nothing is found in it, then look in the tree. Separation of storage facilities is primarily convenient for auto-updating them, i.e. if the BDPN has changed (and it changes constantly), then it is not necessary to reread the ranges of Rossvyaz.

For maximum performance, you can store all information directly in RAM. In my Java implementation, the Rossvyaz mask tree occupies no more than 20–30 MB, the hash table with the masks of the ported numbers: about 500–600 MB. If the ported numbers are stored in a prefix tree, then due to the fact that tree nodes turn out to be very sparse, approximately 1.5 times more memory is required. But on the other hand, it gives a significant performance gain.

Thanks for attention!

→ All source code is available on github .

Source: https://habr.com/ru/post/337338/


All Articles