Foreword
Anyone who has ever been involved in writing user authorization / registration systems probably had to ask the question: “How do you know more information about a user?”. What is it for? In most cases, to identify
this particular user. Sometimes - to provide any additional features and information, depending on various social parameters, or, perhaps, the user's location or region of residence. Sometimes, for example, to conduct any scoring. This article will focus on determining the geographic location of the user.
Effective methods for determining
You can think of a lot of methods for obtaining the geographic position of the Internet user. And all these methods will have their own set of pros and cons, will be more or less effective, depending on the application. Now I will describe only those methods that are currently used by the project in which I am participating, i.e. the ones I use directly. During the existence of the project, enough statistics have already been gathered on them, from which some conclusions can be drawn.
1. Data from the soc. nets
To date, it has become extremely popular to use for authorization (or as additional information) accounts of various social networks and blogs, which allows you to use data from them. By authorizing a user in this way, you can get a lot of information about him. The truth here is about the reliability of its not speak, because many indicate in the social. networks are not “real”, but “desired”, or the first thing that came to mind. Weeding out such things is usually the main task for the developer. To do this, you need to get information about all the friends of the user and verify the general data. You can, for example, find the most common place of residence with colleagues / classmates / classmates / friends of the user (in the blue social network, for example, it is very convenient), and, on the basis of these data, find out the real region, region, city and even area of the city where the user lives / works / learns.
Also, in some soc. networks, it is possible to obtain the user's immediate coordinates, if he is online. The accuracy of this data, in some cases, leaves much to be desired, but, at a minimum, the area of the city where the user is located can be determined quite reliably.
')
Pros:- Relatively high accuracy when using scoring models based on friends
- Most users have accounts in the social. networks
- You can check the received data for accuracy using data from friends.
Minuses:- The complexity of implementation, since It is necessary to study the API of several social services. networks, compose and implement models for analyzing the data
- The need for a valid account in the social. network user (I believe that, despite the prevalence, to demand such data from the user, though, is impossible)
- Low speed, if you consider the analysis using data from friends
Implementation, unfortunately, I can not provide for the "secret of the company."
2. GeoIP data
Probably the easiest and most accessible way for everyone, however, for the Russian Federation today, often inaccurate.
Why?The fact is that at the moment most of the existing providers of the regional level were bought up and absorbed by the operators of the federal level. And why is it bad? And here is what. Imagine a situation - in the city of "H" there were 5 small providers. Everyone worked in their own neighborhood of the city, and, accordingly, had their own pool of IPv4 addresses. And even a dynamically issued “white” IP could be roughly tied to a specific area of the city. Now the provider of the federal level comes and buys all 5 small providers with their pools of addresses. Then he brings their network to a certain general form of all the networks of this federal provider. What do we have in the end? This federal provider has a huge number of clients and a huge number of IP pools used, depending on the needs, in a particular region. Ie now the address that previously belonged to the pool of a small local provider can be issued to a client from a completely different city, simply because this address pool is now used for all clients of this provider. But no one, naturally, will tell anyone and from which area this IP is issued. Moreover, tomorrow it may be issued to someone else.
Also, no one will prevent the user from using, for example, a proxy or VPN to access the Internet on behalf of another IP. In this case, GeoIP becomes absolutely useless, because it will receive information about this particular proxy or VPN server. The same happens if the provider provides its customers access to the Internet via NAT (and in the light of problems with the number of free IPv4 addresses, this is becoming more and more common), although in this case, usually, at least you can get a district, region or city.
So, it’s not always possible to rely entirely on GeoIP data, although this method is very convenient - after all, we receive information almost instantly. For this, usually, a previously downloaded local database is used.
Pros:- Easy to use, there are many implementations in different languages.
- High accuracy (with some exceptions, see above)
- Speed of work (almost instantly getting the result - this is just 1 request to the database)
Minuses:- The need to keep the IP base up to date
- The inability to verify the accuracy of the data (only by requests to several databases)
- A rather large percentage of erroneous data for the Russian Federation at the moment (see above)
I don’t see much about “how to do it”, because on the net, and including. on Habré, full of detailed descriptions. There are many free libraries and tools available for obtaining GeoIP data. For example, for PHP you can use
the geoip extension .
3. Using the JavaScript Geolocation API
Rather useful and effective method, but only for mobile devices. In the case of a stationary computer, no more than GeoIP is useful. The fact is that in the case of a mobile device (modern smartphone, tablet, etc.), all the available location tools and devices allowed by the user, including GPS, Wi-Fi and data from cell towers, will be used. But in the case of a home PC, which, in most cases, does not have a mobile network (in the case of a GSM / 3G modem, data from it is not used), or GPS, we can only find out the GeoIP data, which we will happily report to JS . And about their accuracy, I already wrote above. Although, I would not neglect this method - after all, an increasing number of people use tablets and phones to access the Internet.
As a result, this method has a fairly narrow range of applications - mobile devices. Or if enough approximate data on GeoIP.
Pros:- Easy to implement, lots of documentation and examples on the Internet.
- Exact, because can be used as positioning on cell towers, Wi-Fi, GPS
- Fast, because software is used to determine the position of the client
Minuses:- Not supported on home PCs in all browsers.
- Requires user permissions
- In fact, only applicable to mobile devices.
- Relatively easy to fake data
Examples of implementation can be found
here or
here .
4. Definition through the services of the “locator” type from mobile operators
I think some of the readers have heard about these services, some even use them, and some have to use them in a corporate environment. I’m talking about services similar to Locator’s from the egg company and Yellow-striped Coordinates. Yes, these services were originally intended for end users, but ... What prevents them from using us? There are few positive moments when using this method, but what is the high accuracy and almost 100% accuracy of the data. But there are unpleasant moments. First, these services are paid. Secondly, the need to use a mobile phone number during registration and the requirement to send a free SMS to a short number ... This behavior can scare many people away. Yes, and time to receive information on SMS is considerable (within the framework of a web application). But, in some cases, information of this kind, and yes even reliable, is simply necessary. Moreover, this method can be used as a substitute for confirmation by a code from an SMS of any action. Yes, and fake information obtained in this way is almost impossible.
I will not give an example of a working implementation for the same reasons as in the first case, but I will briefly describe how this is done
just below .
Pros:- Highly reliable data, almost 100%
- High accuracy, regardless of the device used and the method of Internet access
- Automatically confirms mobile number
Minuses:- Difficulty in implementation and support
- Low speed it takes time to send / receive SMS and reply from the user
- Not free (operators' tariffs for this service are very “voracious”)
- User consent is required
How to make
We will need:
- Old mob telephone with cable or 3G / GSM modem, one for each operator
- Sim cards of these operators
- Some PC, preferably with * nix on board (Windows with cygwin is also possible), which will function as a “geo-gateway”
- A little patience and time
- smstools3
1) Depending on the OS , the instructions may differ, but the general meaning is unchanged - you need to download and install the SMSTools package from the software repository
On Gentoo, it looks like this:
If you need statistics of sent / received SMS, then:
nogood-work ~
or (if you have all USE flags in one file):
nogood-work ~
Then we install smstools from the port itself:
nogood-work ~
On FreeBSD:
root@kenny:/usr/ports
For statistics in the options just select "STATS"
You can build from source if there is no ready-made package for your system:
nogood-work ~
2) We connect the modem (s) and check whether the devices of the serial port appeared in / dev
For Gentoo:
nogood-work ~
Multiple ports may appear. Usually we are interested in ttyUSB0, if modem 1. If more - then connect in turn. And here is the first of our ports that have appeared.
For FreeBSD:
root@kenny:~
The meaning is the same - the first of several appearing is ours.
3) Customize SMSTools
smsd.conf may be located in / etc / or in / usr / local / etc / depending on your distribution. We bring it to a similar look:
4) Create a trsms.sh file (event handler)
This is an example with minimal functionality. Writes to the log requests and responses received for the "egg" operator. In a good way, you also need to add a condition to the number from which the message came, based on the from variable. It will also be possible to determine the operator. The numbers of different operators, as a rule, are different.
We do not forget to give launch rights to the user, from under which smsd will work.
5) Start the smsd daemon and add it to autoload
For Gentoo:
nogood-work ~
For FreeBSD:
root@kenny:~
Look logs. If everything is good and there are no error messages, then go to the next step.
6) Try to send SMS to your phone
nogood-work ~
If the SMS was successful - you can try to send an SMS to the coveted service number with the appropriate text, and then check the logs.
Then you can simply call the command
sendsms <> "<>"
from your script
sendsms <> "<>"
and check, for example, on cron the presence of a response to the desired number in the sms log file.
Conclusion
Each of these methods is suitable for some specific purposes and conditions, and you decide what to use. Of course, not all location methods are considered here. I described only those that I tried myself and I consider as the most effective. Also, to achieve greater efficiency, I would recommend combining them. So this is done in our project. That's all. I hope someone this information will be useful.