Now more and more mobile applications are becoming geo-dependent. Some simply do not make sense without knowledge of the location of the user, others become more convenient with him. These are the so-called Location Based Services (LBS): navigators, forkswera, instagrams with photo geotags and even reminder applications that work around a specific place, for example, near an office or a store.
For Yandex services and applications, we created our own implementation of the method of determining the location without GPS - Yandex.
Locator . It saves user time and makes our applications a little smarter. In Navigator and Maps, it saves you from entering the starting point of the route, even if you are in a covered parking lot. And when choosing a movie in the Kinoafish or product in the mobile Market it helps to immediately show where to find them in your area of ​​the city. Well, of course, when searching for cafes and ATMs - it allows you to show you immediately, even when you are on the subway.

We
have long discovered the technology as a free API. Today we want to tell how it works.
')
Why no GPS and how else
Satellite navigation systems (
GNSS ), in our case, this is GPS and GLONASS, is the most accurate method of geo-determination today. The corresponding modules are in almost all modern smartphones. But not always and not everywhere he can solve the problems of LBS.
First, the search for satellites sometimes takes a few minutes, and there are situations in which the speed of determination is important even with loss of accuracy. For example, when you need to build a preliminary route in the navigator or
check in . Secondly, satellites are usually not “visible” in rooms or underground. Thirdly, GPS-modules are not in every mobile phone or tablet, and they are almost absent in laptops. That is, LBS needs alternatives.
And there are, of course, alternatives - you can determine the location by the nearest GSM towers, Wi-Fi networks and even by IP address. The accuracy of determining each of these methods is much worse than that of GPS. But if you combine them, they together will give an acceptable quality. At the same time, some of the shortcomings of one are neutralized by the capabilities of the other. GSM-towers are practically everywhere, and Wi-Fi networks are not. In this case, Wi-Fi accuracy is better. Therefore, the combined method of completeness and accuracy is better than each separately. Less well known is the fact that two routers in different parts of the city can have the same MAC address. The combination of GSM and Wi-Fi solves such collisions. Most likely, these routers will have towers with different identifiers nearby - after all, the probability of coincidence within a quarter is much less than in the whole city.
In the world there are several implementations of such a combined method of geo-determination. And it seems that the first question that all developers faced was where to get information about the location of Wi-Fi networks and cell towers?
Network Location Base
In the buy or create dilemma, we ultimately preferred the latter. The main reason is that with own data and algorithms it is much easier to control the quality of the result. Users of mobile Yandex.Maps helped us collect information.
When we started developing the Locator, there were already hundreds of thousands of people on the streets of the cities with Yandex.Maps included in their phones. With the user's consent, the application constantly transmits its GPS coordinates - Yandex.Probits are built based on this information. We thought that along with this, the application can mark which base station the phone is servicing in these coordinates, which Wi-Fi networks are visible (while, of course, not connecting to the networks themselves, so as not to create privacy risks).
A person does not need to do anything specifically to participate in such crowdsourcing - just use the application. As well as on coordinates, the data on the surrounding Wi-Fi networks and GSM stations are impersonal. They practically do not "weigh", and the battery does not sit down faster from their transmission, respectively.
Thus, users began to help each other:

Some, with a GPS receiver in the phone, find out the exact location of the networks and transmit information to Yandex. Others who do not have GPS modules send a list of networks that they see at the moment and receive their approximate location on the map in reply.
The base is collected and regularly updated. And here we are faced with the following problem.
"Moving" networks
Experience shows that cell tower identifiers are constantly changing - the number that was in the center of the city yesterday may be on the outskirts tomorrow. Wi-Fi routers can also move - along with their owners. And it turns out that with each move you need to invalidate a significant part of the data.
This is how we managed to solve at the same time problems with moving both towers and routers. The user is asked to determine the location along with information about which network he sees. If there is one in the list of networks that was seen in different parts of the city, the algorithm takes into account how many signals from it are accumulated in each district and the age of the latter. Each dense cluster of signals from a Wi-Fi network or cell tower we call a “cloud”. The more signals in the cloud and the fresher they are, the more trustworthy it is. The answer will be, respectively, the largest and freshest. And we consider the cloud, in which there are no signals for more than a month, obsolete - even if no more fresh cloud appeared in this area for another network.
Cloud radius
Since the position is approximately determined, it is impossible to show a point - you need to draw a circle (because the radio signal in the absence of interference is distributed in all directions evenly). Although, if you look at the actual picture of the signals, most often it is an ellipse. Indeed, motorists use mobile maps the most. Their GPS-traces remain on the roads, and from the courtyards and, especially, from the buildings, signals are practically not received.

For the answer to be extremely accurate, the radius of the circle must be minimal. If you simply circle the circle around all points of the signals of a particular network, the radius will be too large. Reduce it helped mat. statistics. Signal density is subject to normal distribution, that is, the
three-sigma rule applies. 99.7% of the points fall into the vicinity of such a radius.


We decided to go further and experimentally selected sigma for a coefficient that minimized the radius as much as possible, but retained acceptable accuracy. It was possible, because in most cases the user sees several networks. That is, the “open” areas with a decrease in the coefficient are most likely overlapped by other clouds.
Cloudy signals
Unfortunately, not all GPS signals from users are simply bundled into the clouds. It turned out that if you put on the map all the signals of a single network, in addition to the “ellipses”, points and lines will appear on it. These are, respectively, single signals, far removed from the accumulation of signals from the same network, and very long GPS tracks (ie, a chain of GPS signals).

"Singles" appear, for example, when a person moves on the subway. The phone loses connection with the cell at one station, and when it reaches the other one, it still believes that it is serviced by that cell. Such signals Locator filters. In addition, we set a minimum threshold for clouds so as not to rely on clusters of signals that are too small.
Long GPS tracks appear, for example, when a person drives a car through the whole city. The phone “drags” the tower identifier from the beginning of the route and reports that it allegedly sees it all the way. It is known that base stations have a limited range, so the GPS locator also filters out such GPS tracks. Tracks, the length of which fits into the range of the tower, remain. As a rule, they are noticeable in areas where there is little data. There they become a chain of small clouds.
We consider single signals, small clouds and long tracks “noise”. When the user sees a single network for which we know only such signals, he receives the answer that the location could not be determined. We believe this is more correct than giving a deliberately wrong, according to our estimates, result.
When little data was accumulated, there was another difficulty with combining all the signals into one cloud. It happened that the signals from the tower from one city came also from another. The presence in the identifiers of GSM networks of the location area code LAC (Location Area Code) helped us. Since the towers with the same code should be close by the standard, to the clouds that were “not in their city” (ie, among the clouds with another LAC), the Locator began to give underweight.

Improved accuracy of determination ...
... via GSM networks
Once upon a time, information about only one base station was available to applications, although the phone most often sees several. After the advent of the Android platform, applications were able to learn
to see them all (except for connecting to the 3G standard, which allows you to find out only one cell tower). The location was determined more precisely - no longer on one cloud, but on the totality of several. It turned out that for many clouds you can use the same approach as for one. The radius is calculated from the standard deviation of the signals included in the combination of clouds, and the center is calculated from their average coordinates.
... via Wi-Fi networks
When a smartphone is within range of several Wi-Fi networks, it can report not only their list, but also the signal strength of each. Knowledge of this power we used to clarify the center of the circle in which the user is located. We began to hang imaginary springs to the centers of the observed clouds - the tighter the stronger the signal. And their free ends - to connect. The point at which these springs are balanced is the specified center.

The resulting quality
First, a few words about how we evaluate the quality of our decision. As already mentioned, from users who have a GPS module in their devices, Latitude receives both coordinates and a list of networks that see devices. To assess the quality, he first determines the approximate location, focusing only on these networks. It then checks to see if the true coordinates from the user are in the circle suggested by Locator.

Using this technique, we obtained the following figures:
- for 83% of requests per day, the location is determined correctly - the GPS coordinates of the device fell into an area called Locator
- 14% of signals - with an error:
- 7% - error less than 100 meters
- 5.6% - from 100 meters to several kilometers
- 1.4% - Locator wrong city
- the remaining 3% of requests receive the answer "Location not found"


Is it possible to achieve better quality? Yes. The advantage of the method is that at a certain maturity of the algorithms, it is enough to collect more data in order to determine the location more accurately. And this is quite easy, because the number of Wi-Fi networks and the number of users of our applications is growing.
But there are technological limits:
- if the phone reports only one GSM tower - the minimum radius will be several hundred meters in the city, and several kilometers outside the city
- if the phone sees several towers - the center can be determined more precisely, but the radius can hardly be reduced
- if the Wi-Fi network is visible - the minimum radius will be 10 meters
Calculation volumes
To quickly respond to the user, you need to prepare in advance the entire answer, or at least a substantial part. Every night, a cluster based on our distributed computing system
YAMR aggregates signals received up to yesterday, getting ready for the answer "clouds". At the time of the request, the Latitude remains only the right way to combine them. So terabytes of "raw signals" shrunk to 1.5-2 GB of ready-made answers that easily fit into memory. And the preparation of the response almost always fits into 1 ms, and each server in the cluster withstands 10 thousand RPS.
And so that the duration of the daily calculation does not grow linearly with the growth of the history of GPS signals, we have achieved the "additivity" of the clouds. Now it is enough to store only a few indicators for each cloud, and there is no need to re-process the entire old history every day.
Preparing a more complete answer is ineffective. If you cluster each combination of networks into a separate cloud, you get a combinatorial explosion. The volume of ready-made answers grows by several orders of magnitude, and with partial overlap of the networks, even more calculations are needed to prepare an answer.
Analogs
Location services without GPS, as we have said, are not only available to Yandex. Developers can contact a commercial provider (such as
Altergeo in Russia and
Skyhook Wireless in the world), or use the mobile platform or browser API.
In general, such a database can be assembled in three ways:
- go around the cities of interest by car, scan the network, and then periodically go around again to update the base
- create a massive mobile application (for example, Yandex.Maps)
- create a mobile platform (for example, iOS or Android)
But only the developer of a geo-dependent application has to choose between different solutions, and the user “lives” with this choice. In the absence of a uniform comparison methodology, attention should be paid to the accuracy of determination (radius of the “tolerance” and the percentage of errors) in the regions of interest.
True, the developer may not always choose. On iOS and Windows Mobile, the application can only use geo-detection functions built into the operating system. The application does not have access to the current base station and / or a list of WiFi networks other than the current one.
Another situation in web services. All modern browsers have a
geo-definition API . And changing the browser, the user changes the geo-determiner. Firefox and Google Chrome use the Google implementation, Safari uses Apple, and IE uses Microsoft. Our Locator works in the
Yandex browser .