Upgrading Google Maps through in-depth training and Street View

Every day, Google Maps build useful routes, provide information about traffic jams and commercial organizations for millions of people. To make it more convenient for our users, this information should reflect a constantly changing world in real time. Street View cars collect millions of images every day, and it is impossible to manually analyze the more than 80 billion high-resolution images collected today to find new or updated information suitable for posting on Google Maps. One of the goals of the Ground Truth team is to automatically extract information from geo-referenced images to improve Google Maps.

In " Extracting structured information from the Street View image database using the attention algorithms, " we described our approach to accurately automatic recognition of street names in very complex Street View photographs from different countries using a deep neural network. Our algorithm showed 84.2% accuracy on a complex French Street Name Signs dataset (FSNS), and seriously outpaced the previous leaders in this field. Importantly, our system scales easily to extract other types of information from Street View photos, and now it helps us automatically recognize commercial signs. And we are pleased to announce that this model is shared !

An example of a successfully recognized street name system. The same sign can be represented by several photos, up to 4 pcs.

Recognizing text in the natural environment is a challenge for computer vision and machine learning. Traditional character recognition systems (OCR) are engaged in extracting text from scanned documents, and text obtained from street photographs is more difficult to recognize due to visual artifacts such as distortion, blocking, blurring, complex background or different points of view. Our attempts to solve these research problems began in 2008, when we used neural networks to blur faces and car numbers to protect the privacy of our users. After this study, we realized that with a sufficiently large amount of tagged data, we can use machine learning not only to protect the privacy of users, but also to add fresh information to Google Maps.
')
In 2014, the Ground Truth team published an advanced method for recognizing house numbers from the Street View House Numbers (SVHN) dataset, which was performed by a then student and now Google employee, Ian Goodfellow . This work was not only of academic interest, but was critical for improving the accuracy of Google Maps. Today, the location of about a third of the addresses worldwide has been improved thanks to this system. In some countries, for example, Brazil, this algorithm specified the location of more than 90% of addresses on Google Maps, which greatly improved the usability of our maps.

The next logical step was to transfer these techniques to street names. To solve this problem, we created and released the French Street Name Signs (FSNS) dataset, a large dataset with over a million street names. The FSNS suite is the result of years of work aimed at providing everyone with the opportunity to improve their OCR models on a complex and real data set. FSNS is much larger and more complex than SVHN, since accurate recognition of street names requires combining information from several different images.

Examples of difficult to recognize characters that our system has successfully recognized using a combination of different images. Random noise is used in cases where there are no four different photos for a single character.

Armed with this set, Google trainee Vozna Zbigniew spent the entire summer of 2016 developing a deep learning model for automatically marking Street View images. One of the interesting and useful features of the new model is the possibility of normalizing text according to our standards for titles, as well as ignoring unnecessary text obtained from images.

An example of the normalization of text on the Brazilian data. “AV.” Turns into “Avenida”, and “Pres.” Into “Presidente”

In this example, the model is not extinguished when it encounters two signs at once, correctly turns “Av” into “Avenue” and correctly ignores the number “1600”.

The new system in combination with retrieving house numbers allows us to create new addresses directly from photos in places where there was no street name or address known to us before. Now, every time the Street View car drives along a new road, our system is able to analyze tens of thousands of images received by the car, retrieve street names and house numbers and correctly map new addresses.

But the automatic creation of addresses is not enough - we also want to provide a route to commercial organizations by their name. In 2015, we published the work “ Large-scale recognition of commercial organizations from Street View photographs ”, which proposed a method for accurately recognizing signs of commercial establishments. However, after the organization's window has been found, it is necessary to extract its name precisely - the model must figure out where the name is shown on the photo, and where the text that does not have a relationship with it. We call this extracted information "structured text." And this is not just a text, but a text combined with its semantic meaning.

Using various training data, we can force our model, which read the name of the streets, to extract the names of commercial establishments from the facades of buildings. In this case, we could extract the name and check whether we know about this institution from information from Google Maps. This allows us to compile more accurate and current lists of commercial organizations.

The system correctly recognized the name of the store as 'Zelina Pneus', despite the lack of information on the location of the store. She also correctly ignored the names of tire brands sold in the store.

The application of these large models to 80 billion Street View images requires significant computational power. Therefore, the Ground Truth team was the very first to gain access to the Tensor Processing Unit , which was announced this year, to drastically reduce computational costs.

People rely on the accuracy of Google Maps and on their ability to help people. We keep Google Maps up to date, dealing with constantly changing urban landscapes. Roads and commercial establishments represent for us technical difficulties that we have not yet been able to overcome 100%. The goal of Ground Truth is to be at the forefront of machine learning progress and to create a more convenient product for more than a billion Google Maps users.

Source: https://habr.com/ru/post/404031/

All Articles

Upgrading Google Maps through in-depth training and Street View

More articles: