Google Research: Fast, accurate identification of 100,000 categories of objects on a single machine.

People can distinguish between approximately 10,000 high-level visual categories, but we can distinguish between a much larger range of visual impulses called special signs . These signs may correspond to parts of the object, animal extremities, architectural details, objects on the ground and other visual images, the names of which we do not know, but it is this much larger set of signs that we use as the basis for the reconstruction and explanation of our daily visual experience. Such signs provide components for more complex visual impulses and create a context that is important for us to resolve ambiguous compositions.

In contrast to the current practice of computer visual perception, the explanatory context necessary for solving visual details may not only be entirely local. A flashing fast red bouncing signal along the ground can be a child's toy in the context of a playground or a rooster in the context of a barnyard. It would be useful to have a large number of detectors of items capable of signaling the presence of such items, including detectors for sandboxes, swings, slides, cows, chickens, sheep, and agricultural machinery, necessary for context recognition in order to distinguish between these two options.

The winners of the CVPR Best Paper Award (for the best report on computer vision and image recognition) this year, in collaboration with the Googlers team, which includes Tom Dean, Mark Ruzon, Mark Segal, Jonathan Shlens, Subhindra Vigyanarasimhan and Jay Yagnik, describe the technology that allow the computer vision system to extract the necessary type of semantically rich contextual information necessary for recognizing visual categories, even if careful viewing of the pixels covering the object in question may not be enough to identify them pecifications the absence of such contextual clues. In particular, we consider the main operation in machine vision, which includes determining the level of each specific location of objects in an image where a particular object may be present.
')
This is the so-called convolution operator, which is one of the key elements used in machine vision and, more generally, in the processing of all signals. Unfortunately, computationally, it is expensive and, therefore, researchers use it sparingly or use exotic SIMD hardware, such as graphics processors and FPGAs, to reduce computational overhead. Let's turn everything upside down to show how quick tabular search can be used - a method called hashing - to exchange time for space, replacing the computationally expensive inner contour of the convolution operator - the sequence of multiplication and addition operations necessary to perform millions of convolutions. tabular search.

We demonstrate the advantages of our approach by scaled object detection, bringing it from its current state with the involvement of several hundred or, at most, several thousand categories of objects to 100,000 categories, which would be the equivalent of over a million convolutions. In addition, our demonstration was held on a single ordinary computer, which only needs a few seconds for each image. The core technology is used in several parts of the Google infrastructure and can be applied to solving problems outside of computer vision, such as hearing signal processing.

On Wednesday, June 26, Google’s engineers responsible for the research were awarded for the best talk at the IEEE Computer Vision and Pattern Recognition Conference held in Portland, Oregon.

The full report can be found here .

The purpose of the publication on Habré: to read comments about the prospects of technology based on this study and their application in the Internet.

PS
This is my first post on Habré. I will be glad to your comments. And do not judge strictly.
Due to the lack of karma, there is no possibility to publish in the hubs "Artificial Intelligence" and "Google".
I would be grateful if you tell me how to transfer to the specified hubs.

Source: https://habr.com/ru/post/190460/

All Articles

Google Research: Fast, accurate identification of 100,000 categories of objects on a single machine.

More articles: