⬆️ ⬇️

Team Jeffrey Hinton won the ImageNet computer vision competition with a twofold advantage

The ImageNet competition was held in October 2012 and was dedicated to the classification of objects in photographs. The competition required image recognition in 1000 categories.



The Hinton team used the methods of deep learning and convolutional neural networks , as well as the infrastructure created by Google under the guidance of Jeff Dean and Andrew Ng. In March 2013, Google invested in a Hinton startup based at the University of Toronto, thereby obtaining all the rights to the technology. Within six months, the photo search service photos.google.com was developed.



The service uses convolutional neural networks originally developed by Professor Yan Lecun in the late 1990s. Already then this technology allowed to solve problems of handwriting recognition reliably. Since then, the power of computers has significantly increased, and new algorithms for large-scale learning of neural networks have emerged.



As for the technical infrastructure, I partially described it in the article Formation of high-level features using a large-scale learning experiment without a teacher . For a detailed description, see article (pdf) , and I will limit myself to a few numbers. Due to the use of locally connected networks characteristic of two-dimensional image processing, it is possible to effectively use up to 32 computers with 16 cores in each, totally up to 512 cores, for training one large neural network. Due to the use of distributed algorithms for optimizing and replicating the learning parameters, the number of effectively working parallel processor cores can be increased to tens of thousands!

')

In particular, 16 million images of 100x100 pixels were used to train the network that won the ImageNet competition. The output layer of the neural network consisted of 21,000 logistic classifiers "one of all." The total number of optimized parameters (weights of the neural network) was 1.7 billion. For training, 81 machines were used - almost 1,300 cores.

The implementation of academic technologies acquired by Google less than a year ago, has allowed in the shortest possible time to develop an unsurpassed search service for unmarked images. Here are some interesting results:



Generalization


Despite the significant difference between the images in the training and test samples, the search engine copes well with the generalizations. For example, for learning the concept of “flower”, photographs of flowers taken with macro photography could be used, with an ideal composition including a single flower in the center of the frame. A trained network finds flowers in amateur photos with arbitrary composition and scale.



image

The image of the flower from the training set




image

The image on which the system found flowers




Multimodal Classes


The network was able to recognize classes of images that differ significantly in appearance. For example, the system includes both the exterior photo and the interior of the car as a class car. This is all the more surprising since in the output layer, in fact, linear classifiers are used that separate the multidimensional feature space.



Classification of abstract concepts


The system copes well with abstract or highly generalized classes, such as “dance”, “kiss”, “food”. This is interesting, because for such concepts, simple visual features, such as color, texture or shape, are not obvious.



image

image

Food is found on these images.




Meaningful mistakes


Unlike many systems of computer vision, when this system is mistaken, its errors seem to be quite reasonable. Such a mistake could well have been made by man - see, for example, the erroneous classification of a mollusk (snake) or a donkey (dog).



image

Banana slug mistakenly recognized as a snake




image

Donkey mistakenly recognized as a dog




Recognition of highly specialized classes


The system was able to recognize very specific classes, such as the types of colors (hibiscus, etc.). For a system capable of recognizing broad concepts such as Breaking Dawn, the classification of subtle attributes is amazing.



image

The system has determined that it is a polar bear ...




image

... and this is a grizzly bear

Source: https://habr.com/ru/post/183380/



All Articles