For the past few years, the phrase “deep learning” has been appearing too often in the media. Various magazines like
KDnuggets and
DigitalTrends try not to miss the news from this sphere and talk about popular frameworks and libraries.
Even popular publications like
The NY Times and
Forbes tend to write regularly about what scientists and developers from the field of deep learning are doing. And interest in depth learning is still not fading. Today we will talk about what deep learning is capable of now, and what scenario it will develop in the future.

')
/ photo xdxd_vs_xdxd CCA few words about deep learning, neural networks and AI
What is the difference between the depth learning algorithm and the usual neural network? According to
Patrick Hall , lead data researcher at SAS, the most obvious difference is that there are more hidden layers in the neural network used in depth learning. These layers are between the first, or input, and the last, output, layer of neurons. At the same time, it is not at all necessary
to connect all neurons at different levels to each other.
The distinction between deep learning and artificial intelligence is not so straightforward. For example,
Pedro Domingos , a professor at the University of Washington, agrees with the opinion that in-depth training acts as a hyponym for the term “machine learning,” which in turn is a hyponym for artificial intelligence. Domingos says that in practice their areas of application intersect quite rarely.
However, there is another opinion.
Hugo Larochelle , a professor at the University of Sherbrooke, is confident that these concepts are almost unrelated. Hugo remarks that AI focuses on the goal, and in-depth training focuses on a
specific technology or methodology necessary for machine learning. Therefore, hereinafter, speaking of achievements in the field of AI (such as AlphaGo, for example), we will keep in mind that such developments use depth learning algorithms - but along with other developments from the field of AI in general and machine learning in particular [as rightly
notes Pedro Domingos].
From the “deep neural network” to deep learning
Deep neural networks appeared quite a long time ago, back in the
1980s . So why deep learning began to actively develop only in the 21st century? Representations in the neural network are created in layers, so it was logical to assume that more layers would allow the network to learn better. But a big role is played by the network training method. Previously, for depth learning
the same algorithms were used as for learning artificial neural networks - the reverse encryption method. Such a method could effectively train only the last layers of the network, as a result of which the process was extremely long and the hidden layers of the deep neural network did not actually “work”.
Only in 2006,
three independent groups of scientists were able to develop ways to overcome difficulties. Jeffrey Hinton was able to pre-train the network using
the Boltzmann machine , teaching each layer separately. To solve image recognition problems, Jan LeCan suggested using a
convolutional neural network consisting of convolutional layers and subsample layers.
The cascade auto-encoder, developed by Yoshua Bengio, also enabled all layers to be used in a deep neural network.
Projects that "see" and "hear"
Today, in-depth training is used in completely different fields, but perhaps most of the
examples of use lie in the field of image processing. The face recognition function has been around for a long time, but as they say, there is no limit to perfection.
The developers of the OpenFace service are confident that the problem has not yet been solved, because the recognition accuracy can be improved. And these are not just words, OpenFace is able to distinguish even similar people. Details about the work of the program have already been written in
this article. In-depth training will also help when working with black and white files, which
Colornet uses to automatically colorize.
In addition, deep networks are now able to recognize human emotions. And along with the ability to track the use of the company's logo in the photos and analysis of the accompanying text, we get a
powerful marketing tool . Similar services are developing, for example, IBM. The tool allows you to evaluate the authors of texts when searching for bloggers for cooperation and advertising.
NeuralTalk can describe images with a few sentences. The base of the program loads a set of images and 5 sentences describing each of them. At the learning stage, the algorithm learns to predict bids based on a keyword using the previous context. And at the stage of prediction, Jordan’s neural network is already creating sentences describing pictures.
Today there are many applications that can solve different problems in working with audio. For example, the
Magenta application developed by the Google team can create music. But most applications focus on speech recognition. The
Google Voice Internet service is able to transcribe voice mail and has SMS management functions, while researchers used existing voice messages to train deep networks.
Projects in the "conversational genre"
According to such scholars as
Noam Chomsky , it is impossible to teach a computer to fully understand speech and to conduct a conscious dialogue, because even the mechanism of human speech is not fully understood. Attempts to teach cars to speak began in 1968, when Terry Vinograd created the
SHRDLU program. She knew how to recognize parts of speech, describe objects, answer questions, even had a small memory. But attempts to expand the vocabulary of the machine led to the fact that it became impossible to control the application of the rules.
But today, with the help of Google’s in-depth training in the face of developer
Kuok Le, he has stepped far forward. His designs are able to respond to emails in Gmail and even help Google technical support specialists. And the
Cleverbot program studied dialogues from 18,900 films. Therefore, she can even answer questions about the meaning of life. So, the bot believes that the meaning of life is to serve the good. However, scientists are once again faced with the fact that artificial intelligence only mimics understanding and
has no idea about reality . The program perceives speech only as a combination of certain characters.
Language learning can help in translation. Google has long been engaged in improving the quality of translation in its service. But how much can machine translation be closer to the ideal, if a person cannot always correctly understand the meaning of a statement? Ray Kurzweil proposes to graphically represent the semantic meaning of words in a language to solve this problem. The process is quite time-consuming: in a special directory of
Knowledge Graph , created at Google, scientists have downloaded data on almost 700 million topics, places, people, between which almost a billion different connections were made. All this is aimed at improving the quality of translation and the perception of the language by artificial intelligence.
The idea of representing the language by graphic and / or mathematical methods is not new. Back in the 80s, scientists had the task to present a language in a format with which a neural network could work. As a result, a variant was proposed to represent words in the form of mathematical vectors, which made it possible to accurately determine the semantic proximity of different words (for example, the words “boat” and “water” should be close to each other in vector space). Today’s Google research, which modern researchers call not “vectors of individual words”, but “vectors of ideas”, is based on these studies.
Deep learning and health care
Today, in-depth training penetrates even into the sphere of public health and helps to monitor the condition of patients not worse than doctors. For example, the
Darmut-Hitchcock Medical Center in the United States uses the specialized Microsoft
ImagineCare service, which allows physicians to detect subtle changes in the patients' condition. Algorithms obtain data on weight changes, monitor patient pressure, and can even recognize an emotional state based on an analysis of telephone conversations.
Deep learning is also used in pharmaceuticals. Today, molecular therapy is used to treat various types of cancer. But in order to create an effective and safe medicine, it is necessary to identify active molecules that would act only on a given target, thus avoiding side effects. The search for such molecules can be performed using in-depth training (a description of the project conducted jointly by scientists from universities in Austria, Belgium and the R & D department of Johnson & Johnson is in
this scientific material).
Does the algorithm have intuition?
How deep is deep learning really?
AlphaGo developers can answer this question. This algorithm is not able to speak, can not recognize emotions. But he is able to beat anyone in a board game. At first glance, there is nothing special. Almost 20 years ago, a computer developed by IBM, for the first time
beat a man in chess. But AlphaGo is another matter. Board game Go appeared in ancient China. The beginning is something like chess - opponents play a checkered board, black pieces against whites. But this is where the similarities end, because the figures are small pebbles, and the goal of the game is to surround the opponent's pebbles with their own.
But the main difference is that there are no known winning combinations in advance, it is impossible to think a few moves ahead. The car can not be programmed to win, because it is impossible to build a winning strategy in advance. This is where deep learning comes into play. Instead of programming certain moves, AlphaGo
analyzed hundreds of thousands of games and played a million games with itself. Artificial intelligence can be trained in practice and perform complex tasks, acquiring what a person would call "an intuitive understanding of a winning strategy."
Cars won't take over the world
Despite the staggering successes of AlphaGo, artificial intelligence is far from enslaving the human race. Machines have learned a kind of "intuitive thinking", processing a huge array of data, but, according
to Fey-Fey Lee, head of the Stanford laboratory of artificial intelligence, abstract and creative thinking is not available to them.
Despite some progress in image recognition, a computer may confuse a traffic sign with a refrigerator. Together with his colleagues, Lee makes a database of images with their detailed description and a large number of tags that will allow the computer to get more information about real objects.
According to Lee, this approach — learning based on a photo and describing it in detail — is similar to how children learn, associating words with objects, relationships, and actions. Of course, this analogy is rather rough - for a child to understand the interconnections of objects in the real world, it is not necessary to describe each object and its environment in detail.
Professor
Josh Tenenbaum , a student of cognitive science at MIT, notes that the algorithm of knowledge of the world and learning from a computer is very different from the process of knowledge in humans; despite its size, artificial neural networks can not be compared with the device of biological networks. Thus, the ability to speak is formed in a person very early and is based on the visual perception of the world, the possession of the musculoskeletal system. Tenenbaum is
sure that it is not possible to teach machines to complete thinking without imitating human speech and the psychological component.
Fei-Fei Lee agrees with this opinion. According to the scientist, the current level of work with artificial intelligence will not allow bringing it closer to human intelligence - at least due to the fact that people have emotional and social intelligence. Therefore, the seizure of the world by cars should be postponed for at least another couple of decades.
PS Additional reading: Our IaaS digest - 30 materials on the applicability of cloud technologies.