
A few days ago there was an article on Habré
What do neural networks hide? . She is a free retelling of the English article
The Flaw Lurking In Every Deep Neural Net , and she in turn talks about a specific study of some properties of neural networks (
Intriguing properties of neural networks ).
In the article describing the study, the authors chose a somewhat sensational approach to the presentation of the material and wrote the text in the spirit of “a serious problem was found in neural networks” and “we cannot trust neural networks in security-related problems”. Many people shared the link to the post on Habré among my acquaintances; several discussions on this topic started on Facebook at once. At the same time, I got the impression that in two retellings, some of the information from the initial research was lost, plus many questions arose related to neural networks that were not considered in the original text. It seems to me that there is a need to describe in more detail what was done in the study, and at the same time try to answer the initial questions. The Facebook format for such long texts doesn’t fit at all, so I decided to try to format my thoughts in a post on Habré.
Content of the original article
The original article titled “The Intriguing Properties of Neural Networks” was written by a group of seven scientists, three of whom work in the neural network research department at Google. The article discusses two non-obvious properties of neural networks:
- It is believed that if for a specific neuron at the deep level of the neural network, the source images are selected so that this particular neuron is activated, then the selected images will have some common semantic feature. Scientists have shown that the same statement is true if we consider not the activation of a single neuron, but a linear combination of the outputs of several neurons.
- For each element of the neural network training set, you can choose a very similar to it visually example, which will be classified incorrectly - the researchers call it the blind spots of the network.
Let's try to figure out in more detail what these two properties mean.
')
Value of specific neurons
The first property will try to disassemble quickly.
There is an assumption that is popular with fans of neural networks, which is that the neural network inside it parses the original data into separate clear properties and at deep levels of the neural network, each neuron is responsible for some specific property of the original object.
This statement is usually checked by visual inspection:
- A neuron in a trained network is selected.
- Images from a test set that activate this neuron are selected.
- The selected images are viewed by a person and it is concluded that all these images have some common property.
What the researchers did in the article: instead of examining individual neurons, they began to examine linear combinations of neurons and search for images that activate a particular combination, general semantic properties. The authors succeeded - they conclude from this that the data on the subject area in the neural network are not stored in specific neurons, but in the general network configuration.
Generally speaking, I don’t really want to seriously discuss this part of the article, because it relates more to the field of religion than to science.
The initial reasoning that specific neurons are responsible for specific signs is taken from a very abstract argument that the neural network should resemble the human brain in its work. Where does the statement that the selected signs should be understood by a person, I could not find anywhere else. Verification of this statement is a very strange task, because it is easy to find a common feature in a small collection of arbitrary images, if there is a desire. And to perform a statistically significant check on a large volume of images is impossible, since the process can not be automated. As a result, we got a logical result: if it is possible to find common features on one set of images, then you can do the same on any other.

An example of images with the same property from the original article.
At the same time, the general conclusion in this part of the article seems to be logical - in the neural network, knowledge about the subject area is rather contained in the entire neural network architecture and parameters of its neurons, rather than in each particular neuron separately.
Blind net spots

The researchers conducted the following experiment - they set out to find objects that are incorrectly classified by the network and located as close as possible to the objects of the training set. For their search, the authors developed a special optimization algorithm, which departed from the original image in the direction of worsening neural network responses until the classification of the object broke.
The experiment resulted in the following:
- For any object of the training sample, there is always a picture that a person does not distinguish from the first one with his eyes, and the neural network on it breaks.
- Pictures with defects introduced will be poorly recognized by the neural network, even if it has to change the architecture or train it on another subset of the training set.
Actually, about these blind spots and there is basically talk, so let's try to answer the questions that appear at the same time. But to begin with, let's look at a few basic objections that appear to people reading the description of the study:
- “Very simple neural networks are used in the study, nobody uses them now” - no, the study used 6 different types of networks, from simple single-layer to large deep networks, all from other famous works of the last 2-3 years. Not all experiments were performed on all types of networks, but all the main conclusions in the article do not depend on the type of network.
- “Researchers use a neural network, which accepts raster images as input, rather than features that are not allocated to them - this is ineffective from the beginning” - the article in fact never clearly states what they send to the input of the neural network. At the same time, their neural networks show good quality on large image bases, therefore, it is difficult to blame them for the inefficiency of the original system.
- “Researchers took a very retrained neural network — naturally they got poor results outside the training set” —no, the results they give show that the networks they trained were not retrained. In particular, the article is the result of the network on the original sample with the added random noise, on which there is no drop in quality. With a retrained system, there would be no such results.
- “The distortions that are added to the networks are very special and cannot be met in real life” - not quite. On the one hand, these distortions are not accidental, on the other hand, they change the picture very slightly, on average, an order of magnitude less than the random noise that is imperceptible to the human eye - there are corresponding numbers in the article. So, I would not argue that such distortions cannot be obtained in reality - the probability of this is small, but such possibilities cannot be excluded.
What is the real news here?
The fact that a neural network may have blind spots next to the objects of the training sample is not really big news. The fact is that in the neural network no one ever promised local accuracy.
There are classification methods (for example,
Support Vector Machines ), which, at the heart of their training, put the maximum separation of the objects of the training sample from the boundaries of class changes. In neural networks, there are no requirements of this kind; moreover, due to the complexity of neural networks, the resulting separation of the original set usually defies normal interpretation and investigation. Therefore, the fact that in networks one can find areas of local instability is not news, but confirmation of a fact that was already well known.
What is really new here is that the distortions that lead to errors retain their properties when switching to a different network architecture and changing the training set. This is indeed a very unexpected discovery, and I hope that the authors in the following papers will find an explanation for it.
Is neural network a dead end?

No, neural networks are not a dead end. This is a very powerful and powerful tool that solves a certain set of very specific problems.
The popularity of neural networks is based on two ideas:
- The Rosenblatt perceptron convergence theorem - for any training sample you can choose the architecture and weights of a neural network with one inner layer, so that the training sample is classified with 100% accuracy.
- Almost all the processes in the training of the neural network (recently including the selection of the architecture) are fully automated.
Therefore, a neural network is a means of quickly obtaining acceptable solutions for very complex recognition problems. Nobody ever promised anything to neural networks (although there have been many attempts). The key words here are “fast” and “difficult tasks”:
- If you want to learn how to consistently distinguish between cats and dogs on YouTube for a year of work, then besides neural networks, you now have no tools of comparable quality and convenience - inventing signs for simpler classifiers and setting them up will take much longer. But it will have to be tolerated that the black box of the neural network will sometimes make strange errors from a human point of view that will be difficult to correct.
- And if you want to recognize the text or to distinguish positive feedback from negative ones, take a better classifier easier - you will have much more control over what is happening, although it may take some time to get the first results.
Can you trust neural networks?
The main conclusion of the article discussing the original research was: “Until this happens, we cannot rely on neural networks where safety is crucial ...”. Then, in separate discussions, Google Car often pop up, for some reason (apparently because of the place of work of the authors and the picture of the car in the article).
In fact, neural networks can be trusted, and for this there are several reasons:
- The user (and not the researcher) of the neural network is important not where exactly it is mistaken, but how often. Believe me, you will absolutely not care, your automatic car did not recognize the truck, which was in its training base, or one that she had not seen before. The whole study is devoted to the search for errors in specific areas next to the training sample, while the overall quality of the work of neural networks (and ways to evaluate it) is not questioned.
- Any recognition system never works at 100%, there are always errors in it. One of the first principles that robotics will recognize is that you should never take action based on one individual sensor indicator, you always need to take a floating window of values ​​and throw out the freaks from there. For any critical system, this is also true - in any real task there is always a flow of data, even if at some point the system failed, the neighboring data will correct the situation.

So, neural networks in any critical system should be treated as another type of sensor, which generally gives the correct data, but sometimes makes mistakes and needs to be put to its errors.
What is important in this article?
It would seem that if no great revelations were found in the article, why did they write it at all?
In my opinion, there is one main result in the article - it is a well-thought-out method for a noticeable increase in the quality of the neural network during training. Often, when learning recognition systems, a standard trick is used, when for training, in addition to the original objects of the training set, the same objects with added noise are used.
The authors of the article showed that, instead, it is possible to use objects with distortions that lead to neural network errors and thus eliminate errors on these distortions, and at the same time improve the quality of work of the entire network on the test set. This is an important result for working with neural networks.
In the end, I can only recommend not to read articles with “sensational” headlines, but it is better to find primary sources and read them - everything is much more interesting there.