DZ Online Tech: ABBYY. How not to get lost in neural networks?

We (DZ Systems) have been shooting a series of programs on digital transformation for the second year. Usually, these are “pro-business” broadcasts, mostly focused on top managers and designed to help understand the business value of what is called digital transformation.

But this year we also remove the second “line” of broadcasts - DZ Online Tech, now focused on the technological aspects of the same topic. In short - that is "under the hood."

Below is the decoding of the next such program, in which Ivan Yamshchikov and I (Yandex, ABBYY, and indeed a high-class professional) talk about the use of neural networks in the modern world.
')
If interested, you can see the transfer itself .

And for those who like to read - the decoding below:

- Hello. Our today's guest is Ivan Yamshchikov from ABBYY, who
will tell us how modern artificial intelligence works.
In relation to AI, there are conditionally two positions: people who say: “We do not want
understand anything about the essence of what is happening in the system. We have statistical methods
who themselves removed the model from the outer life. This model will be correct, it will be
feel all the semantic subtleties. " And there are people who say, "No, so
can not. We understand what is happening. We have to put this understanding into the system.
artificial intelligence, and then it will be more valuable, better and better. " This battle
does it have any criteria within itself?

- Let me explain in a less philosophical language. There are people who say, “We need more
stronger, more efficient algorithms and large amounts of data. We will take more
productive algorithm, and on a larger scale it will give us a higher quality target
metrics no matter what. " I don't know people who say they don't need data or
algorithms. Therefore, the second group of people, in my opinion, has the following approach: “Besides
Of all this, we would be nice to still human marking in one form or another, some kind of
expert knowledge, add on top. "

There’s a joke on Google that is often quoted: the fewer linguists working on
the better the final quality will be. This joke is probably justified by the practice.
bulk B2C services. But when we talk about B2B in the context of narrow grocery
solutions, in the context of a very clearly defined task and a well-defined field,
expert knowledge begins to play a rather important role. We combine inside ABBYY and
ontological models that linguists build and pure machine learning approaches.

- I want to give an example: we did a project for Mosvodokanal. There was such a task:
Mosvodokanal has a complex network, it somehow works and somehow behaves. And I want something
understand something about it, it is desirable to predict accidents, to feel when something is wrong
going on.

- You made a monitoring system.

- Yes, we did some kind of behavior analysis system, which was supposed to say: “In this
The corner is something wrong. " We really can't say if it's an accident or just a fluctuation.
behaviors because they are physically indistinguishable ...

- I did about the same system for monitoring traffic.

- A very similar theme. During the project, we fought with engineers who said:
“Listen, you are doing garbage. It is necessary to measure all the pipes, their external diameters and
internal, then make information about the smoothness of the walls. And then count
hydrodynamic model, and it will show everything. " And we said: “Don't.
Give us the data from the sensors, we will drive them into the stat model, and she, without knowing anything about
physics, will still work, because it will take out the real behavior ". It's straight
the ultimate case of what we're talking about. On the one hand, this is the ultimate knowledge.
physics of the work of the phenomenon, which we pack semantically directly, and the second
the side is the ultimate misunderstanding. We do not really understand how it works.
hydrodynamics - we didn’t even want to understand this.

- Arrogance is very peculiar to people who know the statistics well. As Mark said
Twain: "There are three kinds of lies - lies, shameless lies and statistics."

- We eventually won them for one very simple reason: to collect information about all
these pipes is impossible. But, on the other hand, some depth of knowledge of the subject
area can not help.

- People who are guides of this knowledge, believe that it is true, because it
their area of expertise. But at the same time, in fact, we understand about natural language, with
computer science is much smaller than we would like because many
terms and categories are not defined mathematically, but intuitively. This leads to the fact that
those people who completely come from the computer sciences, there is an understandable
distrust of people who come from the side of linguistics and vice versa. In abbyy this
is solved by the fact that both of them work on the product, are responsible for different parts, and you
It is possible to measure how much quality this adds to you and this. This is the way
tests and experiments.

- This is also a big trouble. We all know that there is a local optimization problem.

- Of course. This is retraining. But just very often things related to common
linguistic approaches allow to fight retraining. Because linguists often try to create some general rule, and then there is a great and beautiful story.
about exceptions. Anyone who read Rosenthal's book about the Russian language at school is perplexed:
my god, what do philologists do? They call the rules what really
is an…

- A set of exceptions.

- But in essence, this is exactly the same story about the error on the test. If you look at it with
in terms of machine learning, a very large number of linguistic rules
cover a fairly large number of examples and leave some error on
test data. If you take these rules and apply to the data that your model
never seen a model in this place is wrong. But many linguistic heuristics
allow you to protect yourself from retraining.

- I heard you correctly, that if we take a book on the Russian language and chase it
in the model, then, by extrapolating these rules, will the model necessarily make a mistake?

- Of course. Exactly. Any rigid rules will always lead to errors, because,
unfortunately or fortunately, artificial intelligence is much more flexible than some kind of set
simple rules.

- This is also due to the fact that when we speak about the formalization of the rules of the natural
language, we in this place are inevitably engaged in an unsolvable task. Depth of this
The process is endless.

- This is a philosophical question. At the machine level, the depth seems not infinite, but there is
An interesting article, in my opinion, in 2015. A brief digression: there is such a section of mathematics,
which is called information theory. In particular, it is used in coding theory.
In Russia, Kolmogorov and his companions did it, in the USA - Shannon. First of all, his
came up with in the context of cryptography.

In information theory there is such a thing as “general information”. If at all on the fingers
say: imagine how you correlate the meanings of a word in the text in
depending on the distance between them. Imagine this metric. If i have here
it says "Petya", then n-words, and then the word "ate." In fact, the words "ate" and "Peter"
correlate, despite the fact that the word "eaten" can be quite far from "Petit".
If we statistically construct a model of these correlations, it turns out that, as a function of
distances this general information in texts decreases rather slowly - not polynomially, but
slower Roughly speaking, in natural language texts there is a correlation between the words,
far apart.

Approximately the same is observed in the "texts" of DNA: our nucleotides also correlate on
relatively long distance. In particular, this kind of system tries to describe
theory of complexity, etc. The whole story about the butterfly effect - it’s about that, that you have a small
a deviation in one place can lead to some significant changes far.
Natural language is described by this kind of dependencies. And now, let's say, LSTM (Long
Short-Term Memory Network) is considered the most advanced, in terms of memory, neural
network, which is used to analyze the language just to ensure that these far-reaching ones
from the other correlation to catch. Here it is, infection, memory decreases faster than necessary.
This is a big research topic. In particular, we at Max Planck Institute are trying to
to do There is an interesting result from graph theory, which says that if you have cycles in your network, then it should have more memory. We know that we have some
these are characteristic frequencies, and there are cycles in the brain. A signal runs through them, neurons stimulate
each other in a circle with a given frequency. In artificial neural networks, we are still
can not play.

- Why can not we? Add cycles! Please pour the bag out of the bag.

- I'll tell you. How do we learn neural networks? With back propagation
mistakes. Reverse error propagation is when you have a neural direct pass.
network and reverse.

- As soon as there are cycles, problems immediately begin with the looping of this very
mistakes?

- Yes! What to do? How to make back propagation?
Friends, make back propagation on the cycle, and you will make a powerful development breakthrough.
artificial intelligence. I tell everyone: you need to do this, it's very cool. This is real
difficult task.

- And if these people who are engaged in the brain understand how it works in the brain, it’s
can I put? It would seem that today the anthropomorphism of what we do,
very low.

- Come this way: what is common between ImageNet from Google and clam? Turns out more or less
everything. Initially, the mollusk was disassembled and saw that its visual fields are arranged as
modern convolutional networks, if you like. Sometime in the 50s Rosenblatt and his comrades
disassembled, and invented the perceptron, in many ways looking at the living and very simple things. They
thought that we now understand how primitive organisms work, and then we will begin to build
complex.

“Why didn't they do it?” In those days it was believed that the perceptron is not alive.
Power is not enough?

- There are a lot of problems. Come this way: there were several AI-winters, that is, people every time
coming up with some kind of new breakthrough in the field of artificial intelligence, and they think: “Everything,
tomorrow jarvis will be my best friend and will communicate with me better than my
psychoanalyst". And then something happens, like that of Jarvis. I really love this joke
from the movie "Iron Man", where everything goes well first, and then you pronounce some
some cranberries So Jarvis tells the main character when he asks if he debugged
all systems.

- What does it look like in practice? Where are the restrictions, if you take the application
the side?

- First, now even the most powerful things that we collect artificially, strongly
smaller than our brain just in order of magnitude.
And the second point is related to the fact that we do not understand why they work. This is a separate
large area of research.

- It would seem, are already starting to tell.

- First, we found out what works, then began to understand how it works.
There is a separate direction about how to visualize the work of a neural network. There is a separate
mathematical formalism, called Information Decomposition, which attempts to describe
how information is decomposed into different streams within the network in order to understand that
what layers going on. With images, it starts to turn out and it turns out the last
some years. The lyrics are more complicated.

Why we do not understand how it works? Because we do not have mathematical good
a result that would explain everything to us. We do not have a proven theorem that would say
that it works. Because, for example, at the level of the convolutional neural network: you have
picture, on her doggy is drawn. This picture has so many pixels, each
A pixel has so many values. If you combinatorially try to count the number
combinations of pixels that still add up to a dog - you get tired. Have
you, in theory, have a fairly large dimension of space and a lot of options
solutions. Moreover, if you start to train a convolutional neural network with the number
the parameters are much smaller than the number of potential images of the dog you train
its relatively simple way. She tells you at the exit whether it's a dog or not a dog, but
you tell her yes or no. Suddenly, after a while, it turns out that she can
to give very good quality in the pictures of dogs that she did not see.

- Is the degree of generalization unexpectedly high?

- Yes, this is an unexpected degree of generalization. Everyone has already come to terms with the fact that it works, everything
apply it everywhere, but strictly grounded mathematical result, which would
explained why such a degree of generalization is possible, no. And there are several hypotheses, one
of which seems to me the most interesting. It's not about what's happening in everyone.
neuron, and how you connect these neurons. The structure of the network itself, apparently, you
allows you to achieve a certain generalization at a certain level. This is interesting
hypothesis, because if it is correct, then it is well associated with neurophysiology, and then
you can take and try something else from neurophysiology. There are some other
assumptions, but this is a question: people are still writing kilograms of articles a month about how
works.

- There is a feeling that the Python language is an AI language. Is it a coincidence or not? Why
Python, because there are a lot of Basic.

- Because quite a large part of the work of Data scientist is now
prototyping. It is convenient to prototype in Python, it was created as a language for
prototyping, not as a language for industrial solutions. We have people at ABBYY
who prototype in Python, and there are people who write the final modelki in C ++,
which are implemented. The Python community is actively using this wave and there is positive feedback. There is a demand, i.e. data science is increasingly being done on
Python, respectively, the community begins to be saturated with people who are trying
develop the language itself. All this is connected.

- When we talk about prototyping, it involves running a large
the number of tests, experiments. Here a computational problem arises
resources.

- The computational resources themselves became cheaper, cloud solutions appeared that made
their affordable. To put it bluntly, a student with internet access can be briefly reasonable
money to get a fairly powerful server, in order to run something on it and some
get the model, and screw the AI, for example, to the coffee maker. Many factors have come together that
drive each other. At the expense of the Internet, the threshold for entering programming and
technology in general. There was a lot of relatively cheap iron, it also went to
cloud. You can buy time, not iron. There was a lot of live data.

For example, in the 80s, people involved in data science had a fundamental problem: where
take the data? And now for a heap of applied tasks it is clear where to get them.
Key elements for machine learning: the algorithm, data and hardware on which this
the algorithm is working. All these three parameters have become more accessible. In this case, the algorithm became
more affordable, in the sense that good quality box solutions have appeared. They
implemented in a language with an intuitively simple syntax, low
level of entry and a bunch of educational resources.

- The guys from Microsoft told the story of how a small group took a neural network.
and the business model of a small, uncomplicated company that delivered bread. And from the sticks and
the ropes turned out to build a model that optimized this business and gave + 10%
to efficiency. Are such pictures an exception or a rule?

- This is more of a rule. In my opinion, Kelly (a famous futurist) has a good lecture about
the future of AI, in which he says that in 20 years we will be treated the same way as we
We treat those who were pioneers of the Internet. We are now saying, "How easy it was for you in the 90
e years to do internet business. " And after 20 years, they will also treat us, saying: “How
it was easy for you to do business with AI. I took everything, added AI to it and became the leader in this
categories". At least this is Kelly's opinion, and I share it.

- You and I experienced a certain amount of what is happening in the industry, and we saw this
the picture, when what is now commodity, was once the state of art. Based on his
of experience, can we advise people who are now part of AI technology where
and how should they move?

- I have two tips that seem reasonable to me. First, do not do one in the corner.
Find a couple of like-minded people, work with each other and show out
what you do in the wider community. And secondly, think less about
specific models that you will use, because they will change,
to become better. And if you are not at the level to improve them yourself, you
you need to know less about exactly how this model works and why it is better. You need
think more about the problem you are solving.

Source: https://habr.com/ru/post/418935/

All Articles

DZ Online Tech: ABBYY. How not to get lost in neural networks?

More articles: