Machine Learning: State of the art

In 2015, a new word entered the world of art: “ inceptionism ”. The machines learned to redraw the paintings, and in 2016 Prisma downloaded millions of people. Today we will talk about art, machine learning and artificial intelligence with Ivan Yamshchikov, author of the acclaimed " Neural Defense ".

')

Meet: Ivan Yamshchikov. Received a PhD in Applied Mathematics at Brandenburg University of Technology (Cottbus, Germany). At the moment - a researcher at the Max Planck Institute (Leipzig, Germany) and an analyst / consultant at Yandex.

- Neurona, Neural Defense and Pianola - how did such a serious passion for creative AI start? At what point did you decide to really deal with this topic seriously?

Ivan Yamshchikov: I would not call it a serious passion for creative artificial intelligence. Just once Alexey Tikhonov shared his ideas on the neuro-poet, and in the summer of 2016 we decided together to record the album “Neural Defense”. Since then, it has become clear that the area is much wider, and now I am working on the topic of artificial intelligence on an ongoing basis.

This is an incredibly interesting topic and right now it is in motion: there have been several “winters” in the history of AI, periods of frustration from unwarranted and high expectations; now is the third period of extreme interest in AI, and it is possible that we will soon again face a similar problem. Despite this, there is a really qualitative leap in the work of many systems: machine translation, aggregation systems, autonomous systems.

- The idea to combine matmodels and music / painting is not new, but why did the approach shoot right now?

Ivan Yamshchikov: This is probably one of my favorite topics: when you played shooters in the 90s, you unconsciously helped develop the AI.

Video cards (GPU) were upgraded for the development of graphical interfaces and games, but at some point people figured out that they could be used for parallel computing, CUDA appeared. Initially, in the field of science in fluid dynamics, where a number of models can be well cheated on a graphics card, such a calculation on the CPU would be several times more expensive. A few years later it turned out that neural networks also perfectly parallelized and trained using graphic cards. And this boom in the development of scientific computing has allowed the creation of neural networks of such dimensions that were previously inaccessible.

Cloud computing also played a certain role: now you don’t even have to buy a GPU, you can rent it; in the same way you can rent the right amount of CPU. This reduced the threshold of entry, and in technology it is always like this: when the threshold is lower, the results appear much faster.

As for painting, the key article here is Neural Artistic Style , written by researchers from Tübingen. As a result of the experiments, it turned out that the signs responsible for the style (as drawn) gathered on one of the layers of the neural network, and on the other, the semantics (what is the content). From this article, the famous Prisma application was born.

And we decided to make music, because we love literature and poetry. And Egor Letov was chosen because we love him and we wanted to try to imitate his style. In general, these are purely aesthetic preferences.

In general, working with music is much more pragmatic than with text: when you work with a dictionary, it is based on one-hot encoding (all words are numbered, and the i-th word is the vector, where the i-th position stands 1, but not all the others - 0). After processing a set of documents, a very large dimension is obtained. Further, the dimension is artificially reduced using a number of methods, for example, word2vec ( https://ru.wikipedia.org/wiki/Word2vec ; https://habrahabr.ru/post/249215/ ).

One way or another, we are talking about a space of several hundreds, not three or four dimensions. Usually, it is difficult to work with a space of this dimension: some areas have a high density of data, while others, on the contrary, are too rarefied - the structure will be very complex. And if we are talking about music and taking notes, then each note is a combination of an octave and a note; 12 notes (with sharps / flatons) in an octave and 4-5 octaves. And from this point of view, this space has a much lower dimension.

And if you approach the melody as a whole, you can scale the data so that the parameter space is dense: there will be few lacunas. When we experimented with neural networks for different types of data, we found that this property of music allows us to understand better and faster whether the model to be taught is working or not, so this was a pragmatic solution.

- What is the best place to start a technical person with a creative AI, is there good resources, courses, lectures on this topic?

Ivan Yamshchikov: We will eat an elephant in parts. First, neural networks! = Artificial intelligence. On the other hand, NA is one of the most popular topics, and quite a lot of material is available on it. On AI and machine learning courses and materials too. We list the main Russian-speaking ones: the joint HSE and Yandex course, Vorontsov's machine learning course, the Vetrov course on Bayesian methods, the Lempitsky course in deep learning, English-language ones: courses in Udacity (including TensorFlow), and Coursera.

There are no courses on creative AI as such - the theme itself lies at the intersection of science and art; and most of the questions here, at the junction, are open.

What I really recommend to look at and what to spend time on is a course on machine learning (see above), including a deep one.

- Many say that modern machine learning methods just copy pieces from already created works and combine them according to the found canons, was there something that really surprised you in works created by AI?

Ivan Yamshchikov: In general, this is, of course, reasonable and justified criticism, but I have two counter-arguments to it: technical and philosophical.

Let's start with the technical. Previously, we did not know how to do this, but now we have learned. And the fact that we can now technically create such things is already a breakthrough. Perhaps, not from the point of view of art history, but from a purely technical point of view - exactly.

On the philosophical side: the postmodernist does the same. And if we live in the era of postmodernism, then virtually any author, in a sense, copies, imitates or is inspired by experience. And in general, if we consider the problem of learning (of course, there is not a fully formalized mat. Apparatus for everything), then this is a transformation of the flow of information into knowledge. Knowledge is filtered, organized and ranked information in a certain way. And if you look deeper, then the basis of any learning, including human, is experience combined and transformed. And it seems that something new has happened, but in reality it stems from a combination of experience obtained earlier.

As for the things that surprised. I have a favorite line from Neurona (Neurona): "The God who is always welcome in Iraq" is a completely unexpected line.

There is such a term in psychology: apofenia - the ability to find patterns where there are none. In this sense, the creativity of machines is now, of course, appealing to apophenia: the stronger this property is in a person, the more interesting is machine creativity to it.

- Continuing the previous question: in the match AlphaGo vs. Le Sedol AlphaGo played on the 5th line - a move that no person would have done (which caused a storm in the GO community) - what are the examples in the created works of something clearly not inherent in the human style?

Ivan Yamshchikov: A huge amount of data is available to a person: tactile and taste sensations, smells, and many others. And this is a great experience accessible to man, in a sense, determines consciousness. The machines of this size do not have and, accordingly, their being is much less diverse and interesting than that of a person. As a result, the texts written by the machine are radically different from those written by man.

The fundamental questions here are: have we understood the principle by which a person creates texts? And here in the scientific community there is no definite and clear answer. The task of generating discrete sequences, whether text or music, is an open question, right at the very frontier of scientific knowledge, and various people around the world are fighting over it.

- A number of technical experts call one of the problems of evaluating the work of creative AI the lack of objective criteria for the quality of work: how do people assess the quality of the generated music and the pictures drawn using neural networks?

Ivan Yamshchikov: I really like this question, and if you have ideas, come, we will write an article together. I am not kidding.

Now the working criterion for quality control is based on collective evaluation by people. The neural defense was heard by 400 thousand people in the first week, and on the basis of the distribution of ratings and comments, it is possible to assess whether people liked it very much and how similar it turned out.

In more detail, technically, we can consider two cases: training with a teacher and without. In the first case, we have answers - marked up by people or previously tested, which the algorithm is trying to navigate, while in the second they are not. If there are answers, then for each specific task you can enter a certain metric of similarity to the answer and objectively measure what happened. And if they are not, then it is not at all obvious how to introduce such a metric.

- It is extremely interesting, how exactly were the Neural Defense texts generated? Is it possible to intuitively talk about the mat. generator set?

Ivan Yamshchikov: Yandex employee Yuri Zelenkov has developed a series of poetic heuristics that evaluate rhythm and rhyme in Russian. We used a combination of these heuristics and the LSTM network (Long Short Term Memory: https://habrahabr.ru/company/wunderfund/blog/331310/ ), which read an array of Russian poetry: she was given a couple <verses, author>, and the array of data was all Russian poetry that we could find, that is, conditionally from Pushkin to the present day, including Russian rock and pop. However, even this amount of data was not enough, and we gave the machine to read each text in random order - so that each poem was read 10 times. This allowed us to significantly increase the amount of data and significantly improve the quality.

Next, we submit the author to the entrance and say: "Come on, like this author." And we filed at the entrance of Yegor Letov. I will tell you more about this at the SmartData 2017 conference, where I will reveal many details.

When we generated English texts for Neurona, poetic heuristics were no longer used. Lesha Tikhonov proposed to include the phonetics of words into the latent space of features that is formed inside the neural network, and the algorithm itself “understood” what can be rhymed and how.

- AI already plays poker and GO, redraws pictures and video, writes music and poems: what next? What is the next unconquered peak for creative AI?

Ivan Yamshchikov: There is already a short film based on the plot created by RNN. Unfortunately, she is rather mediocre. People still do not know how to "explain" the concept of a plot to a neural network.

But from the point of beautiful applications, everything is limited by the author’s imagination. Now it seems to me that the most promising is the possibility of interacting with the network, that is, creating objects that interact with the viewer / listener.

In a sense, computer games are the art of the future. Being in the game, you live the story differently, i.e. experience is individual. Such interactivity in art is the next step.

For example, when you listen to music, and it sounds like a live concert and you can interact with the singer / group / music. A simple example: Yandeks.Muzyka or Spotify can adjust the rhythm and music to the mood, or specifically select tracks for, for example, playing sports.

If you recall the live photos from Apple - this is, in fact, several options for the frame. Accordingly, it can be assumed that when a musician records an album, in a certain sense he will record several versions or variations of composition within certain limits. And then the track will be able to adapt to the mood of the listener, focusing on some external data. The analogy here is quite simple - if you sit down with friends and perform a song with a guitar, then depending on the mood, the same song will come out differently, but it will be the same song. I am sure that something similar can now be realized technologically and in music.

- Now one of the most popular topics for discussion is the work in a pair of machine + people, such games are held in chess and GO tournaments. Are there any interesting examples of working in a pair of people + machine in art?

Ivan Yamshchikov: In general, this is already happening: a person is already creating music on a computer. Periodically I discuss this topic with, so to speak, skeptics who are concerned that robots will replace people and draw apocalyptic scenarios. I try to calm everyone down and cite the following argument. We usually create a machine to do what we do badly. When you need to create a machine that digs the ground well, we do not create a huge man with a shovel, but an excavator who digs the ground much better than this huge man.

There is such a cognitive trap: when we talk about artificial intelligence, we think that it will look like a human - that is, like ours, only more!

For example, when at the beginning of the 19th century, science fiction writers tried to invent a future, there was a giant dirigible balloon in it, not a plane. In general, a person easily anticipates quantitative changes, but can hardly imagine qualitative ones. It is easy to imagine that everything will be faster, cheaper, more (less) ... But the leaps of technology are predicted badly. And it seems to me that the same thing is happening now with artificial intelligence.

But a qualitative (and not only quantitative) breakthrough is already happening in the applications and works related to understanding what a person wants: what he is looking for, is going to buy. And based on the generation of texts, we can make fundamentally different methods of communication with a person. A person will be able to use new tools as they become available. For example, a programmer will have smart assistants to write code, for an artist this can be a paint selection system, and for a composer a system that gives inspiration and helps to correctly convey emotions in a work.

If the topics of machine learning are close to you, just like us, we would like to draw your attention to a number of keynotes at the upcoming SmartData 2017 conference , which will be held on October 21 in St. Petersburg:

Recommender systems: from matrix expansions to in-depth depth learning (Mikhail Kamalov, Epam Systems)
Deep learning, probabilistic programming and metacomputation: the point of intersection (Alexey Potapov, ITMO)
Automatic search for contact information on the Internet (Alexander Sibiryakov, Scrapinghub)
Applied machine learning in e-commerce: scenarios and architectures of pilots and combat projects (Alexander Serbul, 1C-Bitrix)
Deep Learning: Recognizing scenes and sights on images (Andrei Boyarov, Mail.ru)

Source: https://habr.com/ru/post/338654/

All Articles

Machine Learning: State of the art

More articles: