📜 ⬆️ ⬇️

Russian neural network chatbot

About chatbot using neural networks, I already wrote some time ago. Today I will talk about how I tried to make a full-scale Russian version.



Learning interactive systems have recently gained unexpected popularity. Unfortunately, all that has been done in the framework of neural network dialogue systems has been done for the English language. But today we will fill this gap and teach the model to speak Russian.
')


Method
I decided to start by rejecting the generation of the text word for word. This is cool, but not as useful as it seems and is especially difficult for the Russian language with its large number of different word forms. Instead, I decided to go by picking the right answer from a large base. Those. the task is to create a neural network that determines whether the proposal is the appropriate answer, given the context of the conversation, or not.

Why is that:
- You don’t need a big softmax layer to select words, which means we can allocate more resources to the neural network for the actual text analysis task.
- The resulting matching model is suitable for different purposes, it is theoretically possible to force the chatbot to talk on various special topics, simply by downloading a new database of texts, without a new training. This is useful in practice.
- You can make a model that works quickly and can actually communicate with many users simultaneously without multiple GPUs on the server.

What for
In general, general thematic dialogue systems are useful, for example, in online consultants, so that the consultant can talk on topics unrelated to the main task, in games, and in a number of other tasks.

Why neural networks?
Is it possible to approach the task by a more classical method? Download the answer set in the database and search the full-text index for the previous phrase? The answer is, you can do it, but the result is not very good. Here we assume:

H: hello!
K: Greg, Maria, this is Ali ...
H: how are you?
To: from the next room received a complaint about the noise, monsieur
Q: what is your name?
K: thanks, thank you.
Q: how old are you?
K: Do you live somewhere near ...?

A search on a large base of answers gives many results, but their relevance is small, so the dialogues are of rather low quality. This is where the neural network should help us - with its help we will sort out the good answers from the bad ones.

Where to get the data for training:
The most painful for many question. Here and here, people have taken a base of subtitles for movies. Such a base is also for the Russian language, though smaller in size. But the big trouble of this base is that there are a lot of monologues, different garbage in it, and it’s difficult to separate the dialogues from each other in general.

Therefore, I decided to go the other way and, in addition to subtitles, to collect dialogues from publicly available books. Novice writers and authors of all sorts of fan fiction created just an incredible amount of information, so it's a sin not to use it. Of course, there is a lot of any nonsense. In the course of work, I had to inevitably read this, and my head was puffy from the long conversations of some Sergei and Sailor Moon (to whom it had the same idea!). But in general, this is a better base than subtitles, although gathering it is not so easy, you need to spend time.

Architecture
There is plenty of room for imagination. I started with simple options and moved on to complex ones in order to understand the real benefits of various bells and whistles. In the articles, people usually try to make cool and attach some new gadget, and due to the fact that the area is new, then the benefits of this are not always obvious.

The simplest model takes a mix of a sequence of context and response word vectors and feeds it all onto an ordinary fully connected layer. Since the answer is of different lengths, we write it in the vector of length fixed, filling the “extra” place with zeros. It is believed that this is bad. We'll see.

The second option is different in that before the context is “mixed” and answered, they are given “their own” layer to form the presentation. Well, then there may be several layers of processing, the figure shows two.

The third option encodes a sequence of different lengths using a recurrent LSTM encoder. It is a lot slower and learns longer. But it seems like it should work better.


Fig.1. Neural Network Architecture

results
According to the ability to choose the correct answer from several proposed options (randomly taken answers), the best result was shown by model number 2, with five layers for analyzing data on 700 neurons in each (+1 layer of representation of input data, also from 700 neurons). To train such a deep network, I had to use Highway (Highway) layers, which allow information to spread from the lower layers to the upper layers easier, and also wait a few days while the model was calculated on the GPU. Model number 1 was obviously under-educated, but number 3 thought for a long time, and also did not get to the results of model number 2 by a few percent. Which reminds us that, although the simplest models are not suitable for large amounts of data, you should never multiply entities beyond necessity.

The resulting system has a base of 30 million pairs of replicas, and the neural network has about 3 million trained parameters.

What kind of dialogues can be obtained as a result?

D: Hi
K: hi.
W: How are you?
K: I don't know.
W: What is your name?
K: my name is Unis.
W: How old are you?
K: Fifteen.
Q: Where do you live?
K: At home.
Q: do you study at school?
K: No
W: and where do you study?
K: in the Bundeswehr.
Q: Do you like music?
K: Yes, I love music.
Q: What kind of music do you like?
K: I mainly listen to r & b, funk.
D: Do you believe in god?
K: I still believe in God, old man
W: what is moral?
K: two people together - isn't that moral?
W: what color is the sea?
To: the colors of the ocean after the storm.


Good enough in my opinion. Here is another example:

Ch: Where does hippo live?
K: Lived
Ch: and what, now does not live?
K: I don't know.
H: then where did he live before?
K: at the park, on the Maple Street.
Ch: where did he go?
K: where is he gone?
Q: I ask you, where is he gone?
K: I do not understand why I am again in the hospital.
H: because you are sick
K: I just want to help people.
Ch: Then tell me where the hippo is!
K: why do you want to know this so much?
H: just answer me
Q: Do you want me to help you say goodbye to your relatives?
W: Is it a threat?
K: Observation
Ch: So where is the hippo?
K: he is in Hawaii, but the raj rules from here
Q: Who is a raj?
K: not met with him?
D: No, never
K: Once there was a husband and a wife, only the husband disappeared every night.

How can you talk to this bot live?
I tried to make an android application that communicates with the server where the bot is located. The application is raw, is in the testing stage, to install it you need to join the group of beta testers:
plus.google.com/u/0/communities/103302070341792486151

After which you can install it with:
play.google.com/apps/testing/mindy.bot

PS
currently the function of this application is research. does not take money, does not show advertising. While the application uses a simplified model, to reduce the load on the server.

Pps
If the model tries to communicate in English, just answer it in Russian, and she will correct her mistake.



Findings:
It turned out funny. But you can still see the low quality of the training data. For the development of the model it would be useful to collect more real-world dialogues. Nevertheless, the results are encouraging, since in order to obtain fairly reasonable answers, it was not necessary to manually create any templates and rules for selecting answers.

Source: https://habr.com/ru/post/280268/


All Articles