Earlier in the article “Negotiating a neuro-based interlocutor” I considered the use of direct-distribution neural networks for creating an interlocutor-robot. As a result of the experiments, it became clear that using such networks for generating texts is a bad idea. Thanks Roman_Kh , daiver19 , vladshow , that showed how to change the network and in which direction to move.
The next stage of testing is recurrent LSTM networks. As before, in the latest experiments with networks of direct distribution, the dictionary is created with the Word2Vec tool with a uniform distribution of words in the vector space. Each word is represented by a length vector .
Preparation for sequence generation
Offer Coding
Recurrent networks can generate sequences, so the appropriate encoding method is applicable. We will ask the network for a suggestion-question to word-by-word to generate an offer-response. In text form, the training base is stored as a set of “Question = Answer” sentences, for example: ')
1 HELLO = HELLO (2 words)
2 LON'T SEE DON'T SEE = HELLO FRIEND (5 words)
3 GOOD DAY = EXCELLENT DAY (4 words)
4 WHAT DAY IS TODAY = TODAY IS A GREAT DAY (6 words)
5 LET'S FRIEND = LET'S BE FRIENDS (5 words)
6 BE YOUR FRIEND = WELL WHEN MANY FRIENDS (7 words)
7 SEEING = SEEING (4 words)
The following service tags are used to control the generation of sequences, which are encoded with Word2Vec along with other words:
# GEN # - the end of the sentence-question, you can begin to generate an answer;
# BOS # - start generating response;
# EOS # - stop generating response.
For training the neural network, two matrices TrainX and TrainY are formed as follows. Each matrix has a size where - the number of sentences in the database ( in this example); - the largest number of words in the sentence + 3 (for # GEN #, # BOS #, # EOS #), in this example ; - the length of the word vector (50).
All sequences are reduced to the longest in the number of words. In this example, the longest is the sequence number 6, then all sentences are supplemented to seven words, the empty places at the end are filled with # EOS #:
t = 0 t = 1 t = 2 t = 3 t = 4 t = 5 t = 6 t = 7 t = 8 t = 9
TrainX [0] [t] = HELLO # GEN # # BOS # HELLO # EOS # # EOS # # EOS # # EOS # # EOS # # EOS #
TrainY [0] [t] = NULL # BOS # HELLO # EOS # # EOS # # EOS # # EOS # # EOS # # EOS # # EOS #
t = 0 t = 1 t = 2 t = 3 t = 4 t = 5 t = 6 t = 7 t = 8 t = 9 t = 10
TrainX [1] [t] = FOR A LONG TIME DID NOT SEE # BOS # HELLO EACH OTHER # EOS # # EOS # # EOS # # EOS # # EOS #
TrainY [1] [t] = NULL NULL NULL HELLO FRIEND # EOS # # EOS # # EOS # # EOS # # EOS # # EOS #
NULL - the network response does not matter, when encoding words is filled with zeros.
Network response generation
After learning to input the neural network, all words of the question and the tag # GEN # are given word by word. After that, we ask the network to predict the next word on the previously transmitted previous one, until the # EOS # tag is encountered. Eliminating service tags from the response, we get the answer.
Architecture testing
Now for experiments I use the Python language and the Keras library.
Option 1. Single LSTM cell, 65,450 parameters
The experiment involved:
Layer 1: LSTM with D neurons at the input and 2D at the output;
Layer 2: Direct propagation layer with D neurons.
After learning and memorizing 109 sentences, you can get these answers.
- HELLO
- HELLO
- WHAT IS YOUR NAME?
- ME NAME IS NETWORK
- HELLO NET
- HELLO
- HELLO, YOU ME KNOW?
- I AM JUST LEARNING
- DO YOU CALL OLEG?
- NICE TO MEET YOU
- DO YOU CALL THE NET?
- ME NAME IS NETWORK
- YOU ARE NICE?
- I THINK YOU
- YOU ANYTHING LIKE?
- I LIKE TO LISTEN
- WANT TO TALK?
- I WILL BE RADA OF INTERESTING TALK
- ABOUT WHAT?
- LET'S TALK ABOUT
Option 2. Two cells LSTM, 93 150 parameters
The experiment involved:
Layer 1: LSTM with D neurons at the input and 2D at the output;
Layer 2: LSTM with 2D neurons at the input and D at the output;
Ask the same questions:
- HELLO
- HELLO
- WHAT IS YOUR NAME?
- ME NAME IS NETWORK
- HELLO, NETWORK
- THIS IS A FRIEND
- HELLO, YOU ME KNOW?
- I AM JUST LEARNING
- DO YOU CALL OLEG?
- MY NAME IS
- DO YOU CALL THE NET?
- ME NAME IS NETWORK
- YOU ARE NICE?
- I THINK LEARN
- YOU ANYTHING LIKE?
- I LIKE TO LISTEN TO MUSIC
- WANT TO TALK?
- I WILL BE RADA OF INTERESTING TALK
- ABOUT WHAT?
- LET'S TALK ABOUT
Option 3. Three cells LSTM, 63 150 parameters
The experiment involved:
Layer 1: LSTM with D neurons at the input and D at the output;
Layer 2: LSTM with D neurons at the input and D at the output;
Layer 3: LSTM with D neurons at the input and D at the output.
And such a dialogue:
- Hello
- HELLO
- WHAT IS YOUR NAME?
- ME NAME IS NETWORK
- HELLO, NETWORK
- THIS IS FOR YOU
- HELLO, YOU ME KNOW?
- I AM JUST LEARNING
- DO YOU CALL OLEG?
- ME TO MEET
- DO YOU CALL THE NET?
- ME NAME IS NETWORK
- YOU ARE NICE?
- I THINK IN
- YOU ANYTHING LIKE?
- I LIKE TO LISTEN TO MUSIC
- WANT TO TALK?
- I WILL BE RADA OF INTERESTING TALK
- ABOUT WHAT?
- LET'S BE FRIENDS
Total
For testing, specially selected questions that are not in the training database (except the first), to check the "rationality" of the constructed models. It seemed to me that recurrent networks work much better, they are not strongly affected by the absence of some words in a question or the order of words in a sentence (the answer to “What is your name?”, “What is your name?” Is the same). Of course, this result is still far from “good”.
Interestingly, the first model of the three most adequately responds to the greeting, it does not knock down your own name in the sentence. However, she still does not exactly know her name. The second model, on the contrary, to the greeting, different from the student, responds as awfully as you like. But, unlike the first model, she tried to answer the question about her name correctly (“Is your name Oleg?” - “My name is”). Although this implementation does not presuppose memorization of the context of the dialogue and previous answers, the choice of the topic of conversation in the first two models looks more adequate.
Conclusion: Of the entire test base, the first models respond adequately to one part of the questions, wonderfully failing the test for the rest. Other models answer the second part of the questions and do not brilliantly cope with the first. It is a pity that you can not create a set of neural networks that would be able to answer all the questions of the test set correctly ...
Therefore, the further task is to study the influence of types and the number of layers of an ANN on the quality of its answers with constant training and test sets in order to construct such a model of a neural network that passes my test.