Although I was
able to deal with the classification of the intent, there remained a more difficult task - to extract additional parameters from the phrase. I know that this is done using tags. Once I have successfully applied
sequence_tagging , but I’m not very happy to keep a dictionary of vector representations of words larger than 6 gigabytes.
Attempt zero
I found
an example of Kegher’s implementation of tegger and, in the best traditions of my experiments, began to copy pieces of code from there
without thinking . In the example, the neural network processes the input string as a sequence of characters, not dividing it into words. But further down the text is an example using the Embedding layer. And since I learned to use hashing_trick, then I felt a keen desire to use this skill.
What I have done has been learning much slower than the classifier. I included a debugging output in Keras, and, thoughtfully looking at the slowly appearing lines, I noticed the Loss value. It did not particularly decrease, and at the same time it seemed to me to be rather large. And accuracy while it was small. I was too lazy to sit and wait for the result, so I remembered one of the recommendations of Andrew Ng - to try my neural network on a smaller set of training data. By looking at the dependence of Loss on the number of examples, it is possible to evaluate whether it is worth expecting good results.
Therefore, I stopped the training, generated a new set of training data - 10 times less than the previous one - and started the training again. And almost immediately received the same Loss and the same Accuracy. It turns out that the increase in the number of training examples will not be better.
')
I nevertheless waited for the end of training (about an hour, despite the fact that the classifier studied in a few seconds) and decided to try it out. And I realized that I had to copy more, because in the case of seq2seq for training and for real work, different models are needed. I checked the code a little more and decided to stop and think about what to do next.
Before me was the choice - to take the finished example again, but without amateur, or to take the
ready seq2seq , or to return to the tool that I had already worked with - the sequence tagger on NERModel. It is true that without GloVe.
I decided to try all three in the reverse order.
NER model from sequence tagging
The desire to edit the existing code disappeared as soon as I looked inside. So I went the other way - to pull out of sequence tagging different classes and methods, take gensim.models.Word2Vec and feed it all. After an hour of attempts, I was able to make training data sets, but I could not replace the dictionary. I looked at the mistake that came from somewhere in the depths of numpy, and refused this idea.
Made a
commit , just in case not lost.
Seq2Seq
The documentation on Seq2Seq describes only how to prepare it, but not how to use it. I had to find
an example and try again to adjust to my own. Another couple of hours of experiments and the result - the accuracy in the learning process is consistently equal to 0.83. Regardless of the size of the training data. So again I confused something somewhere.
Here in the example I didn’t really like that, first, the manual splitting of the training data into pieces takes place, and second, the embedding is done manually. In the end, I screwed into one Keras-model Embedding first, then Seq2Seq, and prepared the data in one large piece.
it turned out beautifulmodel = Sequential() model.add(Embedding(256, TOKEN_REPRESENTATION_SIZE, input_length=INPUT_SEQUENCE_LENGTH)) model.add(SimpleSeq2Seq(input_dim=TOKEN_REPRESENTATION_SIZE, input_length=INPUT_SEQUENCE_LENGTH, hidden_dim=HIDDEN_LAYER_DIMENSION, output_dim=output_dim, output_length=ANSWER_MAX_TOKEN_LENGTH, depth=1)) model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
But beauty did not save - the behavior of the network has not changed.
Another commit , turn to the third option.
Manual seq2seq
At first I honestly copied everything and tried to run as is. The input is simply a sequence of characters of the original phrase, the output should be a sequence of characters that can be spaced apart by spaces and get a list of tags. Accuracy seems to be good. Because the neural network quickly learned that if it began to spell out a tag, then it would write to the end without errors. But the tags themselves didn’t fit the desired result at all.
We make a small change - the result should not be a sequence of characters, but a sequence of tags, recruited from the final list. Accuracy fell immediately - because now it has become honestly clear that the network is not coping.
Nevertheless, I brought the network training to the end and looked at exactly what it gives out. Because a stable result of 20% probably means something. As it turned out, the network found a way not to strain too much:
please, remind me tomorrow to buy stuff O
That is, it pretends that there is only one word in the phrase that does not contain any data (in the sense of those that the classifier has not eaten yet). We look at the training data ... indeed, about 20% of the phrases are just that - yes, no, part of ping (that is, all sorts of hello) and part of acknowledge (all sorts of thanks).
We begin to put the network stick in the wheel. I cut the number of yes / no 4 times, ping / acknowledge 2 times and add all the garbage in one word, but containing data. At this stage, I decided that I don’t need to explicitly bind to the class in tags, so for example
B-makiuchi-count
turned into just
B-count
. And the new “rubbish” was just numbers with the
B-count
class, “time” in the form of “4:30 AM” with the expected
B-time
tag, indicating a date like “now”, “today” and “tomorrow” with the tag
B-when
Still not working. The network no longer gives an unambiguous answer “O and that's it”, but at the same time accuracy remains at 18%, and the answers are completely inadequate.
not yet expected ['O', 'O'] actual ['O', 'O', 'B-what'] what is the weather outside? expected ['O', 'O', 'O', 'O', 'O'] actual ['O', 'O', 'B-what']
While a dead end.
Interlude - Comprehension
The lack of results is also a result. I even had a superficial, but an understanding of what exactly happens when I construct models in Keras. I learned how to save them, load them and even train them as needed. But at the same time, I did not achieve what I wanted - the translation of "human" speech into "the language of the bot." Clues I no longer left.
And then I started writing an article. Previous article In its original version, everything ended at this place - I have a classifier, but no tagger. After some deliberation, I abandoned this idea and left only about a more or less successful classifier and mentioned problems with the tagger.
The calculation was justified - I received a link to the
Rasa NLU . At first glance, it looked like something very appropriate.
Rasa nlu
For several days I did not return to my experiments. Then he sat down and fastened the Rasa NLU with his experimental scripts in an hour and a little. This is not to say that it was very difficult.
codemake_sample tag_var_re = re.compile(r'data-([az-]+)\((.*?)\)|(\S+)') def make_sample(rs, cls, *args, **kwargs): tokens = [cls] + list(args) for k, v in kwargs.items(): tokens.append(k) tokens.append(v) result = rs.reply('', ' '.join(map(str, tokens))).strip() if result == '[ERR: No Reply Matched]': raise Exception("failed to generate string for {}".format(tokens)) cmd, en, rasa_entities = cls, [], [] for tag, value, just_word in tag_var_re.findall(result): if just_word: en.append(just_word) else: _, tag = tag.split('-', maxsplit=1) words = value.split() start = len(' '.join(en)) if en: start += 1 en.extend(words) end = len(' '.join(en)) rasa_entities.append({"start": start, "end": end, "value": value, "entity": tag}) assert ' '.join(en)[start:end] == value return cmd, en, rasa_entities
After this, save the training data is not difficult:
rasa_examples = [] for e, p, r in zip(en, pa, rasa): sample = {"text": ' '.join(e), "intent": p} if r: sample["entities"] = r rasa_examples.append(sample) with open(os.path.join(data_dir, "rasa_train.js"), "w") as rf: json.dump({"rasa_nlu_data": {"common_examples": rasa_examples, "regex_features": [], "entity_synonims": []}}, rf)
The most difficult thing in creating a model is the correct config.
training_data = load_data(os.path.join(data_dir, "rasa_train.js")) config = RasaNLUConfig() config.pipeline = registry.registered_pipeline_templates["spacy_sklearn"] config.max_training_processes = 4 trainer = Trainer(config) trainer.train(training_data) model_dir = trainer.persist(os.path.join(data_dir, "rasa"))
And the most difficult to use is to find it.
config = RasaNLUConfig() config.pipeline = registry.registered_pipeline_templates["spacy_sklearn"] config.max_training_processes = 4 model_dir = glob.glob(data_dir+"/rasa/default/model_*")[0] interpreter = Interpreter.load(model_dir, config)
parsed = interpreter.parse(line) result = [parsed['intent_ranking'][0]['name']] for entity in parsed['entities']: result.append(entity['entity']+':') result.append('"'+entity['value']+'"') print(' '.join(result))
please, find me some pictures of japanese warriors
find what: "japanese warriors"
remind me to have a breakfast now, sweetie
remind action: "have a breakfast" when: "now" what: "sweetie"
... although there is still something to work on.
Of the shortcomings - the learning process is completely silent. Surely it is included somewhere. However, all the training took about three minutes. Still for work spacy after all the model for initial language is required. But it weighs significantly less than GloVe - for English it is less than 300 megabytes. True for the Russian language model is not yet - and the ultimate goal of my experiments should work with the Russian. It will be necessary to look at the other pipeline, available in Rasa.
All code is available
in github .