Simple intent classifier

Last time I learned how to use neural networks to understand what the user wants to achieve from the bot. What I did then had a number of drawbacks. First, I limited myself to only one kind of phrases. Secondly, I used the ponderous nmt to get the intent out of the phrase. At the same time, such a problem is usually solved by ordinary classifiers.

More convenient generation of educational data

Last time, to generate phrases, I wrote something on a python. Even for the only kind of phrases, this was too unsupported solution. Now it required more variety, so writing on pure python was no longer interesting. Especially when there is a more convenient tool - RiveScript.

In RiveScript, I made templates for different phrases, and only intent and possibly some parameters are input, and the phrase is already generated in RiveScript.

code

make_sample

tag_var_re = re.compile(r'data-([az-]+)\((.*?)\)|(\S+)') def make_sample(rs, cls, *args, **kwargs): tokens = [cls] + list(args) for k, v in kwargs.items(): tokens.append(k) tokens.append(v) result = rs.reply('', ' '.join(map(str, tokens))).strip() if result == '[ERR: No Reply Matched]': raise Exception("failed to generate string for {}".format(tokens)) cmd, en, tags = [cls], [], [] for tag, value, just_word in tag_var_re.findall(result): if just_word: en.append(just_word) tags.append('O') else: _, tag = tag.split('-', maxsplit=1) words = value.split() en.append(words.pop(0)) tags.append('B-'+tag) for word in words: en.append(word) tags.append('I-'+tag) cmd.append(tag+':') cmd.append('"'+value+'"') return cmd, en, tags

using

  rs = RiveScript(utf8=True) rs.load_directory(os.path.join(this_dir, 'human_train_1')) rs.sort_replies() for c in ('yes', 'no', 'ping'): for _ in range(COUNT): add_sample(make_sample(rs, c)) to_remind = ['wash hands', 'read books', 'make tea', 'pay bills', 'eat food', 'buy stuff', 'take a walk', 'do maki-uchi', 'say hello', 'say yes', 'say no', 'play games'] for _ in range(COUNT): r = random.choice(to_remind) add_sample(make_sample(rs, 'remind', r))

RiveScript

 + hello - hello - hey - hi + ping - {@hello}{random}|, sweetie{/random} - {@hello} there - {random}are |{/random}you {random}here|there{/random}? - ping - yo + yes - yes - yep - yeah + no - no - not yet - nope

 + remind * @ maybe-please remind-without-please data-remind-action(<star>) + remind-without-please * - remind me to <star> - remind me data-remind-when({@when}) to <star> - remind me to <star> data-remind-when({@when}) + when - today - later - tomorrow + maybe-please * - <@> {weight=3} - please, <@> - <@>, please

As a result of such tricks it turns out something like this:

')

Source line for generation: remind do maki-uchi

Derived from RiveScript: please, remind me data-remind-when(tomorrow) to data-remind-action(do maki-uchi)

String "in English": please, remind me tomorrow to do maki-uchi

Bots line: remind when: "tomorrow" what: "do maki-uchi"

Related tags: OOO B-when O B-action I-action

Although the tags are not needed for classification, they will be needed later for the tagger.

Self classifier

My main problem last time was complete ignorance of terminology. Now I already know some keywords, so I just drove into the search engine "classify sentence tensorflow" and got a bunch of more or less usable materials. However, even this was not required, because I had already saved a bookmark , which almost completely suited me. I especially liked the fact that there is no need for a separate dictionary, because the model proposed there can build word embeddings directly from the test suite.

word embeddings

To be honest, for a long time I did not understand what word embeddings is. In fact, this is just a kind of dictionary in which each word corresponds to a vector of floats, and for “close” words these vectors will be close. Whatever that means.

The network shown in the example requires only one thing - so that instead of words a list of integers is supplied to it. Of course, I could make a list of all available words and replace each word with its number in this list. But it would not be very interesting. Moreover, the example proposed to use the function one_hot, which is part of keras.preprocessing.text.

code

classifier itself

 def _embed(sentence): return one_hot(sentence, HASH_SIZE) def _make_classifier(input_length, vocab_size, class_count): result = Sequential() result.add(Embedding(vocab_size, 8, input_length=input_length)) result.add(Flatten()) result.add(Dense(class_count, activation='sigmoid')) result.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc']) return result def _train(model, prep_func, train, validation=None, epochs=10, verbose=2): X, y = prep_func(*train) validation_data = None if validation is None else prep_func(*validation) model.fit(X, y, epochs=epochs, verbose=verbose, shuffle=False, validation_data=validation_data) class Translator: def __init__(self, class_count=None, cls=None, lb=None): if class_count is None and lb is None and cls is None: raise Exception("Class count is not known") self.max_length = 32 self.lb = lb or LabelBinarizer() if class_count is None and lb is not None: class_count = len(lb.classes_) self.classifier = cls or _make_classifier(self.max_length, HASH_SIZE, class_count) def _prepare_classifier_data(self, lines, labels): X = pad_sequences([_embed(line) for line in lines], padding='post', maxlen=self.max_length) y = self.lb.transform(labels) return X, y def train_classifier(self, lines, labels, validation=None): _train(self.classifier, self._prepare_classifier_data, (lines, labels), validation) def classifier_eval(self, lines, labels): X = pad_sequences([_embed(line) for line in lines], padding='post', maxlen=self.max_length) y = self.lb.transform(labels) loss, accuracy = self.classifier.evaluate(X, y) print(loss, accuracy*100) def classify(self, line): res = self._classifier_predict(line) if max(res[0]) > 0.1: return self.lb.inverse_transform(res)[0] else: return 'unknown' def classify2(self, line): res = self._classifier_predict(line) print('\n'.join(map(str, zip(self.lb.classes_, res[0])))) m = max(res[0]) c = self.lb.inverse_transform(res)[0] if m > 0.05: return c elif m > 0.02: return 'probably ' + c else: return 'unknown ' + c + '? ' + str(m)

training

 def load_sentences(file_name): with open(file_name) as fen: return [l.strip() for l in fen.readlines()] def load_labels(file_name): with open(file_name) as fpa: return [line.strip().split(maxsplit=1)[0] for line in fpa]

  sentences = load_sentences(os.path.join(data_dir, "train.en")) labels = load_labels(os.path.join(data_dir, "train.pa")) tags = load_sentences(os.path.join(data_dir, "train.tg")) label_count = len(set(labels)) translator = Translator(label_count) translator.lb.fit(labels) translator.train_classifier(sentences, labels)

using

  classifier = model_from_json(os.path.join(data_dir, "trained.cls")) with open(os.path.join(data_dir, "trained.lb"), 'rb') as labels_file: lb = pickle.load(labels_file) translator = Translator(lb=lb, cls=classifier, tagger=tagger) line = ' '.join(sys.argv) print(translator.classify2(line))

I composed the first 4 classes of phrases (yes, no, ping and remind), implemented saving and loading and decided to try. To my surprise, the classifier incorrectly translated even phrases from the training set. Then I added a test score to the learning script. This estimate showed an accuracy of 98-99%. Then I copied the translation script, but instead of analyzing the phrase argument, I sent the cross-validation again. And got the result in 25%. Just as if the neural network randomly chose one of four classes.

The one_hot function came under suspicion. I was embarrassed that to encode words you only need to know the size of the dictionary, but not the content. Experiments have shown that one_hot will produce the same results within the same script run, but different for different launches. After unsuccessful attempts to use something else, I decided to read the documentation more carefully.

As it turned out, for good reason.

one_hot

One-hot encodes a text into a vocabulary of size n.

This is a wrapper function.

Here, it would seem, nothing hints.

hashing_trick

Converts in a fixed-size hashing space

It seems too nothing. But if you still look below at the list of arguments ...

hash_function : defaults to python hash function, can be 'md5' or any function. It is not a consistent hashing function.

I changed one_hot to hashing_trick from md5, but the result did not change, I received the same 25% correct answers. Using one_hot was certainly a mistake, but not the only one.

The next suspect was the save and load function of the trained neural network. As it turned out, model.to_json and model_from_json work only with the network model, but do not save or load weight. And to save the weights, it was also necessary to install the h5py package. After correcting this annoying error, I finally got results that are similar to the truth:

$ ./translate4.py 'please, remind me to make some tea'
    

      
      
      
    

  probably remind

After that, I composed several more classes of phrases, bringing their total number to 10. With different variants, it turned out 13 - two options for remind (one action or two) and three options for find (search by one key phrase, or by two with AND and with OR).

Result

I got a simple classifier, which quickly (a few seconds) learns and gives good results. Much better than using nmt for this. The next step should be a tagger. I could again use the finished sequence tagging, but I really do not want to keep the multi-gigabyte GloVe. Therefore, I continue this experiment, trying to make a tagger that would suit me. So far unsuccessfully.

At some point, messing with the tagger, I almost gave up. But then I came across an article about Alice . Just the day before, I decided to distract myself from analyzing the text in the direction of how the brain should work. What I was able to come up with turned out to be the first step in the direction of how it was done in Alice. Plus, again, we are talking about semantic analysis of phrases. And they did it. So you can hope that I can too. But I’ll understand how to use bidirectional LSTM instead of the usual ones, and what state-of-the-art is, and so on.

All the code for my experiments is available in a githaba .

Source: https://habr.com/ru/post/348224/

All Articles

Simple intent classifier

More convenient generation of educational data

Self classifier

Result

More articles: