Automating answers to frequently asked questions in the skill for "Alice" using the library DeepPavlov

For more than a year, the Laboratory of Neural Systems and Deep Learning at MIPT has been making DeepPavlov , an open library for creating interactive systems. It contains a set of pretrained components for language analysis, with the help of which you can effectively solve business problems.

For example, organize answers to frequently asked questions from customers. To do this through a call center, a widget on a website or a social network, hiring employees is a simple matter. The actual task is to optimize the process so that it is carried out automatically, with minimal errors, and also in a convenient user interface. For example, in the voice assistant "Alice" from "Yandex".

In this article we want to tell how to effectively solve the problem of answers to the FAQ using natural language processing and how to integrate the solution into Alice.
')

Text classification and how to do it.
Creating a question-response skill based on the library DeepPavlov
Installing the DeepPavlov Library
Running the skill on Alice
Conclusion

Text classification and how to do it.

The problem of finding a close to a given question from a ready-made set of question-answer pairs is solved by algorithms for determining semantic proximity / textual classification.

To solve this problem “in production”, there are two ways: you can hire an NLP-specialist in the state, or you can outsource the decision.

The disadvantages of both options are: 1) the need for data collection, 2) endless iterations of model training and quality measurement, 3) serious requirements for the qualifications of developers. Even the process of integrating a ready-made solution for processing a language is not an easy task, let alone creating it from scratch. Foreign cloud solutions (Google Assistant or Microsoft Cortana) offer a comprehensive solution to the problem of text classification (DialogFlow, Azure Bot Service), but there are still issues with scaling, linking to paid API services and Russian language support.

But hurray - there is an alternative: you can use an open software library that will greatly simplify the creation of a solution for answering the FAQ in Russian and integrating it into a voice assistant.

Creating a question-response skill based on the library DeepPavlov

DeepPavlov - just such a library. It contains a set of pretrained components for analyzing a language, including text classification components. Learn more about the different components DeepPavlov can be found in the help .

Working with DeepPavlov does not require any special skills from the developer, the library is free and provides ample opportunities for fine-tuning.

All instructions for creating a skill based on the knowledge base can be found in this tutorial. We recommend rewriting the code from the tutorial into a separate script and launching the skill from the script.

Installing the DeepPavlov Library

First, install Python 3.6 and activate the development environment. Then install DeepPavlov.

source activate py36 pip install -q deeppavlov

Skill development

A skill in DeepPavlov is an entity that, regardless of its functionality ( text classification , open-domain question answering , etc.), has a unified input and output format. Skills are created so that they can be assembled into a single stack of a simple interactive system, which upon receipt of a request takes the answer from the skill with the highest confidence (confidence).

Create an object of class SimilarityMatchingSkill, which responds to the user's request based on a list of frequently asked questions.

 from deeppavlov.contrib.skills.similarity_matching_skill import SimilarityMatchingSkill faq = SimilarityMatchingSkill(data_path = 'http://files.deeppavlov.ai/faq/dataset_ru.csv', x_col_name = 'Question', y_col_name = 'Answer', save_load_path = './model', config_type = 'tfidf_autofaq', edit_dict = {}, train = True)

The object of class SimilarityMatchingSkill has the following parameters:

data_path - the path to the csv data file (comma separated)
x_col_name - the name of the column with questions in the csv file (Question, default)
y_col_name - the name of the column with answers in the csv file (Answer, by default)
config_type is the name of the configuration you want to use for classification. List of all configurations .
edit_dict - `dict` with parameters that need to be rewritten in the configuration of a specific config_type
save_load_path - the path where to save the trained model
train - whether to train a model

To start using the model, after training it’s enough to load it with the following command:

 faq = SimilarityMatchingSkill(load_path='./model')`.

The SimilarityMatchingSkill class simplifies access to text classification components. But if there is a part of the configuration that you want to change, you can do this by defining the edit_dict parameter. An object of the SimilarityMatchingSkill class (like any skill) takes three parameters as input: a list of suggestions for classification, a list of query history and a list of states (in the case of SimilarityMatchingSkill, the last two may be empty lists).

 faq(['  ?'],[],[])

A typical interactive system usually contains several skills. To demonstrate the work with several skills, we will create several skills of the PatternMatchingSkill class.

 from deeppavlov.skills.pattern_matching_skill import PatternMatchingSkill hello = PatternMatchingSkill(responses=['', ''], patterns=['', '']) bye = PatternMatchingSkill(responses=['', ' '], patterns=['', ' ']) fallback = PatternMatchingSkill(responses=[' '], default_confidence = 0.3)

PatternMatchingSkill is a simple skill class that is invoked when a user request matches one of the elements of the patterns list and responds to random entries of the responses with the confidence of default_confidence. You can manually configure the default_confidence parameter in order to prioritize skill responses.

The final step is to combine the skills into an agent and configure the skill selection parameter. The parameter `HighestConfidenceSelector` determines what skill will be triggered with the highest confidence.

 from deeppavlov.agents.default_agent.default_agent import DefaultAgent from deeppavlov.agents.processors.highest_confidence_selector import HighestConfidenceSelector agent = DefaultAgent([hello, bye, faq, fallback], skills_selector=HighestConfidenceSelector())

Next, start the server specifying the path for the `endpoint = 'faq' 'requests and the port = 5000 port

 from deeppavlov.utils.alice import start_agent_server start_agent_server(agent, host='0.0.0.0', port=5000, endpoint='/faq')

Please note that Yandex.Dialogs as the Webhook URL requires you to specify a server with an external IP address and access using the https protocol. For quick prototyping, you can use Ngrok - it allows you to create a tunnel to access your server from DeepPavlov on the local network. To do this, run

 ngrok http 5000

on your server with DeepPavlov. In response, two tunnels will be created, one for the http and https protocols. Copy the tunnel address for https, add endpoint / faq to the link, the final link will be the Webhook URL for our Yandex.Dialog.

Running the skill on Alice

To test the interaction with the Yandex.Dialogs platform, go to dialogs.yandex.ru/developer and create a new dialogue . Specify a unique name and activation name. For Webhook URL, specify the link received earlier. Save the changes. To interact with the skill, go to the tab "Testing".

Conclusion

Well, now you know how to use text classification models from the DeepPavlov library to create a question-answer bot, how to quickly prototype skills using DeepPavlov and connect them to Alice.

By the way, the interfaces to connect to Amazon Alexa and Microsoft Bot Framework are also implemented in our library.

We welcome feedback in the comments. And any questions on DeepPavlov you can leave on our forum .

Source: https://habr.com/ru/post/445748/

All Articles