Alice's B2B skill: from prototype to the first ruble saved

Not long ago, the second Conversations conference took place in St. Petersburg, devoted to conversational AI, at which I was fortunate enough to speak as a speaker. The theme was the development of a prototype B2B-skill for a large company. The report described how it was possible to “make friends” with the skill with relatively slow web services and the company's closed infrastructure. This will be discussed under the cut.

If suddenly you do not know what the skills of Alice, look under the spoiler: it briefly describes what's what.

For the uninitiated

~~What is~~ Who such Alice, I think, many know. But just in case, this is a voice assistant from Yandex. In addition to the fact that she can do a lot out of the box, there is a platform at the disposal of developers to expand its functionality - Yandex.Dialogs (they are Alice’s skills).

From the user's point of view, skill is Alice’s special mode, which is invoked by certain activation phrases. In this mode, Alice sends the user's replicas to a third-party web service, and responds with a message sent in response.
')
From a technical point of view, a skill is that third-party web service that must accept requests that contain replicas of users. His answers may contain text, links, pictures, sounds, etc.

Idea

How did it all start? On March 13, 2018, beta testing of the Yandex.Dialogues platform (Alice’s skills) was announced . At that time, many were already interested in a virtual assistant, which means it was a great opportunity to work with a fairly large audience. The idea of a chat bot was spinning around in my head for a long time, so I decided that it would be interesting to do some skill in my spare time based on his motives. And if he can also benefit at work - it will be generally excellent.

Our company provides a full range of services in the field of business tourism, which means you can make a skill that will help the user to go on a business trip.
Then I was in the team for the development of a mobile application, with which you can find options for flights and hotels for business trips, and arrange the appropriate ones. One of the key indicators for which we struggled to increase was the number of downloads. The idea arose that if the skill brings users into the application, it will help us increase these figures. This was to be verified with the help of this project.

In order for a skill to be at least useful to someone, it must solve a specific task of the user. In this case, search for options for travel. That is, the skill must collect information about where to go, and show the results in a mobile application. Thus, the user will get the desired options using an interesting voice interactive and continue to work in our application, which means developers will get the desired increase in performance.

It turns out that the skill should work like this: greet the user; find it by name; ask clarifying questions and, thus, get the necessary parameters of the trip: the city (from where and where) and dates. Next, show the recognized parameters. If everything is correct, start the search, and give a link to the application.

Resources and limitations

To accomplish its task, the skill needs to interact with our internal APIs, and its web service must also be published somewhere. On the one hand, it could be placed at work, but, as already mentioned, the development was carried out in free time, so I did not want to depend on any specially allocated resources of the company. So, it was necessary to use what is available by default.

For example, a test server. Developers have enough rights to deploy a web application on it, but it will be available only in the company's internal network, because the server does not “stick out” outside. At the same time he has access to the Internet, which means it can be used.

The web service of the skill must be accessible from the outside (so that Alice had to send requests), so she had to be hosted on an external hosting service.

In order for a skill to be able to fulfill its task, a company’s web service is needed that would be able to search for profiles and cities and be accessible from the outside. For this, the mobile application API is suitable, although it has its own nuances. They consist in the fact that you can connect to the API on behalf of only one specific user, which means that the range of searchable profiles will be limited. And the most unpleasant - the results of the search, launched through the API, will be sent only to this user. However, it has the necessary functionality, which means you can work with it.

So, the skill on the external hosting will interact with the API. Of course, it is quite fast, but sometimes, according to the test results, the answer does not have time to come within the required 1500 ms (this is a requirement of the Yandex.Dialogy platform). And in order to still send the results to the right user, you need to run a search service on his behalf that is available only on the internal network. The API, unfortunately, does not help with this, which means that it is necessary to somehow transfer the request from the skill directly to the internal infrastructure.
We will solve these problems as they become available.

Stages. Problems and Solutions

To begin with, in order to actually implement the described scenario, the skill must be stored somewhere state: stage, user name, city, and date. Information is not so much, because you should not deploy a whole database for it, especially since it’s too much messing with it. The state can be stored in a cache.

The choice fell on Redis . He showed himself well in tests for response, and we use it closely at work, which means that if successful, this project can be easily transferred to the company (and the spoiler - we moved it). As a key, you can use the user ID in the skill (specified in the request), and the value can store state data in JSON format. A free copy of Redis can be deployed on Heroku , and for some time now it is supported in Yandex . Cloud .

Now let's take a closer look at the skill stages. At the very first start, the user sees the usual welcome phrase. Next, he must give his full name, on which the skill will look for a profile.

If it is found, then its name should be recorded, and, since the cache is used, then the rest of the necessary information about the profile can be put into it. Now, when the client returns to the skill again, he will see a personal greeting. If the same person logs in from another device and gives his full name, his profile will also be found in the cache, which means we avoid re-searching through the API, which saves time on processing the request.

Next, the trip parameters are received. I, as a user of voice skills, want to call cities and dates as I want, for example, “Peter”, and “in a week”. The skill should be able to recognize such phrases in order to pass the full name of the city to the API and perform a search on the desired day. Now the skill web service immediately receives this information directly in the request:

But such a feature appeared around October 2018, and the skill was developed a little earlier, so Dialogflow was chosen to understand the natural language. It has an excellent markup system, and from time to time you can come to teach it, indicating that the user had in one phrase or another.

So, the client in his own way calls the city and date, the skill transfers his words to Dialogflow, and sends the recognized name of the city to the API, from which he receives the necessary identifier. The chain is long and therefore again there is a high probability that the required 1500 ms will not be met.

The obvious way is to cache. And as a key, you can specify exactly what the user said, and in the value store the identifier of the city from our system. Then there may be several entries in the cache for one city, for example, for the words “Peter” and “St. Petersburg”. But this is not critical if the value does not indicate too much information. In any case, this approach will allow filling the cache with popular cities that other users requested, or “warm it up” in advance. This will allow you to access Dialogflow and API less frequently, which will save time again.

The most interesting stage is the launch of the search. All the necessary parameters are there, but in order for the results to come to the right person, you must somehow “pull” the internal search service. In addition, the search itself takes a long time, and it’s better to perform long operations not in the same web service, but in a separate application.

It is time to use the available server of the company. On it, you can deploy an application that will somehow “pick up” information from the outside and perform long-term tasks, including launching a search.

Such an application may well be the background service.

From the name it is clear that this is an application without UI, which should start its work together with the launch of the server and perform the planned actions, or actions on a specific command (message). We usually organize such a service on the Topshelf framework, and it can receive commands, for example, from a message queue based on the AMQP protocol.

In short, the queue works like this: there is a broker to which senders add messages of a certain type. And there are readers who connect to the broker and get the right information.
A more detailed description can be found, for example, in this article .

On the Internet, there was a good cloud solution that provides a message queue as a service - CloudAMQP . He has a free fare, but he works stably. Another argument for his choice is that this service works on the basis of RabbitMQ , which we also use closely at work.

So, take a look at the work of the skill as a whole: the skill web service interacts with the mobile application API and Dialogflow. The results of calls to them are cached in Redis, and the state is also stored there. After confirming the trip parameters, the skill sends the broker a message with all the necessary information. The background service on the test server connects to it, and when a message appears, it starts a search, and the results are sent to the mobile application.

When the client downloads and installs it, he will find them in his queries:

This completes the skill work.

Results

What happened next? This skill was shown to several clients to get feedback, and this is what we found out: users themselves are reluctant to switch to a mobile application, no matter how cool it is. Some of them are easier to call our agent and ask him to look for what you need.

As practice shows, in this particular case, it is more interesting for users to interact with the voice assistant. He, in this case, replaces the agent, allowing him to save some time, and at the same time motivates clients to download the application to continue working with options in it.

It turns out that, thanks to the skill, it is possible to save certain resources and increase some key indicators, that is, the assumption about the benefits of the skill for our company has been confirmed.

I would like to focus on some of the findings. Obvious: to keep up with 1500 ms, avoid executing unnecessary requests to web services, cache it. For the same information you can use different cache keys. This is justified if at least one person gets into the cache formed by another user. And most importantly: it is better to perform long-term operations in a separate background service: in addition to the decentralization of the skill, it will have fewer problems with multithreading, and if necessary it can be “deployed” within the company's closed network and “take” messages from the outside.

Instead of an epilogue

Chat bots and skills are often written in JavaScript and Python (judging by the number of repositories on GitHub on the query "chatbot"). This is also due to easy publishing to the servers. This project was written in C # under the .net core. In the case of the classic .net framework, there are certain difficulties with publishing (it works mainly under Windows, etc.), but with the advent of the .net core, a lot has changed. For each service or framework mentioned above, there are libraries that fully support this technology. Thanks to this, the skill can potentially be run on Linux servers, and even more so on any hosting supporting Docker. If suddenly you are in a creative search, I recommend to pay attention to this framework, it becomes a good alternative for developing chat bots.

Source: https://habr.com/ru/post/460036/

All Articles