How to build a voice bot: speech recognition, speech synthesis and NLP in a few lines of code

We regularly write about voice bots and automatics of incoming and outgoing calls. Confirmation of delivery, orders, guessing and auto-response to the client while we connect with the company - this is the whole story. In the comments I reasonably noticed that I talk a lot about bots, but I’m showing a little. It is easy to fix! The Hackathon S7 in Gorky Park is in full swing, 50 teams have prototypes of various interesting features - and I have the opportunity to try to keep within the least possible number of lines of code. The minimalism in the examples is cool.

How it will work

For the demonstration, I will do the simplest case: chat about the weather with the help of the famous speech engine , api.ai , recently bought by google. In response to an arbitrary request, this service returns json with the result of “understanding” this request. If there is something similar to a weather query, you can use openweathermap to get a text description, for example, “cloudy”. Like now outside the windows of coworking. But I hope it will be fine by the middle of the day!

And the Voximplant platform will provide for renting a phone number, receiving a call, recognizing the user's speech and synthesizing an answer. One of our key features is JavaScript, which runs in our cloud in parallel with the call. And not just executed, but in real time. Plus, from this JavaScript, you can make HTTP requests to other services, so we don’t need Backend as such: we will do everything in the same cloud that processes the call. That the user had as little as possible delays between his speech and answer. We have a bot, not a turn-based strategy with asterisk?

Step 1: get the phone number and answer the incoming call

I have a good introduction to Voximplant, but it is in English . Good for our customers around the world, but not very good for a tutorial article on Habré, so I’ll allow myself a brief retelling. After registering in the admin panel , go to the scripts section and create a new script: the same JavaScript code that will be executed in the cloud. The simplest scenario will answer the call, synthesize “hello, user” and hang up. Here is his code:
')

VoxEngine.addEventListener(AppEvents.CallAlerting, function(e) {
var inc = e.call; // 'e'
inc.answer(); // ,
inc.addEventListener(CallEvents.Connected, function(e) { //
inc.say(", ", Language.RU_RUSSIAN_FEMALE); //
inc.addEventListener(CallEvents.PlaybackFinished, function(e) {
VoxEngine.terminate(); // JavaScript
});
});
});

view raw scenario-1.js hosted with ❤ by GitHub

To organize the code and instruct when and what to do, we have “applications” and “rules”. Go to the application section, create a new one, add a rule with a default dot-asterisk mask (which means “for calls to any numbers.” In our example, we will use the rented number and calls will obviously come “to this number.” But in general case, the call can come from both other telephony and web sdk - for such cases, the rules help distribute them without doing extra if in the scripts) and assign the rule to the created JavaScript script for this rule.

What else do you need to call the number? Right, number. They are rented in the numbers section, buy. Important : “real numbers” is a switch. If you click on it, the interface switches to the technical mode of virtual numbers for debugging. A number in Gotham City can be bought for 1 cent. Calls to such numbers go through a single access number and extension.

Having rented a number, go to the section “My phone numbers” from the top menu and connect the created application to the number. Everything, you can call and check. By the way, if during the tests the starting balance is over, you can write to me in PM and I will replenish. Habr is first of all a community; we should support our own people.

Step two: try to understand the caller

A little later, I'll show you how to start recognizing from JavaScript Voximplant and get text instead of voice. But for now let's imagine that it already exists and we need to “understand” what the user said. To do this, we register at api.ai , connect a Google account, go to the “prebuild agents” section and add a brain to the project that can talk about the weather. Well, how to "talk." Answer simple questions. After that, in the left menu, select the created project and click on the gear icon in the same place. In the opened project settings window, we are interested in the “Client access token” - we will be able to send requests using it. For example, this is how weather in Moscow is recognized:

curl \
-H "Authorization: bearer a42ee31de39c43f8b31a291397473e4b" \
-H "Content-Type: application/json; charset=utf-8" \
https://api.api.ai/v1/query \
--data @- <<EOT
{
query: " ",
lang: ru,
sessionId: 1
}
EOT

view raw get-weather-example.sh hosted with ❤ by GitHub

In response, you get a rather big json, hidden under the spoiler. The most valuable thing in the key is result , where you can check the topic in action , and in address and city , where you are interested in the weather. Please note that this is a very simple demo, and to the question “what is the weather outside the window” you will receive the address “outside the window”.

Hidden text

{
"id": "4e936c5b-8432-48e4-9d82-dd5996d0049d",
"timestamp": "2017-05-21T05:06:13.051Z",
"lang": "ru",
"result": {
"source": "agent",
"resolvedQuery": " ",
"speech": "",
"action": "weather",
"parameters": {
"address": {
"city": ""
},
"date-time": "",
"unit": ""
},
"metadata": {
"inputContexts": [],
"outputContexts": [],
"intentName": "weather",
"intentId": "f1b75ecb-a35f-4a26-88fb-5a8049b92b02",
"webhookUsed": "false",
"webhookForSlotFillingUsed": "false",
"contexts": [
"weather"
]
},
"score": 0.95
},
"status": {
"code": 200,
"errorType": "success"
},
"sessionId": "1"
}

view raw weather-response.json hosted with ❤ by GitHub

Step three: find out the weather on Mars

Having received the city for which the caller wants to know the weather (or information about what the caller says is not about the weather at all), you can find out the weather itself. Apishki for this million, for the demonstration, I will use the first openweathermap.org that came across , where you can register and get the key api. Please note that the key does not start working immediately. An example of a request that will return the weather in Moscow:

curl "http://api.openweathermap.org/data/2.5/weather?q=&lang=ru&appid=3bb8d31acec1fa4b5c0a23f07169a0fd"

view raw get-weather-response.sh hosted with ❤ by GitHub

In response, we receive json in the same way, in which there is a description field ready for pronouncing. In Moscow it is now overcast:

Hidden text

{
"coord": {"lon":37.62, "lat":55.75},
"weather": [
{"id":804,"main":"Clouds","description":"","icon":"04d"}
],
"base": "stations",
"main": {"temp":283.9,"pressure":1010,"humidity":66,"temp_min":283.15,"temp_max":284.15},
"visibility": 10000,
"wind": {"speed":7,"deg":10},
"clouds": {"all":90},
"dt": 1495359000,
"sys": {"type":1,"id":7323,"message":0.0024,"country":"RU","sunrise":1495328828,"sunset":1495388778},
"id": 524901,
"name": "Moscow",
"cod": 200
}

view raw get-weather-response.json hosted with ❤ by GitHub

The last step: we put everything together

All that is left is to enable streaming recognition in Voximplant JavaScript scripts (we already wrote about it), wait for the question from the user, make a request to NLP, get the name of the city, then make a request to the weather service, get a description of the weather and synthesize it into a call. It will take less than a second for the user, and all this will provide this code:

require(Modules.ASR);
VoxEngine.addEventListener(AppEvents.CallAlerting, function (e) {
mycall = e.call;
mycall.addEventListener(CallEvents.Connected, handleCallConnected);
mycall.answer();
});
function handleCallConnected() {
mycall.say(", ! , .", Language.RU_RUSSIAN_FEMALE);
mycall.addEventListener(CallEvents.PlaybackFinished, function onWelcomeFinished() {
mycall.sendMediaTo(myasr);
mycall.removeEventListener(CallEvents.PlaybackFinished, onWelcomeFinished);
});
mycall.addEventListener(CallEvents.Disconnected, VoxEngine.terminate);
myasr = VoxEngine.createASR({
lang: ASRLanguage.RUSSIAN_RU
});
myasr.addEventListener(ASREvents.Result, function(e) {
recognitionEnded();
var userSpeech = e.text;
Net.httpRequest("https://api.api.ai/v1/query", function(e) {
res = JSON.parse(e.text);
if (!res.result || res.result.action !== "weather"){
mycall.say("- . .", Language.RU_RUSSIAN_FEMALE);
mycall.addEventListener(CallEvents.PlaybackFinished, firstPlaybackFinished);
}
else if (!res.result.parameters || !res.result.parameters.address || !res.result.parameters.address.city ){
mycall.say(" .", Language.RU_RUSSIAN_FEMALE);
mycall.addEventListener(CallEvents.PlaybackFinished, firstPlaybackFinished);
}
else {
var city = res.result.parameters.address.city;
Net.httpRequest("http://api.openweathermap.org/data/2.5/weather?q=" + city + "&lang=ru&appid=a9fa46a8d49e57dbb4ba2d60cb934782",
function(e) {
var weather = e.text;
var weatherDescription = JSON.parse(weather);
mycall.say(" " + city + " - " + weatherDescription.weather[0].description, Language.RU_RUSSIAN_FEMALE);
mycall.addEventListener(CallEvents.PlaybackFinished, function() {
VoxEngine.terminate();
});
});
}
},
{ headers: ["Authorization: bearer 732ae2b69fbf4da3a885714630b47d67",
"Content-Type: application/json; charset=utf-8"],
method: "POST",
postData: "{'query': '" + userSpeech + "', 'lang': 'ru', 'sessionId': '1'}" });
});
}
function recognitionEnded() {
myasr.stop();
}
function firstPlaybackFinished(e) {
mycall.removeEventListener(CallEvents.PlaybackFinished);
handleCallConnected();
}

view raw habr--weather-bot-329122.js hosted with ❤ by GitHub

Source: https://habr.com/ru/post/329122/

All Articles

	VoxEngine.addEventListener(AppEvents.CallAlerting, function(e) {
	var inc = e.call; // 'e'
	inc.answer(); // ,
	inc.addEventListener(CallEvents.Connected, function(e) { //
	inc.say(", ", Language.RU_RUSSIAN_FEMALE); //
	inc.addEventListener(CallEvents.PlaybackFinished, function(e) {
	VoxEngine.terminate(); // JavaScript
	});
	});
	});

	curl \
	-H "Authorization: bearer a42ee31de39c43f8b31a291397473e4b" \
	-H "Content-Type: application/json; charset=utf-8" \
	https://api.api.ai/v1/query \
	--data @- <<EOT
	{
	query: " ",
	lang: ru,
	sessionId: 1
	}
	EOT

	{
	"id": "4e936c5b-8432-48e4-9d82-dd5996d0049d",
	"timestamp": "2017-05-21T05:06:13.051Z",
	"lang": "ru",
	"result": {
	"source": "agent",
	"resolvedQuery": " ",
	"speech": "",
	"action": "weather",
	"parameters": {
	"address": {
	"city": ""
	},
	"date-time": "",
	"unit": ""
	},
	"metadata": {
	"inputContexts": [],
	"outputContexts": [],
	"intentName": "weather",
	"intentId": "f1b75ecb-a35f-4a26-88fb-5a8049b92b02",
	"webhookUsed": "false",
	"webhookForSlotFillingUsed": "false",
	"contexts": [
	"weather"
	]
	},
	"score": 0.95
	},
	"status": {
	"code": 200,
	"errorType": "success"
	},
	"sessionId": "1"
	}

	{
	"coord": {"lon":37.62, "lat":55.75},
	"weather": [
	{"id":804,"main":"Clouds","description":"","icon":"04d"}
	],
	"base": "stations",
	"main": {"temp":283.9,"pressure":1010,"humidity":66,"temp_min":283.15,"temp_max":284.15},
	"visibility": 10000,
	"wind": {"speed":7,"deg":10},
	"clouds": {"all":90},
	"dt": 1495359000,
	"sys": {"type":1,"id":7323,"message":0.0024,"country":"RU","sunrise":1495328828,"sunset":1495388778},
	"id": 524901,
	"name": "Moscow",
	"cod": 200
	}