
The article will discuss how
to get real data from any
query in natural language that your application can work with. Namely, about the
REST API of the SpeechMarkup service , which converts an ordinary line of text into JSON with all the found
semantic entities with specific data in each of them.
Yes, yes, this is the technology that underlies any voice assistant and is used in search engines.It allows you to uniquely interpret the request, and then return the result to your application as a normal data set.
In the article I will tell you why you can use this API and give a
small example of a working application .
')
Why do we need it?
Today, all user interfaces are becoming increasingly minimalist and simple. Indeed, the simpler the interface, the faster and more comfortable it will be to use your service or application.
And instead of offering the user complex forms, in which you need to switch between fields, type something, choose something somewhere, etc., it is easier and more convenient to type several words in one field.
Moreover, for example, in Android at any time, you can click on the microphone and say the data that you do not want / inconvenient / long to drive. In iOS, the situation with voice input has also improved due to the support of Russian in dictation. Already today, nothing prevents you from
screwing voice input to your application ,
putting robots in a call center, or even
creating your own voice assistant for a smart home .

But even if we disregard speech recognition (the
situation with which, although far from ideal, improves from year to year ), then we can say that in many cases, replacing forms with a single field with plain text input will help make the service more convenient and understandable .
The user wrote / said, say,
“Two tickets, Peter Moscow tomorrow morning,” and your service immediately issued suitable flights! Or
"Saturday at 6 pm football" - and the event is preserved in the calendar!
“Mikhalych, come to work early tomorrow morning” - and the sms left the necessary contact, or a task was assigned in the task tracker (or better, both).
But not everything is so simple ...
Well, we received the text from the user (or from some speech recognition system), and what next to do with it? That's right - you just need to pull the data necessary for our service out of it and that's it! For example, the date and time of the flight, the city of departure and arrival. Or date-time and reminder text.
Well, how simple ... It turns out that it is quite difficult ...
Given the fact that this is a
natural language , with its inherent features such as morphology, arbitrary word order, recognition errors, etc., the task of correctly interpreting even a small sentence of 5-10 words becomes really difficult.
Let's say the date can be specified both absolute and relative - “the
day after tomorrow ” or “
in two days ”, “
December 2nd ” or “
Saturday ”. With time - the same thing. And numbers can be indicated with numbers and words! Cities have synonyms (
Peter, St. Petersburg, Leningrad ), they can be written with and without a hyphen (
New York ). And to understand that a substring is a full name, and two adjacent surnames are different people, it is even more difficult ...
Do you want to solve this with the help of regexps? Or delve into the intricacies of NLP, mathematics, AI theory, etc.? So I do not want. Because I
only need
to pull a couple of data from the line, which is necessary for the logic of my application.
What to do?
Read more
Because it is for solving this problem that such an API as
SpeechMarkup is needed .
In fact, it does not perform speech recognition. It receives as input a regular line, which then turns into JSON, where all the entities listed in the required format are listed. Say, "In five minutes" will turn into "18:15", "On Saturday" - in "11/15/2014", etc.
Or rather - here is an example of the answer.
{ "string": " ", "tokens": [ { "type": "Date", "substring": " ", "formatted": "17.11.2014", "value": {"day": 17, "month": 10, "year": 2014} }, { "type": "Person", "substring": " ", "formatted": " ", "value": {"firstName": "", "surName": ""} }, { "type": "Text", "substring": "", "value": "" }, { "type": "City", "substring": "", "value": [{"lat": 59.93863, "lon": 30.31413, "population": 5028000, "countryCode": "RU", "timezone": "Europe/Moscow", "id": "498817", "name": "-"}] }, { "type": "Text", "substring": "", "value": "" }, { "type": "Number", "substring": " ", "value": 52 }, { "type": "Text", "substring": "", "value": "" } ] }
As you can see, SpeechMarkup “marks” the source text with data that it can find, and returns it in the same order in which they appear in the text.
That is, our application can send a line and get back the usual JSON, where each entity has its own type and a certain format, independent of the language of the original request! As written in the
SpeechMarkup REST API documentation , entities such as
dates, times, numbers, cities and full names are currently supported. Well, everything else is marked as plain text.
Custom entitiesThe service has appeared only recently, but it plans to provide the service users with the opportunity to create
their own entities and the logic of converting them into data of the required format.
It is important to note that
SpeechMarkup does not work with the context of the request . In other words, it is the task of a specific service to interpret the data obtained from the text. That is, if your service is not interested in, say, the entity names, then it can ignore their markup and work with them as with a regular string, if it needs it. How this happens is shown by a simple example.
Simple sample application

As an example of using the API, let's take a
demo project that implements the functionality
of the reminder service . Of course, any application on any platform, written in any programming language, can use the REST API, since all you need to do is send an HTTP request with text and several parameters and get JSON back. In this example, we use JavaScript.
So, what does our test reminder service do? Saves reminders. All that is needed from the user is to enter text, which will then be interpreted, and if it contains all the data, it will turn into a reminder. If there is someone's name in the text, then it is additionally highlighted in the list item.
You can try to see examples.Let's look at the part of the JavaScript code that sends the request text and gets back an answer, from which it constructs a list item with date, time and reminder text data.
Sending text with parameters
$('#form').bind('submit', function(event) { event.preventDefault(); var val = $.trim(text.val()); if (val) { var date = new Date(); $.ajax({ url: 'http://markup.dusi.mobi/api/text', type: 'GET', data: {text: val, timestamp: date.getTime(), offset: date.getTimezoneOffset()}, success: onResult }); } return false; });
It's simple. When the user submits the form, take the value of the field with the text and send it by the GET method to
http://markup.dusi.mobi/api/text
2 more additional parameters are needed to correctly convert dates and times from text on the server side of SpeechMarkup. These are the
timestamp parameter, which is the client’s current date-time in milliseconds, and the
offset parameter, which contains the UTC time offset in minutes. It is important to indicate them, because otherwise, SpeechMarkup does not recognize what the client means, for example, “in 5 minutes”.
And here is the code that handles the answer.
function onResult(data) { var resp = JSON.parse(data); var item = createItem(resp); if (!item.text) { warning$.text(' ?'); } else if (!item.time) { warning$.text(' ?'); } else { warning$.empty(); if (!item.date) { item.datetime = moment(); if (item.time.value.hour < item.datetime.hour()) { if (!item.time.value.part && item.time.value.hour < 12 && item.time.value.hour + 12 > item.datetime.hour()) { item.time.value.hour += 12; } else { item.datetime.add(1, 'd'); } } item.datetime.hour(item.time.value.hour).minute(item.time.value.minute); } else { item.datetime = moment([item.date.value.year, item.date.value.month, item.date.value.day, item.time.value.hour, item.time.value.minute]); } items.push(item); appendItem(item, items.length - 1); text.val(''); } }
Since we work with dates and times, it is convenient to use the Moment.js library .There is a little more code here, but it is also simple, and most importantly, it
does not operate with text ,
does not parse it , but works with already prepared data that SpeechMarkup has formed.
In this code, we are trying to construct a reminder from the available data. Namely, if no text or time is specified, then say so. And if everything is there except for the date, then understand at the specified time, for which date to create a reminder.
At the beginning of the method, you saw a call to
createItem , which from the response collects an object for manipulation. Here is his code
function createItem(resp) { var tokens = resp.tokens; var item = {text: tokens.length > 0 ? '' : resp.string}; for (var i = 0; i < tokens.length; i++) { var token = tokens[i]; switch (token.type) { case 'Person': item.text = $.trim(item.text + ' ' + '<span class="label label-warning">' + token.substring + '</span>'); break; case 'Date': item.date ? item.text = $.trim(item.text + ' ' + token.substring) : item.date = token; break; case 'Time': item.time ? item.text = $.trim(item.text + ' ' + token.substring) : item.time = token; break; default: item.text = $.trim(item.text + ' ' + token.substring); } } return item; }
Actually, this is the part that parses the return JSON from the server and either adds some entities to the reminder text, or to the date or time.
To fully understand what a token or substring is, let's take a little bit of the SpeechMarkup API.
API SpeechMarkup
As we have already seen, SpeechMarkup accepts a line and several additional parameters as input, and returns JSON with the source string (
string field) and an array of found entities (
tokens field) at the output. If the array is empty, it means that no specific entities were found and everything is plain text (do not forget that SpeechMarkup works with a specific set of entities that you can add to your own in a short time).
Each token is an object in which the type of the entity is indicated (
type field), the part of the string to which it refers (
substring ), and the transformed final
language-independent value (
value ). For
Text type, this field contains the substring itself.
There may also be an optional
formatted field for a compact view of the data. For example, the date will be recorded in the format “DD.MM.YYYY”, the time - “HH: mm: ss”, and the type Person - in the form “Last Name First Name”.
Each type of entity has its own value format in the
value field. For dates, this is an object with the day, month, and year fields. For time - hour, minute, second.
For cities, this is not an object, but an array (since there are many cities with the same name). In each city there are coordinates, population, country code and standard name.
In essence of the Person type, there are the fields firstName, surName and patrName, some of which may be missing if the user specified, for example, only the name.
Based on these data, you can go through all the tokens in order (because they go exactly in the order in which they are specified in the original text) and, depending on the type of entity and its value, apply this or that logic.
If in the text time occurs several times, then all but the first one are added to the text. The same with dates. If there is a name in the text, it is additionally highlighted in the text.Eventually
SpeechMarkup offers a
free API for marking up entities in queries in natural language, which allows your application to interpret speech as well as plain text input. Over time, users of the API will also be able to create their own entities and the logic of their conversion into data, which will allow you to create handlers for more specific queries.
Here are some links that will help you learn more about the project and keep abreast of innovations:
SpeechMarkup project websiteGitHub DocumentationGoogle+ Developer Community