Artificial Intelligence for Android with Open API

Only the lazy one does not know today what the Siri voice assistant is . When two years ago this product was shown at the presentation of the new iPhone 4S, many looked at the development of the IT industry in a new way. Indeed, artificial intelligence in his pocket, understanding natural speech, has not yet been shown by anyone.

At that time, many people began to talk about the fact that Apple could, at the next WWDC, provide all iOS programmers with the opportunity to use the open API of Siri assistant for their own programs. The picture loomed rainbow - any application would be able to respond to user phrases by executing various commands. Indeed, if the AppStore has so many different useful applications, why not give them the ability to control their voice? Moreover, this type of communication with the user, like speech, quickly became a trend after the release of the iPhone 4s.

Whether Apple managed to do this and what we managed to do, read on.
')

Time passed, and the API at Siri did not appear

It should be noted that most people confuse just speech recognition and the actual capabilities of an assistant as artificial intelligence . There is a huge difference between these two concepts - speech recognition solutions (speech-to-text) have been present on the market for a long time (for example, in Android OS it is available to everyone), but to create an open technology of the interactive system (with context support, extraction of meaning etc.) no one has yet succeeded. Many also did not think about the number of problems that would arise when the general access of many programs to a single AI brain center in the face of Siri. And also about including absolutely new technologies with which programmers would have to deal.

The idea to create a voice assistant with an open and accessible to all API “artificial intelligence” was already in our heads at that time, and we decided to implement it.

Assistant in Russian

Our small group of enterprising developers took on the project, now known as Assistant in Russian .

It is worth noting that the creation of such a voice platform implies knowledge in such specialized areas as Recognition Technology (ASR) and Speech Synthesis (TTS), as well as NLP, which allows to extract meaning from user speech and control the context of the dialogue. This component is the connecting agent for any artificial intelligence system and allows not only to turn speech into text, but also to understand what the user wants. This is what distinguishes speech recognition technology from artificial intelligence technology.

Our goal was to make an accessible tool for using these technologies.

By the time of launch, the application was able to solve everyday tasks of the user with the help of speech. And users of Android version JellyBean could perform voice commands without an internet connection.

Open API artificial intelligence

From the first day every service of “Assistant in Russian” was created on the basis of the same platform that we planned to open for everyone in the future. This principle in English is called “ Eating your own dog food ”. Thus, we could simultaneously design the voice architecture and the functionality of the assistant himself.

The result of our work was an open API application and “ hybrid ” NLP technology, which, on the one hand, makes it possible to program the voice interface without any servers, using only your device and the Android SDK , and on the other hand, transfer part of the solutions to the cloud by of necessity. For example, your contacts are not sent to any servers (hello, Siri), and the list of all cities with which, say, the Weather service works, is not stored on the client.

All assistant services were created by different programmers, some of whom do not have special knowledge in the field of ASR, TTS or NLP. At the same time, there was no particular difficulty in using the API of our “Assistant”, since we set ourselves the task of making an open, accessible and understandable to all platform.

“Assistant in Russian” uses the interprocess communication (IPC) feature in Android OS, so the assistant itself acts as a voice interface between the user and your own application. At the same time, your application can display its GUI in the assistant interface — for this purpose, RemoteViews and other similar techniques are used.

What can API

Due to the API “Assistant in Russian”, you can create much more interesting options, where the functionality of the assistant goes beyond the limits of the device on which it works. For example, the third-party application “ AssistantConnect ”, using the API of our assistant, allows you to control the voice of various devices of the “smart” home and home theater.

At the same time, AssistantConnect is a regular android application that can send requests via the HTTP protocol to the XBMC and ZWave cinema to the Vera smart home controller.

You can also see how using the same add-on you can control, for example, a regular browser. All this demonstrates the capabilities of the assistant's API, which allows you to create a new type of communication with users.

How to get API

You can try the API in your own projects right now by downloading it from our website . Now we give only a brief description of how to use it. In the following articles we will describe in more detail the technical details of the implementation of the entire “Assistant in Russian” platform, as well as describe the nuances of using the API itself.

This article is the very first step in publishing an assistant's API. In the near future, much will change, we plan to provide more features, including a catalog of add-ons , through which the user can find all applications in the PlayStore that support voice control, as well as a commercial SDK for creating your own voice assistants.

The basics

To implement the library with the assistant's API in your application, you will not need to learn any new programming languages or technologies. All you need is an Android SDK and IDE for development. We suggest using Android Studio . Libraries are connected simply by specifying dependencies in the build.gradle file .

repositories { maven { url 'http://voiceassistant.mobi/m2/repository' } } dependencies { compile 'mobi.voiceassistant:base:0.1.0-SNAPSHOT' compile 'mobi.voiceassistant:client:0.1.0-SNAPSHOT' }

The API allows you to establish a connection between your application and “Assistant in Russian” in such a way that all user phrases that are relevant to your application will be redirected to a special service that you must implement. We call these services Agents.

Agents and modules

The assistant will extract in advance all the necessary data from the text of the phrase and provide it to the agent in the form of a semantic parse tree - Token. This is due to the special grammars (Modules) that you need to form for your service.

A module is a set of commands with patterns (Patterns) of phrases to which your agent should respond (the syntax of patterns is described in detail in the API documentation). The agent may at any time limit the set of such modules available to the user, thereby forming the context of the dialogue. Here is an example of the simplest module:

 <?xml version="1.0" encoding="utf-8"?> <module xmlns:android="http://schemas.android.com/apk/res/android"> <pattern name="UserName" value="*" /> <command android:id="@+id/cmd_hello"> <pattern value="* *"/> </command> <command android:id="@+id/cmd_name"> <pattern value="*   $UserName"/> </command> </module>

A module is simply an xml file that needs to be stored in your application's xml resource directory. Here is an example of a simple module with two commands and very simple patterns.
As you can see, the module does not contain any control code, since all the code will be described in the class of your agent. This reflects the basic principle of our approach to the voice API - the declarative part describing the grammar of the dialogue is separated from the control code that implements the processing logic and is completely independent of the language .

The agent is, in fact, an add-on over regular Android services . It implements the interface between the assistant and the logic of your application.

 public class HelloAgent extends AssistantAgent { @Override protected void onCommand(Request request) { switch (request.getDispatchId()) { case R.id.cmd_hello: onHello(request); break; case R.id.cmd_name: onName(request); break; } } ... }

Here is a simple example of how an agent can handle the commands described earlier in the module. The AssistantAgent abstraction provides many different methods for processing commands, managing the context of a dialogue, invoking third-party activities, etc.

A request contains all the necessary information about the user's request — the command identifier, the content of the request (token or something else), the session, etc. For any request, the agent should form a response - Response, containing the content of the response and, if necessary, instructions to the assistant about switching the dialogue context.

 request.addQuickResponse(“!”);

This is an example of forming a quick response in one line. But a slightly more complicated example:

 Response response = request.createResponse(); response.setContent(getString(R.string.hello_say_name)); response.enterModalQuestionScope(R.xml.name); request.addResponse(response);

Here, the answer, in addition to content in the form of a string (it is possible to transmit other types of content, for example, a GUI) also contains information about changes in the dialogue context. Now, the user will be able to access commands only from the R.xml.name module, and after the dubbing by the assistant of the response from the agent, the microphone automatically turns on - this is called “modal mode”.

Each agent is a service, and therefore it must be described in the application manifest - AndroidManifest.xml

 <service android:name=".HelloAgent"> <intent-filter> <action android:name="mobi.voiceassistant.intent.action.COMMAND"/> <data android:scheme="assist" android:host="mobi.voiceassistant.ru"/> </intent-filter> <meta-data android:name="mobi.voiceassistant.MODULE" android:resource="@xml/hello"/> </service>

Here you can see the main module of the agent and the package of the “Assistant in Russian” with which the agent can work.

After building your application and installing it on a device, “Assistant in Russian” will pick up information from your manifest and load the module. And then it will redirect all relevant requests from the user to your agent, if the assistant's NLP engine considers that the phrase is best suited for the patterns of the module commands.

To be continued

In this post we very briefly gave the basics of using our API, describing the basic principles of working with it. Of course, the assistant library provides many more complex functions: remote and fuzzy patterns, RemoteViews, dynamically changing the response content, extracting data from phrases, and more. All this is described in the documentation , which we will supplement in the course of improvements in the library itself.

We suggest that you try voice control in your own projects, join the developer community and help improve this tool.

Source: https://habr.com/ru/post/202132/

All Articles