Kettle and voice assistants. The beginning of a great friendship

What do we have at a given time in the GA world? Known fact: each of the large IT companies has its own tool for working with smart homes. And each vendor provides its own API for those interested in integration. And at the initial stage, it even pays developers for new skills (actions, skills, etc. - in accordance with the vendor's terminology).

The most convenient and practical service to date, according to our experts, is Amazon Alexa. She has more opportunities for detailed skill formation than Google Assistant, Yandex "Alice", Mail.Ru "Maroussia", Tinkoff "Oleg" and others. For Alexa, a device is a parametric virtual entity, as a result of which skills can be customized for each device individually. For example, in addition to water temperature, you can specify consumables that the assistant will offer to buy on Amazon. But, unfortunately, Alexa does not currently support the Russian language and does not work on the territory of the Russian Federation, so this GA is useless for a Russian user. At Google and Yandex, the assistant is more “natural” - it receives and responds to commands in the “human” language, can conduct a dialogue with the user, which makes this GA more pleasant to use. The only serious drawback of Google was that its Actions did not support the Russian language. However, from July 24, 2019, Google Actions have been working in “phones” in Russian, so colleagues have eliminated this drawback.
')
This is all right. And if we want to integrate one device with several GA?

It's possible. Using the device.

A device is an entity with its behavior in the system. This is a common principle for all vendors. And here it is worth stopping, here all the fun begins. The differences are in the approaches. For example, Google and Yandex are trying to standardize the management of technology. That is, now it is necessary to write code not for each individual device, but one program for the entire series is enough. And even if the firmware changes, you will have to change the code once, which is very convenient. Our company already has integration with Google, Yandex, Amazon. Technique listens to Alice , Alex and Google Assistant. Earlier we showed that inside voice assistants .

Where did the voice assistants come from?

One of the most advanced speech recognition systems in the world belongs to Google, its history began back in 2002. The company released Voice Search, on the basis of which the Google Assistant was developed. In 2016, he was presented at the presentation of Google I / O. Google Home is one of the 'surfaces' for the Google Assistant. Now the accuracy of speech recognition of their GA is estimated at 95% and almost inferior to people.

Alexa's voice assistant was introduced by Amazon in 2014. The Amazon Echo smart column, which can control a large number of devices within a smart home, was also presented there.

Yandex SpeechKit - Yandex speech recognition system. It is used in 400+ applications. The company also embeds its GA - Alice - in browsers and electronic devices. The Russian company introduced its GA in 2017, and already in the fall of 2018, Yandex launched its Yandex.Station smart column.

Our experts say that by the hundred and fifty-sixth year ...

We are joking, so far only by 2020. A bit about statistics:

In 2017, approximately 33 million voice-controlled devices were registered worldwide;
Western experts called voice search one of the top 3 SEO trends in 2017 ;
For 2018, Google Assistant works on 400 million devices worldwide. And this figure is only growing;
According to the Global Web Index , 25% of people aged 16 to 24 use voice search from mobile devices;
According to forecasts by Comscore , by 2020, 50% of requests will be made by voice;
According to 2018 research by WalkerSands , every fifth user of a smart speaker from Amazon bought with it, and a third planned to do it next year;
According to a PWC study , 71% of users who search the web would prefer to type in voices rather than manually.

As you can understand, the tendency to use GA is increasing, which suggests that it is time to take a vendor and launch your own assistant. For us, the key to it is the ability to control smart devices, which will distinguish SkyFriend from other assistants.

And let's integrate!

BUT also our task is to work with the existing vendor approach and further adapt it to our specific technology control protocol. We follow the path of standardization, practical application, perceiving the device as a set of skills: each kettle can boil water (skill), also it can heat it to the desired temperature (skill), maintain this temperature for a given time, etc. For example, “On / Off” command is standard for any device. The task is to transfer this command from the service to our protocol. What is the peculiarity of our protocol? It connects different voice assistants (now three, in the future - all large ones) and allows them all to work with devices, including at the same time. Communication is one to many. The only question is how exactly will we adapt our protocol to all approaches?

We'll see. Separate projects for each GA are:

Increased staff
A lot of code and legacy in the future;
Inability to scale.

When new assistants appear on the market, it will be necessary to proportionally increase the staff and volume of work. It is logical that we refused this option. However, despite the different approaches for each voice assistant, they can find something in common - what they basically work with is skill, trait, skill. The names are different, but the essence is the same. So, the task is to develop your “skill”, which will be perceived by assistants. In the future, you will only need to add new vendors, which solves the issue of scaling. We will also keep in mind that a significant amount of our equipment uses BLE vehicles, which dictates architectural features.

We have developed two microservices that work in pairs.

The first is the command layer. Its task: to carry out conversion (mapping) between the vendor API and our protocol. It works like this: a specific request for an assistant is mapping for our skill - mapping for a device protocol. With this approach, it is easy to add new skills: mapping is performed for the final R4S protocol - the code is transferred to the second service. The last item may be excluded when transmitting a command over Wi-Fi.

The second service or transport layer is used for:

Establishing a session with a client gateway;
Raising and maintaining a Bluetooth connection;
Reception / transmission of commands from the first service.

This service is part of a higher level entity: BT-device plus gateway-intermediary, working on the principle: receiving commands via the Internet - sending via BT. Wireless connections may be unreliable. Why? The radio channel can be limited by environmental parameters - thick concrete walls, etc. ... As a result, the devices can "fall off" in an elementary way, therefore maintaining a stable connection becomes an important task for the transport layer.

Connection policy can be different:

1. Continuous communication support.

Pros : minimum delay in executing GA commands.
Cons : expensive for traffic and power consumption; there is a limit on the number of simultaneously connected devices (in this generation Bluetooth 4.0 / 4.2 - six, in Bluetooth 5.0 up to twenty). It will also require additional server resources.

2. Connection on demand.

Pros : connecting almost does not require traffic and charge.
Cons : a high delay in the execution of commands plus the execution itself is not guaranteed (the connection may "fall off" or be unsuccessful). With this approach, we do not fit while waiting for the GA to answer. The session ended and the end.

The question also remains - the command is received and worked out, BUT what to do with the connection further: disconnect or keep on. Note that Apple HomeKit works exactly the same way when working with BLE terminal devices (via Apple TV or iPad as a gateway). It looks like this - the first time you try to send a command, the process takes quite a long time (or better, noticeable to the user), but subsequent commands are executed almost instantly. After the user’s work with the device is completed, the operating system “sets up” the session after a reasonable amount of time, and then the process is repeated as new.

However, that is not all.

Difficulty 1 . Gateway routing.

If there are several gateways in the room, the question arises - which one to connect to, and which gateway to which device is connected. Now everything works according to the principle - whoever succeeds is connected. Implementation is not always successful, since the nearest (and therefore capable of more reliable connection) gateway may be busy in the used time slot. Then the one who is free and capable is connected. This happens without regard to the quality of communication. Therefore, it is important to build a hierarchy and a scheme of work so that the user is as comfortable as possible.

Difficulty 2 . Lots of users.

This is a situation where multiple users can use one gateway or device at the same time. Of course with a high level of security. For example, from different GAs or from GAs and user phones. A swarm of questions: which device to turn on first if GA commands contradict each other, which command is a priority and must be executed earlier, etc. Partially our problem is solved by the Redis service - a database in which user sessions, device statuses and transceiver are stored commands and serving data bus between the first and second services. But this is where the solution to the problem stops.

What have we done? We made SkyFriend. This is our own development, a voice assistant for technology management, which will also support the Russian language. A key feature of our GA is that it is designed for direct interaction with smart Ready for Sky technology without additional instruments. The device is two in one - the assistant is combined with the gateway, which will receive information either through the commands that the user sends from his smartphone, or directly by voice. Plus SkyFriend has additional features that will allow it to compete with those that already exist. He can turn on reminders upon request, he can determine the user's geolocation, search for information on Wikipedia, recommend films, make toasts, read news, answer questions, tell the time and weather in any city in the world, play “Riddles” and “Cities” with the user , make jokes. Buying tickets and ordering a taxi is still at the alpha test stage. And this is only part of the functionality.

More recently, Google announced the work of its column on a similar architecture - the execution script is loaded directly into the Google Home column. Winning on the side of the user is to reduce the time it takes to execute instructions. You do not need to send it to the server of the manufacturer of the equipment; it flies directly to the column from the Google server through the same communication channel and is executed there.

However, Google still does not support other transports - Bluetooth, ZigBe, Z-Wave, RF, etc. directly on the column, and SkyFriend supports Bluetooth 5.0.

What else do we have left? Work with system resources - add memory, processor power, etc. And we are ready to offer users a new GA quality.

What can we say in conclusion?

GA is a trend, it’s convenient, it’s practical. The topic is new, comprehensive, it has many issues that are still difficult to solve. Especially alone. Therefore, we invite you to a discussion.

What will happen next? And then there will be our new article on SkyFriend architecture. We will tell and show everything. But then.

PS Suggestions and reviews can be left in the comments.

Source: https://habr.com/ru/post/461363/

All Articles