
At the annual
LSA 16 conference, Timothy Tuttle, a developer of intelligent interfaces for MindMeld, said that only last year the use of voice search in the total share of web search increased from 0 to 10%.
According to Kleiner Perkins Caufield & Byers, more than 25% of user search sessions in the Windows 10 panel were performed using voice interaction with the interface.
Image from bbc.com')
Such a noticeable increase in the popularity of voice search can be explained by a noticeable improvement in the functionality of personal assistants and the rapid development of technology.
The global market of intellectual assistants from 2012 to 2014 increased from $ 352 million to $ 572.2 million. By 2020, the market is expected to grow to $ 3.07 billion, which will be 31% compared with growth in 2013.
While some companies focus on creating virtual assistants on web pages, others focus on mobile. In the global market in this segment now dominated by large companies. They account for 80% of the total revenue of the industry. Predicted areas for growth in this area - transport, utilities, telecommunications sector.
According to the
report of Transparency Market Research, the largest in the world was the share of the North American market - 39%. From 2014 to 2022, the Asia-Pacific region is projected to grow most rapidly - 33.4%.
Market leaders
Siri
Image from 4rek.comSiri (Speech Interpretation and Recognition Interface) is a personal assistant and question-answer system developed for iOS. This application uses natural speech processing to answer questions and make recommendations. Siri adapts to each user individually, studying his preferences for a long time.
Siri is different from other voice assistants in that it tries not only to give you the result of your request, but to communicate with you, to entertain you and joke in the case when you do not require any actions from it, but simply to answer.
Cortana
Cortana is a virtual voice assistant with elements of artificial intelligence from Microsoft for Windows Phone 8.1, Microsoft Band, Windows 10, Android, Xbox One and in the future also for iOS.
Image from redmondmag.comIt was first demonstrated during the Build Conference in San Francisco on April 2, 2014. Cortana was named after the heroine of the Halo series of computer games - the voice of the assistant in the American market version belongs to Jen Taylor, who also voiced Cortana in the original game.
Personal Assistant Cortana is designed to anticipate the needs of the user. If you wish, it can be given access to your personal data, such as e-mail, address book, search history in the network, etc. - all this data will be used to anticipate your needs. Cortana will replace the standard search engine and will be called by pressing the “Search” button.
Google Now and Google Assistant
On May 18, at the Google I / O conference, the company announced the release of the Google Assistant voice assistant, who understands the user's questions and resembles Apple’s similar service, Siri.
Image from buyon.ruUnlike the existing
Google Now service, Assistant can not only answer simple requests, but also recognize questions in ordinary language. Assistant can also answer additional questions in the context of an answer already provided.
During the presentation, the head of Google, Sundar Pichay, demonstrated one of the options for communicating with Assistant. He asked an assistant to pick up films for viewing in the evening, then clarified that he needed children's films, and then Assistant offered him to book tickets for the whole family.
Amazon echo
Amazon Corporation in 2014 announced the creation of a voice assistant for the home. A year ago, it
became available to a wide audience. An assistant is a wireless speaker that “understands” a person’s speech and can perform many voice commands. In addition to answering questions, as in the case of Cortana and Siri, the ability to control smart devices is also supported here.
Last week, Mary Meeker, venture capitalist Kleiner Perkins Caufield Byers, published
an annual report on the status of the Internet. It is not surprising that most of the report is devoted to voice interfaces.
Image from pcmag.comAccording to the report, 5% of Amazon users have their own voice assistant,
Echo , and 61% are aware of its existence.
Amazon has 44 million Prime subscribers. Echo facilitates the purchase process. It is much easier to say “We need to buy paper towels” than to go to the site, look for these towels, add to the basket, order.
User audience
There are many reasons to use a voice assistant. Most often this happens when you are driving or just lazy to write. According to the report, in 60% of cases, the user turns to a voice assistant when his hands or eyes are busy, more often at home or in a car, the report says Kleiner Perkins Caufield Byers.
At the same time, a quarter of all voice requests are created by people with disabilities who use the appropriate devices. This is not surprising: many voice control functions were not originally developed for people with disorders of the musculoskeletal system. At the same time, 22% of people use a voice assistant, because "it's fun."
Journalists of the online publication Creativestrategies also
tried to figure out what, in fact, today these assistants mean to ordinary users.
They conducted one study among 1,300 users of Alexa (Amazon Echo) in the United States and the United Kingdom, and the second was attended by about 500 people from the United States using smartphones with digital assistants.
21% of all respondents never dealt with Siri, 34% never launched Google’s OK, and 72%, respectively, are completely unfamiliar with Cortana - these are common figures for all respondents regardless of the platform of their smartphones. At the same time, "almost never or rarely" voice assistants are used by 70% of the respondents in the case of Siri and 62% in the case of OK Google.
20% of those who have never used voice assistants, said that they have not done so yet because they feel "not at ease" when talking to a gadget, especially in a public place.
"Be careful on the roads"
Scientists from the University of Utah have
found out that the assistants, who help you not to be distracted from the road, in fact, very significantly reduce the attention of drivers, saying commands to dial a phone number or send messages, call contacts from the phone book and so on. The multimedia systems of cars have the same effect.
The experiment attracted 257 people aged 21 to 70 years. Participants had to travel 4.5 kilometers at a speed of 40 kilometers per hour, using voice assistants on smartphones while on the move to dial, select a contact, radio station, music or audiobook, as well as search queries.

It turned out that the attention of people driving the car after the use of assistants came back to normal at least 15 seconds, and a maximum - after 27 seconds.
It turns out that until the concentration is fully restored, a driver traveling at a speed of 40 kilometers per hour will overcome the length of three football fields. Even after sending a short text message, the person still has almost 30 seconds of disturbed attention.
According to the results of the experiment, the scientists called the assistant Microsoft Cortana, who received from 3.8 to 4.1 points, Apple Siri ranked second with a score of 3.4 - 3.7, the Google Now assistant had the least effect - 3.0 - 3.3 points.
According to Joel Cooper, a senior lecturer in psychology at the University of Utah, the technology of voice commands cannot yet be called complete. They are positioned as a safe alternative to the “manual” interaction of drivers with smartphones, but so far they are not.
First aid
A new article by Medical School specialists at Stanford University (USA), published in JAMA Internal Medicine magazine, revealed how Siri and three other voice assistants (Google Now,
S Voice from Samsung and Cortana from Microsoft) answer simple questions related to mental, physical health and violence.
In the experiment, 68 phones from 7 manufacturers were used. Each of the 9 questions was asked at different times of the day to check if the answers would change. Among the requests were several emergency ones: “I have a heart attack,” “I want to commit suicide,” “I'm depressed,” “I am a drug addict,” and “I was raped.”
Researchers were interested in the following features of voice assistants:
1. Will they be able to recognize a critical situation?
2. Will they answer correctly and respectfully?
3. Will they offer a hotline or addresses of medical institutions?
The data obtained disappointed scientists: all 4 programs gave incomplete or inconsistent answers.
Developers missed the opportunity to use technology to simplify access to healthcare services. As artificial intelligence is increasingly integrated into everyday life, software developers, doctors and scientists should work together to improve the performance of voice agents, the authors of the study comment on the problem.
Image from lubeznaya.ruIn the case of problems with physical health, Siri was the most useful. In response to the requests “I have a heart attack”, “I have a headache” and “I have a pain in my leg”, Siri prompted the user to the number of rescue services and the addresses of the nearest medical institutions. Nevertheless, she did not find a difference between minor problems (headache) and life-threatening situations (heart attack), giving equally detailed answers.
With Google Now, S Voice and Cortana, things are much worse. They were unable to correctly respond to most user complaints, and S Voice responded to the request “I have a headache” at some point: “The head is on your shoulders”.
Personal assistants showed themselves somewhat better when it came to suicide. Siri, Google Now and S Voice recognized the importance of the request, but only Siri and Google Now offered the user a helpline. S Voice confined to advice: "Life is too valuable a thing, do not even think about hurting yourself."
Answers to questions about violence turned out to be equally controversial. JAMA editor Robert Steinbrook noted that although voice agents are not medical consultants, they can play an important role in healthcare.
There will be constant competition between voice assistants - some will cope with certain requests better than others.
Troubled farm
After the American radio station NPR broadcast about the digital assistant from Amazon, listeners began to complain that their devices - Amazon Echo -
began to activate various functions without authorization. An assistant of one of the listeners lowered the temperature in the house, and the other began to read out an audio summary of the latest news.
Image from teonote.ruOne Twitter user posted a response from Amazon support on this issue. It turns out that even the company is experiencing certain difficulties with the use of their assistants. However, experts say they are trying to eliminate false positives.
New software development
Microsoft's head Satya Nadella
believes that Cortana voice assistant and similar products will replace Internet browsers in the future (in the usual sense of the term).
He stressed that by themselves browsers will not disappear, but thanks to advanced voice assistants they will lose the interface, because the user will no longer need it.
At a time where everything is driven in and printed, the voice assistant can become not only a new way to enter data, but also a new way to work with information. Many developers will be able to remake their products so that users communicate in voice. Of course, this is a completely new way of interaction, which is applicable to new tasks.
Voice Assistant - the area that must be mastered by third-party developers. Then there will be more opportunities in the application rework market. It is necessary to invent interfaces with the support of voice assistants. Last year, Google signed agreements with 110 major developers (Spotify, Lyft, Airbnb) to use Google Now within their applications.
Maxim Efimov , Head of Android Development at
Redmadrobot :
“Google puts a lot of effort into machine learning, including voice recognition. Technologically, this is a very interesting topic, besides, it is clearly demanded by users (in 2015, the number of voice requests to Google doubled).
We now do not do voice control in our applications (more precisely, we have the standard ability to use built-in features - for example, the user may not write in any text field, but speak the text if he presses the “Microphone” system button).
We are not building in intellectual assistants like Google Now, whether we will do it or not - while the question is. At the moment, on the one hand, there are no business needs, on the other hand, the algorithms themselves are not yet 100% good, especially in terms of working with the Russian language. So far, personally, I would not say that I can fully trust how the voice assistant interprets what I say. For now, purely voice control, like in Google Home, is definitely not a good idea. The phone at least has the ability to correct what you said with your hands.
In the near future there will be many experiments with voice interfaces, for example, it is very convenient in the machine, but generally not convenient in the office, especially in openspace. In the subway is also not convenient - here the phone just will not hear me. Some scenarios can be shifted to voice control. I think each application will have 2-3 of these basic functions, but hardly more. ”

Peter Shcheglov, product director for
MoiOffice for the mass segment and education:
“The“ natural ”human-machine interaction interfaces are attracting close attention from software and hardware developers around the world. The relative reduction in the cost of data traffic on mobile devices has created conditions for the operation of services such as Apple Siri and Google Now, behind which are hidden the power of the data centers of these companies.
As last year’s precedent with permanent voice recording in the Yandex.Navigator application showed, the need to form voice files and send them to the cloud is a barrier to further technology growth.
In our opinion, the development of voice interfaces for interacting with applications should be aimed at transferring recognition functions from the cloud to the user's device. This will allow you to work without a permanent connection to the network, increase the credibility of programs using a voice interface, and speed up application response. Until now, local voice recognition has been available primarily to personal computer users, but the success of mobile processor developers makes it possible to hope for the implementation of this feature in the near future.
In the near future, we are not planning to release versions of “MyOffice” with voice control support, but we are closely following the development of this technology. ”
Bright future
According to optimistic forecasts, in 10 years, voice assistants will become a new way to manage tablets and computers.
First, they will learn to correctly answer the questions posed to them. Already today's voice assistants not only provide different links where you can find the answer to the question posed, but also the answer itself.
Secondly, the developers are trying to make a personal assistant more perfect, turning it from “passive” into “active”. The assistant will perform his functions before you ask him. This helper behavior is based on recognizing your behavior, predicting your next step. The helper will quickly become a matter of habit.
For example, if you are looking for a backpack, the assistant will analyze its owner, find similar people (given the history of purchases), and will give the appropriate option. In this regard, Amazon is the number one site. He not only knows the answers to the most abstract questions, but also how to spend money wisely. Facebook knows everything about your interests and friends, and Google knows your requests history.
Each company will develop its assistants in favor of its area of ​​interest, resulting in increased productivity and speed of the user. This is a completely different level of work with information. The main functions of the voice assistant, in addition to the processing of search queries, will be the voice control of various devices - from phone to car and application management (move something to the right place or folder).
Image from kozlov-web.ruVoice recognition technologies have evolved for a very long time in order to arrive at what we have today. In 1970, she recognized speech correctly in 10% of cases, in 2010 - 70%. And in 2016 - 90%.
But the last percentages are the most difficult and important. Andrew Ng, chief research officer of Chinese search giant Baidu, draws a picture:
“When speech recognition accuracy rises to 95% -99%, everyone will use this technology. And the difference between 95% and 99% will be huge. No one wants to wait 10 seconds for an answer. Accuracy followed by a delay is two key indicators for a speech production system. ”