Past, present and future speech recognition technology

The voice is the future. World technology giants require a vital market share, and ComScore predicts that "up to 50% of all search queries will be performed by the voice by 2020."

However, the historical antecedents that brought us to this point are as important as they are amazing. In this report, we go on a journey through the history of speech recognition technology, before providing a comprehensive overview of the current landscape, and give tips that all marketers need to take into account in order to prepare for the future.
')

The history of speech recognition technology

Speech recognition technology has entered the public consciousness relatively recently, with brilliant launch events from high-tech giants of leading world trends.

Our admiration is instinctive: we are fascinated by machines that can understand us.

From an anthropological point of view, we developed a spoken word long before its written counterpart, and we can speak 150 words per minute, compared with the insignificant 40 words that an average person can write in 60 seconds.

In fact, communication with technological devices with the help of voice has become so popular and natural that we justifiably ask ourselves why the richest companies in the world have just now begun to provide these services to us.

The history of technology shows that speech recognition is not a new concern, even if the pace of development did not always correspond to the level of interest in this topic. As we see later, major breakthroughs from the 18th century provided a platform for digital assistants that we all know today.

The earliest advances in speech recognition focused primarily on creating vowel sounds as the basis of a system that could also learn to interpret phonemes (speech building blocks) from contacting interlocutors.

These inventors were hampered by the technological context in which they lived, and at their disposal were only the basic means for creating a talking machine. However, they provided an important pre-composition of more recent innovations.

Voice recorders, first created by Thomas Edison at the end of the 19th century, were able to record speech and became popular among doctors and secretaries, who made a large number of recordings daily.

However, it was not until the 1950s that this line of research led to genuine voice recognition. So far we have seen attempts to create and record speech, but not yet interpretations.

Audrey, the machine created by Bell Labs, could understand the numbers from 0-to-9 with an accuracy of up to 90%. Interestingly, this level of accuracy was recorded only when its inventor spoke; but only from 70% to 80% when other people spoke to Audrey.

This indicates some of the persistent speech recognition problems; Each person has their own voice, and the spoken language can be very inconsistent. Unlike text, which has a much higher level of standardization, the spoken word varies greatly depending on regional dialects, speed, accent, even social class and gender. Therefore, the scaling of any speech recognition system has always been a significant obstacle.

Alexander Waibel, who worked at Garpi, developed the car at Carnegie Mellon University, which could understand more than 1000 words built on this principle:

"So, you have such things as 'crippled', which can be 'While I was treated.' Or, if you say "awkward things," you can understand it as "I carry different things."

Until the 1990s, even the most advanced systems were based on pattern matching, where sound waves were transferred to a set of numbers and stored. Then they will be triggered when the same sound is heard in the device. Of course, this meant that it was necessary to speak very clearly, slowly and in conditions without background noise in order to have good chances of recognizing sounds.

IBM Tangora, released in the mid-1980s and named after Albert Tangor, became the fastest typist in the world and could adapt to the voice of the speaker. This still required slow, clear speech and the absence of background noise, but the use of hidden Markov models made it possible to increase flexibility due to data clustering and prediction of future phonemes based on previous patterns.

Although it took 20 minutes of training for each user (in the form of recorded speech), Tangora could recognize up to 20,000 English words and several complete sentences.

The seeds of voice recognition technology then sown are one of the most significant and important events in this area. There was a belief that speech recognition could be achieved only by adapting to the unique way of communication of each person, but it was very difficult to achieve this breakthrough.

It was only in 1997 that the world's first “continuous speech recognizer” was released (i.e., I no longer had to pause between each word) in the form of Dragon's NaturallySpeaking software. Able to understand 100 words per minute, it is still used today (albeit in a new form) and is in demand by doctors.

Machine learning, as in many areas of scientific discovery, has provided most of the breakthroughs of speech recognition in this century. Google has combined the latest technology with the power of cloud computing to share data and improve the accuracy of machine learning algorithms.

This led to the launch of the Google Voice Search app for the iPhone in 2008.

Thanks to the large amount of training data, the Voice Search application has shown remarkable improvements in accuracy compared to previous speech recognition technologies. Google introduced personalization elements into its voice search results, and used this data to develop its own Hummingbird algorithm, gaining a much more subtle understanding of the language used. These threads were connected in Google Assistant, which is now almost 50% of all smartphones.

It was Siri, released by Apple on the voice recognition market, which first captured the public imagination. As a result of years of research, this digital assistant with AI brought humanity into the world of speech recognition.

After Siri, Microsoft launched Cortana, Amazon launched Alexa, and the gears were set in motion. There is a battle for supremacy among the high-tech giants for the most advanced voice recognition platform.

In fact, we spent hundreds of years training machines to complete the journey, which takes an average person only a few years. Starting from the phoneme and creating individual words, then phrases, and finally sentences, machines can now understand speech to 100%.

The methods used to make these leaps forward have become more sophisticated to the extent that they now freely summarize principles based on the working patterns of the human brain. Cloud computing computers have entered millions of homes and can be controlled by voice, even offering interactive responses to a wide range of requests.

This journey is still not complete, but we have moved quite far from computers the size of a room, starting in the 1950s.

Speech Recognition Today

Smartphones were originally the only habitat for digital helpers such as Siri and Cortana, but this concept has been decentralized over the past few years.

Currently, the focus is mainly on voice-activated home applications, but this is essentially the strategy of Trojan horses. Capturing the dominant place in the consumer's home, these systems are the gateway for the distribution of smart (tracking) devices that can be classified under the broad concept of “Internet of Things”. Google Home or Amazon Echo can already be used to manage a wide range of Internet-enabled devices, and by 2020 even more inventions can be added to their list: intelligent refrigerators, headphones, mirrors and fire systems, as well as a growing lightning speed list of side innovations.

A recent Google survey found that more than 50% of users support a voice activated system in their living room, with a significant number also reporting that they have one in the bedroom or in the kitchen.

And in this, in fact, the essence; Google (and its competitors) want us to buy more than one of these home devices. And the more comfortable they are, the more people will continue to use them.

Their ambitions are greatly assisted by the fact that technology is now truly useful for everyday tasks. Ask Alexa, Siri, Cortana or Google what the weather will be like tomorrow, and she will provide a very clear, oral report. The device is still imperfect, but speech recognition has now reached an acceptable level of accuracy for most people, with all major platforms reporting error rates of less than 5%.

As a result, these companies are trying to “plant their own flag” in our homes as soon as possible. Hardware, for example, in the form of a home speaker system, is not something that most people buy. For example, if consumers buy GoogleHome, then it seems likely that they will complement this with devices that support Google, instead of buying from a competing company and creating unbound digital ecosystems under their roof. It is much easier to find devices that provide stability and convenience.

For this simple reason, Amazon has a reason to sell Echo Dot for only $ 29.99. This is equivalent to a short-term financial loss for Amazon on every device sold, but long-term profit will more than compensate for this.

According to today's estimates, about 33 million smart devices have already been installed (Voice Labs report, 2017), and both young and old generations are rapidly adopting this technology.

Tech Crunch reports that,

In fact, the demographics of the superuser helper are those who spend twice as much time with personal assistants on a monthly basis. On average, this is a 52-year-old woman who spends 1.5 hours a month using network assistants.

Perhaps the most important thing for large technology companies is to force consumers to use voice communication more actively when shopping through their devices.

Google reports that 62% of users plan to make a purchase using voice over the next month, while 58% use it to create a weekly shopping list:

Short-term conclusions about existing business strategies at Amazon and Google, in particular, are relatively transparent. The pioneer advantage seems to be a breakthrough in this arena, especially since speech recognition continues to evolve into communicative interactions that comfortably lead to purchases.

We have already written about the two focal points of the voice search strategy for high-tech giants: technology should be ubiquitous and it should be smooth. The voice is already a multi-platform ecosystem, but we are still at some distance from the omnipresence that it is pursuing.

To get an idea of the likely outcome of this competition, it is worth assessing the strengths and weaknesses of four key players in Western markets: Amazon, Google, Apple and Microsoft.

Amazon

Original equipment: Echo, Echo Dot, Echo Show, Fire TV Stick, Kindle.
Digital Assistant: Alexa

Usage statistics:

“Tens of millions of devices with Alexa support” sold worldwide during the 2017 holiday season (Amazon) 75% of all smart devices sold today are Amazon (Tech Republic) devices
Echo Dot was number one, selling a device on Amazon during the holidays, and Alexa-enabled on the Fire TV Stick in second place. (Amazon)
The average Alexa user spends 18 minutes per month interacting with the device, compared to five minutes for Google Home (Gartner)
Currently, more than 25,000 skills are available for Alexa (Amazon).

Overview:

The cylindrical device Echo and his younger brother, Echo Dot, were an unbridled breakthrough of smart devices. By connecting the system to a number of popular third-party services, Amazon managed to make Echo a useful addition to millions of households.

As Amazon spokesman Dave Limp said recently, “We think of it as an important computation, which is provided by system access, less personally loading you, but solving more extensive tasks.”

Ubiquity seems like a real opportunity based on sales figures.
After the festive season, when Echo Dot became the most popular product on Amazon worldwide, the Alexa app took the top position on the App Store, ahead of the rival Google product.

The legacy of Amazon, as an online store, gives it a built-in advantage when it comes to monetizing technology. The acquisition of Whole Foods adds extra weight, with the ability to integrate offline and online worlds so that other companies are jealous.

Moreover, Amazon has never depended on advertising to keep stock prices floating. On the contrary, in fact. Consequently, there is less short-term pressure, which allows you to intercept the initiative in this aspect and take the lead in the field of smart applications.

With advertisers looking to find a real online alternative to Google and Facebook, Amazon is in excellent real capitalization relationships. But the balance here is too fragile, to hold here is worth a titanic effort. Amazon is losing the most in terms of consumer confidence and reputation, so it will carefully switch to advertising for Alexa.

The company denies that it has plans to do this, but, as research firm L2 Inc. recently reported, Amazon turned to major brands asking whether they were willing to pay for Amazon's Choice, the designation of the best products in a particular category.

It is expected to see more attempts from Amazon to provide something besides paid ads in search results. The voice requires new advertising solutions, and Amazon will begin to retreat slightly to ensure that it does not violate Alexa's rights. And the recently announced partnership with the publishing giant Hurst is a sign of a future surprise.

The key to Alex's success will be the integration of Amazon's own assets, as well as third-party support, which has already led to the creation of more than 25,000 innovations. With support announced for new headphones, watches, refrigerators, etc., Amazon seems to be staying ahead of voice recognition technology for some time.

Google

Initial equipment: Google Home, Google Home Mini, Google Home Max, Pixelbook, Pixel smartphones, Pixel Buds, Chromecast, Nest intelligent home products.

Digital Assistant: Google Assistant

Usage statistics:

Google Home has a 24% share of the US smartphone market (eMarketer)

In Google Home (Google) more than 1000 actions

Google Assistant is available on more than 225 home control brands and more than 1500 devices (Google)

The most popular applications of Google Assistant are games followed by home management applications (Voicebot.ai)

Overview:

Google Assistant is directly linked to the world's largest search engine, giving users direct access to the largest database that has ever been known to mankind. This is not a bad repository for working with digital assistant, especially since Google continues to improve its speech recognition software.

Recent studies by Stone Temple Consulting on 5,000 queries have shown that Google provides the most accurate solutions at a fairly long distance:

In combination with Google Photos, Google Maps, YouTube and a number of other effective services, Google Assistant does not lack integration capabilities.

Perhaps Google did not plan to re-enter the equipment market after having received a warm welcome for its products in the past. However, this new market prompted the search giant to take very serious actions. At the moment there is no room for mistakes, so Google took matters into its own hands with the help of Pixel smartphones, Chromecast and, of course, home smart devices.

The Home Mini was very popular, and Google added Home Max to the collection, which comes at a higher price than even the Apple HomePod. All bases are provided with excellent coverage.

Google knows that gaming devices are not a long-term solution. This is a necessary strategy for the here and now, but Google will want to convince other hardware manufacturers to integrate the assistant, just as with Android smartphone software. This will eliminate costly production costs, but will retain a vital currency - the attention of consumers.

This plan has already been launched, and even support has been announced for a number of intelligent displays:

This innovation adds a fresh visual element for consumer interaction with smart devices and, vital, enables you to use Google Photos, Hangouts and YouTube.

Google also wants to add a “more humane contact” with its AI assistant and has hired a team of comedians, video game designers and empathy experts to add some personality to the product.

Google is, after all, an advertising campaign, so the next project will undoubtedly monetize this technology. At the moment, the main goal is to provide a better, more human experience than the competition, and to gain substantial territory in more households. The search giant will undoubtedly find new ways to make money from this situation.

Although it was slower than Amazon, Google’s new advertising and growing range of products mean that it is still a serious contender in both the short and long term.

Apple

Hardware: Apple HomePod (due to launch in 2018 for $ 349), iPhone, MacBooks, AirPods

Digital Assistant: Siri

Usage statistics:

42.5% of smartphones have Apple Siri digital assistant installed (increased visibility)
41.4 million Active users in the United States as of July 2017, which is 15% less than in the previous year (Verto Analytics)
19% of iPhone users interact with Siri at least daily (HubSpot)

Overview:

Apple maintains an enviable position in the smartphone and laptop markets, which allowed it to integrate Siri with its OS in such a way that other companies simply cannot replicate. Even Samsung, with its assistant Bixby, cannot boast of this level of synergy, since smartphones work on Android and, as a result, have to compete with Google Assistant for the user's attention.

Nevertheless, according to statistics, they lag a little when it comes to the use of equipment in smart homes of consumers. HomePod will almost certainly provide a much better sound than Echo Dot or the Google Home Mini, with a price tag of $ 350. It will contain many impressive features, including the ability to assess the surrounding space and adjust the sound quality accordingly.

The launch of the HomePod was postponed, and industry representatives said the cause was Siri. Apple protection provides certain user benefits, but it has some drawbacks when it comes to technologies such as voice recognition. Google has access to a huge amount of information that it processes in the cloud, and uses it to improve the work of the assistant for all users. Apple does not have such a valuable resource, and nothing but the same thing, quantity, which slows down the development of Siri since its launch on the market.

However, it seems that these are most likely short-term problems.

Apple will stay away from its core business strategy, and so far, very good. HomePod will sit on the premium end of the market and relying on Apple’s legacy of design, pay special attention to providing superior sound. It will only run with the support of Apple Music, so if Apple does not open its doors to third parties, it can remain only with its ardent fans. Fortunately for Apple, they are enough to make the product a springboard. We'll see.

Microsoft

Hardware: Harman / kardon Invoke, Windows-smartphones, Microsoft laptops

Digital Assistant: Cortana

Usage statistics:

5.1% of smartphones have Cortana assistant installed.
Cortana now has 133 million users per month (Tech Radar)
25% of Bing requests - by voice (Microsoft)

Overview:

Microsoft was relatively stable on the front of speech recognition, but its trump card is in owning many of the components necessary for the success of a speech recognition product.
With a very significant market share, the Office suite of services and popular products such as Skype and LinkedIn, Microsoft should not be written off.

Apple’s decision to defy Google’s results against Bing on his Siri assistant was a blow to Microsoft’s ambitions, but Bing could still be a competitive advantage for Microsoft in this arena. Bing is a source of invaluable data and has helped transform Cortana into a much more efficient speech recognition tool.

The Invoke speaker, developed by Harman / kardon with Cortana, integrated into the product, was also reduced to a more affordable $ 99.95.

In development, there are new speakers with Cortana support, as well as smart home products, such as thermostats. This, I suppose, may surprise us with an increase in demand, but there is a sharp feeling that Microsoft may be a little late for this party.

Where Microsoft can compete very seriously, this is an office environment that has also become a central factor for Amazon. Microsoft is ready to go the other way to gain a foothold in this market, but it can still be an extremely profitable segment.

Future Speech Recognition Technology

We are still at some distance from realizing the true potential of voice recognition technology. The problem concerns both the sophistication of the technology itself and its integration into our lives. Current digital assistants can interpret speech very well, but they are not the interactive interfaces that technology providers expect. Moreover, speech recognition is still not limited to a very small number of finished products.

The pace of progress, compared with the earliest discoveries in speech recognition, is actually quite phenomenal.

And, based on this, we can look into the near future and predict the transformation of the way we interact with the world around us. Amazon’s notion of "ambient computing" seems to be quite appropriate here.

The smart device market has significant room for growth, with 75% of US homes projected to be at least one by the end of 2020.
Now, when users begin to overcome the initial awkwardness in talking with their devices, the idea of asking the Alexa to boil a kettle or making an espresso does not seem so wild.

The voice becomes its own interface, extending beyond the smartphone to the house, and soon, to many other contexts of information.

We expect to see more complex I / O relationships as technology advances. Voice communication, while, somewhat limits the response potential, but innovations such as Amazon Echo Show and Google's support for smart displays will open up many new opportunities for interaction. Apple and Google will also include their AR and VR applications when consumer appetite reaches the required level.

However, minor problems still remain. First of all, voice search providers must find a way to provide a choice through an environment that is best suited for short answers. Otherwise, how can you ensure that the user receives the best response to their request, and not the answer with the highest decision budget?

Modern consumers are sane people and have access to an almost endless information base, so any shortcomings from brands will be documented and distributed online by users.

A new study from Google showed that there is growing recognition among consumers that brands will use smart speakers to communicate with them. A significant number showed a willingness to receive information about transactions and sales, almost half of which wanted to receive individual advice.
Speech recognition technology provides a platform for reliable communication, but marketers need honest and mutually beneficial relationships with their audience.

Main outputs

Brands must consider how they can make the interaction more valuable to the user. The undoubted advantage of voice search is that it is fast, convenient and productive. But, on the other hand, due to the assimilation and strengthening of the relationship between technology and the consumer, they will be able to separate us. The example of "Beauty and the Beast" gives a warning in advance for all of us.

Amazon is in an excellent position to monetize its speech recognition technology, but still faces obstacles. Amazon's Choice sponsorship was defined as a route to generate revenue without losing customers.

Google has made the voice ID the center of attention for growing its business. With vast amounts of data at its disposal and increasing third-party support, Google Assistant will provide a serious threat to Alexa Amazon this year.

Marketers should use technical recommendations for voice search to increase the transparency of their devices today. Although this technology is still in development, we must extend a helping hand to it, as it completes truly gigantic tasks.

The best way to understand how people can use speech recognition technology is to interact with it as often as possible. Marketers who are serious about identifying areas with additional capabilities should conduct their own research at home, at work, and on the go.

Source: https://habr.com/ru/post/346928/

All Articles