The risks of using speech recognition from Google in their business projects

Hello.

Based on the article " Customer self-service using google ASR "
I would like to tell you what business risks are when using Google speech recognition for your call center.

I represent the company "Center for Speech Technologies" (MDGs) and we are engaged in the technologies of synthesis and recognition of Russian speech and first of all, we are just making decisions on the automation of call-centers.
We have been engaged in speech technology for over 20 years. About 70-80 scientists and programmers are working on this now, plus speech departments at leading universities of the country help us, plus we have our own speech technology department at ITMO - we are raising our young scientists.
')
And in this post I will draw parallels between speech recognition from the MDGs and Google, regarding its use in the corporate sector (call centers).

Our technologies (MDGs and Google) are repeatedly compared at various meetings and speeches, they ask provocative questions about this, and we have already worked out the prepared answers to them.

But first, I'd like to admit that Google's speech recognition works very well. But this does not mean that recognition from the MDGs is worse. You can see for yourself: video .

Another important point: in the corporate sector, speech recognition is not the most important role, there are many other factors that need to be taken into account. I will tell about them below.

What is the basis of speech recognition technology?
This statistic is the processing of thousands, millions, and in the case of Google, and billions of real words and expressions that people use when building their phrases.
Where does Google take the base for its recognition? It's very simple - this is the Google search bar, i.e. they can recognize everything that people have ever written in the search box.
This is called recognition by a common language model, i.e. talk about everything at once on any topic and the use of standard colloquial words and expressions.
For example, Google easily and without error recognizes phrases such as: “What is the weather now?”, “Where is the Bering Strait”, “What is the exchange rate” - all this once and many times people asked Google.

The first difference is the quality of recognition.
Google will not be able to recognize speech related to narrow specific topics.
Recognition of Google in general is not specifically trainable for your dictionaries. For example, the phrase "37 cm above the Z-line in relation to the X-axis" is a real phrase from a project that Google does not recognize correctly, because people do not use this in everyday speech, and here also mixed alphabets.

We (MDGs) create our own speech recognition, and we can teach him on any topic, make all sorts of edits and settings. We and the client, when creating the voice menus, have all the tools so that the recognition works as it should, and not because someone has done it, without the ability to influence it.

A simple example: the words "bank" and "punk" are very similar, but when you call the call-center of the bank, the likelihood that someone will say the word "punk" is very small. Therefore, this word is not even in the database, which significantly improves the quality of recognition. When a person is heard poorly (big noises, bad diction, communication interference, etc.), the system should guess what the subscriber is saying. In this case, the use of limited vocabulary significantly improves the quality of recognition. This is more reliable than guessing which of the 1000 similar words the subscriber had in mind. After all, our task is to solve the client's question, and not to demonstrate the steepness of recognition. If something can be made easier - it is necessary to make it easier, it is always safer.

Two more points, one important, the other not very much in the context of application in call-centers.

Important.
Google is not responsible for the quality of its recognition, i.e. if he does not recognize correctly, then no one is to blame. He does not owe you anything and does not guarantee anything to you. Want - use, want - no.

The MDGs are responsible for the quality of their recognition under the contract. It is for this that customers pay money - for the result. If something is not recognized, it means that we will finish it until it is recognized - remember, we have a whole scientific department.

Not a very important point in this case.
Google recognizes segments of 15 seconds of speech.
In the MDGs, streaming, continuous recognition works without any restrictions on the reading time — at least read the entire book right away.

The second difference is the communication channels.
All Google recognition works over the Internet. No internet - no recognition - your system is down. Business got up. This is a real project risk.
This also includes breakage of network equipment, packet loss, disconnection of the Internet for non-payment, and everything connected with it.
The MDG solutions work locally, on a server that stands nearby and buzzes. Everything is its own and native - many will understand me.

The third difference is the contractual relationship.
You cannot enter into a legal contract for the provision of speech recognition services with Google. They generally do not provide recognition officially. Officially, it can be screwed only to mobile applications. Any attempts to fasten it to your call-center are a risk for business, work through a back door.
At any time, they can cover this loophole, which has already been with other projects.
I will say even more - it is not legal to make commercial projects based on Google recognition in a call-center topic (to be completely honest).

The fourth difference is Technical support.
In Google, you will not receive technical support for speech recognition, which is bolted to your call center. If something does not work, you will not be able to call and complain anywhere. Especially if it is Asterisk and you all made it yourself (no matter how I personally didn’t treat Asterisk personally).
In the MDGs, technical support works 24x7x365 with departure to the customer. In the contract, you can register any SLA conditions (reasonable, of course).

The fifth difference is Security.
Everything that Google recognizes is recognized on servers in another country in the world. Here and non-compliance with the protection of personal data, and everything else related to this topic. No bank or medical institution will go for it.
The decision from the MDGs is local and works in the network segment where you indicated. On the Internet, he has nothing to do.

Why did I write all this?
Many customers who approached us repeatedly referred to the fact that Google recognizes perfectly and is generally free. It really is.
But is this the most important thing for a reliable business?
It is your choice and it is your risk.
Well, maybe, of course, not exactly yours, but your leader for sure.

PS By the way, I can not say that our recognition is very expensive. It is 30% less than the cost of similar offers of foreign developers.
+ We have special prices for Asterisk-developers.
+ we have cloud technologies, i.e. lease of ASR and TTS resources (remote access).

At one time, we were actively engaged in cooperation with Asterisk-developers and focused on small and medium-sized business of CC.
A lot of instructions / documentation was written about integration with Asterisk: here .

Source: https://habr.com/ru/post/189134/

All Articles

The risks of using speech recognition from Google in their business projects

More articles: