Intellectual CPaaS: industry innovations and what AI / ML gave her

In June in Amsterdam, the last regular APIDays is a conference for all who in one way or another create and use various APIs. The theme of the conference was “the flowering of contextual communications,” that is, communications in which both parties immediately and completely understand the context of communication. It sounds abstract, so a couple of examples: you get a call from an unfamiliar number. Accordingly, you do not know who is calling, from where and with what purpose. Conversely, if you do some operation in the Internet banking application and at some step something went wrong, you can call support directly from this step - the context of the situation will be clear to you and the operator on the move . To provide such awareness, businesses use communication platforms (CPaaS, Communications Platform as a Service ), and they, in turn, use AI and Machine Learning. It was about this that our CEO Alexey Ailarov told, speaking at APIDays, and today we are publishing an adaptation of the June performance.

CPaaS success

CPaaS is a fast-growing business. Why? The success of the CPaaS concept has several reasons.

First, CPaaS flourished mainly due to the rise of the “new enterprise” - when companies like Uber and Lyft proved their viability, it suddenly became clear to everyone that all these start-ups yesterday use cloud-based communication platforms. When the market began to understand this, the demand for CPaaS began to grow, as cloud solutions allow you to assemble ready-made “box solutions” on a very basic basis in order to start making money.

Secondly, we must remember that CPaaS platforms have always been aimed at developers. And every modern startup always has developers for whom it is easy to use CPaaS.
')
Thirdly, the clouds - there are clouds, which means availability to the service around the world, scalability and increase in capacity on request. And all this without a headache for someone who uses CPaaS.

And, finally, most platforms offer the principle of pay-as-you-go payment, when you have to pay only for what you use: there is speech recognition and translation into text - these functions are shattered, and there is no recognition - well, you understand. It is very flexible and transparent.

New in the industry

Here, the first thing to do is to mention Serverless, which has brought CPaaS convenience to a new level. Once we have already written in detail on this topic , now we confine ourselves to the main thesis: Serverless does not mean the absence of servers at all, but their absence on the client side. From the point of view of the computing resources used, this is the same pay-as-you-go, because the fee is charged according to the load on the computing provider. Another important point of serverless is that customers can be given access to the platform's timeout, and this leads to reduced delays and increased reliability.

Another trend is WYSIWYG editors. This is one of the steps towards the business audience, which (most often) does not know how to code, but at the same time can collect the logic of the bot / call center in a visual editor. Approaches to implementation vary slightly (see Smartcalls from Voximplant, Studio from Twilio, FlowBuilder from MessageBird, etc.), but the essence is similar - the user does not use the code, but the visual blocks, varying their location and the connections between them. By the way, some of these editors still allow you to use the code as an advanced feature, for example, our Smartcalls, but this is a slightly different story.

Finally, cloud IDE. Of course, for the time being they can hardly be compared with the conditional IDEA, but with VS Code it is easy . If CPaaS gives a developer a powerful tool for working with code, then such a developer will most likely be very satisfied. Normal debugger, smart autocomplete, code highlighting, custom styles, tabs, etc. - when it is in the web interface and it works quickly, the platform gets extra karma points for its flexibility.

But our joy would not be complete ...

... if not for AI. Machine learning gives new degrees of freedom to communication platforms, namely:

Recognition

Recognition and synthesis of speech - someone develops them independently, but this is very time consuming. You can turn to large players like Google, Amazon, Yandex for this - their models already recognize human speech very well, as well as imitate it (a nod in the direction of WaveNet).

NLU / NLP Automation

Natural Language Understanding (Processing) - natural language processing is now the hottest topic in the communications world. And if the business decision is based on NLU, then, as an option, speech synthesis occurs there, then the person responds something, his speech is transliterated, this text is given back to the robot and, to react, it selects the text of the answer, which again synthesize. It does not sound like rocket science, but it is still reasonable to use automation here - Google Dialogflow, IBM Watson, Amazon Lex, etc.

Reinforcement operators

When the call center operator communicates with the client, you can analyze the background and give additional information to the operator so that he does not waste his time. For example, a customer may ask where the nearest ATM is - the system recognizes the question and displays the answer on the operator’s screen; the latter will simply read out the answer, instead of asking the client to wait.

Analysis of emotions

Almost everyone is interested in this, but this is the most difficult direction in CPaaS at the moment, because people tend to give the same information differently and also quite often use cultural references in speech. Now many companies are analyzing emotions using text. Now there are solutions in this direction, but it is impossible to say that they were successful, since the analysis of only the text will not go far; It is obvious that emotions are not only what exactly was said, but also HOW. Therefore, a convincing analysis of emotions in real time is a question for the (nearest?) Future.

Audio / Video Enhancement

Everyone knows about noise reduction - when you talk on the phone, the trained model "removes" the background noise so that the other person can hear you only. Sometimes the voice of the speaker himself suffers, since the models cannot always successfully distinguish which frequencies relate to the background and which ones relate to the voice. But overall, it works quite well already. Speaking of the picture, we know how modern smartphones make bokeh (blur the background) using AI. This approach, but within the framework of video calls will also be in demand - imagine that you do not need to look for the perfect background, because the AI will blur any environment behind your back. Although why “imagine” - Skype already has this functionality .

Video analysis

Analysis of the video stream or video helps to understand what is in the frame. So far this is a very resource-intensive task, so today those who have a lot of computing power — Google, Microsoft and other major players — are the best to cope with it.

Call Analytics

This includes not only data classification and segmentation. Imagine that you have tens of thousands of call records, and you can translate them into text and then search for it. But it is much more effective if AI passes through these records and distributes them into groups (these are sales calls, and these are warranty calls), reveal where the call center operator behaved correctly, and where not very much (plus you can identify exactly the person behaved, what were the emotions), here the client asked only about the purchase of the car, and here - about the car, and about the insurance, and about the test drive. You can extract as much information as you want from such a data set using machine learning.

Answering Machine Definition

A special case, but also a good example: in our platform, we implemented the definition of an answering machine. Now the platform is able to recognize autoresponders in Russian - we have trained the model on a multitude of calls, now it is able to distinguish a living person from a recorded message. Conventional detection methods are not very effective (for example, by an audio signal), but AI helped us to achieve accuracy of up to 99%, while recognition takes only 2 seconds.

Difficulties

Machine learning requires a lot of resources. And it's not just about computing power, but also about people with special skills - data scientists who create and customize training models, and also know what data is needed. Such people are not easy to find and their work is expensive. They are also in great demand among major players, and it’s hard to compete with Google’s conditional recruitment plan, although it’s possible. Therefore, instead of competing, it is better to choose cooperation with giants - most CPaaS players use the achievements of large companies, and this is normal. On the other hand, this leads to the fact that the partner giant controls the costs of other players - it sets / changes the rates for speech recognition and synthesis (remembering Google’s WaveNet). That is, if you use the decisions of a giant, and he suddenly decides to change the rates, then you have to do the same thing, which may not really please your users. Add here the fact that you will send data to this giant - for some businesses this is a problem. However, you can always not depend on only one partner, to use the decisions of several giants with similar functionality. Finally, such cooperation is convenient and beneficial for CPaaS players.

Instead of conclusion

New technologies are coming that will affect communications just as WebRTC influenced in its time - these are 5G and AV1.

5G is designed to implement the principle of "always online" - this is the ultimate goal, but it is clear that this will not happen in one day. With the advent of this technology, CPaaS will have more opportunities, because even those who have not used mobile data transfer before will do it. The communications infrastructure will change, and with it, the usual telecommunications businesses will change.

The AV1 video codec will also be useful for CPaaS, since it is free, which means you will not have to worry about licenses. A free codec that is more efficient than H.265 and will be available to everyone will also change the world of communications.

The future is happening before our eyes, and Voximplant not only monitors what is happening, but also participates in this process.

Source: https://habr.com/ru/post/459368/

All Articles