Speech AI with Python & Google API

Good day!

The idea to make a "talker" in Russian recently came to mind. In my head there was a simple scheme like:

1) Recognize speech from a microphone
2) Come up with a more or less reasonable answer.
At this point, you can do a lot of interesting things.
For example, to implement the management of something physical and not so much.
3) Convert this very answer to speech and reproduce.

The most interesting thing is that for all these items there were libraries under Python, which I used.

The result was a bundle, almost independent of the language chosen as the spoken language.

Speech recognition

SpeechRecognition

This library is a wrapper over many popular speech recognition services / libraries.
Since Of all the services presented in the library list, Google Speech Recognition was the first to earn, which I used later.

Speech processing

Chatterbot

The library uses machine learning techniques. Training takes place on data sets in a dialog format.

The learning process in the library chatterbot

The data sources for training can be files of such a simple format.
In fact, they are a set of dialogs in the form:

-  -  - ... -

For English, there is a good set of training classes, one of which takes dialogs from Ubuntu Dialog Corpus, and the other from Twitter.

Unfortunately, for the Russian language, I did not find alternatives to Ubuntu Dialog Corpus (the same volume). Although the same TwitterTrainer should work.

As an experiment, I tried to use the dialogues from the first volume of the Warriors and the World.

It turned out funny, but hardly feasible, because the dialogues there are often aimed at specific characters in the novel.

Since it is difficult to get an interesting interlocutor from a bot without a lot of data, at the moment the search for a good base for conversations continues.

Another library chatterbot provides a set of "logic modules" (LogicAdapter). With the help of which you can, for example, filter the answer, teach the bot to count or say the current time.

The library is quite flexible; it allows you to write your own classes for learning and logical modules.

Speech synthesis and reproduction

Google Text to Speech

This library is able to convert a string to an mp3 file with speech. Since Google is behind this library, there are many languages to choose from, including Russian.

First successes

Project code

Available at the link: GHub

How to install and run?

Just want to advise to create a separate virtual environment for python.
For example with the help of conda .

 conda create --name speech_ai source activate speech_ai conda install python=3.5

For experiments with the above set of libraries, it is suitable:

python 3 (because there is no hassle with non-ascii characters, as in Python 2)

Packages put on instructions from sites:

Google Text to Speech
SpeechRecognition
Chatterbot
PyGame (For playback of synthesized speech)

Also, when installing SpeechRecognition, it is sometimes necessary to help one dependency (PyAudio):

 sudo apt-get install python-pyaudio python3-pyaudio pip3 install pyaudio

chatterbot advises using MongoDB to work in production.
By default, the Json file is used as the data storage, which leads to a multiple slowdown of the work with the training on medium-sized samples.

What's next?

From thoughts:

Diversify the logic of the bot, for example by adding a search query adapter to Google
Use Computer Vision here, for example, to voice the seen objects or the names of people passing by.
Add a bot of emotions with a state machine
Try to train the bot on Ubuntu Dialog Corpus
Use similar in robotics (for smart home)

Source: https://habr.com/ru/post/323570/

All Articles

Speech AI with Python & Google API

Speech AI with Python & Google API

Good day!

Speech recognition

SpeechRecognition

Speech processing

Chatterbot

Speech synthesis and reproduction

Google Text to Speech

First successes

Project code

What's next?

More articles: