Stanislavsky for the robot. How to use the possibilities of emotional synthesis

Congratulations to all the upcoming holidays! This post will be not so much technical as fabulous New Year. According to the tradition, on the eve of the New Year, we make up gifts for our partners and customers. But we also want to do something nice for a wider audience. The best gift is a hand-made gift. And giving presents is most pleasant for children. Therefore, this year we came up with the project “Robot reads fairy tales to children.” We took 12 New Year's fairy tales, voiced them using speech synthesis and placed on our portal Voice Fabric .

As you know, the Speech Technology Center has created several TTS (text-to-speech) voices, which are used in IVR contact center systems, voice notification systems, and mobile applications, Radio RSS and Reader .
')

How the voices were picked up

In total, the team of “voice robots” currently has 5 female and 2 male voices. Each vote has its own name. The predominance of female voices is explained by the fact that in information systems it is women’s voices that are more comfortable for users to perceive.
The individuality of each voice is provided by several factors. First, the voice timbre is preserved in the studio recording of the announcer, whose voice is used as a prototype. Secondly, we preserve the intonation features of the speaker’s voice, applying a statistically calculated voice model, which is created precisely from the studio soundtracks of this speaker. fairy tales, we were guided, of course, by the New Year and Christmas theme, lack of copyrights, as well as personal preferences. The collection turned out to be variegated: these are well-known Russian folk tales (“Two Frost”, “Little Fox-Chanterelle”, “Snow Maiden”), and the European classics of the Brothers Grimm and G.Kh. Andersen. We also found lesser-known but charming Christmas tales "The Tailor of Gloucester" and "Little Twickly". And, of course, they could not ignore the wonderful parable of O. Henry “The Gifts of the Magi”, which is not really a fairy tale, but a very touching story about true love.

How is the sounding process

To begin with, the roles were distributed: mostly, male characters were given to male voices, female - to female. Exceptions were, but rare. For the author's text read and male and female voices. Just for a variety of impressions. Some of the characters received additional characteristic timbre. For example, mice were voiced not just by a female voice, but also the timbre was additionally changed in the direction of “small” cartoon characters.
Text processing before dubbing was minimal. SSML tags “voice” were inserted into the text, which set the voice name for the text and additional tags for setting the voice. Sometimes it was necessary to clarify the place of stress in an unambiguous word: “They are running, they are dear”.

Our task was to achieve a natural sound, so the voices should convey the emotions of the characters. And our voice robots know how. For example, they can laugh. To do this, in the text you just need to add a cheerful smiley. And sigh. To do this, in the text you just need to add a sad smiley.

But with the expression of other emotions turned out to be more difficult. It turned out that the expression of negative, our robots are not trained. In the fairy-tale "Little Chanterelle," the old woman scolds the old man very gently and tenderly.

/> : , , :/>- !.. -! !

And the voice does not sound at all senile, but melodious and young. Here you can exclaim in the spirit of Stanislavsky "I do not believe!"
A funny moment in the same fairy tale: “The foxy jumped and hit its head in a tub with tEstom.” It is immediately noticeable that the actress has a technical education, and she is far from home economics.
Indeed, our New Year project is more like a creative initiative. But our synthesized voices are used in a real theater. The MDGs are launching a project at the Alexandrinsky Theater in St. Petersburg. A speech synthesizer is being developed that can read an artistic text while preserving the speech and emotional features of the voice of a given actor.

What happened?

We launched a page with fairy tales on the MDG portal dedicated to Voice Fabric synthesis speech technologies, and also announced on the pages of the company VKontakte and FaceBook .
We were very worried about how children will perceive our project. However, readers liked our fairy tales. More than half of the survey participants on VKontakte rated the project positively. Naturally, there was a criticism that no robots can replace the reading of books by parents. And we fully agree with this! But to acquaint children in an interesting game form with the world of high technology can even be very! In addition, this technology has an important feature - it allows you to read fairy tales to visually impaired children.
What do you think ?

PS Most recently, the English-speaking colleague Carol joined the team of our robots, and we will definitely come up with something interesting for her creative debut!

Source: https://habr.com/ru/post/207536/

All Articles

Stanislavsky for the robot. How to use the possibilities of emotional synthesis

How the voices were picked up

How is the sounding process

What happened?

More articles: