⬆️ ⬇️

"I hear voices" or does Siri have a face

We hear voices constantly: in the metro, in navigators and in our smartphones. And if there is no doubt that the voices in the subway belong to real people, then the answer to the question of who is voicing virtual assistants and robots may soon cease to be so unequivocal.



On the other hand, the dubbing actors cannot yet be afraid of losing their job, because even for the dubbing of the BB-8 robot from Star Wars, Bill Hader, the famous American Saturday Night Live show on NBC, was recruited. About all the details in today's material.



Photo of Vancouver Film School CC-BY

')

Siri



Almost everyone has heard the sound of the American version of Siri, but few people think that this voice belongs to a real person, a professional voice actress , Susan Bennett (Susan Bennett). True, the actress herself, while working on the recording, did not even imagine that her voice would sound from each pocket. The fact is that the recording was made by a text-to-speech company that Apple bought later.



In 2005, Susan spent 20 hours a week in a recording studio, but it was very tense 20 hours: one often had to take breaks, drink plenty of water and recite absolute nonsense, consisting of a set of all sorts of unrelated words. In order for the sounds to be then combined into the right words that would sound naturally, it is necessary to pronounce all possible combinations of sounds in the language. And the finalization of the voice acting in 2011 took 4 months already, although the “Siri voice” worked only two hours a day.

Read more about Siri and how the recording took place, says Susan Bennett herself in a talk on TED Talks:





The actress is worried about the lack of protection of the rights of voice actors - their voice can be used for any purpose, and they do not receive any additional money even for such commercial use.



British male version of Siri under the name of Daniel was voiced by TV and radio host John Briggs (Jon Briggs), who also did not know that his voice would be used for Siri until he saw the advertisement on TV. He also recorded a voice for Scansoft in 2005. Nuance later bought it out, which, together with Apple, was developing Siri. During the work, John wrote down 5 thousand sentences in three weeks, but unlike Susan, he is quite satisfied with the fee received for the voice acting.



Women vs Men



But the actress, who records the voice for Google Now, prefers not to show his face. But you can see how the recording process itself happens:





The actress notes that this process is quite complicated, as it is necessary to speak at one pace and with one timbre. You cannot change your voice throughout the entire recording, and you should observe the correct intonation. But in Google, this is followed by a team consisting of a linguist and a stage speech specialist, which ultimately allows you to get a more natural speech.



In the case of Cortana from Microsoft, the situation is quite different: the very image and the name of the virtual assistant was borrowed from the Halo series of games. Therefore, for her voice acting was invited the same actress who worked on the voice of the heroine of the same name in video games. Jen Taylor knew exactly what recordings would be used for, and in general she couldn’t hide at all and even played the role of Cortana in the mini-series “Halo 4: Going to Dawn” in 2012.



Most virtual assistants speak in a female voice or are called female names. Some even see this as a manifestation of digital sexism. However, research results show that the female voice is more often chosen by the users themselves. People believe that it sounds friendlier, and the male is perceived as more aggressive.



This, of course, is not always the case; intonation and timbre play a big role. The difference between the perception of two different male voices can be seen in the example of home virtual assistant Mark Zuckerberg. Assistant name is Jarvis, and with the voice of Morgan Freeman, he is perceived as a very courteous and educated system:





We go, we go, we go



Even more people are confronted with a synthesized voice when using navigators. The male voice of Yandex.Navigator was recorded by a professional speaker, but an employee of the company was involved in recording the female version. The recording took only 3 hours, and the text fit on 4 sheets, which, in comparison with the voice acting of virtual assistants, is quite a bit.



Separate words are used to construct sentences that the navigator utters, but you had to say whole phrases on the notes so that the text sounded more natural. For the dubbing of the navigator, Vasily Utkin was invited to the Olympiad, who spent several hours in the studio and said 160 phrases. Only 120 are used in the navigator, but the creators promised to change some of them in order to diversify the trips. And Vasily even invented some phrases himself.



Its features are in the voice acting ads in the subway. For example, the first recordings with modern metro voices were made more than 20 years ago, which means that they were written on reels of film. Therefore, the actors had no margin for error. More precisely, if the error was made, it was necessary to rewrite everything all over again. And now, if you need to add new information to a record, you have to overwrite the voice acting of the entire branch.



And the face is not only in Siri, but also in the Moscow metro. In fact, there are even three of them : actors, radio and TV presenters Julia Romanova-Kutyina, Sergey Kulikovskikh and Alexey Rossoshansky. Celebrities or children are attracted to announcements for different holidays. But what people say in the subway can be influenced by ordinary people. For example, after activists expressed dissatisfaction with the phrase “Request to free the wagons”, it was replaced with “Please get out of the wagon.”



But in the near future, speech synthesis will be completely different due to the development of Google. WaveNet synthesizes speech not from fragments of human voice recordings: the program reproduces sound waves, analyzing them using convolutional neural networks (listen here ).



In addition to voice, she can even imitate music. While this technology is still quite expensive, since a lot of resources and time are required for network training and processing of records, but now 50% of people in the control group have taken WaveNet speech as human. And in the future it will be possible to imitate the voice and intonations of any person, however, for the training, all the same, we need recordings of the voice of real people.



PS What else can you read in our blog:



Source: https://habr.com/ru/post/370257/



All Articles