Understand me if you can

Last week we received a letter from one candidate who did not pass the interview in English. It turned out that our colleagues became participants in a technical experiment that was held in parallel with the interview. We give a letter with minimal edits and thank the author for an interesting idea and courage in its implementation.

“I suppose I don’t go up to you, because I’ve failed the English test, unfortunately. Yes, I do not have good practice of colloquial English, but this does not prevent me from reading data sheets and communicating with foreign support by mail. Actually, this is not about this, knowing in advance that I will not pass the language proficiency test, I could not take advantage of the chance to apply a technical approach to this problem. Although I did not have the opportunity to test my method in person in advance, especially in communication with a person who is fluent in English and who feel the pronunciation features, I decided to try.
')
I must apologize for the experiment to the employee who conducted the test, and for the not very good sound quality associated with the technical features of "my" complex.

In compensation for participating in the experiment, I will tell his idea. Although it is not new, I think it will be interesting to your technical specialists, and with an adequate level of training and a team approach based on it, you can get an interesting result and even a commercial product.

I will describe the essence: I used two computers connected to the Internet, and Google Translate with speech recognition and synthesizer. To do this, I connected the analog path of my mobile phone with the sound cards of two system units.

One system unit is configured to translate from English to Russian, respectively, the audio output of the phone was connected to the line-in audio card. In the sound card settings, the signal duplication mode was set from the line input to the headphone output of my headset, so I heard the original speech and saw the correct Google Translate speech recognition.

The second system unit is configured to translate from Russian to English, its microphone input was connected to the headset microphone that was on me. I connected the audio output of the system unit to the analog path of the phone.

Thus, I saw the English text and heard the original, saw the translation, and while uttering a response to a second computer, I synthesized it into speech. As Google speaks in a female voice, I used the Fruity Loops digital filters to give it a male low tone and send the already processed sound to the analog path of the phone.

Although I failed the test of the English language, the reason was not the wrong recognition. I was let down by the narrow neck of the system, namely the 3G Internet connection - unfortunately, I have no other speed line. In the morning, as I understand it, the base station was less loaded, and I had enough bandwidth. But after lunch, the network slipped at the most inopportune moment.

The result of the experiment was that I was able to hold out on the first few sentences, then the synthesis of speech was not enough speed, I switched to independent answers and failed. Although I saw the correct translation, I am not able to quickly build sentences. It's funny that the interlocutor did not notice the switch from the robot to the person, highlighting only turning off the comfort noise generator, because of what the interlocutor thought during pauses that the connection was lost tightly.

The system needs to be improved - to pre-process the signal from the microphone, cutting off the noises that did not give the effect of silence, important for Google. Then he will be able to interrupt the processing and allocate a component of the voice, which will reduce outgoing traffic, increase the connection speed and not lose UDP packets. In this case, the system turns out quite combat. We transfer it to two Raspberry PI and get a real-time translator.

I myself can not implement this idea - I need a DSP for preprocessing, it is necessary to do circuit design, build a printed circuit board topology, write a processing algorithm for DSP (although they are publicly available), then write a program for Raspberry PI with convenient functionality and proper interaction with the API Google. This task is within my reach, but rather is suitable for the development team. Still, there are too many subtasks in it.

I think the idea is interesting and will find its buyer, the company will realize it simply, and I will not be gathering dust on the shelves of my own enthusiasm.

Thank you for considering my candidacy! ”

Source: https://habr.com/ru/post/398561/

All Articles

Understand me if you can

More articles: