
Summer 2015. Session successfully delivered. A normal person will probably say: “Hurray! Freedom! I will play football all day and fly to the sea to Turkey. ” But not a real researcher with an inquisitive mind. I decided that in any case I would work on some kind of my own project ... But time was unproductively whistling along. And then a bright thought came to my mind: why not go for
an internship at Yandex? Surely they have a lot of interesting research tasks, moreover, this is an invaluable experience in a huge company with a lot of professionals in their fields who have something to learn. How to get an internship at Yandex, what you can do there and what awaits you later, I want to share today.
First, a few words about yourself. My name is Muammar, 21 years old, at the moment I am a fifth-year student of the Faculty of Mechanics and Mathematics of Moscow State University. And I’m also a ShAD graduate, leading seminars on Natural Language Processing at ShAD and a junior developer in the Yandex speech technology team. I am no different in some super-genius, but I love and know how to work. Perhaps enough to praise yourself, let's talk about the internship. Who cares - welcome under the cat!
Team selection process
So, I decided to work in the summer in Yandex. Without thinking twice, he wrote to Yulia Krivova, who is the curator of student programs, said, I want to do some difficult tasks, and I did not forget to mention that I am interested in
NLP . How is it in the "Field of Miracles"? Taking this opportunity, I would like to
tell my mother that I love to convey her great thanks to Yulia: she put a lot of effort into selecting the team that I finally liked. In general, the process of finding a team in my story deserves special attention. In addition, everyone is always wondering what is being asked not by these “terribly scary” interviews in Yandex and how to prepare for them - but the best thing is to get a ready-made recipe right away, how to successfully complete them!
Initially, the machine translation team became interested in me. They did not have an internship vacancy, but they still decided to talk to me. Everything was fine except that it was heart-to-heart conversation, and in the end everything turned into a technical interview. Morally, I was not ready for this, and as a result I was terribly stupid and solved everything only with a thousand hints. He left with a heap of thoughts about his insignificance. Although a few days later, Yulia said that, in general, they liked me, so you need to wait for an internship.
')
After some time, I was visited by one of my ShAD teachers on information retrieval -
Sasha Bolkhovityanov , and they just needed a person for the position of junior developer in the duplicate search team. A week later, I was invited to three consecutive technical in-person interviews: algorithms, design, and C ++. I knew the algorithms and data structures perfectly, and in general, nothing supernatural was asked. In designing for questions like “how to build a distributed fault-tolerant system” and “what if in this system ...” sharpness helped. But with C ++ there were serious problems. Although I understood how to write a program, what any specific line of code does, nevertheless, how it all works a deeper level, in memory, at that time I had no special idea. I am a sociable person, therefore, in order to somehow dilute the fiasco, I talked to the interviewer heart to heart. I made a total impression of a positive one, but personally the tasks that they offered to do were not personally inspiring. And I clearly decided: it’s better for a small salary, but for interesting tasks for me personally, let an intern go to machine translation than to search for duplicates by a junior developer.
Time passed, but the rest for some reason stood still ... but not for long! I don’t remember exactly when, but for beauty, let's imagine that at three o'clock in the morning, I received a letter from Julia: a voice interface development team is looking for an employee! "Yes, this is pure NLP," - flashed through my head. Extraction of information from the text, the definition of named entities, chat bots - in general, I really did catch fire to them. First, the guys arranged a Skype interview on Python, at the same time refining my knowledge of Bash. If at first I thought well, then that's what Bash is - I did not even know! After the interview, they gave me a great homework - to write an interactive agent who, for human needs, for example, “What is the Eiffel Tower?”, Would find the most appropriate Wikipedia articles. It still seems to me that I did soundly, but, no matter what, their leader said that the team requires a ready-made developer, not an intern, who needs to be coddled and taught to everything.
The only command left is speech recognition. To be honest, I didn’t know anything about this area, besides I had a vague idea of where the NLP was. My current leader, Ilya
iliia Edrenkin, assured me that they would like them, and the tasks they were engaged in seemed extremely difficult to me. Two more interviews were waiting for me: first, on advanced algorithms, where, in addition to the usual knowledge, a lot of ingenuity was required, and second, on machine learning. I passed them not with brilliance, but decently. Just a day or two I received a letter from Yulia that they were ready to take me to the team. Hurray, comrades, happiness has come!
Separately, I would like to mention a few points that, it seems to me, have a positive effect on the interviews. First, the lack of excitement: I went to Yandex with the mood that they would not be interviewing me, but I was going to interview everyone there, and the result was completely irrelevant to me: they would take it — fine, they wouldn't take it — also thank God. Secondly, even if somewhere I was stupid, I nevertheless tried on the whole to engage in conversation, to talk on abstract topics, to put myself in a positive light. Thirdly, I never spent more than three hours on preparing for an interview, and the only thing I did during the preparation was to refresh my knowledge in a particular area.
Technical aspect of the internship
Each intern in Yandex is given a small, but rather creative task in the assumption that he will solve it in 3-6 months. And the word "decides" should not be taken literally. Sometimes it is really a question of solving a small problem, in some cases it may be an improvement of metrics, it happens that you need to come up with some new method or approach to the problem. In my case, it was necessary to learn how to determine the degree of confidence in the recognition of phrases and individual words by our ASR system (ASR - Automatic Speech Recognition). Let me remind you how speech recognition. The process of converting sound into text consists of several stages.
- Pre-processing. We split the sound recording into frames of 25 ms in 10 ms increments. (Perhaps we also try to get rid of noise with some methods.)
- Retrieving features Some numerical features are extracted from each frame.
- Transformation of features into distribution by phonemes (in fact, in the real model, not phonemes, but senons are classes of context-dependent phonemes). The features from several neighboring frames are connected together, and a kind of neural network (DNN, LSTM, GRU - whatever your heart desires) is set against this good. At the output we get the probability distribution of phonemes.
- Decoding. Further, taking into account the language model, the lexicon and our view of the world, decoding of the found probability distributions occurs, and we get a string of words.
The whole problem lies in step 3. The neural network is a discriminative model. It says how well one phoneme differs from another in this fragment of sound, but does not say how well / well this or that phoneme is represented in this piece of sound. Here such an analogy is possible. We know some people so well that we can describe them completely - this is a generative model. And it happens that we saw a person and something was deposited in our memory, we cannot describe a person, but if we are shown a photo of an arbitrary person, we can say whether he is in the photo or not - this is a discriminative model. The problem with such a model is that we had, for example, a certain frame and the sound of “n” sounded on it in normal conditions and in noisy conditions. If this sound is in principle - that in one case, that in the other - we are different from other sounds, then the neural network may quite well assign the same probability to them in both cases, but we still want to somehow distinguish between these two cases.
So, by virtue of the above, the ASR system is poorly aware of how clean / noisy the recording was. And it is important for us to understand this, because if the record is very noisy, then it is better to ask the person again to repeat his request than to recognize and give him complete nonsense. The degree of confidence in the record is called confidentiality (confidence). The definition of phrase confidence is also useful in order to give assessors only mark entries with an average confidentiality. With great confidentiality, the recordings are well recognized, with low - just noise, but with medium - the most useful, for these are complex examples for our system. And the more complex examples we give to the speech recognition system, the faster it learns (here as with a person: if he is constantly given a 2 + 2 decision, he will not count the integral).
Generally speaking, the problem of determining confidentiality has two adjacent subtasks:
- determination of the confidence of each word separately,
- definition of phrase confidence in general.
In the first case, the task of classification is: either we guessed the word (class 1), or we made a mistake (class 0). In the second case, there is a regression task. Yes, I forgot to say that the quality of recognition is measured by the WER (Word Error Rate) metric. So, in the second case, we will try to predict 1 - WER, after which we will call this a phrasing confusion. It is also worth noting that if we are well able to solve the first task, then a good grade for 1 - WER will be the average word-by-word confidences, so it was for this task that I concentrated the main forces.
All have probably thought that the technical details on this ended? But no. Perhaps this is not so secret information, so I will tell you in general terms what the final solution was for the task. For word confidences, two types of features were used: acoustic and language. Selection of acoustic features was an attempt to deal with the discriminativeness of the neural network. We collected some statistics on the entire learning corpus (average phoneme duration, etc.), and then, for each word, when recognizing a specific phrase, its parameters were compared with statistically averages of the corpus. This softened the discriminativeness and contributed some information about the outside world. The n-gram probabilities were taken as linguistic features in order to track various inconsistencies of the “green grass” type. Ultimately, a gradient boost was set on it, and voila - we have word-for-word confidentiality! The results turned out to be quite decent: about 25–50% (variation depending on the model) of the quality gain relative to baseline (sMBR). Only two features were used to evaluate pofraznogo confidentiality: medium word-by-word confusion and the degree of how much N best hypotheses differ from each other, which the ASR system issued during recognition (the point is that if the hypotheses differ greatly, then most likely the phrase was pronounced unclear and its WER will be great).
A few words about the experiments themselves. They do not always turn out well and immediately. There were two subtasks: come up with features and choose an algorithm for machine learning. Sometimes you have a lot of ideas with features, you try - none of them gives an improvement. You try others - again there is no improvement. In research tasks, it often happens. Looking for patience and patience. I clearly said to myself: you try, and then come what may, and in no case should not be discouraged. It only seems that here are the features, bang-bang - and improvement, but in fact behind this there is painstaking work and a lot of failed experiments.
Finally, I note that Ilya Yedrenkin suggested calling confidences "Muammar numbers." It was also quite funny that I occasionally heard these words when colleagues were talking about something, but every time I thought I heard it. And at one of the meetings of the team they told me: “But you don’t know that confidences have been called Muammar numbers for a month already?” Well, after that somehow got stuck :).
Cons and pros internships in Yandex
Of the minuses, perhaps, I will note two things. The first is that from time to time you have to deal not with your work. And this happens even not because everyone around is so evil and they want to shove some of their duties on you, but simply because sometimes there is no one to do this, but the matter is important. The second point is that, having reached the level of Yandex, it is easy to lose the incentive to move on. Once in a team where you don’t like it, you can still stay, for “well, this is Yandex”.
About the pros. As you might have guessed, man, I’m a believer enough, so the first advantage I will put is the possibility of observing religious practices. In Yandex, people are valued not for faith and political or other convictions, but for how well they do their work, and arrange everything so that it is more convenient to work. Therefore, even in such a sensitive issue, Yandex is ready to go forward. With a fivefold prayer, there are no problems at all: I found a free negotiation - and at least stand on my head there. But that is not all. As many probably know, every mature man from among Muslims is obliged to attend Friday sermon. Therefore, for myself, I clearly decided that if there was no opportunity to attend a sermon, then such work would not work for me. Although the schedule is flexible, by chance it turned out that it was on Friday, it was during the Friday sermon that we had a weekly team meeting. I immediately said: Ilya, I'm sorry, but specifically at this time I will not be able to attend. To my great surprise, for the sake of an intern who has not worked for one week in a team, about ten people (some suit me as fathers and have many years of experience) agreed to postpone the meeting to a convenient time for me. To say that such an attitude on the part of the team members pleases and disposes is to say nothing.
The second plus: Yandex provides a lot of opportunities for development in a professional way: you can attend a variety of scientific seminars, research reports and much more. I even managed to fly to the Shad conference - but this is the exception rather than the rule! Moreover, the acquisition of new knowledge in the field of Computer Science is part of your work. Yandex wants its employees to become smarter! Moreover, I have a hobby - teaching: well, I like to teach people something. It’s not a question at all, says Yandex, teaching in the SAD - please, speaking at scientific seminars is a joy! In continuation of the scientific topic, I would like to note that in our team each person is allocated quite a lot, according to my standards, computing power, so if you want to work out with your own research outside of working time, the flag is in your hands, no one will reproach you.
And the third plus: a comfortable Moscow office provides comfortable work. Not to mention this is difficult, but I will not grovel here at all - the topic is quite well known.
Is there life after the internship?
Some may think that just on an internship, everything is so fun and joyful, and as soon as you become a developer, you begin your hard working days. This is not so ... in any case is not always the case! For example, a considerable part of my work consists in reading scientific articles from the field of speech recognition, in their modification and implementation. By the way, this is another reason for which I advise you to be selective in choosing a team: I like it so terribly, but to someone else it may seem terribly boring. Choose an area to your liking to work with a smile, rather than a sour face on your face! Summing up, I repeat:
an internship at Yandex is a worthwhile job for the summer. It is possible that later it will result in an interesting permanent job.