📜 ⬆️ ⬇️

Resident program of Yandex, or How to become an ML-engineer as an experienced backend



Yandex opens a resident machine learning program for experienced backend developers. If you have written a lot in C ++ / Python and want to apply this knowledge in ML, then we will teach you to do practical research and select experienced curators. You will work on Yandex key services and gain skills in areas such as linear models and gradient boosting, recommender systems, and neural networks for image, text, and sound analysis. You will also learn how to properly evaluate your models using metrics offline and online.

The duration of the program is one year, during which participants will work in the management of machine intelligence and research on Yandex, as well as attend lectures and seminars. Participation is paid for and assumes full employment: 40 hours a week, starting July 1 of this year. Reception of applications is already open and will last until May 1.
')
And now, in more detail - about what kind of audience we are waiting for, what the workflow will be and, in general, how the backend specialist will switch to a career in ML.

Focus


Residency Programs have many companies, including, for example, Google and Facebook. Basically, they are aimed at junior and mid-level specialists who are trying to step towards ML research. Our program is for a different audience. We invite backend developers who have already gained enough experience and know for sure that in their competences they need to move towards ML, to get practical skills - and not skills of a scientist - in solving industrial problems of machine learning. This does not mean that we do not support young researchers. For them, we organized a separate program - the Ilya Segalovich Award , which also allows us to work in Yandex.

Where the resident will work


We in the management of machine intelligence and research are developing project ideas ourselves. The main source of inspiration is the scientific literature, articles, trends of the research community. My colleagues and I are analyzing what I’ve read and are looking at how to improve or expand the methods proposed by scientists. At the same time, each of us takes into account his area of ​​knowledge and interests, formulates the task based on the directions he considers important. At the junction of the results of external research and own competences, the idea of ​​a project is usually born.

Such a system is good because it largely solves the technological problems of Yandex services even before they occur. When a problem arises before the service, its representatives come to us in order to most likely take the technologies that we have already prepared, which we can only use correctly in the product. If something is not ready - at least we will quickly recall where we can “start digging”, in which articles to look for a solution. As you know, the scientific approach is to stand on the shoulders of giants.

What to do


In Yandex - and even specifically in our management - all current ML directions are developing. Our task is to improve the quality of a wide variety of products, and this serves as an incentive to check everything new. In addition, new services appear regularly. So the lecture program has all the key (well-proven) machine learning directions in industrial development. In drawing up my part of the course, I used the experience of teaching at the School of data analysis, as well as materials and developments of other teachers of the SAD. I know that colleagues did the same.

In the first months of the course, the program of the course will be about 30% of your working time, then about 10%. However, it is important to understand that working with the ML-models themselves will continue to take about four times less than all related processes. These include preparing the backend, getting data, writing the pipeline for their preprocessing, optimizing the code, adapting it to specific hardware, etc. An ML engineer is, if you will, a fullstack developer (only with a big bias in machine learning) capable solve the problem from beginning to end. Even with the finished model, you will probably need to perform a number of other actions: parallelize its execution on several machines, prepare the implementation in the form of a pen, a library, or components of the service itself.

Student selection
If you have the impression that it is better to go to ML engineers, first working as a backend developer, this is not so. Enrolling in the same SAD without real experience in developing services, learning and becoming highly demanded in the market is a great option. Many experts in Yandex have found themselves in current positions in this way. If any company is ready to offer you a job in the field of ML immediately after the institute, it is probably worth accepting the offer. Try to get into a good team to an experienced mentor and get ready to learn a lot.

What usually prevents to do ML


If backender seeks to become an ML-engineer, he - without regard to the resident program - can choose from two directions of development.

Firstly - to learn in the framework of some educational course. Lessons on Coursera will bring you closer to understanding basic techniques, but in order to immerse yourself in the profession to a sufficient degree, you need to devote much more time to it. For example, finish the SAD. Over the years, there were a different number of courses directly on machine learning at the SAD - on average, about eight. Each of them is really important and useful, including in the opinion of graduates.

Secondly, you can participate in combat projects where you need to implement one or another ML-algorithm. However, there are very few such projects on the IT development market: in most tasks, machine learning is not used. Even in banks that are actively exploring ML-related opportunities, few are involved in data analysis. If you did not succeed in joining one of these teams, it remains to either start your own project (where, most likely, you will set your own deadlines, and this has little to do with combat production tasks), or start competing in Kaggle.

Indeed, it is relatively easy to team up with other members of the community and try yourself in contests - especially if you back up your skills with the training and the courses mentioned above. Each competition has a deadline - it will serve as an incentive for you and prepare for a similar system in IT companies. This is a good way - which, incidentally, is also a bit divorced from real processes. At Kaggle, you are given pre-processed, if not always perfect, data; Do not offer to think about the contribution to the product; and most importantly, they do not require solutions suitable for production. Your algorithms are likely to be efficient and will have high accuracy, but your models and code will look like Frankenstein made from different parts - in a combat project, this whole construction will work too slowly, it will be hard to update and expand (for example, language and voice algorithms are always partly rewritten as language develops). Companies are interested in the fact that not only you could do this work (it is clear that you, as the author of the solution, can do this), but also any of your colleagues. Much has been said about the difference between sports and industrial programming, and Kaggle brings up exactly “athletes” - even if it does it very well, allowing you to gain some of the experience.

I described two possible lines of development - learning through educational programs and learning in combat, for example, at Kaggle. The resident program is a combination of these two methods. You are waiting for lectures and seminars of the ShAD level, as well as truly combat projects.

Source: https://habr.com/ru/post/446554/


All Articles