📜 ⬆️ ⬇️

“The main challenge is personnel shortage” is a panel discussion about the selection of data teams. Data Science Week 2017

Hi, Habr! We publish the final part of the review of Data Science Week 2017 , held in Moscow on September 12-14. Today we will talk about the panel discussion on the topic “Selection of teams for working with data and evaluation of their effectiveness. Olga Filatova, Vice-President for Personnel and Educational Projects of Mail.ru Group acted as moderator, and the participants were Victor Kantor (Yandex), Andrey Uvarov (MegaFon), Pavel Klemenkov (Rambler & Co) and Alexander Yerofeyev (Sberbank).



- Colleagues, tell us about your company and the data scientist team in it. How is the team organized and how many people are there? What interesting projects can you tell? What challenges do you most often encounter?

Andrey (MegaFon): I am the head of analytical services at MegaFon, in fact I am responsible for big data analytics in the company's technical unit. The analytical competencies in MegaFon are now distributed across different divisions, in ours there are now about 18 people, and we are actively strengthening. If we talk about the organization of activities, we work in dynamic teams: a new project appears, we put together a team with the skills that are necessary for its implementation. At the same time, we are not the only data scientist, the team also includes data engineers for the pipelines, and developers for implementing models in production, testers and business analysts.
')
Our customers are various internal functions of MegaFon, whether it be revenue increase or risk reduction. Any division of the company can come to us, talk about their “pain”, and then we are already thinking how to solve their problem.

Among our latest projects, we can single out a case for determining the location of new base stations. It's not a secret that the satisfaction of users directly depends on how we develop our network, so it is very important to understand in which locations you need to build stations. A whole range of tasks is associated with this direction: predicting traffic growth up to each base station, analyzing client preferences, locations and traffic patterns. As a result, this whole stream of tasks, which we call Smart CAPEX, is aimed at properly prioritizing our investments.

Interaction within the team passes through Agile. We have open planning, in which every employee, regardless of position, can express his idea and be heard. If we talk about working with the customer, then we work on Scrum, test hypotheses together with it, some work, some do not, but most importantly, in the process, the customer plunges deeper into the subject area and has a different attitude to the formulation tasks.

The main challenge for us is to understand how to involve the customer in the project so that he shares his subject expertise in the field. Frequently, data scientists do not have a very deep knowledge of the customer’s field of activity, so helping people who work there for the first year can be very helpful in understanding the true nature of the data. Also, these are all kinds of challenges related to HR: development, motivation, hiring, team building: how to divide the areas of responsibility - one superman or a team with a data engineer, a tester and a data scientist?

Alexander (Sberbank): I work in the department of corporate data management in Sberbank. There are two directions in the department: data scientists are working in the first one, and people in charge of the data, their availability, storage infrastructure and processing work in the second.

Now we have a little more than 100 data scientists in the team (we plan to increase their number to 300 in the future), who work in 10 units, trying to make each of them more data-driven. Based on this, the main challenge for us now is personnel shortage. There are not enough qualified specialists on the market right now. This situation is not only with a data scientist, but also with specialists working with infrastructure. The peculiarity of Sberbank is that we have a rather complicated landscape with a huge number of data sources: now it is about 100 connected storages. Add to this the work with partners who share data and produce very large scales, for the management of which corresponding personnel are needed. The problem is aggravated by the fact that as soon as developers gain enough experience, they quickly receive offers from foreign companies and they go abroad.

Our products are divided into two groups: the first is directly related to data, where the result of the developer’s work is a stable data mart, in relation to this, we recently had the term DQA (Data Quality Agreement), where the range of data, update frequency, etc. are spelled out. The second group includes an analytical model - the result of the work of the developer and the data scientist-a, which is displayed in production. By the way, one of the indicators of the performance of a data scientist, and we have - the number of models that have reached the implementation stage.

Also now, at the bank level, we are actively engaged in managing a portfolio of initiatives so that the investments we invest in units are as returnable as possible. At the same time, with a certain periodicity, we update the portfolio, in each of the units there is an internal transition plan: it often happens that the data engineer after some time wants to become a data scientist.

One of the latest initiatives that we launched is smart loans, selection of personalized offers for small businesses, which allows to significantly reduce the decision-making process on a loan based on detailed analytics on the client. Here there is such a feature that, unlike the Internet and telecom companies, we have a significant amount of data created manually: when processing loan applications, when concluding contracts for issuing debit cards, etc. Of course, when such processes exist, the issue of data quality comes to the fore. I note that traditionally all the responsibility for data quality was borne by people on the side of the data storage, but now we are leveling up this situation, each unit has its own DQA.

Pavel (Rambler & Co): I manage the machine learning department in one of the largest media holdings in the country - Rambler & Co. Most of the expertise in data analysis is concentrated here and divided into three areas. The first is all that is connected with advertising, the second is recommendation systems, there is a Price.ru platform (an analogue of Yandex.Market). And the third direction is Data Science at outsourcing, we work with external companies, solve cases taking into account our knowledge, data and experience. For example, we launched product recommendations on the Ecco company's website, a well-known shoe store, and got quite steep results: unexpectedly, but the average check increased by 2 times.

We also do computer vision. One of our clients was the company KB STRELKA, which is engaged in various urban projects. They have a direction - urban anthropology, which studies how people behave in a city, in what places they are, and what points of attraction exist for them. In this project, we used photos of visitors from various places in the city on social networks to create their “portrait”, to understand what kind of people they are (of course, the data is completely anonymous, we see only the faces of these people).

Our team consists of 10 data scientists, 2 infrastructure engineers, and there is also a department of administrators. Data scientists work end-to-end with us, solve business problems before implementing production code, try to do everything according to Scrum.

Hiring people is a major challenge for us too, in principle there are few of them, and even fewer qualified, as well as the places where they are professionally trained. Another problem and at the same time a plus lies in the availability of online education, where content and its quality do not always suit us.



Viktor Kantor (Yandex): In May of this year, I came to Yandex.Taxi to build a unit for machine learning. Now we have dozens of projects. Now there are several main areas, among which, for example, determining the exact waiting time, convenient points of embarkation / disembarkation. It is also an analysis of the behavior of drivers and passengers here and now, how we can influence them, increase activity, offering them some profitable things, that in the case of the driver he will take out more, and in the case of the client - more ride.

We have 13 people in the team, all data scientists, but some with special skills, some are better with development, there are people with rich experience of participating in competitions, they can make the quality of the model as high as possible. Thus, there are always people who can solve certain tasks faster and better, and we actively use it.

In my experience, often the challenge is the successful formulation of tasks, whether internal or external customer. I had previously worked at Yandex Data Factory, and it often happened there that during the process they reformulated the problem statement several times. However, this is not due to the fact that customers are so bad, but because it takes a lot of effort to find a suitable production. Now we are trying to build processes in such a way that some understandable, measurable stages appear with a measurable result and a formulation that does not need to be changed very often on the go.

- Share life hacking for colleagues who want to find a job for you. What do you look for when applying for a job? Give a few tips on how to prepare for the selection of your company.

Victor: we usually check the following things: how are things with machine learning - theory, practice. According to the theory, we must make sure that a person understands what is happening, could explain this or that behavior, know what optimization methods are solved, because sometimes there is a need to use algorithms in a non-trivial way. As for practice, we ask about the tasks that a person did. If there is no experience yet, we suggest describing how he would do the task. At the same time, of course, the task is not in the form it comes from the customer, but in a rather general way, in the spirit: “We need to develop such a recommendation algorithm in such a situation, describe how you will measure quality, what data to use and based on what signs to build a model. ” Often there is a problem that candidates start talking not about signs, but about the model that he will teach, not thinking about what he will predict at all.

- Do you give preference to olympiadics or to guys who have gone through various hackathons, or is it unprincipled?

Victor: We have a very simple approach: everyone is equal, and only knowledge shows how valuable a person will be to the team. I am opposed to the whole team consisting of olympiads, but at the same time I had different cases, when both the olympiad programmers and the kagglers were very useful in the team, you just need to interact with them correctly.

Alexander: I do not deal with HR, but there was a case when we gave a test of knowledge of mathematics and a few case studies on solving applied problems to a potential data scientist. According to the results of testing, a decent part of the candidates immediately dropped out.

In fact, each unit recruits people on their own. There is also a unit responsible for the development of competencies, we are developing a partnership with universities, we are trying to take people in the early stages. We try to invest as much as possible in the internal resource, raising the skills of our employees. Now the blocks work in the community format, and one of their tasks is the cross-development of competences. For example, there is a risk block and a retail block, where Data Science has a very high level, they are mainly and help to improve data analysis in other blocks.

- What is missing in training? What would you advise to the organizations and the students themselves?

Pavel: It seems to me that, unfortunately, not all people can learn. Even if you have completed an online specialization, often the exhaust is quite superficial. People grab a lot of all sorts of words and have shallow knowledge. This is especially evident when they begin to bring things to production, where it is often necessary to understand the internal principle of work. It seems that the ability to learn can not be helped, but it is quite possible to do it yourself: there is an excellent “Learning How to Learn” on the Coursera, there is an excellent Feynman method.

The second problem, personally in my experience, is very bad programming. It’s not even the case that a person cannot implement a more or less complex algorithm, and this is the inability to program in principle. If a person imports Pandas to read a file in Python, then this is weird. However, we are beginning to fight this, otherwise we will not be able to hire anyone. Thus, programming is a very important part of the work in Data Science.

Andrei: I will continue the topic that Paul raised. A data scientist doesn’t program well, it’s not going anywhere. At the same time, there are high-class developers who may not know so many algorithms and have not tried to train them with their hands. Therefore, the most important (and the first life hacking) is to identify oneself, the industry and the direction are very rich and diverse, everyone can find his place in it.

For example, we now can not find a good tester who will test the model in production. On the one hand, it should be a person with automation experience and good knowledge of operating systems, databases, etc., on the other hand, he should at least understand a little about machine learning and data analysis algorithms.

Further, if you reveal the internal kitchen of MegaFon, then we go from people, we do not have a perfectly clear description of vacancies. It, of course, is there, but we do not adhere to it, the person sends us a resume, and we look to see if it can be useful to us, if it can, in which projects, which division, etc. If you find a project under its competence, it turns out such a win-win story.

Questions


- Now there is a huge amount of online and offline data analysis courses . Do you think that having a certain level of basic training, what is the best way to go further: further self-education and the beginning of a career in a few months, or development at the expense of practice and work, solving applied problems?

Pavel: In my opinion, when you already have a certain level, it's cool to practice and somehow get a job in a particular company, because the area is really huge, and if you want, you can study all your life. In addition, if you want to continue to study, then an academic career would be better, otherwise education is obtained for the sake of education. Moreover, people often underestimate themselves and think: “In Yandex such tough guys work, it is obvious that I do not reach them”. And most often it turns out that you are better than the average candidate.

Victor: I agree, it is better to get practice as soon as possible, and within reasonable limits. For example, I first tried to acquire practice, without having theoretical knowledge, I do not advise anyone. It is very unpleasant when for a long time nothing happens, and you do not understand the reason, and later you realize how much time was wasted.

It often happens that you simply don’t know about the existence of something, because you have never studied it, so you don’t need to stop education, you always have to wonder what is in the field, at least at the level of understanding how it works.

You can watch a video of the panel discussion on the New Professions Lab Facebook page .

On this, with the review of Data Science Week 2017, we finish, we will be glad to see you in early March 2018 in the same place, in Deworkacy, on Data Science Weekend 2018.

The event's partner was MegaFon, while Info-partner was Pressfeed.

Pressfeed - A way to get free publications about your company. Subscription service for journalists inquiries for business representatives and PR specialists. The journalist leaves the request, you answer. Sign up . Have a good job.

Source: https://habr.com/ru/post/340806/


All Articles