📜 ⬆️ ⬇️

Russian girls in Data Science. Part 1

As you know, in IT men are significantly more than women, although the latter are often not inferior in knowledge and skills. According to our observations, in the area of ​​Data Science this imbalance is even stronger, although, again, women process data and build models no worse than men. This was confirmed for us by the final results of the participants of our past “Big Data Specialist” group, when the top 5 groups included 3 girls (there were four of them in the group).

We set out to find girls in different companies and industries, working with big data, managing teams, and we managed to collect interesting material that does not fit into one article, so wait for a series of publications!

And we open this series with interviews with Anna Kryuchkova and Maria Anisimova, who will tell about their work, career path and the future of girls in Data Science.
')


- Tell us about the company in which you work and about your position in it. What tasks associated with data analysis arise in a company? What is your responsibility?

Anna: I work for PJSC MegaFon as an expert on segmental programs. In fact, this is a product specialist working on big data models, I make them “come to life” and generate income.

Maria: I work in the Department of Information Technology of Moscow in the direction of Big Data. Developing smart analytics on an urban scale. I am the head of the data modeling department, but the main direction, the launch of projects for which I accompany, is Internet analytics. Here you need to understand that in our products this is not so much connected with web analytics and optimization of the UI of city portals (for example, www.mos.ru ), as many people think, as with the profiling of Internet users. My responsibilities include supporting projects at every stage - from the initiation and launch of work, to completion and transfer to an industrial solution, that is, the creation of an immediate product. Moreover, maintenance involves both making decisions regarding the applied mathematical apparatus, collecting and analyzing existing data, and identifying the necessary technical infrastructure.

- What was your educational and career path? How did you get into the company?

Anna: I started my career with consulting, I passed a rigorous selection in a consulting boutique with a small number of clients, whom we helped in several areas at once. At the same time, that you rarely meet in consulting, at strategic presentations, our work with clients did not end, but just started - we were responsible for implementing changes. Since I graduated from the university with a red diploma in mathematics and economics, I immediately asked for analytical and marketing projects. It was very interesting to solve real problems and see how your ideas work and the predictions come true, work has become life, clients are almost relatives.

However, I could only dream about working with “big data” - a medium-sized retail and manufacturing business teaches you to work with “small data” - to enrich them, preprocess to get results independent of chance or rare events. One of our clients was the company “White Wind Digital”, with which we worked only a couple of years, but I managed to get involved in its tasks and wanted to go there to work full-time as the head of the analytical department. I didn’t have enough data about customers, and in all cases, it turned out that we need to learn how to accumulate this data, analyze it, make individual offers and even involve customers in our brand emotionally. The loyalty program suggested itself - a rather expensive tool, however, ways were found to bring it practically to payback. So I became, among other things, the head of the loyalty program.

It was a very interesting experience. In two years we have implemented a complex technical solution, built a system of communication with customers, and began to consider feedback. Well, it became clear to me that I want more. More data. More, bigger. Banks or telecom - I guessed for myself and found myself in telecom. And again, I was lucky to work with a super team and super ambitious tasks, but not touching the data directly, but being the so-called customer of analytics.

Anna Kryuchkova

Maria: I graduated from the Bachelor of Higher School of Economics with a specialization in Statistics and Data Analysis in Economics. After graduation she immediately went to the magistracy of the same university, but already in the direction of “Management” of the specialization “Project Management”, where she “met” the current employer.

The career path was fairly simple: during my studies at the university, I worked in various organizations, one way or another connected with statistics, but from Data Science, as such, these areas were still far away. This is partly due to the specifics of using statistical analysis in our country: a few years ago, such solutions were applied only in very narrow industries - banking, insurance and strategic sales planning of commercial organizations. In addition to Rosstat and students, almost no one was involved in socio-demographic statistics.

- What are the tasks of machine learning most often found in your work? What algorithms and models do you use to solve them?

Anna: Obviously, in telecom this is primarily the task of classification and segmentation. Algorithms are dealt with by another part of the team, which did not prevent us from organizing brainstorms together and inventing exactly how they will be applied.

Maria: Most often there are tasks under the code names “profiling” and “forecasting”. The first implies mainly clustering, segmentation of users by the available attributes, i.e. factors that are often not one, but tens and hundreds. The second type of tasks includes building a vector of user behavior with a further search for “similar” (look-a-like) to build assumptions about the belonging of “unidentified” users to a particular segment and to predict the next user action.

Accordingly, the models for all these tasks use standard - random forests, gradient boosting of trees, logistic regression and ensembles of these algorithms for classification problems, PCA methods (main components) and DBSCAN (for noisy data) for clustering problems. If text analytics tasks are encountered (for example, to identify thematic interests based on the types of Internet content consumed), then the naive Bayes classifier, VSM (vector model of semantics), the k-means method and the maximum entropy classification are used.

As you can see, a set of models and algorithms is similar to a set of any team that deals with analytics. But I believe that the solution of any big data problem comes down not only to building models - a large amount of work falls on the data collection and data preparation (Data Mining) and interpretation of the results, i.e. adapt them to business applications. Relatively speaking, it is not enough just to identify patterns based on the constructed correlation matrices, it is important to understand what to do next and how to use it in the product, and not just on beautiful slides with drawn infographics.


Maria Anisimova

- Do you think that working with data analysis is suitable for people only with a certain background or can everyone master the data science with due persistence?

Anna: So far, only in a small number of educational institutions you can get a serious DS specialization, basically you have to complete your education by yourself. Of course, with the background in mathematics, it will be much easier to understand, but the most important thing here, as in any business, is 1% inspiration, 99% perspiration, with this formula, everything is possible.

Maria: We have the opinion that any person can learn anything, there would be only desire. In addition, as I said earlier, the field of analytics is not limited to the construction of mathematical models — there are many other important stages in this work. The model will have nothing to build on if there is not enough data collected by someone, structured and normalized, which at the same time is enough to solve a specific business goal. And given the versatility of the industries in which the analyst is now used, you can be anyone by education / profession. The teacher, who analyzes the performance of children in his class with the further construction of the curriculum, is also a participant in the new-fangled Data Science, even on a small amount of data in 40-50 records in the Excel table (exaggerating).

- What advice would you give to newbies? Which online and offline courses did you take and which ones you can advise?

Anna: I decided to dive into this area with courses from Newprolab . Having passed them, having already become more or less familiar with the topic, I began to read a lot of books - there is nowhere without Sebastian Raska, the rich pantheon of the authors of O'Reilly, classic editions of Bishop and Murphy. And of course, it is best to learn in practice, so I hope to get to the machine learning competitions.

Maria: For beginners in this area, and regardless of age, you are a student or a person with work experience of more than 20 years who decided to retrain, I advise you to start to decide in which area you might be interested in studying the data. I mean to choose the industry: education, health, finance, or, if for functional purposes, the Internet, text analytics, analysis of photo and video. Start with the basic mathematics online courses on Coursera, further delve into the study of existing practical work (for example, you can read the same Habrahabr or follow the competition at Kaggle). So you will understand what is of particular interest to you, communicate with people who are closely connected with this area, study the trends and begin to learn from practical experience. Further, if you are interested in working in this direction, study employers who are developing the direction of data analysis, or by that time the employer will find you. :)

- Tell me, is there any special policy towards girls in the company? Being a woman in IT, and in DS in particular, what advantages and disadvantages do you see? Are there any difficulties?

Anna: Needless to say, you rarely meet a girl in Data Science, in IT, and the girls-mathematicians are not the overwhelming majority. Probably the reason for this are some sociocultural grounds. However, a rare “instance”, which did get to DS, turns out to be strongly motivated in its work and has such genuine interest that it rather quickly earns the respect of its colleagues.

Maria: There is no particular company policy. Previously, for some reason, it turned out that there were mostly men in programming. Now the world is changing, the boundaries of the distribution of professional areas in relation to gender are obliterated. We at the university learned to analyze data in Excel and SPSS, but at work, when faced with arrays of tens and hundreds of millions of records, you begin to think about the need to start learning programming languages ​​that will allow you to work in certain DBMS. In my opinion, this is not difficult, although it is harder for girls to adapt to new solutions, girls are embarrassed to ask questions and start learning something new. Men in this regard are more mobile and decisive. If you are a girl without complexes, brave and young - then everything will work out. :) But this is not only in relation to DS, it is everywhere.

- Will the percentage of women to men in data science change in the future? How do you think women’s attention is focused on data analysis?

Anna: The society has already begun to change: daughters are bought not only by dolls, but also by designers with typewriters, they are assigned not only to dance classes, but also to programming lessons. Curiosity, curiosity, the ability to notice patterns - all this is inherent in girls as much as men, because interest in the industry in which all these qualities can be realized, is inevitable for both sexes. Especially when you consider how much this interest is fueled by the media.

Maria: The question to me is more correct: “How to attract the attention of people in general to the field of data analysis, not limited to exchange and banking projects?”. If we still talk about differences in gender structure, given that specialized educational programs are now launched at universities, and literally every commercial organization creates data analytics units in its organizational structure, the ratio of women to men in DS will smooth out and even out in the future. In general, of course, the more beautiful girls in teams, especially in the harsh IT, the greater the motivation for the male half to accomplish great things and change the world. :)

PS By the way, in the upcoming 7th recruitment of the Big Data Specialist program, the coordinator will also be a girl - our graduate.

Source: https://habr.com/ru/post/336256/


All Articles