📜 ⬆️ ⬇️

Labor market overview in the field of big data and data science

Habr, hello! There were about 1000 vacancies for relevant search queries, then they were manually filtered by headings and descriptions, and to prepare the survey we used 288 active vacancies in the field of big data and data science with HeadHunter.

In fact, there are more active vacancies, since other resources were not taken into account (for example, SuperJob, Blastim, social networks, companies' websites). In addition, you need to understand that this is just a snapshot of the current situation, every day vacancies are filled and new ones appear.

The data was obtained through the Headhunter API, data acquisition and processing was carried out using Python libraries.

The geographical distribution of vacancies posted on HeadHunter is:
')
image

Almost half of all active vacancies (128) fall on Moscow, in St. Petersburg there are more than 3 times less (42), then go the capitals of neighboring countries, Belarus (16) and Ukraine (12), but not Kazakhstan, and other large Russian cities. A small number of vacancies in developed countries, along with vacancies in other cities of Russia and the CIS, fell into the group Others (58).

Almost all vacancies in the sample assume full employment, but a sufficiently large number of vacancies allows you to work with a flexible schedule (32). The database has 11 vacancies with the possibility of remote work. At the same time, the absolute majority of vacancies (244, i.e. about 85%) require to be in the office full-time.

image

image

This was the distribution of vacancies by experience:

image

Jobs requiring expert level of experience in this field - more than 6 years - the rarest category, there were only 9 such vacancies. Perhaps this is due to the fact that this professional field is young and dynamically developing. The most popular average values ​​of work experience are 1-3 years (152) and 3-6 years (110). There are opportunities for those who have no work experience yet, such vacancies in the database 17.

The salary in the majority of vacancies is not indicated, however, we found the available sample of vacancies with an indication of salary (56) sufficient to estimate the approximate level of payment by market.

image

For the part of the vacancies from among those for which the salary was specified, it was expressed in foreign currency.

image

All amounts in foreign currencies were converted to rubles at the current exchange rate.

Wages at Headhunter are indicated as follows: from a certain amount to a certain amount. If both values ​​were indicated, the average between them was taken as a wage estimate. If only “from”, 10% was added to the specified value, if only “to” - the specified value was reduced by 10%. Salaries were calculated by categories of work experience separately for Moscow, developed countries and all other cities of Russia and the CIS.

image

As can be seen from the table (the values ​​in it are given in thousand rubles), in Moscow the level of wages is higher for all values ​​of work experience. This distinction is especially significant for young specialists: for specialists with work experience of less than 3 years, wages in Moscow are higher by a third, and it is precisely in Moscow that all vacancies with no work experience are concentrated (among those with wages). Vacancy in the developed countries in the list was only one, in Japan, the level of payment there is significantly higher, almost 2 times higher than the maximum salary in Moscow. The average salary in the sample was 138 thousand rubles, without work experience - almost 2 times less, only 63 thousand rubles. The maximum specified salary in Russia is 220 thousand rubles.

HeadHunter provides a separate field for indicating key skills in job description, but for most vacancies in the sample it was not filled. In addition, key skills are entered manually, and are not selected from a fixed list, so the writing of the same skills may differ. In this regard, a list of the Top-50 key skills was generated based on the vacancy list, which was then supplemented by an expert method. For many skills, several synonyms were given, including in different languages ​​(for example, Machine Learning and Machine Learning, JavaScript and JS). For some skills, a list of stop words was given to separate C from C ++, Java from JavaScript, SQL and MySQL from NoSQL, etc. Further, the search for these keywords was carried out using regular expressions in the combined text of key skills and job descriptions, counted for one entry per vacancy.

image

The most necessary skill in this professional field turned out to be knowledge of Python: it is mentioned in 170 of 288 vacancies. Java is mentioned in 92 vacancies, C ++ in 58, Scala - in 46, Matlab - in 44. Other languages ​​turned out to be much less popular, including the languages ​​R (21) and Julia (3), which are popular in the data analysis environment. The second most popular skill is knowledge of SQL (140 vacancies). Knowledge of machine learning methods is required in 104 vacancies, data mining methods - 81, in-depth training - 52 (includes as keywords, in addition to Deep Learning, the names of the main libraries used in deep learning, for example, TensorFlow and Theano), methods of processing natural languages ​​(including Text Mining) - at 23. Knowledge of big data technologies is required in 122 vacancies, however, it is not entirely clear what exactly is meant here. More specifically, Hadoop is mentioned in 99 vacancies, Spark in 84, Hive in 39, MapReduce in 29, Kafka in 19. Experience with NoSQL databases is required in 37 vacancies, including 21 mentioned by MongoDB. 41 jobs require knowledge of English, 22 require knowledge of statistics. The Kaggle data analysis competition site is mentioned in 25 vacancies.

image

The diagram above shows the distribution of vacancies according to the HeadHunter specialization classifier (one vacancy can belong to several specializations simultaneously). As can be seen from it, the majority of vacancies in the sample relate to the design (185) and data analysis (162). The remaining specializations are followed by a significant margin, among them project management (66) and mathematics (60).

image

For professional fields, the absolute majority of vacancies are in the field of information technology, some (66) are in the field of science and education, apparently due to mathematics and algorithms.

According to the sample, it was not possible to make some kind of reality rating for employers due to its considerable randomness (not all positions filled in companies are represented, but active vacancies). Therefore, the choice was made in favor of splitting vacancies by industry.

The API HeadHunter does not allow to get a distribution by industry for companies-employers, so for 165 employers it was necessary to manually select the sample based on their names and descriptions. The resulting distribution of vacancies by industry is shown in the diagram below.

image

The most numerous industry - companies specializing exclusively in information technology (93 vacancies). Of these, companies formed around Internet portals (Internet, for example, Yandex and Avito, 19 vacancies), telecommunications companies (16 vacancies), IT consulting (16 vacancies) and IT security (for example, Kaspersky Lab, 4 vacancies) were singled out separately. . The second in the number of vacancies in the industry Marketing includes media and advertising agencies, as well as a smaller number of companies conducting marketing research. They account for 23 vacancies. The banking sector had 20 active vacancies, the rest financial - another 18. A rather large employer was the game development industry (18 vacancies). However, for the game development industry, multiple duplicate positions for different regions fell into the sample. Retail, including fashion retail, gave 9 sample vacancies. FMCG and pharmaceutical companies are practically not represented in the sample. Despite the popularity of data analysis in biology and medicine and the popularity of the bioinformatics profession, the number of active vacancies in these fields was relatively small (3 in health care and 2 in biotechnology).

We remind you that the “Big Data Specialist” program starts in March :)

Source: https://habr.com/ru/post/320336/


All Articles