📜 ⬆️ ⬇️

The most sought-after skills in data science

In terms of knowledge, data science specialists expect a lot: machine learning, programming, statistics, mathematics, data visualization, communication and deep learning. Each of these areas covers dozens of languages, frameworks, technologies available for study. So how can data professionals better manage their training time budget to be worth the price of employers?

I have carefully studied job sites to find out which skills are now most popular with employers. I considered both the wider disciplines related to working with data, and specific languages ​​and tools in a separate study. I turned to the sites for LinkedIn , Indeed , SimplyHired , Monster and AngelList , as of October 10, 2018. The graph below shows how many vacancies on data science are represented on each of these resources.



I have studied many job descriptions and surveys in order to understand which skills are most often mentioned. Terms like “management” were not included in the analysis, since they are used in a very wide range of diverse contexts on job sites.
')
The search was made in the United States based on the terms "data science" "keyword". To reduce the issue, I selected only the exact entries. Anyway, the similar method guaranteed that all results will be relevant to data science and the same criteria will be applied to all requests.

AngelList does not provide the total number of jobs related to working with data, but the total number of companies offering such jobs. I excluded this site from both studies, as its search algorithm, apparently, works on the principle of "OR" and does not allow to somehow switch to the "AND" model. You can work with AngelList when you enter something in the spirit of “data scientist” “TensorFlow” - in this case, compliance with the second request assumes compliance with the first. However, if you use keywords in the spirit of “data scientist” “react.js”, then there will be a lot of vacancies not related to data science.

Materials with Glassdoor also had to be excluded. The site claimed that they had information about 26,263 vacancies for working with data, but in fact, a maximum of 900 were displayed. In addition, it seems to me extremely doubtful that they collected more than three times as many vacancies than any other major site.

For the final stage of the research, I selected the keywords for which LinkedIn had a big issue: more than 400 results for general skills, more than 200 for private technologies. Of course, there were no duplicate offers. I recorded the results of this stage in the Google document .

Then I downloaded the files in .csv format, uploaded them to JupyterLab, calculated the prevalence rate of each in percentage, and averaged the values ​​obtained for different resources. I subsequently compared the results in languages ​​with those reported in the study on vacancies from the data science field from Glassdoor in the first half of 2017. If you add to this information from the survey on the use of KDNuggets, it seems that some skills are gaining popularity, while others are gradually losing value. But more about that later.

In my Kaggle Kernel you will find interactive graphics and additional analysis. For visualization, I used Plotly. To work with Plotly and JupyterLab in conjunction you have to podshmanat something, at least it was at the time of this writing - you can read the instructions at the end of my Kaggle Kernel, as well as in the Plotly documentation .

Broad Skills


Here is a graph that represents the most popular general skills that employers want to see from candidates.



The results show that analytics and machine learning still form the basis of the work of data scientists. The main purpose of this specialty is to draw useful conclusions based on data arrays. Machine learning aims to create systems that can predict the course of events, respectively, it is in great demand.

Data processing requires knowledge of statistics and the ability to write code - there is nothing surprising. In addition, statistics, mathematics and software engineering are specialties for which training is conducted in universities, which can also affect the frequency of requests.

Interestingly, in the descriptions of almost half of the vacancies, communication is mentioned: data professionals need to be able to communicate their findings and work in a team to people.

The references to AI and deep learning are not as regular as some other requests. However, these areas are machine learning branches. Deep learning is increasingly being used in tasks for which machine learning algorithms were previously used. For example, the best machine learning algorithms for problems arising in the processing of natural language now belong specifically to the field of deep learning. I believe that in the future it will become more and more popular, and machine learning will gradually begin to be perceived as a synonym for the deep.

What specific software solutions should data science experts master, according to employers? We turn to this issue in the next section.

Technological skills


Below are 20 specific languages, libraries and technology tools with which, in the opinion of employers, data processing specialists should have work experience.



Let's quickly go through the leaders.



Python is the most popular option. The fact that this open source language is extremely popular among programmers, noted by many. For beginners, this is a very convenient option: there are many learning resources. The vast majority of new tools for working with data are compatible with it. For all these reasons, Python can be called the main language for data scientists.



R follows Python by a small margin. Once, it was he who was the main language for data science specialists. It came as a surprise to me that there is still an active interest in him. This language originates in statistics, and accordingly, is very popular among those who deal with it.

Virtually all vacancies make it imperative to know one of these two languages ​​- Python or R.



SQL is also very popular. The abbreviation stands for Structured Query Language (structured query language), and it is this language that is the main tool for interacting with relational databases. SQL in the data science community is often neglected, but it refers to skills that you should demonstrate fluency if you plan to enter the labor market.




Next come Hadoop and Spark - both of them are open source tools from Apache, designed to work with big data. About them much less written tutorials and articles on Medium. I assume that the number of applicants who own them is much smaller than those who are familiar with Python or R. If you know how to work with Hadoop and Spark or have the opportunity to master them, this can be a good advantage over your competitors.




Next up is Java and SAS . I was surprised that these two languages ​​were able to climb so high. Both are the offspring of large companies and for both presented a certain number of free materials. However, among data scientists, neither Java nor SAS excite much interest.



The next in the ranking of popular technologies - Tableau . It is an analytical platform and visualization tool, featuring high power and easy to use. Its popularity is steadily increasing. Tableau has a free public version, but if you want to work with data in private mode, you will have to fork out. If you are not familiar with Tableau at all, it makes sense to take a short course - say, Tableau 10 AZ on Udemy. For advertising, they do not pay me, I just worked on this course myself and found it very useful.

On the graph below you can find an extended list of popular languages, frameworks and other tools for working with data.



Historical comparison


The GlassDoor team published a study of the ten most popular skills for data science specialists in the period from January to July 2017. In the graph below, their data on frequency of terms is compared with the average values ​​calculated by me for LinkedIn, Indeed, SimplyHired and Monster sites.



In general, the results are similar. Both my research and the Glassdoor study agree that Python, R and SQL demand is the highest. Tops of skills also coincide in composition within the first nine positions, although the exact order is different.

Judging by the results, compared with the first half of 2017, the demand for R, Hadoop, Java, SAS and MatLab decreased, and Tableau, by contrast, became more popular. This was to be expected, if you look at least at the results of a survey of developers from KDnuggets. They clearly show that R, Hadoop, Java and SAS have been declining for several years now, while Tableau has been steadily on the rise.

Recommendations


Taking into account these calculations, I would like to offer a number of recommendations for specialists working with data who have already entered the market or are just preparing to start a career and at the same time improve their competitiveness.


When an employer looks for an employee who works with Python, he will most likely expect candidates to become familiar with the main libraries for data processing: numpy, pandas, scikit-learn and matplotlib. If you want to learn this kit, I recommend the following resources:


If you want to make a breakthrough in deep learning, I advise you to start with Keras or FastAI , and then go to TensorFlow or PyTorch . “ Deep learning in Python ” by Scholle is a great help for those learning to work with Keras.

In addition to these recommendations, I think it is worthwhile to concentrate on studying what you yourself are interested in, although, of course, you can distribute your time in training based on a variety of considerations.

If you are looking for job vacancies for working with data on online portals, I advise you to start with LinkedIn - its output is consistently the most extensive. Also, when searching for vacancies or placing resumes on websites, keywords play a very important role. For example, on all the considered resources, the data science request yields three times more results than the data scientist. On the other hand, if you are interested only in offers from a data scientist, it is better to give preference to this request.

But whatever resource you choose, I recommend creating an online portfolio that demonstrates your skills in various popular areas - the more there are, the better. A LinkedIn profile should ideally contain some evidence of the skills you are talking about.

Perhaps I will explain the rest of the research results in other articles. If you want to learn more about the code or interactive graphics, I invite you to Kaggle Kernel .

Source: https://habr.com/ru/post/426557/


All Articles