⬆️ ⬇️

Data science skills





We continue a series of analytical studies on the demand for skills in the labor market. This time, thanks to Pavel Surmenka sharky, we will look at a new profession - Data Scientist.



Last year, the term Data Science began to gain popularity. They write a lot about this and speak at conferences. Some companies even hire people to the position with the sonorous name Data Scientist. What is Data Science? And who are the Data Scientists?





Who are the Data Scientists?



If you ask such a question to a resident of San Francisco, you can get the answer that Data Scientist is a statistician living in San Francisco. Funny, though not very encouraging to those who do not live in San Francisco, right? Well, then another definition: Data Scientist is one who understands statistics better than any programmer, and understands programming better than any statistician. But this option is close to the point. Data Scientist, a data scientist, is a hybrid of statistics and a programmer. Moreover, both statistics and programmers are very different, so it is better to consider this profession as a wide spectrum from pure statisticians to pure programmers.

')

Robert Chang, Data Scientist from Twitter, divides his profession into 2 groups: Type A Data Scientist vs Type B Data Scientist .



Type A, where A is Analysis. These people are mostly engaged in extracting meaning from static data. They are very similar to statisticians, they can even be statisticians and just change the job title to Data Scientist, and, as we know, just a change of job title can give a significant salary increase, plus honor and respect. But besides statistics, they also know practical aspects: how to clear data, how to work with large data sets, how to visualize data and describe the results of their work.



Type B, where B is Building. They also have knowledge of statistics, but at the same time strong and experienced programmers. They are more interested in applying data on real systems. Often they build models that work in collaboration with users, for example, product recommendation systems, films, advertisements.



Data Science also overlaps with such areas as Machine Learning and Artificial Intelligence, representatives of this sphere are close to Type B Data Science.



Data Scientist Skills



In the English-language Internet, the trend of increasing interest in Data Science has been well marked since about 2012 ( https://www.google.com/trends/explore#q=Data%20Science ). In the past few years, there has also been a marked increase in interest in related areas: Machine Learning, Artificial Intelligence, Deep Learning. Gartner placed Machine Learning at the top of the hype curve in 2015: Gartner's 2015 Hype Cycle for Emerging Technologies Identify the Computing Innovations That Organizations Should Monitor . And Harvard Business Review in 2012 published an article with an intriguing headline: Data Scientist: The Sexiest Job of the 21st Century .







What to learn for those who want to become Data Scientist, what skills are needed? Let's take a look at what requirements American employers made to candidates for positions in the fields of Data Science and Machine Learning.



We analyzed 549 vacancies published on one of the world's largest job search portals - Monster.com, which included the requirements of Data Science and Machine Learning.



Data Scientist Hard Skills



We begin with an analysis of the requirements for possession of professional skills (hard skills).



As can be seen from the rankings, the most popular are fundamental knowledge of mathematics, statistics, computer science and machine learning. In addition to theoretical knowledge, Data Scientist should be able to “extract,” clean, model and visualize data. Experience in software development and quality management is also important.







Data Science Tools and Technologies



The main Data Scientist tools are the programming languages ​​Python and R.



R is a specialized programming language for statistical calculations, which is why he was so fond of statisticians and data scientists. It allows you to quickly download a set of data, calculate the basic statistical characteristics, visualize data, build data models.



Python, although it is a general-purpose programming language, has a huge number of high-quality libraries and platforms for Data Science and Machine Learning.



Remarkably, 39% of vacancies require knowledge of both R and Python at the same time, so it’s better to learn both languages ​​at once, rather than try to choose one of them.



For big data, employers prefer to use Hadoop and Spark. MySQL and MongoDB are popular among databases.







Data Scientist Soft Skills



Compared with professional skills, general competencies (soft skills) are less demanded, since they are mentioned more than twice as rarely in vacancies. The average salary of vacancies in which soft skills are required is also significantly, approximately 20% lower than those where hard skills and knowledge of technology are required.



However, among the soft skills encountered, the most important are the following: the ability to communicate, visualize data, make presentations, write and speak effectively. Also useful skills in teamwork, management and problem solving.







Data Scientist Domain Knowledge



Some vacancies require knowledge of the subject area from physics and biology to real estate and hotel business. The leaders here are economics, marketing and medicine.







Data Scientists Specializations



Before the start of the study, we intended to emphasize the sub-specialization of the profession Data Scientist. For example, to separate those who are mainly engaged in the analysis and visualization of data from those who build models for predictive analytics or machine learning algorithms. But, as it turned out in the course of data analysis, the requirements for most vacancies are fairly uniform, and there is no clear breakdown into specialties.



Although some patterns seem interesting. For example, if a job requires knowledge of Python or C ++, then communication and management requirements are unlikely, and vice versa.



The impact of technology on wages



O'Reilly's 2015 Data Science Salary Survey survey helps us look at the labor market from the opposite side. This study is based on a survey of 600 Data Scientists, and the data collected includes wages, demographic information, and the amount of time experts spend on different types of tasks. The key findings of this study are as follows:





We recommend reading the entire report. Among other things, he describes a mathematical model of the dependence of the salary of Data Scientist on where he lives, what education he has and what tasks he works on. For example, Data Scientists who spend more time at meetings earn more. And who more than 4 hours a day are engaged in the study of data, earns less.



How to study Data Science?



In recent years, many online courses have appeared on this topic. And this is a very good way to start!



If you are more inclined towards data analysis, then a good option is the Data Science course on Coursera: Launch Your Career in Data Science . Obtaining a specialization is not free, but if you do not need a certificate, then you can take all these courses for free: just look at the name of the course and use the search to find the course.



For those who are interested in Machine Learning, we can recommend the course Andrew En (Andrew Ng), Chief Scientist in Baidu Research, who is also part-time teacher at Stanford and is the founder of Coursera: Computer Training .



What is Data Science?



Data Science is a new area of ​​activity, so the requirements for Data Scientists are not yet fully formed. Given the dynamism of our time, perhaps Data Science will never become an independent profession, which will be taught at universities, and will remain a set of practices and skills. But these are exactly the practices and skills that will be in great demand in the coming years.

Source: https://habr.com/ru/post/271085/



All Articles