📜 ⬆️ ⬇️

8 skills required in the profession of Data Scientist

Svetlana Shapovalova, the editor of the blog "Netologii", adapted the article by Dave Holtz, in which he spoke about eight skills that will help start a career Data Scientist.

Interesting profession Data Scientist ? It's time to start learning: Thomas Davenport and J. Patil, famous leaders of the region, in the article for the Harvard Business Review called Data Scientist "the most desired profession of the 21st century".

But how to become a data scientist? If you believe most sources, you will get the impression that you will need at least an advanced degree in various fields: from software development, data processing, work with databases and statistics to machine learning and data visualization.
')


Do not worry. Experience shows that this is not the point. It is not necessary to study a lot of information about the data and learn all the skills in a row as quickly and as quickly as possible - this can drag on for half a life. Instead, learn to carefully read job descriptions. This is what makes it possible to apply for vacancies for which you already have the necessary skills, or to develop specific data-processing skills in order to get the desired place.

I will tell you about eight important skills for Data Scientist .

Basic toolkit


No matter what company you are going to, you are expected to be required to have knowledge of standard professional tools: a programming language for statistical data processing, for example R or Python , and also a query language for working with databases, for example, SQL .

Basic knowledge of statistics


A basic understanding of statistics is vital in Data Science. One interviewer complained to me that most of the candidates he interviewed could not even clearly formulate the definition of P-value. You need to understand what statistical tests, distributions, maximum likelihood method are, etc.
Remember what you were taught in pairs according to statistics. This will also be needed when working with machine learning.

However, the most important thing is for you to understand exactly when and which approach you need to use.

Knowledge of statistics will be needed to work anywhere, but they will be especially important in companies that are fully focused on working with data , and where shareholders make decisions depending on the data they are provided with.

Machine learning


Methods of machine learning will be useful in working with large volumes of information and in companies whose product is entirely based on data. This means that you will have to find out the meaning of all the words that are well known in the topic of machine learning: k-nearest neighbors, random forests, ensemble methods.

Many of these methods are completely realizable with the help of R or Python libraries - that is why you don’t have to reinvent the wheel, unless you are a leading specialist with a worldwide reputation.

More important is the ability to see the whole situation entirely and to understand when it is appropriate to apply certain methods.

Multivariate analysis and linear algebra


Most likely, you will be asked to give examples of the results that you received at your previous place of work, using machine learning or statistics. If not, the interviewer may ask questions related to a variety of variables or linear algebra, since this is the basis of many methods.

You may ask why it is necessary to understand this material, if there is a bunch of embedded implementations in sklearn or R. The point is that if at some point the development team decides to develop its own implementation, this knowledge will be very useful to you.

Understanding these concepts is especially important in companies where the product is driven by data, and small improvements in predicted performance or algorithm optimization can lead to huge gains.

Data processing


Often the data you analyze is not organized, so it’s difficult to work with them. Therefore, it is important to know how to deal with their fragmentation. These may be missing values, inconsistent line formatting (for example, “New York” and “ny” instead of “New York”) and date formatting ('01 / 01/2014 'instead of' 2014-01-01 'and so on. d.). This skill is important both for small companies, where you are just starting to work with data, and for data-driven companies.

Visualization and data transfer


Visualization and data transfer is incredibly important. Especially in young companies that make data-based decisions for the first time. Or in companies where a data scientist is a person who helps others make decisions based on data.

Data transfer means that you will need to describe your findings or working methods for both technical and non-technical audiences.

As for data visualization, it will be useful to get acquainted with such tools as ggplot and d3.js. It is important not only to learn how to work with data visualization tools, but also to understand the principles of data coding and information transfer.

Software development


If you are interviewing a small company and will be one of the first data experts, you will definitely benefit from software development experience. You will be responsible for processing a large amount of data and possibly developing data-driven products.

Thinking in the data world


It is important for companies to know that you can solve problems based on data.

This means that at some point in the interview you may be able to ask about a problem that is higher than your current one. For example, a test that a company wants to run, or a product that may be required for development. It is important to understand what is important and what is not. How would you, in the role of Data Scientist , interact with developers and product managers? What methods would you use?

The science of data is in its infancy and does not yet have clear boundaries. To get a job, it’s more important to find a company whose needs match your skills, rather than developing these skills idly. Of course, these are just my personal impressions.

Source: https://habr.com/ru/post/329234/


All Articles