📜 ⬆️ ⬇️

Full stack Data analyst

"Data analysis" is often organized as follows: here we have the developers of the repository, and here we have the analysts. In the DWH (data warehouse, storage) can SQL, and analysts, we know how to work with Excel. If we need to analyze something, then go to the analysts, and they follow the data to the DWH for the data. It seems to be logical. And many perceive that this is a normal division of labor. In this article I want to convey the idea that this division of labor is erroneous and grandly reduces the efficiency and productivity of the entire data analysis process.


A typical cycle of work on an analytical task is as follows:


  1. Business comes up with a problem and asks for an answer.
  2. Analysts are discussing with the business what needs to be done.
  3. Analysts understand what the business wants from them and understand what they roughly need in the data.
  4. Analysts write a query in DWH to get data.
  5. DWH takes the request, reads, asks, clarifies, extracts data, gives.
  6. Analysts understand that they did not take everything, or they were misunderstood, they write again the request to DWH to get the data.
  7. DWH takes the request, reads, asks, clarifies, extracts data, gives.
  8. Analysts understand that they did not take everything, or they were misunderstood, they write again the request to DWH to get the data.
  9. Repeat p. 7 and p. 8

One day, the guys at DWH say that they cannot give data or are not ready to process so many requests from analysts. In response, analysts are starting to save their data away from DWH in some kind of exelinks. There, they begin to collect their ETL processes as best they can, based on what they can get from DWH “without a fight.”


What we have in the end:


  1. DWH inadequately covers the needs of consumers (well, from the side of DWH it looks like users do not know what they want).
  2. Analysts start writing bad ETL processes and create pseudo DWH in terms of their data volume, but without reserve, access control, low performance, etc.
  3. The interaction of DWH and analysts suffers, because One doesn’t care about business, and the second doesn’t have to understand the avian language comically.
  4. The process of getting an answer to a business question is delayed, because now the data processing process is a bunch of manual work beyond the DWH framework. And why did we build DWH, is it not to have one storage?
  5. Small changes in the formulation of the problem from the business starts the data analysis cycle almost from scratch, because DWH again will not show flexibility, and analysts will not have data in a new section.

What could be the solution? If you want to get rid of the problem of interaction between DWH and analysts, then you must bring together the competencies of DWH and analysts. The person who combines these competencies can be called a data analyst.


What should be able to such Full Stack Data Analyst?


  1. Work with raw data sources, understands how data storage is organized.
  2. Formulate what needs to be changed in the repository in terms of the content of the data, what data to add and how to methodologically process it, so that DWH hardcore developers can implement it.
  3. Understand business needs, discuss requirements and help your customer, internal or external, formulate a problem and a solution to it.
  4. To be able to design an analytical solution, i.e. understand how to solve a problem, what data is needed, what needs to be “invented”, what assumptions need to be made
  5. Be able to visualize your results and report to your customers (internal or external)
  6. To be able to make a “reproducible” study is an analysis that can always be repeated on the same data and get the same result. To do this, you need to be able to work with R / python or systems that allows you to formalize the analysis process.

If you combine technical and analytical competencies in one analyst, then you get a really solid employee who can solve the end-to-end problem. And it is very important for analytical tasks, since only this analyst has an understanding of what he does and why. The division into those who “analyze” and those who “process data” leads to the fact that each of these employees is disabled: the analyst is like without hands, because nothing is able to receive and process to scale, and the data engineer is like “without brains”, since does not think how it will be used and what are the hypotheses.


The division of labor is very important, but it must take place in a slightly different plane. The analyst should be able to get everything he needs for analysis, and the task of the Data Engineer is to build systems that efficiently provide data in any data sections that may be of interest to the analyst. For the Data Engineer, this means that the data should be stored in a rather flexible form, but at the same time in an easy-to-use: partially denormalized, partly accessed via cubes, partly pre-aggregated and predicted.


And if you can’t find Full Stack Analyst for yourself, then at least include Data Engeneer in the team of analysts so that the competence in working with data is not removed from the analysis to the external service.


It’s not a matter of data analytics to support getting data from the google adwords API, but it’s not a Data Engeneer business to write selects to get data on revenue for the past month.


')

Source: https://habr.com/ru/post/427999/


All Articles