"Data analysis" is often organized as follows: here we have the developers of the repository, and here we have the analysts. In the DWH (data warehouse, storage) can SQL, and analysts, we know how to work with Excel. If we need to analyze something, then go to the analysts, and they follow the data to the DWH for the data. It seems to be logical. And many perceive that this is a normal division of labor. In this article I want to convey the idea that this division of labor is erroneous and grandly reduces the efficiency and productivity of the entire data analysis process.
A typical cycle of work on an analytical task is as follows:
One day, the guys at DWH say that they cannot give data or are not ready to process so many requests from analysts. In response, analysts are starting to save their data away from DWH in some kind of exelinks. There, they begin to collect their ETL processes as best they can, based on what they can get from DWH “without a fight.”
What we have in the end:
What could be the solution? If you want to get rid of the problem of interaction between DWH and analysts, then you must bring together the competencies of DWH and analysts. The person who combines these competencies can be called a data analyst.
What should be able to such Full Stack Data Analyst?
If you combine technical and analytical competencies in one analyst, then you get a really solid employee who can solve the end-to-end problem. And it is very important for analytical tasks, since only this analyst has an understanding of what he does and why. The division into those who “analyze” and those who “process data” leads to the fact that each of these employees is disabled: the analyst is like without hands, because nothing is able to receive and process to scale, and the data engineer is like “without brains”, since does not think how it will be used and what are the hypotheses.
The division of labor is very important, but it must take place in a slightly different plane. The analyst should be able to get everything he needs for analysis, and the task of the Data Engineer is to build systems that efficiently provide data in any data sections that may be of interest to the analyst. For the Data Engineer, this means that the data should be stored in a rather flexible form, but at the same time in an easy-to-use: partially denormalized, partly accessed via cubes, partly pre-aggregated and predicted.
And if you can’t find Full Stack Analyst for yourself, then at least include Data Engeneer in the team of analysts so that the competence in working with data is not removed from the analysis to the external service.
It’s not a matter of data analytics to support getting data from the google adwords API, but it’s not a Data Engeneer business to write selects to get data on revenue for the past month.
Source: https://habr.com/ru/post/427999/
All Articles