How we develop # bigdataX5 and who is needed in Big Data

Our team in a short time has gone from a dozen employees to a whole subdivision of almost 200 people, and we want to share some milestones from this path. Plus, we will speculate on who exactly is needed in big data and what is the real threshold of entry.

The recipe for success in the new field

Working with big data is a relatively new technological area, which, like everything, as it progresses goes through the maturation cycle.

From the point of view of a particular specialist, work in the technological field at each stage of this cycle has its advantages and disadvantages.
')
Stage 1. Implementation

At the first stage, this is the brainchild of R & D units, which still does not give real profit.

From the pros: a lot of money is invested in it. Along with investments, there are growing hopes for solving previously inaccessible tasks and returning investments.

Disadvantages: any technology, no matter how promising it looks at the start, has its limitations: it cannot be used to eliminate all existing problems. These limits are detected as experiments with a new idea, which leads to a cooling of interest in technology after the so-called “peak of high expectations”.

Stage 2. Growth

The real take-off will be only for the technology, which will overcome the subsequent depression of disappointments due to its real possibilities, and not marketing noise.

Pros: at this stage, technology attracts long-term investments: not only money, but time specialists in the labor market. When it becomes clear that this is not just a hyip, but a new approach or even a market segment, it's time for the specialists to get built into the “trend”. This is an ideal moment for the development of promising technologies in terms of career takeoff.

Cons: at this stage, the technology is still poorly documented.

Stage 3. Maturity

Mature technology is the real workhorse of the market.

Pros: as they grow older, the volume of accumulated documentation grows, trainings and courses appear, it becomes easier to enter the technology.

Cons: at the same time increasing competition in the labor market.

Stage 4. Recession

Stage of recession (sunset) comes in all technologies, although they continue to work.

Pros: by this moment the technology is already fully described, the boundaries are clear, a huge amount of documentation and courses are available.

Cons: in terms of obtaining new knowledge and prospects, it is no longer so attractive. In essence, this is an accompaniment.

The growth stage is most attractive for anyone who wants to start working in a new technological field: both for young professionals and for already established professionals from related segments.

The development of big data is now at this stage. Inflated expectations were left behind. Business has already proven that it is possible to make a profit from big data, and therefore ahead of the productivity plateau. This moment gives an excellent chance to specialists in the labor market.

Our story big data

The introduction of technology in any single company essentially repeats the overall cycle of maturity. And our experience here is quite typical.

We started collecting our big data team in X5 a year and a half ago. Then it was only a small group of key specialists, and now we have almost 200 people.

Our project teams went through several evolutionary stages, as we gained a deeper understanding of roles and tasks. As a result, we have our own team format. We stopped at the agile approach. The main idea is that the team has all the competencies to solve the problem, and how exactly they are distributed among the experts is not so important. On this basis, the composition of the roles of the teams was formed gradually, including taking into account the maturation of technology. And now we have:

Product Owner (product owner) - has an understanding of the subject area, formulates a general business idea and predicts how it can be monetized.
Business Analytic (business analyst) - is working on this task.
Data quality (data quality specialist) - checks whether existing data can be used to solve the problem.
Directly Data Science / Data Analytic (data scientist / data analyst) - builds mathematical models (there are different subspecies, including those working only with spreadsheets).
Test Managers (testers).
Developers (developers).

In our case, infrastructure and data are used by all teams, and the following roles are implemented for teams as services:
Infrastructure (infrastructure).
ETL (data loading command).

How we came to the dream team

Dream-ne'dream, but, as I said, the composition of teams changed under the influence of the maturity of big data analytics and its penetration into the daily routine of X5 and our retail chains.

“Quick start” - minimum roles, maximum speed

The first team included only two roles:

Product Owner proposed a model, gave recommendations.
Data Analytic - collected statistics based on existing data.

All quickly planned and manually implemented in the business.

“And do we think?” - we learned to understand business and produce the most useful result.

New roles have appeared for interaction with business:

Business Analytic - described the requirements for processes.
Data Quality - performed data consistency check.
Depending on the task, Data Analytic / Data Scientist analyzed data statistics / performed model calculations on a local workstation.

“Need more resources” - local settlement tasks moved to the cluster and began to touch external systems

To support scaling required:

The infrastructure that raised the HADOOP server.
Developers - they implemented integration with external IT-systems, and the user interfaces at this stage checked themselves.

Now Data Analytic / Data Scientist could check several options for calculating the model on a cluster, although the manual implementation in the business is still preserved.

“Loads continue to grow” - new data appears, new capacities are required to process them

These changes could not but be reflected in the team:

The infrastructure has developed a cluster of HADOOP under growing loads.
The ETL team began regular downloads and updates.
Appeared testing functionality.

“Automation in everything” - the technology got accustomed, it’s time to automate the implementation in business

At this stage, DevOps appeared in the team, which set up automatic building, testing and installation of functionality.

Key thoughts on team building

1. It’s not a fact that everything would work out if we didn’t initially have the right specialists around whom we were able to build a team. This is the skeleton on which the muscles began to grow.

2. The big data market is quite green, so there are not enough “ready” specialists for each of the roles. Of course, it would be very convenient to recruit a whole subdivision of senior-s, but, obviously, it is impossible to build a lot of such “star” teams. We decided not to chase only “ready” frames. As we have already mentioned, adhering to agile, we only need to ensure that the team as a whole has enough competences to solve a specific task. In other words, we can take (and take) into one team of professionals and beginners with a certain technical and mathematical base, so that they together form a set of competencies necessary to achieve the desired results.

3. Each of the roles implies an understanding of the principles of working with big data, but demanding its own depth of understanding. The greatest variation in the roles that have direct analogies in the classical development - testers, analysts, etc. For them, there are both tasks where belonging to big data is almost imperceptible, and tasks in which you have to dive a little deeper. Anyway, to start a career, a certain experience, an understanding of IT, a desire to learn and some theoretical knowledge about the tools used (which can be obtained by reading articles) is enough.

4. Practice has shown that in spite of the fact that the technology is well-known and many would like to do it, not every specialist who would be suitable to start a career in big data (and would like to work there deep down) really tries to come here .
Many excellent candidates believe that working in BigData teams is strictly Data Science. What is the cardinal change of activity with a high threshold of entry. However, they underestimate their competencies or simply do not know that people of different profiles are in demand in big data, and it would be easier to start a career in an alternative role - any of the ones listed above.

a. In fact, to start working in a mixed team in many roles, you do not need a narrow specialized education in the field of big data.

b. We actively expanded the team, adhering to the idea of building mixed structural units. And the most interesting thing is that the people who came to our tasks, who had never worked with big data before, perfectly got accustomed to the company, having coped with the tasks. They were able to learn the practice of big data in a short time.

5. Even without great experience, you can dive deeper, learn the necessary languages and tools, being motivated to grow in this segment in order to engage in more strategic tasks within the project. And the accumulated experience helps to move to those roles where knowledge in big data is required and an understanding of the logic of work in this area. By the way, in this sense a mixed team helps a lot to speed up development.

How to get into BigData?

In our case, the idea of balanced teams from specialists of different levels “took off” - the group implemented more than one internal project. It seems to me, given the shortage of ready personnel and the growing need of business in similar teams, other companies will come to the same scenario.

If you seriously want to choose this direction, dive specifically in Data Sciense - Kagle, ODS and other specialized resources will help you. Moreover, if you don’t see yourself soon in the role of Data Scientist, but the direction itself is interesting for you, you still need Big Data!

To increase your value:

Update your math knowledge. To solve ordinary problems big data does not require a doctoral degree, but basic knowledge of higher mathematics is still needed. Understanding the mechanisms underlying the math statistics, it will be easier for you to understand the processes;
Choose the roles that are closest to your current specialty. Find out what tasks you will face in this role (and in a particular company where you want to go). And if you solved similar problems earlier, they should be emphasized in the resume;
tools that are specific to the chosen role are very important, even if it seems that this has nothing to do with big data. For example, with the development of our internal solution, it turned out that we need a lot of front-end developers who work with complex interfaces;
Remember that the market is actively developing. Someone builds and pumps teams inside, and someone expects to find ready-made specialists in the labor market. If you are a beginner, try to get into a strong team, where there will be an opportunity to gain additional knowledge.

PS By the way, right now we are continuing to grow actively and are looking for a data engineer , testing specialist , developer of React , and a UI / UX specialist . May 10-11, we will discuss including work in # bigdatax5 with everyone at our booth at DataFest .

Source: https://habr.com/ru/post/450930/

All Articles

How we develop # bigdataX5 and who is needed in Big Data

The recipe for success in the new field

How to get into BigData?

More articles: