Evaluate processes in a development team based on objective data.

Software development is considered to be a poorly measurable process, and it seems that in order to effectively manage it, you need a special flair. And if the intuition with emotional intelligence is not developed very well, then the time will inevitably shift, the quality of the product will sink and the speed of delivery will fall.

Sergey Semenov believes that this happens mainly for two reasons.

There are no tools and standards for evaluating the work of programmers. Managers have to resort to a subjective assessment, which in turn leads to errors.
The automatic control of the processes in the team is not used. Without proper control, the processes in the development teams cease to perform their functions, as they begin to be partially or simply ignored.

And offers an approach to assessing and controlling processes based on objective data.

Below is the video and text version of the report by Sergey, who, according to the audience voting, took second place on Saint TeamLead Conf .

About the speaker: Sergey Semenov ( sss0791 ) has worked in IT for 9 years, was a developer, team leader, product manager, now the CEO of GitLean. GitLean is an analytical product for managers, technical directors, and team leaders who are designed to make objective management decisions. Most of the examples in this story are based not only on personal experience, but also on the experience of client companies with a development staff of 6 to 200 people.
')
About the evaluation of the developers, we have already with my colleague Alexander Kiselyov told in February at the previous TeamLead Conf. I will not dwell on this in detail, but I will refer to the article on some metrics. Today we will talk about the processes and how to control and measure them.

Data sources

If we are talking about measurements, it would be good to understand where to get the data. First of all, we have:

Git with code info;
Jira or any other task tracker with task information;
GitHub , Bitbucket, Gitlab with code review information.

In addition, there is such a cool mechanism, as the collection of various subjective assessments. I will make a reservation that it should be used systematically if we want to rely on this data.

Of course, there is dirt and pain waiting for you in the data - there's nothing you can do about it, but that's not so scary. The most annoying thing is that the data on the work of your processes in these sources often may simply not be. This may be because the processes were built in such a way that they do not leave any artifacts in the data.

The first rule we recommend to follow when designing and building processes is to make them so that they leave artifacts in the data. We need to build not just Agile, but make it measurable ( Measurable Agile ).

I'll tell you a scary story that we met at one of the clients who came to us with a request to improve the quality of the product. For you to understand the scale - about 30-40 bugs from production flew on a team of 15 developers a week. We began to understand the causes, and found that 30% of the tasks do not fall into the status of "testing". At first, we thought it was just a data error, or testers did not update the status of the task. But it turned out that really 30% of tasks are simply not tested. Once there was a problem in the infrastructure, due to which 1-2 puzzles in the iteration did not get into testing. Then everyone forgot about this problem, testers stopped talking about it, and over time it turned into 30%. In the end, this led to more global problems.

Therefore, the first important metric for any process is that it leaves data. Be sure to follow this.

Sometimes, for the sake of measurability, you have to sacrifice some of the principles of Agile and, for example, somewhere prefer to write oral communication.

Due date practice proved to be very good. We implemented it in several teams in order to improve predictability. Its essence is as follows: when a developer takes a task and drags it into “in progress”, he must deliver due date when the task is either released or ready for release. This practice teaches the developer to be a conditional micro project manager of his own tasks, that is, to take into account external dependencies and understand that the task is ready only when the client can use its result.

In order for learning to occur, after the due date, the developer needs to go to Jira and put a new due date and leave comments in a specially defined form, why this happened. It would seem, why do we need such a bureaucracy. But in fact, after two weeks of this practice, we unload all such comments from Jira with a simple script and conduct a retrospective with this texture. It turns out a bunch of insights about why deadlines fail. Very cool work, I recommend to use.

Approach from problems

In the measurement of processes, we confess the following approach: we must proceed from the problems. We imagine some ideal practices and processes, and then we will be creative, in what ways they may not work.

It is necessary to monitor the violation of processes , and not how we follow some kind of practice. Processes often do not work, not because people maliciously violate them, but because the developer and manager do not have enough control and memory to follow them all. By tracking violations of the rules, we can automatically remind people about what needs to be done, and we get automatic controls.

To understand what processes and practices need to be implemented, you need to understand why it should be done in the development team, what the business needs from development. Everyone understands that you need not so much:

that the product is delivered for an adequate predictable time;
that the product was of proper quality, not necessarily perfect;
make it all fast enough.

That is, predictability, quality and speed are important. Therefore, we will look at all the problems and metrics with regard to how they affect predictability and quality. We’re almost not going to discuss speed, because of the nearly 50 teams we worked with in one way or another, only two could work with speed. In order to increase speed, you need to be able to measure it, and so that it is at least a little predictable, and this is predictability and quality.

In addition to predictability and quality, we introduce such a direction as a discipline . We will call discipline everything that ensures the basic functioning of processes and data collection, on the basis of which the analysis of problems with predictability and quality is carried out.

Ideally, we want to build the following workflow: so that we have automatic data collection; from this data, we could build metrics; using metrics to find problems; signal problems directly to the developer, team leader or team. Then everyone will be able to respond to them in a timely manner and cope with the problems found. I’ll say right away that it’s not always possible to reach clear signals. Sometimes metrics will remain just metrics that will have to be analyzed, look at values, trends, and so on. Even with the data there will sometimes be a problem, sometimes they cannot be collected automatically and you have to do it with your hands (I will separately clarify such cases).

Next we look at 4 stages of feature life:

And we will analyze what problems with discipline, predictability and quality can be at each of these stages.

Problems with discipline at the planning stage

There is a lot of information, but I pay attention to the most important points. They may seem simple enough, but they are faced with a very large number of commands.

The first problem that often arises during planning is a trite organizational problem — not everyone who should be there is present at the planning meeting.

Example: the team complains that the tester is testing something wrong. It turns out that testers in this team never go to planning at all. Or instead of sitting and planning something, the team frantically searches for a place to sit, because it has forgotten to book a meeting.

You do not need to configure metrics and signals, just please make sure that you do not have these problems. The rally is marked in the calendar, everyone is invited to it, the venue is occupied. No matter how funny it may sound, they face it in different teams.

Now we will discuss situations in which signals and metrics are needed. At the planning stage, most of the signals that I will talk about should be sent to the team about an hour after the end of the planning rally, so as not to distract the team in the process, but so that the focus remains.

The first disciplinary problem is that tasks have no description, or they are poorly described. It is controlled by elementary. There is a format to which the tasks must correspond - check if this is so. For example, we follow that acceptance criteria are set, or for frontend tasks there is a link to the layout. You also need to keep track of the components placed, because the description format is often tied to the component. For the backend task, one description is relevant, for a frontend one, another.

The next frequent problem is that priorities are spoken orally or not at all and are not reflected in the data . As a result, by the end of the iteration, it turns out that the most important tasks were never completed. You need to ensure that the team uses priorities and uses them adequately. If a team has 90% of tasks in an iteration having a high priority, it is all the same that there are no priorities at all.

We try to arrive at this distribution: 20% high priority tasks (it’s impossible not to release); 60% - medium priority; 20% - low priority (not scary, if not released). We hang all the signals.

The last problem with the discipline, which happens at the planning stage - there is not enough data , including for subsequent metrics. The basic ones are: the tasks have no ratings (a signal should be made) or the types of tasks are inadequate. That is, bugs get started as tasks, and tasks of technical duty are not tracked at all. Unfortunately, it is impossible to automatically control the second type of problems. We recommend just once a couple of months, especially if you are a CTO and you have several teams, to look through the backlog and make sure that people get bugs as bugs, story as stori, technical debt tasks as technical debts.

Predictability problems at the planning stage

We turn to problems with predictability.

The basic problem is that we do not fall within the time limits and estimates , we estimate it incorrectly. Unfortunately, there is no way to find a magic signal or metric that will solve this problem. The only way is to encourage the team to learn better, to sort through examples of the causes of errors with one or another assessment. And this learning process can be facilitated with the help of automatic tools.

The first thing that can be done is to deal with obviously problem tasks with a high estimate of execution time. We hang the SLA and control that all tasks are fairly well decomposed. We recommend a limit of two days maximum for execution to begin with, and then you can go to one-day.

The next item can facilitate the collection of artifacts on which it will be possible to conduct training and disassemble with the team why there was an error with the assessment. We recommend using the practice Due date for this. She has proven herself very cool.

Another way is a metric called Churn code within a task. Its essence is that we look at what percentage of the code in the framework of the task was written, but did not live up to the release (for more details, see the last report ). This metric shows how well the tasks are thought out. Accordingly, it would be nice to pay attention to the tasks with Churn jumps and to understand them that we did not take into account and why we made a mistake in the assessment.

The next story is standard: the team planned something, filled the sprint, but in the end did not at all what it had planned . It is possible to tune the signals for stuffing, changing priorities, but for the majority of the teams with which we did this, they were irrelevant. Often these are legal operations by the product manager to throw something into the sprint, change the priority, so there will be a lot of false positives.

What can be done here? Calculate fairly standard basic metrics: the closability of the initial sprint skopa, the number of stuffing in the sprint, the closability of the stuffing itself, the change of priorities, to see the structure of stuffing. After that, estimate how many tasks and bugs you usually throw into the iteration. Further, using a signal to control what you are laying out this quota at the planning stage .

Quality problems at the planning stage

The first problem: the team does not think out the functionality of the released features . I will talk about quality in a general sense - the problem with quality is if the client says that it is. This may be some kind of product nedodumki, and there may be technical things.

Regarding the product mismatch, a metric such as 3-week churn works well , revealing that 3 weeks after the release of the churn task is above the norm. The essence is simple: the task was released, and then within three weeks a rather high percentage of its code was deleted. Apparently, the task was not well implemented. We catch such cases and sort them out with the team.

The second metric is needed for teams that have problems with bugs, crashes and quality. We propose to build a graph of the balance of bugs and crashes: how many bugs are there right now, how many flew yesterday, how many did yesterday. You can hang such a Real Time Monitor right in front of the team so that it sees it every day. This is a great emphasis on the quality problems of the team. We did this with two teams, and they really began to think through the tasks better.

The next very standard problem is that the team has no time for technical debt . This story is easily monitored, if you observe the work with the types, that is, the technical debt tasks are evaluated and set up in Jira as technical debt tasks. We can calculate what time distribution quota was given to the team for technical debt during the quarter. If we agreed with the business that it is 20%, and spent only 10%, this can be taken into account and take more time to the technical debt in the next quarter.

Problems with discipline at the development stage

We now turn to the development stage. What problems can there be with discipline?

Unfortunately, it happens that developers do nothing or we cannot understand whether they do anything. Track it easily for two banal signs:

commit frequency - at least once a day;
at least one active task in Jira.

If not, then it’s not a fact that you need to beat the developer’s hands, but you need to know about it.

The second problem, which can knock down even the most powerful people and the brain, even a very tough developer, is constant processing . It would be nice if you, as a tmilid, know about what a person is recycling: writing code or doing a review code during off-hours.

Different rules for working with Git may also be violated . The first thing we urge to follow all the commands is to specify the task prefixes from the tracker in the commit messages, because only in this case can we link the task and the code to it. It’s better not even to build signals, but to directly configure git hook. For any additional git-rules that you have, for example, you cannot commit in master, we also configure git hooks.

The same applies to the agreed practitioners. At the design stage, there are many practices that a developer must follow. For example, in the case of Due date there will be three signals:

tasks for which due date is not set;
tasks that have overdue due date;
Tasks that have a due date been changed, but have no comments.

Signals are tuned to everything. For any other practice, you can also set up such things.

Problems with predictability at the design stage

A lot of things can go wrong in the forecasts during the development phase.

The task may just hang in development for a long time. We have already tried to solve this problem at the planning stage - to decompose the tasks rather small. Unfortunately, this does not always help, and there are tasks that hang . To begin with, we recommend simply setting the SLA to “in progress” status, so that there is a signal that this SLA is violated. This will not allow us to start releasing tasks faster right now, but this will again allow us to collect the invoice, react to it and discuss with the team what happened, why the task has been hanging for a long time.

Predictability may suffer if there are too many tasks on one developer . The number of parallel tasks that the developer does is preferably checked by code, not by Jira, because Jira does not always reflect the relevant information. We are all human, and if we do many parallel tasks, the risk that something goes wrong somewhere increases.

A developer may have some problems about which he does not speak, but which are easy to identify on the basis of data. For example, yesterday the developer had little code activity. This does not necessarily mean that there is a problem, but you, as a team leader, can come up and find out. Perhaps he is stuck and he needs help, but he hesitates to ask her.

Another example is the developer, on the contrary, some big task, which is growing and expanding along the code. This can also be identified and possibly decomposed, so that in the end there are no problems with the code review or testing stages.

It makes sense to adjust the signal and to the fact that during the work on the task the code is repeatedly rewritten. Perhaps it is constantly changing requirements, or the developer does not know which architectural solution to choose. On the data it is easy to detect and discuss with the developer.

Quality problems at the design stage

Development directly affects quality. The question is how to understand which of the developers most influences the decline in quality.

We suggest doing this as follows. It is possible to calculate the criterion of the “importance” of the developer : we take all the tasks that were in the tracker for three months; among all the tasks we find “bug” tasks; We look at the code of these tasks of the type "bug"; We look, the code of what tasks fixed this bug fix. Accordingly, we can understand the correlation of tasks, in which later defects were discovered, to all the tasks that the developer did - this will be the “criterion of importance”.

If we supplement this story with statistics on returns from testing , that is, the proportion of developer tasks that were returned to testing for refinement, then it will be possible to assess which developer has the most quality problems. As a result, we will understand by whom it is necessary to fine-tune the processes of code review and testing, whose code must be carefully reviewed and whose tasks should be given to more corrosive testers.

The next problem, which may be with the quality at the development stage, is that we write hard-to-maintain code , such a “layered” architecture. I will not dwell here in detail, I described it in detail last time. There is a metric called Legacy Refactoring , which just shows how much time is spent on embedding a new code into an existing one, how much old code is removed and changing when writing a new one.

Probably one of the most important criteria when assessing quality at the development stage is the SLA control for high-priority bugs . I hope you follow this already. If not, I recommend starting it, because it is often one of the most important indicators for a business: the high-priority and critical bugs development team undertakes to close at a certain time.

The last thing you often come across is no autotest . First, they need to be written. Secondly, you need to monitor that the coating is kept at a certain level and does not fall below the threshold. Many people write autotests, but forget to follow the coverage.

Problems with discipline at the code review stage

We proceed to the Code review stage. What problems can there be with discipline? Let's start with probably the most stupid reason - forgotten pull requests. First, the author may simply not assign a reviewer for the pull request, which will be forgotten as a result. Or, for example, they forgot to move the ticket to the “in review” status, and the developers check which tasks need to be reviewed in Jira. We must not forget to follow this, for which we set up simple signals. If you have practice, which should be more than 2-3 reviewers per task, then this is also easily controlled with a simple signal.

The next story that the reviewer uncovers a task cannot quickly understand to which task the pull request relates, he is too lazy to ask and he postpones it. Here we also make a signal - we make sure that in the pull request there is always a link to the ticket in Jira and the reviewer can easily read it.

The next problem, which, unfortunately, cannot be excluded. There are always huge pull requests in which a lot is done. Accordingly, the reviewer opens them, looks at them and thinks: “No, I'd rather check it later, something is too much here.” In this case, the author can help the reviewer with onboarding, and we can control this process. Large pull requests must have a good clear description that matches a specific format, and this format is different from the ticket in Jira.

The second kind of practice for greater pull request, which can also be monitored, is when the author himself in advance in the code puts comments in those places where something needs to be discussed, where there is some non-obvious solution, thereby inviting the reviewer to discussion. Signals are also easily tuned to this.

Further, the problem that we also encounter very often, - the author says that he simply does not know when he can begin to correct everything, because he does not know whether the review is complete. For this, elementary disciplinary practice is being introduced: the reviewer must at the end of the review be sure to unsubscribe with a special comment that "I have finished, you can fix." Accordingly, you can configure automatic notifications about this to the author.

Please configure linter. In half of the teams we work with, for some reason linter is not configured, and they themselves are engaged in such syntax code review and for some reason they are doing the work with which the machine will cope much better.

Problems with predictability in the code review stage

If the tasks continue to hang , we recommend that you configure the SLA that the task either waits for fixes for a long time, or it waits for a long time to review. Accordingly, be sure to ping both the author and the reviewer.

If SLA does not help, I recommend to introduce into practice the morning “ code-review hour ” or the evening one — how convenient. This is the time when the whole team sits down and is engaged in a purely code review. The implementation of this metric is very easy to monitor by shifting the activity time in a pull request to the desired hour.

It happens that there are people overloaded with code review , and this is also not very good. For example, in one of the teams, the CTO stood at the very beginnings of the system, wrote it all up, and it just so happened that it was always the main reviewer. All developers are constantly hung on him the task of code review. At some point, everything came to the fact that in a team of 6 people more than 50% of the code-review hung on it and continued to accumulate and accumulate. , , 50%, CTO . , CTO - , 100%.

, — , - . :

, -.

-

First of all, the problem may be in just a very superficial code review. In order to monitor this, there are two good metrics. You can measure the activity of the reviewer as the number of comments for every 100 lines of code. Someone reviews every 10 lines of code, while others scroll through entire screens and leave 1-2 comments. Of course, not all comments are equally useful. Therefore, you can refine this metric by measuring the influence of the reviewer - the percentage of comments that pointed to a line of code that was later changed as part of the review. Thus, we understand who is the most corrosive, and most effective in the sense that his comments often lead to code changes.

, , , , .

— , - , . — churn -, .. pull request , .

, - , , , . , commit, , -.

, - ( pull request ), - . , commit, . , .

We proceed to the testing phase and problems with discipline at this stage. The most frequent we face is that there is no information about testers in Jira. That people save on licenses and do not add testers in Jira. That tasks that simply do not fall into the status of "testing". That we can not determine the return of the task for revision on the task-tracker. We recommend setting up signals for all of this and watching for data to be accumulated, otherwise it will be extremely difficult to say something about the tester.

Predictability issues at the testing stage

SLA . SLA , .

-, , , — . . , , , .

pipeline test- — , , , . build' , , — , . , 1-2 , , . , .

— . , . , , «» , , , , , .

, , , . . , , , , . , , , . : , , , .

, «» , , . , , , .

Another story that affects the quality of the testing phase is such a constant ping-pong between testing and development . The tester simply returns the task to the developer, and he, in turn, without changing anything, returns it back to the tester. You can look at it either as a metric, or set up a signal for such tasks and look closely at what is happening there and if there are any problems.

Metrics Methodology

We talked about metrics, and now the question is how to work with all this? I told only the most basic things, but even there are quite a lot of them. What to do with all this and how to use it?

We recommend that you automate this process to the maximum and deliver all signals to the team using a bot in messengers. We tried different communication channels: both e-mail and dashboard does not work well. Bot has proven itself best. You can write the bot yourself, you can get OpenSource from someone, you can buy from us.

The point here is very simple: the team responds to signals from the bot much more calmly than to the manager, who points to problems. If possible, deliver most signals directly to the developer first, then to the team if the developer does not respond, for example, within one or two days.

No need to try to build all the signals at once. Most of them simply will not work, because you will not have data, because of the banal problems with discipline. Therefore, we first establish discipline and set up signals for disciplinary practices. According to the experience of the teams with whom we communicated, it took a year and a half to simply build up the discipline in the development team without automation. With automation, with the help of constant signals, the team begins to work in a disciplined manner in about a couple of months, that is, much faster.

Any signals that you make public, or send directly to the developer, in any case, you can not just take and turn on. First you need to coordinate this with the developer, speak with him and with the team. It is advisable to put in writing all the thresholds in the Team Agreement, the reasons why you are doing this, what the next steps will be, and so on.

It should be borne in mind that all processes have exceptions, and take this into account at the design stage. We do not build a concentration camp for developers , , . , . - , , 5 , «no-tracking», . , , «no-tracking» , , , , . «no-tracking» , , , , , , , , .

. - — . - , — - , , .

, . ( ), , . , . . , , . bring closer the moment when the team becomes autonomous , and you can still go to Bali.

findings

Collect data. Build processes in such a way that your data is collected. Even if you do not want to build metrics and signals now, you can do a cool, retrospective analysis in the future if you now start collecting them.

Automatically monitor processes. When designing processes, always think about how you can hack them, and how you can recognize these hacks by data.

When the signals are not enough for several weeks - you are well done! , , , , - , - , . , , — , , , :)

TeamLead Conf . Call for Papers .

? , .

Source: https://habr.com/ru/post/420061/

All Articles