How to learn to evaluate tasks if you are not able: 4 complexity factors

When I was a novice programmer — and then later, when I was a novice lead developer — I thought that it was absolutely impossible to predict the terms in which you would do something. Or, that a good forecast requires very detailed design and preparation, approximately the same in duration as the task itself.

Later, of course, I discovered that several smart books were written on the subject of forecasts, which, together with some experience, make the assessment of tasks, though ungrateful, but also unreliable. The most convenient way, of course, is to evaluate by analogy: when you have already done something like this, you quite accurately know how much effort this task will require. But how to be in a situation where there is relatively little experience or analogy to take from nowhere, but still want to evaluate?

In one of the teams where I worked, we came up with an original method for the preliminary assessment of tasks. The method synthesizes some methods known from literature, but in the given form, perhaps, is not described by anyone. The concept was as follows: objectivity (connection with measurable indicators); integrability with Agile; repeatability; speed of assessment (less than 0.5% of the task volume); accessibility for novice developers. I will be happy to discuss our idea and do not rule out that someone from the Habrowd Audience will like it.

Four factors of difficulty: problem analysis

So, we take the task (poorly set, not formally described - everything is like in life). We perform express analysis: namely, we collect in our head (or on the board) an intuitive idea of what kind of functionality needs to be implemented and how technically this task can be performed.
')
Our first goal is to flesh out the presentation that has arisen so that it can be numerically evaluated using the four scales below.

The complexity factors , they are projections , are conditionally called:

"Surface"
"Testing"
"Requirements"
"Technical Risk"

These terms in the method are understood in a rather original way (for example, “surface” is a generally non-standard concept), but they are formulated in such a way as to have a connection with objective reality, i.e. with measurable quantities:

The surface in general is the area of contact between our refinement and the outside world. For example, for a GUI component, a certain quantitative representation of the future number of interface elements should be made. For the library - about the number of methods in the public API. For bug fixes - about the volume of the affected code. For the state machine - about the classes of incoming events. If we have in the task “both that, and another, and the third”, then we need to think about what is external and what is, in essence, private, and it is the public part that we should evaluate.

Testing is our idea of the number of elementary automatic tests that we will have to write for performance in the style of TDD. This does not necessarily imply that we will develop through testing (they say, real cowboys do not write tests at all), but in order to assess the complexity of the problem, we need to think about how much tests it corresponds to.

Requirements - this is our idea of the scope of what is called requirements in classical approaches (requirements). We can understand this factor as the size of the specification that we would have to write if we wanted to set the task completely formally. Naturally, we almost never spent time on the drafting of such specifications - the drafting of requirements was not part of our process - but we had an idea of how voluminous such documents are obtained. We make an assessment in the form: "To formally describe this, we would need pages 15" and go on.

Technical risk - this factor reflects the expectations that you will have to do some research, try several approaches, or throw out part of the work due to the fact that something does not grow together. In theory, this value can be expressed as a number from 0% to 100%, but in practice it is the most subjective of 4 values. You can act as follows: to assess the risk, we come up with all the ways to overwhelm this task. If, as a result, we feel depressed and we have serious doubts that we will eventually be dealt with within a reasonable time frame, then we put the rating close to the maximum.

The resulting 4 "projections" form a kind of profile of the task - it becomes clear what constitutes its main complexity. At the same time, when analyzing a task, the validation (verification) of its formulation occurs: if you, for example, realize that you do not know how you will test your revision, then you most likely cannot start writing code. The ratio of projections is also of some importance - large indicators at once by several factors hint that the task has a “hidden volume” (which threatens to repeatedly exceed our expectations), while “flat” tasks (tasks with one dominant “projection” a) usually easily viewed and made in the expected time. In the same case, when the task initially seemed small, and really turned out to be small by all estimates, the time for reflection was not spent in vain anyway: at least, we confirmed that we have a sufficient understanding of what needs to be implemented.

Such an express analysis is carried out quickly, and gives a fairly clear idea of the complexity and complexity of the task. It would be possible to finish on this, having confined to a qualitative approach. However, we actually went much further and synthesized an experimental metric from this concept.

Below is a way to get a quantitative value from 4 qualitative assessments, which, ultimately, helps to create backlog of iteration or to mark the prediction of time. Behind it is not worth serious scientific research, but in our practice, it gave pretty good accuracy with minimal effort.

Experimental metric

As we already understood, at the initial stage 4 projections are expressed like this: “in our component, about fifty methods are expected, two hundred tests; it would take 20 pages of description, which will never be there, and there is about a 10% chance of redoing everything anew. ” Now, when we have estimated all this, we will need to aggregate these quantities. But, of course, it’s impossible to simply put tests and odds together, so our estimates have to be converted into conventional units of complexity.

Complexity , in general, is understood as a measure of time-consuming minus the writing of an obvious boilerplate and all other routine and predictable operations. For initial conversion, the following nonlinear scale is used:

0 - means “everything is simple” (you can immediately present all the details that need to be done, and actually do it in a short time);
1 - this is the complexity that can be overcome in a time slice (for example, in 4 or 8 hours)
3 - more than one unit by 3-5 times, which is already considered quite high, somewhat problematic value
5 - maximum score, means "epic" (so difficult that it is definitely worth breaking, trimming or giving up)

Formula

To obtain the final metric, we summarize all our estimates in units of complexity on four scales.

However, if in two or more scales we get ≥2 - then we do not add the two largest numbers, but multiply them.

Illustration:

  Problem 1. 1 0 0 0 -> 1 -> 1
 Problem 2. 1 0 2 0 -> 1 + 2 -> 3 
 Problem 3. 1 1 2 0 -> 1 + 1 + 2 -> 4
 Task 4. 3 3 1 2 -> 3 * 3 + 1 + 2 -> 12 
 Task 5. 1 4 1 5 -> 4 * 5 + 1 + 1 -> 22

Using

The complexity metric gives an idea of the amount of intellectual work, explicit and hidden, and can even be converted, almost mechanically, into a pessimistic estimate of time. How to do this, see below, but first I’ll consider a few lighter usage scenarios. Some of the following methods are not at all tied to exactly how you got the metric - they just show how you can get some benefit from a similar numerical value.

The choice of technical solutions

When you write a large and complex project, it happens that you have two or three acceptable ways to add some functionality, and you need to predict in advance which way will be easier. To make a decision, you can evaluate each of the options by 4 factors and compare the values (although choosing a simpler way, it is worth considering whether you are adding technical debt ).

Task validation

The method has an interesting ability to reveal the hidden complexity of a task due to the way arithmetic and scales are arranged in it (that hidden volume ; see also the section “Experience of Use”). If you are not limited to qualitative analysis, but calculate the metric, then you can make the formulation of problems better. For this, two criteria are introduced:

The complexity metric should not exceed a certain limit. Volume tasks are relatively homogeneous (not too small, not too large);
There are no "three-dimensional" and "four-dimensional" tasks, i.e. the task receives high marks on a maximum of 2 projections.

In these criteria, some constants are introduced that characterize the desired size of the partition. They can be customized. I prefer the option “no more than 10 in metric, no more than 1 in 2 lower projections,” but this is quite large.

Backlog compilation

There are a great many software development methodologies, but a significant part of them involves a certain iteration, which is that the team sets a goal for a period of time and defines a set of tasks that must be done to achieve this goal. At the end of the iteration, the product version should be ready for delivery.

At the beginning of the iteration, an assessment is made of what is generally possible to do during this period of time and the tasks are clarified, resulting in a certain list similar to the backlog of the User Story and / or other forms of the task description.

A well-known advantage of a standard backlog in the User Story format over the “work plan for a period” is that it is not necessary to accurately evaluate it, due to the unique “magic” of the scram. But there is an important nuance. If tasks, as it happens in many teams, are not broken in the canonical User Story format, but larger, or more “architecturally”, then the mentioned “magic” easily breaks down: the quality of the size of the task at the beginning of the iteration begins to play a critical role. This is a very common situation in which you may need both a historical data set for past tasks and a computable metric that you use at the beginning of an iteration as an estimate of the size for problems that are in doubt. Since the wrong composition of the iteration harms both the product and the developer’s motivation, it is more than justified to make efforts to adequately assess the tasks. I would recommend to do it.

Boost your Scrum

Scrum teams, as a rule, assign tasks to the problem in conventional units (Story Point). It is not always easy to do this. Practiced relatively complex procedures, such as Planning Poker.

The application of the method in the style of Agile is that we consider the proposed metric in cases where we have doubts about the size of the problem or the possibility of its implementation. Since for a particular task, all calculations and estimates using this method can be done in the mind, this takes place quite painlessly and without major methodological disputes. There is also the possibility of a "partisan" application of this method - if you evaluate the task as one, as a developer, and the 4-factor method works for you, then you do not have to tell all this mechanics on a stand-up.

Need a plan?

I apologize in advance to those readers for whom “planning” is an abusive word. I would like to rejoice the rest of the readers that immediate time estimates are derived from the metrics, if we act carefully. Strategically plan this way probably will not work. However, one of the advantages of the method is the ability to pre-evaluate fairly large blocks of tasks.

Planning with metrics is based on the assumption that tasks will be related in terms of their future actual volume as their volume metrics. Of course, complexity and volume are not equivalent concepts, because sometimes only a part of the developer’s efforts is directed at the actual solution of the problem. If you, in addition to programming, have to do some little intellectual activity, like writing tons of XML, manual ineffective testing, or a time-consuming deployment procedure; if you also negotiate with the customer and write documentation manuscripts, and all such classes are unevenly distributed between tasks - I can only sympathize with you. You will have to figure out how to evaluate non-programmer and little intellectual activity. However, if there is little such work, and in good projects it is not enough, it is quite possible to allow yourself to consider for planning purposes the Volume == Difficulty, and not to make additional amendments.

When I had to draw up medium-term plans, I acted as follows:

Some list of tasks is highlighted and our metric is calculated for each item.
Tasks whose assessments signal the need for change or splitting are broken down (criteria — see Validation of Tasks)
Further, we, focusing on our intuitive sensations, compress or stretch this graph over time, more or less observing the calculated proportions of the volume and not allowing the appearance of unrealistic looking items.

Of course, this is a rather naive approach - you can make a plan without any assessment of the tasks - but the result will be worse. The metric helps by the fact that at the time of planning we see numerical estimates (relationships between tasks), which insures us against distortions and does not allow us to reduce the timeframe below the reasonable limit. It turns out that in the long run, the deadline is still not chosen mechanically, but partly intuitively, but with strong support from the metric.

Alternatively, if you plan for yourself one, you can do more simply: calculate the historically some “speed” factor and mechanically calculate the time forecast, dividing the next task metric by the factor. But planning is worse than when you see the whole picture. In addition, the calibration of this coefficient is not a pleasant occupation. A similar approach with “speed” works quite well in Scrum with its BurnDown charts, but there it looks more efficient and natural. It can be said that the concept of Scrum Velocity is a more subtle mechanism than the ratio between volume and time.

Experience of use

For a small task, as practice shows, estimates and arithmetic are performed within a minute in the mind. For large components, a little more time is required, but still the method may look like a quick, verbal calculation, yielding an acceptable accuracy result — usually, more accurate than an intuitively voiced forecast of N days. There were cases when I counted these numbers and put them in mind at the rally while my colleagues from other teams were doing real work - secretly painting the cells in Buzzword Bingo.

This method can hardly be called very simple, but it has a useful ability to detect implicit complexity - even one that is hidden in bad libraries and tools, in bad architecture, in non-linearity of labor, in technical debt. Partly, this is facilitated by nonlinear arithmetic, but in general, the detection of hidden complexity occurs rather due to the fact that the assessment scales are not completely orthogonal: consider several variations of the real problem, and you will notice that even the changes that give an increase mainly on the same scale are still partly appear in the rest. It is characteristic that in some cases such transfer is manifested strongly, and in others it is weak. And the less pronounced the transference, the more intuitively simpler and easier the corresponding refinement of the task looks, isn't it? The formula, indeed, uses the observation that various “pitfalls” never manifest themselves primarily on a separate scale, but their presence significantly raises two or three projections simultaneously. The nonlinearity of the formula “catches” this effect, creating a significant increase in the calculated value.

Because of these anatomical details, a 4-factor estimate, although it takes more time, it almost always works better than simple estimates “by analogy” or estimates based on a single factor.

I myself have been practicing this method occasionally for several years. My attachment to it is due to several reasons. First, with careful application, it gives a noticeably more accurate prediction than other primitive methods. Secondly, it adapts quite flexibly to various development methodologies: with some stretch, it is possible to use it in the “clean” Agile environment, in hybrid projects, and in the conventionally “classic” project management with Microsoft Project on top of the programmer's team. I also noticed some pleasant scalability in it - it is sometimes possible to apply a similar assessment not only on small (bug, story ...), but also on larger time scales (functionality block, component). Finally, the method helps the evaluator to learn: it is always possible to compare the actual indicators (volume of code, complexity of testing, etc.) with their early assessments, and draw substantive conclusions.

∗ ∗ ∗

In my article, in principle, I did not set out to discuss whether it is necessary to evaluate tasks in projects at all, and whether to apply some metrics in software development, but simply wanted to offer a useful tool for those who have such a need. What appeals to me is that, with fairly good performance, this method is little known; and I confess, it would be very interesting for me to find the original source of some of his concepts.

How to learn to evaluate the task, if you can not? Answering this question, I talked about my favorite method, which is mechanical enough to be used by a relative newcomer. Generally speaking, there are many dozens of assessment methods, and if you want to study this topic more deeply, I would advise you to read one of the well-known books: