About the importance of catching fleas. What is Global OpenStack Bug Smash for?

Authors: Igor Marnat, Ilya Stechkin

From March 7 to March 9, we held the Global OpenStack Bug Smash Mitaka. Mirantis took two sites of this event: one of them - in the Moscow office of the company. Russia first joined the BugSmash. And we are glad that this happened with our direct participation.
Why is it so important for the community that major market players, such as Intel, Rackspace, Mirantis, IBM, HPE, Huawei, CESI, or SUSE, support testing and improving code quality? What place does this event occupy in the process of creating the OpenStack platform?

')
Development of the functional and fix bugs - two different, but strongly interconnected process. This statement is true for any software product. But in the case of open source, there is a nuance: everyone who can create something new will prefer to do just that, and not spend time looking at someone else's code and making corrections to it.

Even in literature, music or art there are those who create works and those who analyze them (critics of all stripes). But if the productivity of a writer, composer or artist does not always benefit from the close attention of critics, the software product always improves, it becomes cleaner and more stable if it is looked at with a fresh look. The essence of the OpenStack development process is that it is conducted openly and several people always check every change.

In addition, in each release cycle there is a technical debt, which hampers the development of new functionality. A simple example from the internal kitchen of Mirantis: the master node until the release of MOS 7.0 was based on an outdated version of CentOS. This slowed down the process of product improvement, did not allow to implement some interesting features. But in the release of MOS 8.0, we gallantly paid this technical debt by updating the master node, allowing developers to use the latest versions of libraries and users to receive updates in time.

However, experience shows that technical debt accumulates from release to release. And developers are forced to divide their time between the creation of a new functionality and the partial elimination of debt. So, the “catching fleas” time is allocated by the residual principle, since the pressure from the product management is always aimed at developing new cool features that can be sold, rather than fixing old boring bugs that have already been sold.

First of all, critical bugs are eliminated. These are such bugs, in the presence of which the feature does not work at all. Further, significant bugs are received. In the presence of such bugs, the feature works, but not quite as it should, but “with crutches” - with significant modifications that compensate for the presence of this error. For example, the user has to restart the service so that he continues to work correctly.

The bugs of a level below “high” rarely reach the attention of developers who are always in the tight framework of the schedule of a new release, while at the same time correcting high & critical bugs in previous releases. Therefore, lower priority bugs can hang for years and be transferred from release to release. They hang. And transferred from release to release. The problem is that medium-significance bugs (medium bugs) are related to usability, they are often got by the users themselves. It can be said that their presence spoils the impression of working with ecosystem projects (customer experience). Here is an example of such a bug with a FOUR-YEAR history ( dhcp server defaults to gateway for filtering when unset ). This is a more “young” bug - it’s only 2 years old ( Enable metadata when create server groups ). Both bugs are related to the Nova project (the official name of the project is OpenStack Compute). In all, this extremely important for the ecosystem project is 483 bugs of various “degrees of severity” (as of this writing)! So all hope is that once in the release cycle, developers will postpone their work for the sake of hunting bugs. And the code will be cleaner.

Determine the place of Bug Smash in the process of quality control (QA). There is an opinion that QA is exclusively testing. However, experienced developers (including those working in proprietary companies such as Cisco ) know that testing is only part of QA. A large number of bugs can be detected at the stage of code quality verification by other developers (code review). Usually code review precedes testing. This means that the price of the error found in the review process is lower.

It is widely known that the earlier the problem is found, the cheaper it will be to eliminate it. For example, according to the data given in McConnell's “Perfect Code” , the correction of an error at the testing stage will cost ten times more than at the development stage of the code. Testing is a laborious procedure, and therefore not cheap. It is required to raise the lab with the appropriate characteristics, create a test script, test and troubleshoot problems identified during the test.

The most expensive mistake is the one that the user found. The one that the reviewers missed and the testers did not catch. In this case, the fix chain begins with support. The support service specialists receive the client's request, diagnose the problem, for which they most likely repeat the testing procedure: that is, they raise the lab and - hereinafter (see above).

The most advanced users who have people from the OpenStack community on the team themselves discover bugs and inform the community about them. However, since these bugs do not fall into the category of critical or high, developers rarely have the opportunity to work with them. The circle is closed.

Thus, it is difficult to overestimate the importance of OpenStack Bug Smash, a marathon that runs within each OpenStack release cycle, and allows developers to allocate time to work with those bugs that usually remain outside their field of vision.

Everybody wins from this: and users who finally receive a solution to their problems. And contributing companies that save money by detecting and correcting errors in the code early. And the whole ecosystem, as the level of customer satisfaction increases, which means there are new opportunities for business growth based on the creation and implementation of solutions based on OpenStack.

Well, another significant result of the event is the attraction of new contributors and, in general, the dissemination of knowledge about OpenStack in the world. In New York, at a similar event, the entire first day was spent on training beginners, who came to learn how to work with OpenStack, and only on the second day they started fixing bugs. In Moscow, we also took a day to work with those who are just beginning their immersion in OpenStack. Good luck and stable code!

Source: https://habr.com/ru/post/279085/

All Articles

About the importance of catching fleas. What is Global OpenStack Bug Smash for?

More articles: