Article translation: Best Practice for Creating Git Commit from OpenStack

I propose to the readers of "Habrakhabr" a translation of the article "Good practice in communicating commits from OpenStack" .

1 Git Commit Best Practice

The following document is based on the experience of developing code, fixing bugs and viewing code in a number of projects using Git, including libvirt, QEMU and OpenStack Nova. Consideration of other open source projects, such as Kernel, CoreUtils, GNULIB and others, assumes that they all follow a fairly common practice. This is motivated by the desire to improve the quality of the history of the Nova Git project. Quality is an abstract term for definition in development; when for one person a certain code “Beautiful” (Thing of Beauty) - then for another it is “Crutch” (Evil Hack). Nevertheless, we can formulate some general recommendations on how and what to do, or, conversely, what not to do when sending Git commits to merge with projects in OpenStack.

This topic can be divided into two areas:

The order of combining or splitting into multiple commits
Information in commit messages

Content

1 Git Commit Best Practice

1.1 Overview

The thoughts and examples described in this document should clearly demonstrate the importance of breaking changes into a group of consecutive commits, as well as the importance of writing good messages to them. If these instructions are widely used, it will significantly improve the quality of the git OpenStack history. Carrots and sticks are needed to effectively change the history of git. This document should be a carrot, drawing people's attention to the benefits and benefits, while for others the use of the Gerrit code inspection system will act like a whip. ;-P

In other words, when inspecting in Gerrit, it’s not just necessary to look at the correctness of the code. When inspecting, you first need to examine the commit message and point out improvements to the message itself. Get accustomed to commits that combine many logical changes and require the sender to split them into separate commits. Make sure that changes with formatting indents are not merged with functional changes. Also make sure that the refactoring of the code is fixed separately from the functional changes, and so on.

It is also worth mentioning that Gerrit's handling of a series of patches is not entirely perfect. However, you should not consider this a valid reason to avoid creating a series of patches. The tools used must comply with the needs of the developers, and since they are open source, they can be fixed or improved. The source code is “often read, write from time to time,” and therefore the most important criterion is to improve the long-term support of the code with the help of a large number of developers in the community, and you shouldn’t sacrifice a lot because of one author who may never touch code again.

And now the detailed principles, as well as examples of good and bad practice.

1.2 Structural separation of changes

The basic rule for creating good commits is to provide only one “logical change” for each commit. There are many reasons why this is an important rule:

The less the code changes, the faster and easier it is to check it and identify potential flaws.
If it is detected that the commit is incorrect or broken, a bad commit may need to be rolled back. And it will be much easier to do if there are no other unrelated changes in the code that are mixed with the original commit.
When troubleshooting issues, such as when using the git bisect command, small, well-formed commits will help determine exactly where and when a problem was introduced in the code.
When viewing the git history with git annotate or git blame, small well-formed commits help to find and isolate the problematic part of the code, as well as to understand where the problem fragment came from.

1.2.1 What to avoid when creating commits

You need to know and understand common examples of bad practice in order to avoid them:

Mixing indent and space formatting with functional code changes.

Formatting changes will hide important functional changes, making it difficult for a reviewer to accurately determine if a commit is correct. Solution: Create 2 commit, one with formatting changes, the other with functional changes. Usually, changing indents is done first, but this is not a strict rule.

Mixing two unrelated functional changes.

Again, it will be more difficult for a reviewer to identify flaws if you mix two unrelated changes. If the need arises for a later rollback of a broken commit, it will be necessary to sort out and unravel two unrelated changes with the additional risk of creating new errors.

Sending large new features in one giant commit.

It is possible that the code for the new function is useful only when the entire implementation is present. However, this does not mean that the entire function should be provided in one commit. New features often entail refactoring existing code. It is highly desirable that any refactoring be performed in separate commits, and not in those in which the new function is implemented. This will help reviewers and test suites confirm that refactoring does not have unintended functional changes. Even newly written code can often be divided into several parts that can be independently considered. For example, changes that add new internal APIs or classes may be in separate commits. Again, this simplifies the verification of the code. It also allows other developers to get small chunks of work with git cherry-pick, if the new feature is not quite ready to merge. Adding new public HTTP APIs or RPC interfaces should be done in commits separate from the actual internal implementation. This will encourage the author and reviewers to think about the overall design of the API or RPC, and not just choose a design that is simpler for the currently selected internal implementation. If the patch affects public HTTP, use the APIImpact flag (see Including External Links ).

The main thing is to follow the rule:

, . - . .

1.2.2 Examples of bad practice

Now let's illustrate a bit of examples from the history of Git Nova. Note: although the comm hashes are cited for reference, the names of the authors are removed, since we should not blame or insult a single person. From time to time, each of us is guilty of violating the rules of good form. In addition, people who have reviewed and approved these commits are as guilty as the one who wrote and sent them;

1.2.2.1 Example 1

 : ae878fc8b9761d099a4145617e4a48cbeb390623 : [] : . 1  01:44:02 2012 . +0000    create libvirt *       create *   wait_for_destroy    undefine *        *  reset     create / destroy *  resume_host_state     hard_reboot *  rescue/unrescue        Change-Id: I2072f93ad6c889d534b04009671147af653048e7

This commit has made at least two independent changes.

Switch to the use of the new reset API in the “hard_reboot” method
Correction of internal driver methods to not use "hard_reboot"

What is the problem here:

First, there are no compelling reasons why these changes needed to be made simultaneously (in one commit). The first commit could include changes to stop the call in different places in the “hard_reboot” method. The second commit could rewrite the implementation of hard_reboot.
Secondly, the transition to the use of the reset libvirt method was hidden in a large code refactoring, and the reviewers missed it, and this introduced dependence on the newer version of the API libvirt. This commit was found as a problem fairly quickly, but a trivial rollback is impossible due to the large number of unrelated changes.

1.2.2.2 Example 2

 : e0540dfed1c1276106105aea8d5765356961ef3d : [] : C. 16  15:17:53 2012 +0400  lvm-disk-images     LVM   VM.   LVM    libvirt. VM-     LVM      `libvirt_images_volume_group`.   `libvirt_local_images_type` ,     .  : `raw`, `lvm`, `qcow2`, `default`.  `libvirt_local_images_type` = `default`,     `use_cow_images`.   `libvirt_sparse_logical_volumes`  ,       (   virtualsize        ).       `False`.     : `Raw`, `Qcow2`  `Lvm`.      ,      `LibvirtConnection._cache_image` ` libvirt_info`      `LibvirtGuestConfigDisk`  libvirt.  `Backend` ,     . Change-id: I0d01cb7d2fd67de2565b8d45d34f7846ad4112c2

This change introduces one major new function, so at first glance it seems reasonable to use a single commit, but looking at the patch, it is clearly very confusing, since it contains a significant amount of code refactoring with the new LVM function. This fact makes it difficult to determine probable regressions with the support of QCow2 / Raw images. It should be divided into smaller ones, at least into four separate commits.

Replaced the use_cow_images configuration flag with the new libvirt_local_images_type flag with backward compatibility support for the obsolete use_cow_images flag.
Creating an internal “Image” class and implementing the Raw & QCow2 subclasses.
Refactoring the libvirt driver to replace the raw / qcow2 image management code by calling the new class “Image”.
Introduction of a new implementation of the “LVM” image class.

1.2.3 Examples of good practice

1.2.3.1 Example 1

 : 3114a97ba188895daff4a3d337b2c73855d4632d : [] : . 11  17:16:10 2012 +0100      KVM  VM  PIT  RTC : 573ada525b8a7384398a8d7d5f094f343555df56 : [] : . 1  17:09:32 2012 + 0100       VM libvirt

Together, these two changes provide support for configuring guest KVM timers. The introduction of new APIs for creating the XML configuration libvirt was clearly separated from the change in the KVM guest creation policy, which uses new APIs.

1.2.3.2 Example 2

 : 62bea64940cf629829e2945255cc34903f310115 : [] : . 1  14:49:42 2012 -0400     rpc.queue_get_for(). Change-Id: Ifa7d648e9b33ad2416236dc6966527c257baaf88 : cf2b87347cd801112f89552a78efabb92a63bac6 : [] : . 30  14:57:03 2012 -0400   shared_storage_test   rpcapi. ......  get_instance_disk_info    rpcapi. ......  remove_volume_connection    rpcapi. ......  compare_cpu    rpcapi. ......  get_console_topic()    rpcapi. ......  refresh_provider_fw_rules()    rpcapi. ...    ...

This commit sequence reorganized the entire RPC API layer inside OpenStack Compute (Nova) so that the implementation of the connected message systems can be used. For such a significant change in the main part of the functionality, it was key to dividing the work into a larger sequence of commits, allowing you to make a meaningful check of the code, and also simplified tracking changes and finding possible regressions at each stage of the process.

1.3 Information in commit messages

The text of the commit message is as important as the code changes themselves. When creating a message there are some important things to remember:

Do not assume that the reviewer understands what the problem was originally.

It is often not always clear what the problem was, even if a bug report was read, as well as a couple of comments. The commit message must contain a clear statement of the original problem. A specific bug is just an interesting historical background on how the problem was identified. The reviewer should be clear what the proposed patch does for correct decision making without studying this defect in the error tracking system.

Do not assume that the reviewer has access to external web services or sites.

After 6 months, when someone on the train / plane / bus / beach / bar resolves the problem and views the history of git, there is no guarantee that he will have access to the online error report or to the server with the documentation. Distributed source control systems (SCM) have taken a big step forward in that you no longer need to be “online” to have access to all the information in the repository. The commit message must be completely self-contained in order to continue to benefit from git.

Do not assume that the code is self-evident / self-documented.

What is obvious to one person may be completely incomprehensible to another person. Always document what the original problem was and how it was corrected for every change, of course, except for fixing obvious typos or formatting changes.

Describe why the changes were made.

A common mistake is to simply describe how the code was written, without describing why the developer decided to do it that way. The most important thing is to explain the intention and motivation of the changes, but also do not forget to describe the general structure of the code.

Read the commit message to see if it indicates an improvement in the code structure.

Often when describing a large message it becomes obvious that the change had to be divided into two or more parts. Do not be afraid to go back and split it into several commits.

Provide enough information to make the right decision about what to check.

When Gerrit sends email alerts about new patch fixes, the email contains minimal information, mainly a commit message and a list of changed files. Given the large amount of patches, one should not expect all reviewers to study them in detail. Thus, the commit message should contain sufficient information to warn potential reviewers that this is the patch they should be looking at.

The first line of a commit is the most important.

In Git, the first line of a commit message has a special meaning. It is used as the subject of the email, in the git annotate command messages, in the gitk viewer program, when merging a branch and in many other places where there is not much room for text. In addition to a brief description of the change itself, you should not forget which part of the code is affected. For example, if this affects the libvirt driver, enter “libvirt” somewhere on the first line.

Describe any limitations of the current code.

If the code being modified has features for future enhancements or any limitations you know, be sure to mention them in the commit message. This demonstrates to the reviewer that the author has considered a broader picture, as well as which compromises have been made in terms of short-term goals compared to long-term desires.

Do not include comments that relate exclusively to this patch.

In other words, if you redo your commit, please do not add “Patch number 2: redone” (Patch set 2: rebased) to your message. This will not make any difference after merging the changes. However, write a note in Gerrit as a comment on your change. This will help reviewers find out what has changed between patchsets. This also applies to comments such as "Added unit tests", "Fix localization problems" or any other patches that do not affect the overall goal of your commit.

The main thing is to follow the rule:

, . - . .

1.3.1 Inclusion of external links

Although the message is mainly intended for human interpretation, it always contains the metadata provided for machine processing. In OpenStack, they include a “Change Id” (Change ID), as well as optional links to bug identifiers, blueprint links (schemes / documents), DocImpact flag, APIImpact flag, and SecurityImpact flag.

The “Change-Id” string is a unique hash describing the change that is generated by the Git interceptor (hook). It should not change after being inspected while editing a commit with the rebase command, it is also used by Gerrit to track patch versions.
The string 'bug' can refer to an error in several ways. Gerrit creates a link to the bug when viewing the patch on review.openstack.org, so that reviewers can quickly access the bug in Launchpad.
- Closes-Bug: # 1234567 - use the “Closes-Bug” if the commit is intended to completely eliminate and close the error you are referring to.
- Partial-Bug: # 1234567 - use 'Partial-Bug' if a commit is only a partial fix and more work is required to eliminate this bug.
- Related-Bug: # 1234567 - use the 'Related-Bug' if the commit is simply associated with the specified error.
The name 'blueprint' should contain the name Launchpad blueprint (documentation on the Launchpad website), if the commit is intended to implement a certain function. Gerrit links to the blueprint when viewing the patch on review.openstack.org so that reviewers can quickly access the document on the Launchpad.
The DocImpact line contains the line DocImpact and a comment about why this change affects the documentation. Put the DocImpact on a separate line. Use this flag to indicate that the documentation is contained in a patch, or the change affects the documentation so that the reviewer can understand the reasons for the change. Include as much information as possible. When this flag is included in the commit message, Gerrit creates a bug-report for the openstack-manuals project (openstack instructions) for tracking and ordering, or moves, if necessary, to the openstack-api-site.
The APIImpact line contains the APIImpact line and a comment about why this change affects the public HTTP API. Place the APIImpact on a separate line. Use this flag to indicate that the patch creates, updates, or deletes the public HTTP API or modifies the behavior of the public API. Include as much information as possible. When this flag is included in the commit message, all related reviews can be found using this link , and a page from the API_Working_Group documentation can help in finding relevant reviews. In addition, the API Working Group can be directly contacted via IRC on the Freenode channel # openstack-sdks .
The SecurityImpact string simply contains the SecurityImpact string. It is used to indicate that the change has security implications and should be reviewed by the OpenStack Security Group.
The UpgradeImpact line contains the UpgradeImpact line and a comment about why this change affects upgrades. It is used to indicate that the change has the consequences of updates for those who make continuous deployment, or from N to N + 1 updates. It may also indicate updates to the “Upgrade Notes” section in the release notes for the affected project.

All metadata is intended for machines and is of secondary importance for people, therefore, all of them should be grouped together at the end of the commit message.

Note: Although in many open source projects using Git, it is a popular practice to include the "Signed-off-by" tag (generated by using the "git commit -s" command), this is not necessary for OpenStack. Before being able to send code to Gerrit, OpenStack requires all participants to sign a CLA (Contributor License Agreement, License Agreement for Members), which serves an equivalent purpose.

We encourage the use of Co-Authored-By: name name@example.com ( Collaboration ) in commit messages to identify people who worked on a specific patch. This is an agreement on the recognition of several authors, and our projects call upon statistical collection tools to follow these fields as they are analyzed.

1.3.2 Overview of the commit message structure

In the first line, provide a brief description of the changes.
Insert an empty line after the first line.
In the following lines write a detailed description of the changes, breaking up the paragraphs where necessary.
The first line must be limited to 50 characters and must not end with a period.
Subsequent lines must be enclosed in 72 characters.
- vim (by default in the $ EDITOR environment variable on most distributions) can automatically wrap the lines for you. In most cases, you just need to copy the vimrc sample file (which can be found somewhere, for example /usr/share/vim/vim74/vimrc_example.vim ) to ~/.vimrc if you do not have it.
- After editing a paragraph, you can wrap it by pressing the escape key, placing the cursor inside the paragraph and entering the gqip combination.
Put the “Change-id”, “Closes-Bug #NNNNN” and “blueprint NNNNNNNNNNNN” lines at the very end if necessary.

For example:

   libvirt get_cpu_info    API  get_cpu_info   libvirt      XPath      XML .       LibvirtConfigCaps.        . DocImpact Closes-Bug: #1003373 Implements: blueprint libvirt-xml-cpu-model Change-Id: I4946a16d27f712ae2adf8441ce78e6c0bb0bb657

1.3.3 Some examples of bad practice

Let's look at some examples from the history of Git Nova, again with the names of the authors removed, since we do not want to blame or insult anyone.

1.3.3.1 Example 1

 : 468e64d019f51d364afb30b0eed2ad09483e0b98 : [] : . 18  16:07:37 2012 -0400     compute/utils.py Fixes bug #1014829

Problem: there is no mention of what should have been imported, where this import is not enough, and why it is so necessary. This information was included in the error tracker and had to be copied to the commit message in order to provide a self-contained description. For example:

    "exception"  compute/utils.py nova/compute/utils.py      exception.NotFound,     .

1.3.3.2 Example 2

 : 2020fba6731634319a0d541168fbf45138825357 : [] : . 15  11:12:45 2012 -0600    ec2id     Fixes bug #1013765 *      ec2utils.id_to_ec2_id() Change-Id: I5e574f8e60d091ef8862ad814e2c8ab993daa366

Problem: there is no mention of what is wrong with the current (wrong) format, or what the new corrected format is. This information was also available in the error tracker and should be included in the commit message. Moreover, this error corrected the regression caused by the earlier change, but there is no mention of what happened before. For example:

    ec2id        uuid' ,    XXXXXXX,   ec2               (i-xxxxx).      vol-xxx  snap-xxxx.      ec2utils.id_to_ec2_id () Fixes bug #1013765

1.3.3.3 Example 3

 : f28731c1941e57b776b519783b0337e52e1484ab : [] : . 13  10:11:04 2012 -0400     libvirt. Fixes LP bug #1012689 Change-Id: I91c0b7c41804b2b25026cbe672b9210c305dc29b

Problem: in this commit, the message simply documents what was done, and not why it was done. It was necessary to mention that earlier the change set introduced a new minimal version of libvirt. It was also necessary to report what the current behavior is, if the check fails.
For example:

    libvirt,  0.9.7   XXXXXXXX   API  'reset'     libvirt 0.9.7  .     libvirt  ,     .     ,   . Fixes LP bug #1012689 Change-Id: I91c0b7c41804b2b25026cbe672b9210c305dc29b

1.3.4 Examples of good practice

1.3.4.1 Example 1

 : 3114a97ba188895daff4a3d337b2c73855d4632d : [] : . 17  17:16:10 2012 +0100      KVM  VM  PIT  RTC      PIT  RTC    KVM     ()    .  ,  Windows 7       KVM  ,    Linux      .  PIT ,        , ..    RTC  ,        ,    ''    libvirt XML <clock offset='utc'> <timer name='pit' tickpolicy='delay'/> <timer name='rtc' tickpolicy='catchup'/> </clock>   KVM   -no-kvm-pit-reinjection -rtc base=utc,driftfix=slew     ,       .              Closes-Bug: #1011848 Change-Id: Iafb0e2192b5f3c05b6395ffdfa14f86a98ce3d1f

, :

, ( KVM )
( PIT / RTC)
( XML / QEMU args)
( )
Closes-Bug

1.3.4.2 2

 : 31336b35b4604f70150d0073d77dbf63b9bf7598 : [] : . 06  22:45:25 2012 -0400      CPU   ,      ,      ARM    X86_64  .         ,     .  libvirt            permitted_instances_types   cpu_info .  Xen     .         permitted_instances_types     .   ARM     .   ArchFilter  . Change-Id: I17bd103f00c25d6006a421252c9c8dcfd2d2c49b

, :

, ( )
( )
( libvirt )
(, Xen)

Source: https://habr.com/ru/post/329992/

All Articles