📜 ⬆️ ⬇️

Article translation: Best Practice for Creating Git Commit from OpenStack

I propose to the readers of "Habrakhabr" a translation of the article "Good practice in communicating commits from OpenStack" .


1 Git Commit Best Practice


The following document is based on the experience of developing code, fixing bugs and viewing code in a number of projects using Git, including libvirt, QEMU and OpenStack Nova. Consideration of other open source projects, such as Kernel, CoreUtils, GNULIB and others, assumes that they all follow a fairly common practice. This is motivated by the desire to improve the quality of the history of the Nova Git project. Quality is an abstract term for definition in development; when for one person a certain code “Beautiful” (Thing of Beauty) - then for another it is “Crutch” (Evil Hack). Nevertheless, we can formulate some general recommendations on how and what to do, or, conversely, what not to do when sending Git commits to merge with projects in OpenStack.


This topic can be divided into two areas:


  1. The order of combining or splitting into multiple commits
  2. Information in commit messages

Content



1.1 Overview


The thoughts and examples described in this document should clearly demonstrate the importance of breaking changes into a group of consecutive commits, as well as the importance of writing good messages to them. If these instructions are widely used, it will significantly improve the quality of the git OpenStack history. Carrots and sticks are needed to effectively change the history of git. This document should be a carrot, drawing people's attention to the benefits and benefits, while for others the use of the Gerrit code inspection system will act like a whip. ;-P


In other words, when inspecting in Gerrit, it’s not just necessary to look at the correctness of the code. When inspecting, you first need to examine the commit message and point out improvements to the message itself. Get accustomed to commits that combine many logical changes and require the sender to split them into separate commits. Make sure that changes with formatting indents are not merged with functional changes. Also make sure that the refactoring of the code is fixed separately from the functional changes, and so on.


It is also worth mentioning that Gerrit's handling of a series of patches is not entirely perfect. However, you should not consider this a valid reason to avoid creating a series of patches. The tools used must comply with the needs of the developers, and since they are open source, they can be fixed or improved. The source code is “often read, write from time to time,” and therefore the most important criterion is to improve the long-term support of the code with the help of a large number of developers in the community, and you shouldn’t sacrifice a lot because of one author who may never touch code again.


And now the detailed principles, as well as examples of good and bad practice.


1.2 Structural separation of changes


The basic rule for creating good commits is to provide only one “logical change” for each commit. There are many reasons why this is an important rule:



1.2.1 What to avoid when creating commits


You need to know and understand common examples of bad practice in order to avoid them:



Formatting changes will hide important functional changes, making it difficult for a reviewer to accurately determine if a commit is correct. Solution: Create 2 commit, one with formatting changes, the other with functional changes. Usually, changing indents is done first, but this is not a strict rule.



Again, it will be more difficult for a reviewer to identify flaws if you mix two unrelated changes. If the need arises for a later rollback of a broken commit, it will be necessary to sort out and unravel two unrelated changes with the additional risk of creating new errors.



It is possible that the code for the new function is useful only when the entire implementation is present. However, this does not mean that the entire function should be provided in one commit. New features often entail refactoring existing code. It is highly desirable that any refactoring be performed in separate commits, and not in those in which the new function is implemented. This will help reviewers and test suites confirm that refactoring does not have unintended functional changes. Even newly written code can often be divided into several parts that can be independently considered. For example, changes that add new internal APIs or classes may be in separate commits. Again, this simplifies the verification of the code. It also allows other developers to get small chunks of work with git cherry-pick, if the new feature is not quite ready to merge. Adding new public HTTP APIs or RPC interfaces should be done in commits separate from the actual internal implementation. This will encourage the author and reviewers to think about the overall design of the API or RPC, and not just choose a design that is simpler for the currently selected internal implementation. If the patch affects public HTTP, use the APIImpact flag (see Including External Links ).


The main thing is to follow the rule:


, . - . .


1.2.2 Examples of bad practice


Now let's illustrate a bit of examples from the history of Git Nova. Note: although the comm hashes are cited for reference, the names of the authors are removed, since we should not blame or insult a single person. From time to time, each of us is guilty of violating the rules of good form. In addition, people who have reviewed and approved these commits are as guilty as the one who wrote and sent them;


1.2.2.1 Example 1


 : ae878fc8b9761d099a4145617e4a48cbeb390623 : [] : . 1  01:44:02 2012 . +0000    create libvirt *       create *   wait_for_destroy    undefine *        *  reset     create / destroy *  resume_host_state     hard_reboot *  rescue/unrescue        Change-Id: I2072f93ad6c889d534b04009671147af653048e7 

This commit has made at least two independent changes.


  1. Switch to the use of the new reset API in the “hard_reboot” method
  2. Correction of internal driver methods to not use "hard_reboot"

What is the problem here:



1.2.2.2 Example 2


 : e0540dfed1c1276106105aea8d5765356961ef3d : [] : C. 16  15:17:53 2012 +0400  lvm-disk-images     LVM   VM.   LVM    libvirt. VM-     LVM      `libvirt_images_volume_group`.   `libvirt_local_images_type` ,     .  : `raw`, `lvm`, `qcow2`, `default`.  `libvirt_local_images_type` = `default`,     `use_cow_images`.   `libvirt_sparse_logical_volumes`  ,       (   virtualsize        ).       `False`.     : `Raw`, `Qcow2`  `Lvm`.      ,      `LibvirtConnection._cache_image` ` libvirt_info`      `LibvirtGuestConfigDisk`  libvirt.  `Backend` ,     . Change-id: I0d01cb7d2fd67de2565b8d45d34f7846ad4112c2 

This change introduces one major new function, so at first glance it seems reasonable to use a single commit, but looking at the patch, it is clearly very confusing, since it contains a significant amount of code refactoring with the new LVM function. This fact makes it difficult to determine probable regressions with the support of QCow2 / Raw images. It should be divided into smaller ones, at least into four separate commits.



1.2.3 Examples of good practice


1.2.3.1 Example 1


 : 3114a97ba188895daff4a3d337b2c73855d4632d : [] : . 11  17:16:10 2012 +0100      KVM  VM  PIT  RTC : 573ada525b8a7384398a8d7d5f094f343555df56 : [] : . 1  17:09:32 2012 + 0100       VM libvirt 

Together, these two changes provide support for configuring guest KVM timers. The introduction of new APIs for creating the XML configuration libvirt was clearly separated from the change in the KVM guest creation policy, which uses new APIs.


1.2.3.2 Example 2


 : 62bea64940cf629829e2945255cc34903f310115 : [] : . 1  14:49:42 2012 -0400     rpc.queue_get_for(). Change-Id: Ifa7d648e9b33ad2416236dc6966527c257baaf88 : cf2b87347cd801112f89552a78efabb92a63bac6 : [] : . 30  14:57:03 2012 -0400   shared_storage_test   rpcapi. ......  get_instance_disk_info    rpcapi. ......  remove_volume_connection    rpcapi. ......  compare_cpu    rpcapi. ......  get_console_topic()    rpcapi. ......  refresh_provider_fw_rules()    rpcapi. ...    ... 

This commit sequence reorganized the entire RPC API layer inside OpenStack Compute (Nova) so that the implementation of the connected message systems can be used. For such a significant change in the main part of the functionality, it was key to dividing the work into a larger sequence of commits, allowing you to make a meaningful check of the code, and also simplified tracking changes and finding possible regressions at each stage of the process.


1.3 Information in commit messages


The text of the commit message is as important as the code changes themselves. When creating a message there are some important things to remember:



It is often not always clear what the problem was, even if a bug report was read, as well as a couple of comments. The commit message must contain a clear statement of the original problem. A specific bug is just an interesting historical background on how the problem was identified. The reviewer should be clear what the proposed patch does for correct decision making without studying this defect in the error tracking system.



After 6 months, when someone on the train / plane / bus / beach / bar resolves the problem and views the history of git, there is no guarantee that he will have access to the online error report or to the server with the documentation. Distributed source control systems (SCM) have taken a big step forward in that you no longer need to be “online” to have access to all the information in the repository. The commit message must be completely self-contained in order to continue to benefit from git.



What is obvious to one person may be completely incomprehensible to another person. Always document what the original problem was and how it was corrected for every change, of course, except for fixing obvious typos or formatting changes.



A common mistake is to simply describe how the code was written, without describing why the developer decided to do it that way. The most important thing is to explain the intention and motivation of the changes, but also do not forget to describe the general structure of the code.



Often when describing a large message it becomes obvious that the change had to be divided into two or more parts. Do not be afraid to go back and split it into several commits.



When Gerrit sends email alerts about new patch fixes, the email contains minimal information, mainly a commit message and a list of changed files. Given the large amount of patches, one should not expect all reviewers to study them in detail. Thus, the commit message should contain sufficient information to warn potential reviewers that this is the patch they should be looking at.



In Git, the first line of a commit message has a special meaning. It is used as the subject of the email, in the git annotate command messages, in the gitk viewer program, when merging a branch and in many other places where there is not much room for text. In addition to a brief description of the change itself, you should not forget which part of the code is affected. For example, if this affects the libvirt driver, enter “libvirt” somewhere on the first line.



If the code being modified has features for future enhancements or any limitations you know, be sure to mention them in the commit message. This demonstrates to the reviewer that the author has considered a broader picture, as well as which compromises have been made in terms of short-term goals compared to long-term desires.



In other words, if you redo your commit, please do not add “Patch number 2: redone” (Patch set 2: rebased) to your message. This will not make any difference after merging the changes. However, write a note in Gerrit as a comment on your change. This will help reviewers find out what has changed between patchsets. This also applies to comments such as "Added unit tests", "Fix localization problems" or any other patches that do not affect the overall goal of your commit.


The main thing is to follow the rule:


, . - . .


1.3.1 Inclusion of external links


Although the message is mainly intended for human interpretation, it always contains the metadata provided for machine processing. In OpenStack, they include a “Change Id” (Change ID), as well as optional links to bug identifiers, blueprint links (schemes / documents), DocImpact flag, APIImpact flag, and SecurityImpact flag.



All metadata is intended for machines and is of secondary importance for people, therefore, all of them should be grouped together at the end of the commit message.


Note: Although in many open source projects using Git, it is a popular practice to include the "Signed-off-by" tag (generated by using the "git commit -s" command), this is not necessary for OpenStack. Before being able to send code to Gerrit, OpenStack requires all participants to sign a CLA (Contributor License Agreement, License Agreement for Members), which serves an equivalent purpose.


We encourage the use of Co-Authored-By: name name@example.com ( Collaboration ) in commit messages to identify people who worked on a specific patch. This is an agreement on the recognition of several authors, and our projects call upon statistical collection tools to follow these fields as they are analyzed.


1.3.2 Overview of the commit message structure



For example:


   libvirt get_cpu_info    API  get_cpu_info   libvirt      XPath      XML .       LibvirtConfigCaps.        . DocImpact Closes-Bug: #1003373 Implements: blueprint libvirt-xml-cpu-model Change-Id: I4946a16d27f712ae2adf8441ce78e6c0bb0bb657 

1.3.3 Some examples of bad practice


Let's look at some examples from the history of Git Nova, again with the names of the authors removed, since we do not want to blame or insult anyone.


1.3.3.1 Example 1


 : 468e64d019f51d364afb30b0eed2ad09483e0b98 : [] : . 18  16:07:37 2012 -0400     compute/utils.py Fixes bug #1014829 

Problem: there is no mention of what should have been imported, where this import is not enough, and why it is so necessary. This information was included in the error tracker and had to be copied to the commit message in order to provide a self-contained description. For example:


    "exception"  compute/utils.py nova/compute/utils.py      exception.NotFound,     . 

1.3.3.2 Example 2


 : 2020fba6731634319a0d541168fbf45138825357 : [] : . 15  11:12:45 2012 -0600    ec2id     Fixes bug #1013765 *      ec2utils.id_to_ec2_id() Change-Id: I5e574f8e60d091ef8862ad814e2c8ab993daa366 

Problem: there is no mention of what is wrong with the current (wrong) format, or what the new corrected format is. This information was also available in the error tracker and should be included in the commit message. Moreover, this error corrected the regression caused by the earlier change, but there is no mention of what happened before. For example:


    ec2id        uuid' ,    XXXXXXX,   ec2               (i-xxxxx).      vol-xxx  snap-xxxx.      ec2utils.id_to_ec2_id () Fixes bug #1013765 

1.3.3.3 Example 3


 : f28731c1941e57b776b519783b0337e52e1484ab : [] : . 13  10:11:04 2012 -0400     libvirt. Fixes LP bug #1012689 Change-Id: I91c0b7c41804b2b25026cbe672b9210c305dc29b 

Problem: in this commit, the message simply documents what was done, and not why it was done. It was necessary to mention that earlier the change set introduced a new minimal version of libvirt. It was also necessary to report what the current behavior is, if the check fails.
For example:


    libvirt,  0.9.7   XXXXXXXX   API  'reset'     libvirt 0.9.7  .     libvirt  ,     .     ,   . Fixes LP bug #1012689 Change-Id: I91c0b7c41804b2b25026cbe672b9210c305dc29b 

1.3.4 Examples of good practice


1.3.4.1 Example 1


 : 3114a97ba188895daff4a3d337b2c73855d4632d : [] : . 17  17:16:10 2012 +0100      KVM  VM  PIT  RTC      PIT  RTC    KVM     ()    .  ,  Windows 7       KVM  ,    Linux      .  PIT ,        , ..    RTC  ,        ,    ''    libvirt XML <clock offset='utc'> <timer name='pit' tickpolicy='delay'/> <timer name='rtc' tickpolicy='catchup'/> </clock>   KVM   -no-kvm-pit-reinjection -rtc base=utc,driftfix=slew     ,       .              Closes-Bug: #1011848 Change-Id: Iafb0e2192b5f3c05b6395ffdfa14f86a98ce3d1f 

, :



1.3.4.2 2


 : 31336b35b4604f70150d0073d77dbf63b9bf7598 : [] : . 06  22:45:25 2012 -0400      CPU   ,      ,      ARM    X86_64  .         ,     .  libvirt            permitted_instances_types   cpu_info .  Xen     .         permitted_instances_types     .   ARM     .   ArchFilter  . Change-Id: I17bd103f00c25d6006a421252c9c8dcfd2d2c49b 

, :



')

Source: https://habr.com/ru/post/329992/


All Articles