📜 ⬆️ ⬇️

Git Automatic verification of a server-side commit message using Python

Target audience, motivation


I hope that the post will be useful for those who at the middle level are familiar with Git and at the initial level - with Python. In addition, it assumes basic knowledge of Unix-system devices and regular expressions.

In my development team, there is a need to organizationally influence the format of messages to commits. Practice has shown that for proper compliance with the rules of the page in the corporate knowledge base is not enough, I wanted to forcibly prohibit pushing poorly designed commits to the server. Having recently started learning Python, I knew that this language is well suited for writing system scripts thanks to its advanced standard library. At the same time, experience suggested that having a specific goal helps a lot when learning anything new. Therefore, rejecting the fear of the unknown, undertook to solve the problem in an unfamiliar language. I would like to make a reservation in advance that at the end of the post are links for which you can find detailed information on all the topics covered in the text.

Git interceptors


Git provides a rich set of interceptors (hooks), allowing you to run custom scripts at the right time, both on the server side and on the client side. If we talk about the moment the push command is executed on the server, then the update file in the hooks subdirectory of the repository directory is responsible for this. This file is launched by the system for each branch pushed to the server.

The interceptor update takes the following parameters:

At the exit, the interceptor must return an exit code: 0 — enable push ; 1 — disable push . In this case, the entire output of the script to the standard output stream is returned to the client.
')

Formulation of the problem


Requirements for the message format are formed under the influence of informed opinions on the web and on the basis of the development process adopted by the company.

They are described by the following pattern:
 projectName-taskId action annotation detailsString1 detailsString2 ... detailsStringN 

Explanations:

Based on the format requirements, a simple algorithm emerges:

Standard Python Library Coding


This section only provides brief code snippets to illustrate the use of the standard language library and Git commands. A link to the full version of the script can be found at the end of the text.

The main lines of the script:
 import sys ... if __name__ == "__main__": sys.exit(main()) 

Each .py file is a module — a set of data, user-defined types, and functions. __name__ is a built-in attribute of the module; in the case of running the script from the command line, this attribute is set to the special value __main__ . If a module is imported by another module, __name__ will contain the name of the file being imported. Due to the above conditional expression, the file can be used both as a module and as a stand-alone script. sys.exit() returns the sys.exit() code of the script, which in turn is returned by the main() function that contains the main logic.

Further implementation of the function to execute console commands:
 import subprocess ... def runBash(commandLine): process = subprocess.Popen(commandLine, shell=True, stdout=subprocess.PIPE) out = process.stdout.read().strip() return out 

subprocess.Popen() creates a child process by launching a program, information about which is passed in arguments. In this case, the standard command shell is started (bash by default for Unix systems), the commandLine string is sent to it for execution, the text result of the command execution is sent to the channel opened by the child process, the contents of which are returned by the function. strip() returns a copy of the string without leading and trailing whitespace.

Now, using the runBash() function, you just need to get a list of commits:
 import sys ... COMMAND_LIST = "git rev-list {}..{}" ... def main(): refOld = sys.argv[2] revNew = sys.argv[3] commits = runBash(COMMAND_LIST.format(refOld, revNew)).split("\n") ... for commit in commits: ... 

The sys.argv array contains command line arguments passed to Git. Using the powerful function format() in this case, the substitution of arguments into a string takes place.

The project name is conveniently stored in the Git settings, because there can be many projects (respectively, and git repositories), and writing the name of the constant in the script code will not work. To set the project name for the repository, just run the git config --add project.name HABR command git config --add project.name HABR

Then the function for getting the project name will look like this:
 COMMAND_PROJECT_NAME = "git config project.name" ... def getProjectName(): return runBash('git config project.name') 

Function for checking a separate commit:
 COMMAND_COMMIT_MESSAGE = "git cat-file commit {} | sed '1,/^$/d'" ... def checkCommit(hash): commitMessage = runBash(COMMAND_COMMIT_MESSAGE.format(hash)) return checkMessage(commitMessage) 

Checking the first line of a commit message with a regular expression:
 import re ... def checkFirstLine(line): ... expression = r"^({0}\-\d+ )?({1})(\/({1}))* .*".format( getProjectName(), AVAILABLE_ACTIONS ) if not re.match(expression, line): ... 

And the last nuance. The script is intended to be run by the Python interpreter version 2.7, and the git repository uses UTF-8 encoding. To combine these two circumstances, the first lines of the file should look like this:
 #!/usr/local/bin/python # -*- coding: utf-8 -*- 

And checking the lengths of strings is done using decode() :
 if len(line.decode("utf-8")) > LENGTH_MAX: ... 

Testing, improvement


On the first day of the run-in of the implemented interceptor, during one of the attempts to push , the following error message was received:
 fatal: Invalid revision range 0000000000000000000000000000000000000000..b12e460740edf4ea41984a676834bee71479aa52 

The commits were arranged correctly, the peculiarity was that a new branch was pushed onto the server. The git rev-list command is not designed for this, it was necessary to handle the situation in a special way:
 import sys ... COMMAND_LIST = "git rev-list {}..{}" COMMAND_FOR_EACH = "git for-each-ref --format='%(objectname)' 'refs/heads/*'" COMMAND_LOG = "git log {} --pretty=%H --not {}" ... ref = sys.argv[1] refOld = sys.argv[2] revNew = sys.argv[3] if refOld == REF_EMPTY: headList = runBash(COMMAND_FOR_EACH) heads = headList.replace(ref + "\n", "").replace("\n", " ") commits = runBash(COMMAND_LOG.format(revNew, heads)).split("\n") else: commits = runBash(COMMAND_LIST.format(refOld, revNew)).split("\n") 

However, this was not enough when a new branch was pushed through without commits. In this case, the error message looked like this:
 usage: git cat-file (-t|-s|-e|-p|<type>|--textconv) <object> or: git cat-file (--batch|--batch-check) < <list_of_objects> 

To fix, you must complete the script with a successful exit code in the absence of commits:
 for commit in commits: if len(commit) == 0: sys.exit(0) 

Conclusion


As an exercise, I suggest that students of Python and Git add to the script a check of the validity of the XML files contained in commits.

The program environment in which the script operates:

Links



PS


Beginning authors always make a reservation about the fact of the first post on Habré and ask to send messages about flaws in the design of the text in PM. I did it too. :)

In the comments I will be glad to comment on the code, as well as information on how to port the script to Python 3 (do you need more than to remove # -*- coding: utf-8 -*- and calls to decode() ?).

Source: https://habr.com/ru/post/192190/


All Articles