⬆️ ⬇️

Injecting a legacy story into a tree: finding the optimal branch point

On duty, I inherited a certain system that has ~ 15 years of history and about a few dozen installations in different organizations. The system itself is relatively small (~ 25K lines of code, ~ 1K commits), but the problem was in release management:





In practice, of course, grateful customers of this system from time to time still want to get support, fix bugs, and sometimes even some global improvements in the core of the system.



After a short consultation, it was considered inexpedient to continue support of such a distributed system in svn and it was decided to migrate to git.

')

Problem number one - to drag the master tree from svn to git - it was generally solved simply by standard git-svn tools .



The set of problems number two - how to pour into the tree numerous forks in different installations - it was decided to disassemble "as they become available." When the next organization woke up, it was necessary:



  1. get them fork
  2. Understand where he was at the time and forked and to what level he last rebased (if it is svn checkout)
  3. create a new brunch for this fork
  4. try to divide the changes into more or less semantically-related smaller pieces and commit them all into this brunch


The main gag was suddenly at step 2 - to understand where the next installation was forked from. In the case of svn checkout, one could at least look at the current state of working copy, in the case of svn export, it was not trivial to guess. Having stumbled upon a semi-manual archaeological study of the state of the code a couple of times, I was bored and decided to automate the searches. There was no ready solution (git bisect here, unfortunately, is no good) and the following script turned out:



#!/bin/sh -ef if [ $# -ne 2 ]; then echo "Usage: $0 <git-repo-dir> <candidate-checkout-dir>" exit 1 fi GIT_REPO="$1" CANDIDATE_DIR=$(cd "$2" && pwd) TAB=$(printf '\t') cd "$GIT_REPO" COMMITS=$(git log --all --format=format:%H) # Remember current commit CURRENT_COMMIT=$(git rev-parse HEAD) for C in $COMMITS; do git checkout --quiet $C echo -n "$C$TAB" diff -urN --exclude=.git --exclude=.svn "$CANDIDATE_DIR" . | wc -l done | sort -t"$TAB" -k2,2n # Restore current commit git checkout --quiet "$CURRENT_COMMIT" 


The script takes 2 parameters: (1) the path to the git-repository, (2) the path to the next fork candidate, for which you need to find a place for a “tie-in” to the general tree of the project's development. The script trivially calculates the amount of diff (in lines) between each checkout of the repository and the candidate-in-box. With a high probability - a commit where the volume of differences is minimal - and there is an optimal place to base a brunch. The result of the work looks like this:



 3810315aaa238e32a7106312f9973f1d1f0ea097 651
 19b595d87eecc43933ea60d89882319c7ac3f512 835
 989cee69664733b773a4a81cc49e2a1a0cdff38a 872
 9026dae1154f98018c808b73c7f1c6cd09310dc7 885
 802943edf287ad28d5e71a57510400afacb49176 894
 c5bd4050fce754e16664e6e1eeb57a4ff3ed06c6 894
 dcb70c4a2e9fc0431ceb6154ecd1688189362622 908
 ...




This means that the problem will most likely be solved in the following way:

 $ git branch new-organization 3810315aaa238e32a7106312f9973f1d1f0ea097 $ git checkout new-organization $ cp -r ../new-organization-fork/* . 


... after which you can already deal with the changes, try to divide them into pieces and commit (perhaps even with --date and --author, if you can figure them out).



I would be glad if this solution will be useful to someone else. Comments and tips on how to do better are welcome.

Source: https://habr.com/ru/post/191696/



All Articles