📜 ⬆️ ⬇️

File line or activity on the file

Most developers are familiar with such a product as the code_swarm visualizer ( on google code ). At least one in three must have unloaded the log for him and created a video that visualizes the application development process, which shows the activity of programmers. And of course, every second person has seen videos of this kind. Virtually all of these videos were made in the context of the programmer-file relationship.
This article will describe the process of forming a log in the slice of the file-string relationship, that is, the generated video will demonstrate the activity of working on the file.

To whom it is interesting to ask under the cat.
The article will be used:At once I will clarify, the description of the process of generating the diff file will be for git, but the script can be altered if desired, but I will share my experience here.
The finished working result is here .


Log generation script for code_swarm

In order for code_swarm to analyze the history, it needs to be submitted in a certain format. The file format is xml and it looks like this:
  1. <? xml version ="1.0" ? > < file_events > < event date ="" author ="" filename ="" action ="" comment ="" /> </ file_events >
  2. <? xml version ="1.0" ? > < file_events > < event date ="" author ="" filename ="" action ="" comment ="" /> </ file_events >
  3. <? xml version ="1.0" ? > < file_events > < event date ="" author ="" filename ="" action ="" comment ="" /> </ file_events >
  4. <? xml version ="1.0" ? > < file_events > < event date ="" author ="" filename ="" action ="" comment ="" /> </ file_events >
In fact, in code_swarm, you can display any statistics that changes over time and which has an object , which is something and the subject over which the action is performed. In the classic case, when the log for code_swarm is unloaded, let's say a platform such as showteamwork , the object is the programmer, the subject file. In our case, the object will be the file and the subject line, which is added or deleted.
We will take the data from the diff file, which for the most part looks like a classic file, but attached to it is also commits from the repository. The file has the following form:
1142998387000:John Resig<br>&ajax/ajax.js<br>new file mode 100644<br> +// AJAX Plugin<br>+// Docs Here:<br>+// http://jquery.com/docs/ajax/<br>+if ( typeof XMLHttpRequest == 'undefined' && typeof window.ActiveXObject == 'function') {<br>+var XMLHttpRequest = function() {<br>+return new ActiveXObject((navigator.userAgent.toLowerCase().indexOf('msie 5') = 0) ?<br> -Microsoft.XMLHTTP : Msxml2.XMLHTTP);<br>-};<br>-} <br> +.xml = function( type, url, data, ret ) {<br>+var xml = new XMLHttpRequest();<br>+if ( xml ) {<br>+xml.open(type || GET, url, true);<br>+if ( data )<br>+xml.setRequestHeader('Content-Type', 'application/x-www-form-urlencoded');<br>+if ( ret )<br>+xml.onreadystatechange = function() {<br>
This file is unloaded by the command:
git log -U0 --diff-filter=AMD --reverse --pretty= "%at000:%cn" -10 | \<br> grep -v "^\(-\{3\}\|+\{3\}\) " | \<br> grep -v "^[+-][ ]*$" | \<br> grep -v "^[+-]$" | \<br> grep -v "^[ ]*$" | \<br> sed -e "s/diff .* b\//\&/g" \<br> -e "s/^+[ ]\+/+/g" \<br> -e "s/^-[ ]\ + /-/g" \<br> -e "s/[ ]\+$//g" \<br> -e "s/^$//g" \<br> -e 's/\\/\\\\/g' \<br> -e "s/[\"\`<>$]//g"
I think it is worth describing what and for what here:
  1. log - show log
  2. -U0 - add to the log also a diff change with the 0th number of context lines, only changed lines
  3. --diff-filter = AMDshow only files with statuses A: added, M: changed, D: deleted.
  4. --reverse - reverse sorting by date.
  5. --pretty = "% at000:% cn" - % at log format — date, % cn — committer name.
  6. -10 - only the last 10 commits.
  7. grep prohibits the display of lines that begin with 3 + or - with a space, all empty lines, all lines beginning with + or - but they are empty.
  8. sed convert strings: remove all unnecessary spaces after +/-, make strings safe for the shell.
And now the most interesting, there is a very useful utility for bash called awk - it is a language for parsing and processing the input stream. In it all gusto. I first implemented the idea on standard sed, cut and while, but the performance was terrible, but when I redid everything for Awk, the generation rate doubled if not tripled. Actually, enough of the extra words, here is the script code completely (the most current version of the file ):
#!/bin/sh

generate ()   {
if   test   -t   1 ;   then
exec   > $logfile
fi

echo   -e   "<?xml version=\"1.0\"?>\n<file_events>"
echo   "generating ..."   >& 2

awk   - v   typegen = $1   '
BEGIN {
split("\b\b\b\b\b. . . . . \b- \b\b- \b\b- \b\b- \b\b- \b= = = = =", st, " ")
ist=0
_ord_init()

typehash=0
if( typegen == "ch_code") {
typehash=1
}
else if( typegen == "crypt" ) {
typehash=2
}
}

function _ord_init(low, high, i, t)
{
low = sprintf("%c", 7)
if (low == "\a") {
low = 0
high = 127
} else if (sprintf("%c", 128 + 7) == "\a") {
low = 128
high = 255
} else {
low = 0
high = 255
}

for (i = low; i <= high; i++) {
t = sprintf("%c", i)
_ord_[t] = i
}
}

function ord(str, c) {
c = substr(str, 1, 1);
return _ord_[c];
}

/^[0-9]/ {
sub(/:.*/, "");
d=$0;
next;
}
/^&/ {
sub(/&/, "");
f=$0
substr($0, 2, length($0) - 1);
next;
}
/^\+/ { a="A"; }
/^-/ { a="D"; }
/^[\+-]/ {
sub(/[\+-]/, "")
str=""
if( typehash == 1) {
for(i=1; i<length($0); i++){
str = str "" ord(substr($0, i, 1))
}
gsub(/32|16/, "/sd", str)
str = substr(str, 0, length(str)-2) "." substr(str, length(str)-1, 2);
}
else {
cmd="echo \"" $0 "\" | md5sum | cut -f1 -d \" \" | sed -e \"s@[32|16]@/sd@g;\" -e \"s/\\(..\\)\$/.\\1/\""
if ( typehash == 2 )
cmd="C:/Perl/bin/perl -e \"print crypt($ARGV[0], $ARGV[1])\" \"" $0 "\" \"1/5l58j/jk\""
cmd | getline str;
close(cmd);
}

if (str != "")
print "<event date=\""d"\" author=\""f"\" filename=\""str"\" action=\""a"\" comment=\"\"/>"

system("echo -ne \"" st[ist++] "\" >&2")
if (ist > 16) ist=0
}
'
  $gitdiff

echo   - ne   "\b\b\b\b\b\b\b\b\b\b\b\bcompleted!"   >& 2
echo   "</file_events>"
rm   $gitdiff
}

prepare_git ()   {
git   log   - U0   -- diff-filter = AMD   -- reverse   -- pretty = "%at000:%cn"   $1   |   \
grep   - v   "^\(-\{3\}\|+\{3\}\) "   |   \
grep   - v   "^[+-][ ]*$"   |   \
grep   - v   "^[+-]$"   |   \
grep   - v   "^[ ]*$"   |   \
sed   -e   "s/diff .* b\//\&/g"   \
-e   "s/^+[ ]\+/+/g"   \
-e   "s/^-[ ]\+/-/g"   \
-e   "s/[ ]\+$//g"   \
-e   "s/^$//g"   \
-e   's/\\/\\\\/g'   \
-e   "s/[\"\`<>$]//g"   >   $gitdiff
}

fileaction = "$(date +%j%H%M%s)"

typehash = md5
[   -n   "$1"   ]   &&   typehash = $1   ||   echo   -e   " "   +   \
" \n :\n"   +   \
"\t\tmd5 — -\n\t\tcrypt\n\t\tch_code\nusing: $0 crypt"   >& 2
echo   " : " $typehash   >& 2

[   -n   "$2"   ]   &&   countcommit = $2   ||   echo   -e   " \n"   +   \
"git log --help\n:\t-<n>\n\t\tLimits the number of commits to show.\nusing: $0 crypt -10"   >& 2
echo   -n   " : "   >& 2
[   -n   "$2"   ]   &&   echo   $2 ' '   >& 2   ||   echo   " "   >& 2

gitdiff = $fileaction ".temp"
logfile = $fileaction "actions.xml"

prepare_git   $countcommit
generate   $typehash


I will not explain in detail how the awk program works. I can only say in general:In order for strings (subjects) to be digested by visualizers, they are converted in several ways:
  1. md5 - using the md5sum utility, then in the sum all the numbers 32 or 16 are replaced by / sd, and the dot character is added before the last two. This is done so that a tree is built in the gource visualizer
  2. crypt - using the perl crypt function that encrypts the incoming by key and returns the result.
  3. ch_code - simply converts all characters to a digital value and replaces all numbers with 32 or 16 characters / sd.
The script can take 2 parameters:
  1. type of string conversion - this parameter is responsible for string conversion, accepts the values ​​given above, without its indication, the md5 type will be used by default.
  2. number of commits - this parameter is passed to the function of generation of diff, in order to limit the number of commits output, you need to transfer the following construct -num , where num is the number of commits. If it is not, then all commits are taken.
Data output occurs in a file created automatically. But if desired, the output can be done in any other file. To start the activity log generation function from sh at any of your repositories, run the following command:
$ echo "{ } \$@" > /bin/genlogcs
Actually, by generating activity, everything.


Config for code_swarm

Now let's talk about the config for code_swarm. For starters, I have compiled code_swarm from sources, the resulting file can be downloaded from here . Put it in your own directory where code_swarm is located in the dist directory.
Create a file called my.conf with the following contents:
#
ColorAssign1 = "DigitLetter" , ".*[0-9][az]" , 43 , 170 , 215 , 43 , 170 , 215
#
ColorAssign2 = "LetterDigit" , ".*[az][0-9]" , 255 , 134 , 51 , 255 , 134 , 51
#
ColorAssign3 = "LetterLetter" , ".*[az][az]" , 43 , 110 , 214 , 43 , 110 , 214
#
ColorAssign4 = "DigitDigit" , ".*[0-9][0-9]" , 41 , 242 , 185 , 41 , 242 , 185

Width = 1280
Height = 720
InputFile = data / my / data / actions . xml
PhysicsEngineConfigDir = physics_engine
PhysicsEngineSelection = PhysicsEngineOrderly
ParticleSpriteFile = src / particle . png
Font = Helvetica
FontSize = 16
BoldFontSize = 16
#MillisecondsPerFrame=2254085
MaxThreads = 4
Background = 0 , 0 , 0
TakeSnapshots = true
SnapshotLocation = data / my / png / cs - #####. png
DrawNamesSharp = true
DrawNamesHalos = true
DrawFilesSharp = false
DrawFilesFuzzy = true
DrawFilesJelly = false
ShowLegend = true
ShowHistory = true
ShowDate = true
ShowEdges = false
ShowDebug = false
EdgeLength = 36
EdgeDecrement = - 2
FileDecrement = - 1
PersonDecrement = - 1
FileSpeed = 7.0
PersonSpeed = 2.0
FileMass = 2.0
PersonMass = 10.0
EdgeLife = 250
FileLife = 200
PersonLife = 255
HighlightPct = 5
UseOpenGL = false
ShowUserName = true
IsInputSorted = false
This file is useful to us in the future.
')

Script for generating video activity visualization

So as not to explain for a long time what to create and where and what directory structure should be, I’ll just say in the directory with code_swarm, in the data directory, create the my directory with the structure shown here . You need to take the following:And we also need to create 2 png and results catalogs.

Script gen_log

Since it is interesting to see the result of working on files not only in code_swarm, but also in gource, I made a script that generates a log for it. This script is called gen_log (the most current version of the file ):
#!/bin/sh
uses (){
echo -e 'using\n$0 file_codeswarm.xml'
}

generatelog (){
echo "genereting... "
state =( "\\" "|" "/" "—" )
i = 0
if [ -f "$1" ]; then
result = ${1%.*} '.log'
echo -n > $result
# event
grep -e "event " $1 | \
#
# <event />
sed -e "s/^[ ]*//;s/^<event //g;s|/>$||g" | \
while read line
do
date = ""
# , 4
eval $line ;
# date,
[ -n "$date" ] && [ "`echo -n $data | wc -c`" - gt "10" ] && date = `echo $data | sed -e "s/^\(.\{10\}\).*/\1/"`
[ -n "$date" ] && echo "$date|$author|$action|$filename" >> $result
# .
echo - ne "\b${state[$i]}"
(( i += 1 ))
[[ $i - eq 5 ]] && i = 0
done
echo - ne "\bcompleted!"
else
echo -e "file log code_swarm not exsits!\n$1"
fi
}

[ -n "$1" ] && generatelog $1 || uses
This script uses the useful eval function. It executes the text as if you typed it into the command line. This approach is convenient in our case, since the input line has the following form:
date ="1142998387000" author ="ajax/ajax.js" filename ="c9/sd/sd9db4/sd/sd/sdb945/sdb89a/sd/sd7/sd/sdfbfdf.04" action ="A" comment =""
As you understand, the system will process this line and we will have 5 variables date , author , filename , action , comment (thanks to bliznezz ). These variables are uploaded to a file with the following format:
date|author|action|filename
True with gource, you can already customize this format. The file processing format is in the file {gource_home} /data/gource.style

Script run.bat

Now we will collect everything into a common file that will process the activity file generated by you using the genlogcs command, which you put in the {code_swarm_home} /data/my/data/actions.xml directory.
Here is its contents (the most current version of the file ):
call sh gen_log ./data/actions.xml

call sh sort_log ./data/actions.log > data\gource.log

pushd png
del *.png
popd

pushd ..\..
call run.bat data\my\my.config
popd

pushd png
call "..\tools\nt\mencoder" mf://*.png -mf fps=19:type=png -ovc x264 -x264encopts pass=1:bitrate=1000 -oac copy -audiofile "..\data\audio.wav" -o "..\results\result.avi"
popd

pushd "tools\gource"
call gource.exe --hide filenames,dirnames --user-scale 2 --output-framerate 25 --stop-position 1 --highlight-all-users --seconds-per-day 1 --output-ppm-stream "..\..\results\resultgource.ppm" "..\..\data\gource.log"
popd

pushd "tools\nt"
call ffmpeg -y -b 9000K -f image2pipe -vcodec ppm -i "..\..\results\resultgource.ppm" -fpre "..\ll.ffpreset" -i "..\..\results\resultgource.ppm" -vcodec libx264 "..\..\results\resultgource.avi"

call mencoder "..\..\results\resultgource.avi" -ovc x264 -x264encopts pass=1:bitrate=10000 -ofps 19 -speed 2 -o "..\..\results\resultgource.fps"

call mencoder "..\..\results\resultgource.fps" -ovc x264 -x264encopts pass=1:bitrate=10000 -oac copy -audiofile "..\..\data\audio.wav" -o "..\..\results\resultgource.avi"
popd

del results\resultgource.ppm
del results\resultgource.fps
del data\actions.log
This script performs the following actions:
  1. sorts it with sort -k1 -t "|" , but since we are doing it under Windows, I put it in another file, and then it swears in Windows. Sorting is necessary, as gource works correctly only with sorted data.
  2. Runs code_swarm with the config we described in my.config. As a result, code_swarm generates a large number of png files.
  3. We convert png files into video using mencoder , while attaching a soundtrack, the duration of the video can be adjusted with the -mf fps = 19: type = png parameter, where 19 this should be the ratio between the png file count and the duration of the audio track in seconds. But I don’t like it and for this I use a value acceptable to me.
  4. Then gource is started and unloads the result in ppm. I warn the file will be very large in several gigabytes, so the way to upload is indicated considering this.
  5. Then we overtake this ppm file with the help of the ffmpeg utility into the avi file, but it turns out very large and long. Accelerate with the help of mencoder to the same 19 fps. And then run the mencoder again in order to attach a sound track to it. As a result, we get a file of just a few 10 megabytes.
I looked at the video conversion in the showteamwork framework. After completing all the steps in the directory that you specified for the results of the work (in my case it is {code_swarm_home} / data / my / results), two video clips with visualized activity.


Execution result, with md5 generation type

Here are my results from the jquery repository.

code_swarm



What does this mean (model "attraction"):Here the brightest objects are most likely curly braces that are often found in js scripts. and as a rule the closing one happens in a line, md5sum is the same for all of them. Also md5sum can be the same for several different lines. But you can put up with it. If you need the most objective picture, use the ch_code generation type.

gource



What does this mean (model "bees and honeycombs"):An interesting picture is reminiscent of a fractal. Each branch is as if the directory every leaf on the tree is a file.


Results

This article primarily emphasizes the fact that with the help of code_swarm and gource visualizers you can process any statistics that has variable time, the main thing is to submit these statistics to them in the right way.
All this of course more like a game. For me at least that's what it is. Let's say that these things add variety to the work of the programmer.
Make my turnip clone
$ git clone git://github.com/artzub/code_swarm-gource-my-conf.git test
and post your results to me very interesting.


Literature

upd: Everyone is silent, but in the log generation script for code_swarm there is an error, or I do not even know a typo how to correctly say. In the regular expression, / ^ + / error since + must be escaped like this / ^ \ + / . Strange that in Windows everything fulfills launched under debian awk swore! =)

Source: https://habr.com/ru/post/114630/


All Articles