File line or activity on the file

Most developers are familiar with such a product as the code_swarm visualizer ( on google code ). At least one in three must have unloaded the log for him and created a video that visualizes the application development process, which shows the activity of programmers. And of course, every second person has seen videos of this kind. Virtually all of these videos were made in the context of the programmer-file relationship.
This article will describe the process of forming a log in the slice of the file-string relationship, that is, the generated video will demonstrate the activity of working on the file.

To whom it is interesting to ask under the cat.
The article will be used:

Git - VCS
code_swarm is a repository history visualizer.
gource is a repository history visualizer.
The linux environment emulator on Windows or UNIX OS (with git, the msysgit emulator is already running for win)
MEncoder - Free Video Encoder
ffmpeg - a program for converting video using various codecs.

At once I will clarify, the description of the process of generating the diff file will be for git, but the script can be altered if desired, but I will share my experience here.
The finished working result is here .

Log generation script for code_swarm

In order for code_swarm to analyze the history, it needs to be submitted in a certain format. The file format is xml and it looks like this:

<? xml version ="1.0" ? > < file_events > < event date ="" author ="" filename ="" action ="" comment ="" /> </ file_events >
<? xml version ="1.0" ? > < file_events > < event date ="" author ="" filename ="" action ="" comment ="" /> </ file_events >
<? xml version ="1.0" ? > < file_events > < event date ="" author ="" filename ="" action ="" comment ="" /> </ file_events >
<? xml version ="1.0" ? > < file_events > < event date ="" author ="" filename ="" action ="" comment ="" /> </ file_events >

In fact, in code_swarm, you can display any statistics that changes over time and which has an object , which is something and the subject over which the action is performed. In the classic case, when the log for code_swarm is unloaded, let's say a platform such as showteamwork , the object is the programmer, the subject file. In our case, the object will be the file and the subject line, which is added or deleted.
We will take the data from the diff file, which for the most part looks like a classic file, but attached to it is also commits from the repository. The file has the following form:

1142998387000:John Resig &ajax/ajax.js new file mode 100644 +// AJAX Plugin +// Docs Here: +// http://jquery.com/docs/ajax/ +if ( typeof XMLHttpRequest == 'undefined' && typeof window.ActiveXObject == 'function') { +var XMLHttpRequest = function() { +return new ActiveXObject((navigator.userAgent.toLowerCase().indexOf('msie 5') = 0) ? -Microsoft.XMLHTTP : Msxml2.XMLHTTP); -}; -} +.xml = function( type, url, data, ret ) { +var xml = new XMLHttpRequest(); +if ( xml ) { +xml.open(type || GET, url, true); +if ( data ) +xml.setRequestHeader('Content-Type', 'application/x-www-form-urlencoded'); +if ( ret ) +xml.onreadystatechange = function() {

This file is unloaded by the command:

git log -U0 --diff-filter=AMD --reverse --pretty= "%at000:%cn" -10 | \ grep -v "^$-\{3\}\|+\{3\}$ " | \ grep -v "^[+-][ ]*$" | \ grep -v "^[+-]$" | \ grep -v "^[ ]*$" | \ sed -e "s/diff .* b\//\&/g" \ -e "s/^+[ ]\+/+/g" \ -e "s/^-[ ]\ + /-/g" \ -e "s/[ ]\+$//g" \ -e "s/^$//g" \ -e 's/\\/\\\\/g' \ -e "s/[\"\`<>$]//g"

I think it is worth describing what and for what here:

log - show log
-U0 - add to the log also a diff change with the 0th number of context lines, only changed lines
--diff-filter = AMD — show only files with statuses A: added, M: changed, D: deleted.
--reverse - reverse sorting by date.
--pretty = "% at000:% cn" - % at log format — date, % cn — committer name.
-10 - only the last 10 commits.
grep prohibits the display of lines that begin with 3 + or - with a space, all empty lines, all lines beginning with + or - but they are empty.
sed convert strings: remove all unnecessary spaces after +/-, make strings safe for the shell.

And now the most interesting, there is a very useful utility for bash called awk - it is a language for parsing and processing the input stream. In it all gusto. I first implemented the idea on standard sed, cut and while, but the performance was terrible, but when I redid everything for Awk, the generation rate doubled if not tripled. Actually, enough of the extra words, here is the script code completely (the most current version of the file ):

#!/bin/sh generate () { if test -t 1 ; then exec > $logfile fi echo -e "<?xml version=\"1.0\"?>\n<file_events>" echo "generating ..." >& 2 awk - v typegen = $1 ' BEGIN { split("\b\b\b\b\b. . . . . \b- \b\b- \b\b- \b\b- \b\b- \b= = = = =", st, " ") ist=0 _ord_init() typehash=0 if( typegen == "ch_code") { typehash=1 } else if( typegen == "crypt" ) { typehash=2 } } function _ord_init(low, high, i, t) { low = sprintf("%c", 7) if (low == "\a") { low = 0 high = 127 } else if (sprintf("%c", 128 + 7) == "\a") { low = 128 high = 255 } else { low = 0 high = 255 } for (i = low; i <= high; i++) { t = sprintf("%c", i) _ord_[t] = i } } function ord(str, c) { c = substr(str, 1, 1); return _ord_[c]; } /^[0-9]/ { sub(/:.*/, ""); d=$0; next; } /^&/ { sub(/&/, ""); f=$0 substr($0, 2, length($0) - 1); next; } /^\+/ { a="A"; } /^-/ { a="D"; } /^[\+-]/ { sub(/[\+-]/, "") str="" if( typehash == 1) { for(i=1; i<length($0); i++){ str = str "" ord(substr($0, i, 1)) } gsub(/32|16/, "/sd", str) str = substr(str, 0, length(str)-2) "." substr(str, length(str)-1, 2); } else { cmd="echo \"" $0 "\" | md5sum | cut -f1 -d \" \" | sed -e \"s@[32|16]@/sd@g;\" -e \"s/\$..\$\$/.\\1/\"" if ( typehash == 2 ) cmd="C:/Perl/bin/perl -e \"print crypt($ARGV[0], $ARGV[1])\" \"" $0 "\" \"1/5l58j/jk\"" cmd | getline str; close(cmd); } if (str != "") print "<event date=\""d"\" author=\""f"\" filename=\""str"\" action=\""a"\" comment=\"\"/>" system("echo -ne \"" st[ist++] "\" >&2") if (ist > 16) ist=0 } ' $gitdiff echo - ne "\b\b\b\b\b\b\b\b\b\b\b\bcompleted!" >& 2 echo "</file_events>" rm $gitdiff } prepare_git () { git log - U0 -- diff-filter = AMD -- reverse -- pretty = "%at000:%cn" $1 | \ grep - v "^$-\{3\}\|+\{3\}$ " | \ grep - v "^[+-][ ]*$" | \ grep - v "^[+-]$" | \ grep - v "^[ ]*$" | \ sed -e "s/diff .* b\//\&/g" \ -e "s/^+[ ]\+/+/g" \ -e "s/^-[ ]\+/-/g" \ -e "s/[ ]\+$//g" \ -e "s/^$//g" \ -e 's/\\/\\\\/g' \ -e "s/[\"\`<>$]//g" > $gitdiff } fileaction = "$(date +%j%H%M%s)" typehash = md5 [ -n "$1" ] && typehash = $1 || echo -e " " + \ " \n :\n" + \ "\t\tmd5 — -\n\t\tcrypt\n\t\tch_code\nusing: $0 crypt" >& 2 echo " : " $typehash >& 2 [ -n "$2" ] && countcommit = $2 || echo -e " \n" + \ "git log --help\n:\t-<n>\n\t\tLimits the number of commits to show.\nusing: $0 crypt -10" >& 2 echo -n " : " >& 2 [ -n "$2" ] && echo $2 ' ' >& 2 || echo " " >& 2 gitdiff = $fileaction ".temp" logfile = $fileaction "actions.xml" prepare_git $countcommit generate $typehash

I will not explain in detail how the awk program works. I can only say in general:

If the line starts with a number, it means that this is the time of the commit, we extract and remember this value and immediately go to the next line.
If the line starts with the & sign, the following characters are the file name, extract and go to the next line.
If the line starts with the + sign, then the action is of type A, we continue the analysis of the line
If the line starts with the sign -, then the action type D, continue the analysis of the line
If the line starts with the sign - / +, then process the line and display it in STDOUT, go to the next line

In order for strings (subjects) to be digested by visualizers, they are converted in several ways:

md5 - using the md5sum utility, then in the sum all the numbers 32 or 16 are replaced by / sd, and the dot character is added before the last two. This is done so that a tree is built in the gource visualizer
crypt - using the perl crypt function that encrypts the incoming by key and returns the result.
ch_code - simply converts all characters to a digital value and replaces all numbers with 32 or 16 characters / sd.

The script can take 2 parameters:

type of string conversion - this parameter is responsible for string conversion, accepts the values given above, without its indication, the md5 type will be used by default.
number of commits - this parameter is passed to the function of generation of diff, in order to limit the number of commits output, you need to transfer the following construct -num , where num is the number of commits. If it is not, then all commits are taken.

Data output occurs in a file created automatically. But if desired, the output can be done in any other file. To start the activity log generation function from sh at any of your repositories, run the following command:

$ echo "{ } \$@" > /bin/genlogcs

Actually, by generating activity, everything.

Config for code_swarm

Now let's talk about the config for code_swarm. For starters, I have compiled code_swarm from sources, the resulting file can be downloaded from here . Put it in your own directory where code_swarm is located in the dist directory.
Create a file called my.conf with the following contents:

# ColorAssign1 = "DigitLetter" , ".*[0-9][az]" , 43 , 170 , 215 , 43 , 170 , 215 # ColorAssign2 = "LetterDigit" , ".*[az][0-9]" , 255 , 134 , 51 , 255 , 134 , 51 # ColorAssign3 = "LetterLetter" , ".*[az][az]" , 43 , 110 , 214 , 43 , 110 , 214 # ColorAssign4 = "DigitDigit" , ".*[0-9][0-9]" , 41 , 242 , 185 , 41 , 242 , 185 Width = 1280 Height = 720 InputFile = data / my / data / actions . xml PhysicsEngineConfigDir = physics_engine PhysicsEngineSelection = PhysicsEngineOrderly ParticleSpriteFile = src / particle . png Font = Helvetica FontSize = 16 BoldFontSize = 16 #MillisecondsPerFrame=2254085 MaxThreads = 4 Background = 0 , 0 , 0 TakeSnapshots = true SnapshotLocation = data / my / png / cs - #####. png DrawNamesSharp = true DrawNamesHalos = true DrawFilesSharp = false DrawFilesFuzzy = true DrawFilesJelly = false ShowLegend = true ShowHistory = true ShowDate = true ShowEdges = false ShowDebug = false EdgeLength = 36 EdgeDecrement = - 2 FileDecrement = - 1 PersonDecrement = - 1 FileSpeed = 7.0 PersonSpeed = 2.0 FileMass = 2.0 PersonMass = 10.0 EdgeLife = 250 FileLife = 200 PersonLife = 255 HighlightPct = 5 UseOpenGL = false ShowUserName = true IsInputSorted = false

This file is useful to us in the future.
')

Script for generating video activity visualization

So as not to explain for a long time what to create and where and what directory structure should be, I’ll just say in the directory with code_swarm, in the data directory, create the my directory with the structure shown here . You need to take the following:

Data directory with all content
The tools directory with all the content
The directory generator_logs , in which we put a script to generate an activity file for code_swarm. Actually, he is there.
The file gen_log , it will be discussed later, this file generates a log for gource from a log for code_swarm
my.config , which is described above
sort_log is a script to sort the log for gource
run.bat , it will be discussed further

And we also need to create 2 png and results catalogs.

Script gen_log

Since it is interesting to see the result of working on files not only in code_swarm, but also in gource, I made a script that generates a log for it. This script is called gen_log (the most current version of the file ):

#!/bin/sh uses (){ echo -e 'using\n$0 file_codeswarm.xml' } generatelog (){ echo "genereting... " state =( "\\" "|" "/" "—" ) i = 0 if [ -f "$1" ]; then result = ${1%.*} '.log' echo -n > $result # event grep -e "event " $1 | \ # # <event /> sed -e "s/^[ ]*//;s/^<event //g;s|/>$||g" | \ while read line do date = "" # , 4 eval $line ; # date, [ -n "$date" ] && [ "`echo -n $data | wc -c`" - gt "10" ] && date = `echo $data | sed -e "s/^$.\{10\}$.*/\1/"` [ -n "$date" ] && echo "$date|$author|$action|$filename" >> $result # . echo - ne "\b${state[$i]}" (( i += 1 )) [[ $i - eq 5 ]] && i = 0 done echo - ne "\bcompleted!" else echo -e "file log code_swarm not exsits!\n$1" fi } [ -n "$1" ] && generatelog $1 || uses

This script uses the useful eval function. It executes the text as if you typed it into the command line. This approach is convenient in our case, since the input line has the following form:

date ="1142998387000" author ="ajax/ajax.js" filename ="c9/sd/sd9db4/sd/sd/sdb945/sdb89a/sd/sd7/sd/sdfbfdf.04" action ="A" comment =""

As you understand, the system will process this line and we will have 5 variables date , author , filename , action , comment (thanks to bliznezz ). These variables are uploaded to a file with the following format:

date|author|action|filename

True with gource, you can already customize this format. The file processing format is in the file {gource_home} /data/gource.style

Script run.bat

Now we will collect everything into a common file that will process the activity file generated by you using the genlogcs command, which you put in the {code_swarm_home} /data/my/data/actions.xml directory.
Here is its contents (the most current version of the file ):

call sh gen_log ./data/actions.xml call sh sort_log ./data/actions.log > data\gource.log pushd png del *.png popd pushd ..\.. call run.bat data\my\my.config popd pushd png call "..\tools\nt\mencoder" mf://*.png -mf fps=19:type=png -ovc x264 -x264encopts pass=1:bitrate=1000 -oac copy -audiofile "..\data\audio.wav" -o "..\results\result.avi" popd pushd "tools\gource" call gource.exe --hide filenames,dirnames --user-scale 2 --output-framerate 25 --stop-position 1 --highlight-all-users --seconds-per-day 1 --output-ppm-stream "..\..\results\resultgource.ppm" "..\..\data\gource.log" popd pushd "tools\nt" call ffmpeg -y -b 9000K -f image2pipe -vcodec ppm -i "..\..\results\resultgource.ppm" -fpre "..\ll.ffpreset" -i "..\..\results\resultgource.ppm" -vcodec libx264 "..\..\results\resultgource.avi" call mencoder "..\..\results\resultgource.avi" -ovc x264 -x264encopts pass=1:bitrate=10000 -ofps 19 -speed 2 -o "..\..\results\resultgource.fps" call mencoder "..\..\results\resultgource.fps" -ovc x264 -x264encopts pass=1:bitrate=10000 -oac copy -audiofile "..\..\data\audio.wav" -o "..\..\results\resultgource.avi" popd del results\resultgource.ppm del results\resultgource.fps del data\actions.log

This script performs the following actions:

sorts it with sort -k1 -t "|" , but since we are doing it under Windows, I put it in another file, and then it swears in Windows. Sorting is necessary, as gource works correctly only with sorted data.
Runs code_swarm with the config we described in my.config. As a result, code_swarm generates a large number of png files.
We convert png files into video using mencoder , while attaching a soundtrack, the duration of the video can be adjusted with the -mf fps = 19: type = png parameter, where 19 this should be the ratio between the png file count and the duration of the audio track in seconds. But I don’t like it and for this I use a value acceptable to me.
Then gource is started and unloads the result in ppm. I warn the file will be very large in several gigabytes, so the way to upload is indicated considering this.
Then we overtake this ppm file with the help of the ffmpeg utility into the avi file, but it turns out very large and long. Accelerate with the help of mencoder to the same 19 fps. And then run the mencoder again in order to attach a sound track to it. As a result, we get a file of just a few 10 megabytes.

I looked at the video conversion in the showteamwork framework. After completing all the steps in the directory that you specified for the results of the work (in my case it is {code_swarm_home} / data / my / results), two video clips with visualized activity.

Execution result, with md5 generation type

Here are my results from the jquery repository.

code_swarm

What does this mean (model "attraction"):

The object attracts subjects to itself.
Objects repel each other
Objects are attracted to each other if they use the same subject
Subjects which are often used increases in size and in brightness.

Here the brightest objects are most likely curly braces that are often found in js scripts. and as a rule the closing one happens in a line, md5sum is the same for all of them. Also md5sum can be the same for several different lines. But you can put up with it. If you need the most objective picture, use the ch_code generation type.

gource

What does this mean (model "bees and honeycombs"):

Object - bee
Subject - honeycomb
Bees build honeycombs, Green beam - add, Red - delete.

An interesting picture is reminiscent of a fractal. Each branch is as if the directory every leaf on the tree is a file.

Results

This article primarily emphasizes the fact that with the help of code_swarm and gource visualizers you can process any statistics that has variable time, the main thing is to submit these statistics to them in the right way.
All this of course more like a game. For me at least that's what it is. Let's say that these things add variety to the work of the programmer.
Make my turnip clone

$ git clone git://github.com/artzub/code_swarm-gource-my-conf.git test

and post your results to me very interesting.

Literature

Effective awk programming is a good awk material.
UNIX OS user manual
man ffmpeg
Basics of using MEncoder and other chapters from MPlayer documentation
git log - documentation for log upload
How to deal with the error: couldn't commit memory for cygwin heap?
Awk parsing xml
Shell, replacing all characters in a string with their codes

upd: Everyone is silent, but in the log generation script for code_swarm there is an error, or I do not even know a typo how to correctly say. In the regular expression, / ^ + / error since + must be escaped like this / ^ \ + / . Strange that in Windows everything fulfills launched under debian awk swore! =)

Source: https://habr.com/ru/post/114630/

All Articles