πŸ“œ ⬆️ ⬇️

Vim croquet

The translator from me is absolutely nothing, but I just could not get past this article, because it radiates waves of coolness, and the concentration of zen in it goes off scale. Therefore welcome.

Introduction


I recently discovered an interesting game called VimGolf . The goal of this game is to convert a piece of text from one form to another with the smallest possible number of keystrokes. While I was playing on this site with different puzzles, I was curious - what text editing habits do I have? I wanted to better understand how to manipulate text in Vim and see if I can find inefficient points in my workflow. I spend a huge amount of time in my text editor, so eliminating even minor rough edges can lead to a significant increase in performance. In this post I will talk about my analysis and how I reduced the number of keystrokes when using Vim. I called this game Vim-croquet.

Data collection


I started my analysis with data collection. Text editing on my computer always happens with the help of Vim, so for 45 days I logged any keystroke in it using the scriptout flag. For convenience, I made an alias to record clicks to the log:

alias vim='vim -w ~/.vimlog "$@"' 

After that it was necessary to parse the data, but it was not so easy. Vim is a modal editor in which one command can have several different values ​​in different modes. In addition, the commands depend on the context, where their behavior may differ depending on where they are executed inside the vim buffer. For example, the cib command in normal mode will put the user into edit mode if the command is executed inside brackets, but will leave the user in normal mode if it is executed outside brackets. If cib is executed in edit mode, then it will have a completely different behavior β€” it will write the β€œcib” characters to the current buffer.
')
I’ve reviewed several candidates for parsing vim commands, including industrial libraries such as antler and parsec , as well as the vimprint project specializing in vim. After some thought, I decided to write my own tool, because Spending a lot of time on learning fairly complex parsers seemed unreasonable for this task.

I wrote a damp lexer on haskell to split the keystrokes I collected into individual vim commands. My lexer uses monoids to extract normal mode commands from the log for further analysis. Here is the source of the lexer:

 import qualified Data.ByteString.Lazy.Char8 as LC import qualified Data.List as DL import qualified Data.List.Split as LS import Data.Monoid import System.IO main = hSetEncoding stdout utf8 >> LC.getContents >>= mapM_ putStrLn . process process = affixStrip . startsWith . splitOnMode . modeSub . capStrings . split mark . preprocess subs = appEndo . mconcat . map (Endo . sub) sub (s,r) lst@(x:xs) | s `DL.isPrefixOf` lst = sub' | otherwise = x:sub (s,r) xs where sub' = r ++ sub (s,r) (drop (length s) lst) sub (_,_) [] = [] preprocess = subs meta . DL.intercalate " " . DL.words . DL.unwords . DL.lines . LC.unpack splitOnMode = DL.concat $ map (\el -> split mode el) startsWith = filter (\el -> mark `DL.isPrefixOf` el && el /= mark) modeSub = map (subs mtsl) split sr = filter (/= "") $ s `LS.splitOn` r affixStrip = clean . concat . map (\el -> split mark el) capStrings = map (\el -> mark ++ el ++ mark) clean = filter (not . DL.isInfixOf "[M") (mark, mode, n) = ("-(*)-","-(!)-", "") meta = [("\"",n),("\\",n),("\195\130\194\128\195\131\194\189`",n), ("\194\128\195\189`",n),("\194\128kb\ESC",n), ("\194\128kb",n),("[>0;95;c",n), ("[>0;95;0c",n), ("\ESC",mark),("\ETX",mark),("\r",mark)] mtsl = [(":",mode),("A",mode), ("a",mode), ("I",mode), ("i",mode), ("O",mode),("o",mode),("v", mode),("/",mode),("\ENQ","βŒƒe"), ("\DLE","βŒƒp"),("\NAK","βŒƒu"),("\EOT","βŒƒd"),("\ACK","βŒƒf"), ("\STX","βŒƒf"),("\EM","βŒƒy"),("\SI","βŒƒo"),("\SYN","βŒƒv"), ("\DC2","βŒƒr")] 

Here is an example of data before and after processing:

 cut -c 1-42 ~/.vimlog | tee >(cat -v;echo) | ./lexer `Mihere's some text^Cyyp$bimore ^C0~A.^C:w^M:q `M yyp$b 0~ 

Lexer reads from standard input stream and sends processed commands to standard output. In the example above, the raw data is located in the second line, and the processing result is in the following. Each line represents a group of commands normal mode, executed in the appropriate sequence. Lexer correctly determined that I started in normal mode by going to some buffer using the `M tag, then typing here's some text in edit mode, then copying / pasting the line and going to the beginning of the last word in the line using the command yyp $ b . Then I entered additional text and eventually moved to the beginning of the line, replacing the first character with an uppercase command 0 ~ .

Key usage map


After processing the logged data, I forked the wonderful heatmap-keyboard project by Patrick Wied , and added my own custom layer to it to read the lexer output. This project did not define most of the meta characters, for example, ESC, Ctrl and Cmd, so I needed to write a data loader in JavaScript and make some other modifications. I translated the meta characters used in vim to Unicode and projected them onto the keyboard. That's what I got on the number of commands, close to 500,000 (color intensity indicates the frequency of use of keys).



On the resulting map, you can see that the Ctrl key is most often used - I use it for numerous commands for moving to vim. For example, ^ p for ControlP , or loop through open buffers through ^ j ^ k .

Another feature that caught my eye when analyzing a map is the frequent use of ^ E ^ Y. Every day I use these commands to navigate up / down through the code, although vertical movement using them is inefficient. Each time one of these commands is executed, the cursor moves only a few lines at a time. It would be more efficient to use the ^ U ^ D commands, since they move the cursor half the screen.

Command frequency


The key usage map gives a good idea of ​​how the individual keys are used, but I wanted to know more about how I use different key sequences. I sorted the lines in the lexer output by frequency to see the most used commands of the normal mode using the one-liner:

 $ sort normal_cmds.txt | uniq -c | sort -nr | head -10 | \ awk '{print NR,$0}' | column -t 1 2542 j 2 2188 k 3 1927 jj 4 1610 p 5 1602 βŒƒj 6 1118 Y 7 987 βŒƒe 8 977 zR 9 812 P 10 799 βŒƒy 

It was amazing for me to see zR in eighth place. After thinking about this fact, I realized a serious inefficiency in my approach to text editing. The fact is that in my .vimrc it is indicated to automatically collapse the blocks of text. But the problem with this configuration was that I almost immediately unfolded the entire text, so there was no point. Therefore, I simply deleted this setting from the config to eliminate the need for frequent use of zR .

Difficulty teams


Another optimization that I wanted to look at is the complexity of the normal mode commands. I was curious to see if I could find commands that I use on a daily basis, but which require an excessively large number of keystrokes. Such commands could be replaced with shortcuts, which would speed up their execution. As a measure of the complexity of the commands, I used entropy , which I measured with the following short Python script:

 #!/usr/bin/env python import sys from codecs import getreader, getwriter from collections import Counter from operator import itemgetter from math import log, log1p sys.stdin = getreader('utf-8')(sys.stdin) sys.stdout = getwriter('utf-8')(sys.stdout) def H(vec, correct=True): """Calculate the Shannon Entropy of a vector """ n = float(len(vec)) c = Counter(vec) h = sum(((-freq / n) * log(freq / n, 2)) for freq in c.values()) # impose a penality to correct for size if all([correct is True, n > 0]): h = h / log1p(n) return h def main(): k = 1 lines = (_.strip() for _ in sys.stdin) hs = ((st, H(list(st))) for st in lines) srt_hs = sorted(hs, key=itemgetter(1), reverse=True) for n, i in enumerate(srt_hs[:k], 1): fmt_st = u'{r}\t{s}\t{h:.4f}'.format(r=n, s=i[0], h=i[1]) print fmt_st if __name__ == '__main__': main() 

The script reads from the standard input stream and issues commands with the highest entropy. I used lexer output as data for entropy calculation:

 $ sort normal_cmds.txt | uniq -c | sort -nr | sed "s/^[ \t]*//" | \ awk 'BEGIN{OFS="\t";}{if ($1>100) print $1,$2}' | \ cut -f2 | ./entropy.py 1 ggvG$"zy 1.2516 

I select teams that have been executed more than 100 times, and then I find among them the team with the greatest entropy. As a result of the analysis, the ggvG $ '' zy command was allocated, which was performed 246 times in 45 days. The command is performed with 11 rather clumsy keystrokes and copies the entire current buffer to the z register. I usually use this command to move the entire contents of one buffer to another. Of course, I added a new shortcut to my config.

 nnoremap <leader>ya ggvG$"zy 

findings


My vim-croquet match defined 3 optimizations to reduce the number of keystrokes in vim:

These 3 simple changes saved me from thousands of unnecessary keystrokes every month.

The parts of the code I presented above are slightly isolated and can be difficult to use. To make the steps of my analysis clearer, I quote a Makefile that shows how the code contained in my article fits together.

 SHELL := /bin/bash LOG := ~/.vimlog CMDS := normal_cmds.txt FRQS := frequencies.txt ENTS := entropy.txt LEXER_SRC := lexer.hs LEXER_OBJS := lexer.{o,hi} LEXER_BIN := lexer H := entropy.py UTF := iconv -f iso-8859-1 -t utf-8 .PRECIOUS: $(LOG) .PHONY: all entropy clean distclean all: $(LEXER_BIN) $(CMDS) $(FRQS) entropy $(LEXER_BIN): $(LEXER_SRC) ghc --make $^ $(CMDS): $(LEXER_BIN) cat $(LOG) | $(UTF) | ./$^ > $@ $(FRQS): $(H) $(LOG) $(CMDS) sort $(CMDS) | uniq -c | sort -nr | sed "s/^[ \t]*//" | \ awk 'BEGIN{OFS="\t";}{if ($$1>100) print NR,$$1,$$2}' > $@ entropy: $(H) $(FRQS) cut -f3 $(FRQS) | ./$(H) clean: @- $(RM) $(LEXER_OBJS) $(LEXER_BIN) $(CMDS) $(FRQS) $(ENTS) distclean: clean 

Source: https://habr.com/ru/post/211108/


All Articles