Performance consoles and shells

There is a good MSR demo from 2012 that shows the effect of response time when working on a tablet . If you do not want to watch three minutes of video, they essentially created a device that simulates arbitrary delays up to a fraction of a millisecond. A delay of 100 ms (0.1 seconds), typical of modern tablets, looks terrible. At 10 ms (0.01 seconds), the delay is noticeable, but you can already work normally, and at a delay of less than 1 ms, everything is just perfect - as if you are writing in pencil on paper. If you want to check it yourself, take any Android tablet with a stylus and compare it with the current generation iPad Pro with the Apple stylus. The Apple device has a response time much more than 10 ms, but the difference is still cardinal - it’s such that I really use the new iPad Pro to write notes and draw diagrams, while I consider Android tablets completely unacceptable as a replacement for pencil and paper.

Something similar you will see in VR helmets with different delays. 20 ms looks fine, 50 ms lags, and 150 ms is already unbearable .

Strange, but rarely do you hear complaints about keyboard or mouse input latency. It would seem that the reason may be that keyboard and mouse input is very fast - and it happens almost instantly. Often I am told that this is the way it is, but I think the situation is completely reversed. The idea that computers respond quickly to data entry — so quickly that people don’t notice the difference — the most common misconception I've heard from professional programmers.

When testers measure real latency from start to finish in games on normal computer configurations, it usually turns out that the delay is in the 100 ms range .
')
If you look at the distribution of the delay in the gaming pipeline, which Robert Menzel did , then it’s easy to understand where 100+ ms come from:

~ 2 ms (mouse)
8 ms (average waiting time for the game input to start)
16.6 (game simulation)
16.6 (rendering code)
16.6 (GPU draws the previous frame, the current frame is cached)
16.6 (GPU rendering)
8 (average mismatch time by vsync)
16.6 (frame cache in display)
16.6 (frame redrawing)
5 (switching pixels)

Please note that it is supposed to use a gaming mouse and a fairly decent LCD display; but in practice it is often possible to see a much greater delay in the mouse and switching pixels.

You can configure the system and fit into the range of 40 ms, but the vast majority of users do not. And even if they do, it is still very far from the range of 10-20 ms, in which the tablets and VR helmets start to behave “as they should”.

Measuring the delay between pressing a key and displaying it is usually done in games, because it’s more important to gamers than most other people, but I don’t think that most other applications are very different from games in response time. Although games usually do more work on each frame than "typical" applications, they are also much better optimized. Menzel allocates a game budget of 33 ms, including half for game logic and half for rendering. What is the response time in non-gaming applications? Pavel Fatin measured it in text editors and found delays from a few milliseconds to hundreds of milliseconds - and he wrote a special application for taking measurements , which we can also use to evaluate other applications. Here java.awt.Robot is used to generate keystrokes and capture the screen.

Personally, for certain reasons, I would like to look at the response time of different consoles and shells. First, I spend a significant part of my time on the console and usually edit it, so the input delays here are partially related to the console. Secondly, most often (about two orders of magnitude more often) the speed of text output is given as a benchmark of consoles, often measured by running cat on a large file. It seems to me that this is a rather useless benchmark. I can not remember when the task I performed was last limited by the speed of processing the file with the cat and issuing stdout to the console (well, unless I use eshell in emacs). And I cannot imagine a single task for which such a specialized dimension would be useful. The immediate task that could be important for me is the execution speed of the ^C interrupt command when I accidentally sent too much output to stdout . But as we will see from actual measurements, the ability of the console to absorb a large amount of input data with output to stdout very weak about the response time on ^C The scrolling speed of the whole page up and down seems to be relevant, but in real dimensions these two parameters do not correlate much (for example, emacs-eshell scrolls quickly, but extremely slowly absorbs stdout ). What else I care about is the response time, but the information that a particular console quickly processes stdout says little about its response time.

Let's look at the response time in some consoles - does any of them add a noticeable delay? If you measure the response time from pressing a key to the internal screen capture on my laptop, then for different consoles the delays are as follows:

These graphs show the distribution of delays for different consoles. On the vertical axis - the delay in milliseconds. The horizontal axis is the percentile (for example, 50 means that 50% of the data are below the 50th percentile, that is, the median average press). Measurements are taken at macOS, unless otherwise noted. The graph on the left corresponds to the unloaded machine, and on the right - under load. If you look only at the median averages, then some terminals look good - terminal.app and emacs-eshell are approximately around 5 ms on an unloaded system. This is small enough for most people not to notice the delay. But most consoles (st, alacritty, hyper and iterm2) are in a range where users can already notice an additional delay even on an unloaded system. If you look at the tail of the graph, say, the response for the 99.9th percentile, then all consoles fall into the range where the additional delay should be noticeable, according to user perception research. For comparison, the delay between the internally generated keystrokes and the memory of the GPU for some consoles exceeds the travel time of the package from Boston to Seattle and back , which is about 70 ms.

All measurements were taken when testing each console individually, on a full battery without power from the A / C cable. Measurements under load were made during Rust compilation (as before, on a full battery without power from the A / C cable, and for the sake of reproducibility, each measurement started 15 seconds after a clean Rust build after downloading all dependencies, with sufficient time between tests to avoid interference from thermoregulation between tests).

If you look at the average median latency under load, then apart from emacs-term, the results of the other consoles are not much worse than on an unloaded machine. However, at the tail of the graph, like the 90th percentile or the 99.9th, each console becomes much less responsive. Switching from macOS to Linux does not change the picture too much, although on different consoles in different ways.

These results are much better than the worst scenario (on low battery, if you wait 10 minutes from the start of compilation to aggravate interference due to thermal control, there are also delays in hundreds of milliseconds), but even so, each console should have a delay in the tail of the graph be noticeable to man. Also remember that this is only a fraction of the total response time from the beginning to the end of the input and output processing pipeline.

Why don't people complain about the delay between keyboard input and screen output as they complain about the delay when drawing with a stylus or VR helmets? My theory is that for VR and tablets people have a lot of experience in similar “applications” with much less delay. For tablets, this “application” is a pencil and paper, and for virtual reality - the ordinary world around, in which we also turn our heads, but only without a VR helmet. But the response time between keyboard input and screen output is so long in all applications that most people just take a big delay as a given.

An alternative theory may be that keyboard and mouse input is fundamentally different from tablet input, which makes the delay less noticeable. Even without my data, this theory seems implausible, because when I connect through a remote terminal with dozens of extra milliseconds, I feel a noticeable lag when I press the keys. And it is known that when adding additional delay in A / B testing, people may notice, and indeed notice, a delay in the range that we discussed earlier .

So, if we want to compare the most popular benchmark (stdout performance) with a delay in different consoles, then let's measure how quickly different consoles process input data for output to stdout:

Console	stdout (MB / s)	idle50 (ms)	load50 (ms)	idle99.9 (ms)	load99.9 (ms)	mem (MB)	^ C
alacritty	39	31	28	36	56	18	ok
terminal.app	20	6	13	25	thirty	45	ok
st	14	25	27	63	111	2	ok
alacritty tmux	14
terminal.app tmux	13
iterm2	eleven	44	45	60	81	24	ok
hyper	eleven	32	31	49	53	178	fail
emacs-eshell	0.05	five	13	17	32	thirty	fail
emacs-term	0.03	13	thirty	28	49	thirty	ok

The relationship between stdout performance and how fast the console looks like is not obvious. In this test, terminal.app outwardly looked very bad. When scrolling, the text moved in spurts, as if the screen was rarely updated. Problems were also observed in hyper and emacs-term. Emacs-term did not have time to issue at all - after the end of the test it took him a few seconds to update to the end (the status bar, which shows the number of lines left, seemed to be relevant, so the number stopped increasing until the end of the test). Hyper has lagged behind even more and, having blinked a couple of times, practically did not update the screen. The Hyper Helper process was artificially supported with a 100% CPU load for about two minutes, and the console was completely unresponsive to actions all the time.

Alacritty was tested with the tmux manager, since this console does not support scrolling back up, and the documentation states that tmux should be used for this. Just for comparison, terminal.app was also tested with tmux. Most tmux consoles do not seem to reduce stdout speed, but alacritty and terminal.app were fast enough so that in reality their performance is still limited by tmux speed.

Emacs-eshell is not technically a console, but I also tested eshell, because in some cases this program can be used as a console replacement. In fact, Emacs, both with eshell and with term, turned out to be so slow that it doesn't matter with what speed it actually produces stdout . In the past, when using eshell or term, I sometimes had to wait for a few thousand lines of text to scroll if I ran a command with detailed logging in stdout or stderr . Since this happens quite rarely, for me it is not too big a problem until the delay reaches 0.5 or 1 second, although everything works fine on any other console.

Conversely, I type characters fast enough to notice the delay of the long tail. For example, if I type 120 words per minute, that is, 10 characters per second, then the tail from the 99.9th percentile (1 out of 1000) will appear every 100 seconds!

In any case, instead of a “benchmark,” cat cares more about whether I can interrupt the process in ^C if I accidentally run a command with millions of lines of display on the screen instead of thousands of lines. This test passes almost every console, except for hyper and emacs-eshell — they both hang for at least ten minutes (after ten minutes I kill tasks and no longer wait for the process to finish).

The table also includes the memory usage when loading the program, since I saw that this parameter is also often used by people when testing consoles. Although it seems to me a little strange that the console can take 40 MB in memory when booting, but even on a three-year old laptop I have 16 GB of RAM installed, so optimizing this 40 MB to 2 MB doesn’t particularly affect the work with the program. Damn, even on the "Chromebook" for $ 300, which we recently bought, installed 16 GB of RAM.

Conclusion

Most consoles have a fairly long response time, which can be optimized to improve the user experience of the program if the developers concentrated on this parameter, rather than adding new features or other aspects of performance. But when I was looking for benchmarks of consoles, I found that if the authors of the programs measured the performance of something, it was either the output speed in stdout , or memory usage at boot. This is unfortunate, since most of the "slow" consoles already give out stdout several orders of magnitude faster than people are able to understand, so further optimization of the speed of stdout relatively weak effect on the actual usability for most users. The same can be said about the reduction in memory usage when booting, if the console uses 0.01% of memory on my old laptop or on a modern cheap model.

If you are working in the console, then it may be more important for you to more optimize response time and interactivity (for example, the reaction to ^C ) and relatively less optimization of bandwidth and memory usage when loading.

Update. In response to this article, the alacritty author explained where the alacrity delay comes from, and described how it can be reduced .

Appendix: negative results

Tmux and delay. I experienced a tmux manager with different consoles and found that the difference is within the measurement error.

Shell and delay. I checked different shells, but even in the fastest console the difference between them was within the measurement error. In my experimental setup, it was a bit difficult to check Powershell, because it incorrectly processes colors (the first typed character is typed in the color set in the console, but the other characters are yellow regardless of the settings, this bug seems to be closed ), which knocks the image recognition setting, which I used. Powershell also does not always place the cursor in the right place - it randomly jumps along the line, which also knocks the image recognition setting. But despite these problems, Powershell performance is quite comparable with other shells.

Wrappers and bandwidth stdout. As in the previous cases, the difference between different shells is within the measurement error.

Single and multiline text and bandwidth. Although some text editors work with extremely long lines, the bandwidth does not practically change, either I pushed a file with one such line into the console, or it was split into lines of 80 characters each.

Block queue / data skip error. I ran these tests at an input speed of 10.3 characters per second. But it turned out that the input speed does not have a special effect on the delay. Theoretically, the console can be overflowed, and hyper first began to fail at very high input speeds, but these speeds are much faster than text input in people I know.

Appendix: experimental setup

All tests were conducted on a mid-2014 dual-core Macbook Pro 13 ”2.6 GHz. This machine has 16 GB of RAM and a screen resolution of 2560 × 1600 characters. OS X version 10.12.5. Some tests were conducted in Linux (Lubuntu 16.04) to compare macOS and Linux. Each delay measurement was limited to 10 thousand keystrokes.

The measurements were carried out by pressing a button . with the output in the default encoding base32 , that is, simple ASCII text. George King noted that different types of text can affect the speed of issuing:

I noticed that Terminal.app dramatically slows down when issuing non-Latin encodings. I think there may be three reasons for this: the need to load different pages of fonts, the need to parse code points outside the Basic Multilingual Plane (BMP) and characters in multibyte Unicode encoding.

Probably, the first one comes down to a very complex combination of deferred loading of glyphs of fonts, calculating backup fonts and caching glyph pages or another way this is done.

The second is a bit speculative, but I would suggest that Terminal.app uses Cocoa NSString based on UTF16, which almost certainly leads to a slowdown if the code points are higher than BMP due to surrogate pairs.

Consoles were deployed in full screen before running the tests. This affects the result, and resizing the console window can significantly change performance (for example, you can make hyper much slower than iterm2 by changing the window size for all other constant factors). st on macOS started as client X under XQuartz. To check the version that XQuartz is inherently slow, I tried runes , another Linux native console that uses XQuartz. It turned out that runes have a much smaller delay in the tail than st and iterm2.

The delay tests on the “unloaded” system were performed immediately after the system was rebooted. All terminals were open, but the text was entered only in one of them.

Tests "under load" were conducted during the background compilation of Rust, 15 seconds after the start of compilation.

Console bandwidth tests were performed by creating a large file with pseudo-random text:

timeout 64 sh -c 'cat /dev/urandom | base32 > junk.txt'

with the subsequent launch

timeout 8 sh -c 'cat junk.txt | tee junk.term_name'

Terminator and urxvt were not tested, since installing them on macOS is a non-trivial procedure and I did not want to bother trying to get them to work. Terminator is easy to build from source, but it hangs up on boot and does not show the command line. Urxvt is installed via brew, but one of its dependencies (which is also installed via brew) was the wrong version, which is why the console did not load.

Source: https://habr.com/ru/post/346054/

All Articles

Performance consoles and shells

Conclusion

Appendix: negative results

Appendix: experimental setup

More articles: