Computer response time: 1977−2017

I have an oppressive feeling that modern computers feel slower than the computers I used in childhood. I do not trust this kind of sensation, because human perception has proven to be unreliable in empirical research, so I took a high-speed camera and measured the response time of devices that came to me in the last few months. Here are the results:

Computer	Response (ms)	Year	Clock frequency	Qty transistors
Apple 2e	thirty	1983	1 MHz	3,5 thousand
TI 99 / 4A	40	1981	3 MHz	8 thousand
Haswell-E 165 Hz	50	2014	3.5 GHz	2 billion
Commodore Pet 4016	60	1977	1 MHz	3,5 thousand
SGI Indy	60	1993	0.1 GHz	1.2 million
Haswell-E 120 Hz	60	2014	3.5 GHz	2 billion
ThinkPad 13 ChromeOS	70	2017	2.3 GHz	1 billion
iMac G4 OS 9	70	2002	0.8 GHz	11 million
Haswell-E 60 Hz	80	2014	3.5 GHz	2 billion
Mac color classic	90	1993	16 MHz	273 thousand
PowerSpec G405 Linux 60Hz	90	2017	4.2 GHz	2 billion
MacBook Pro 2014	100	2014	2.6 GHz	700 million
ThinkPad 13 Linux chroot	100	2017	2.3 GHz	1 billion
Lenovo X1 Carbon 4G Linux	110	2016	2.6 GHz	1 billion
iMac G4 OS X	120	2002	0.8 GHz	11 million
Haswell-E 24 Hz	140	2014	3.5 GHz	2 billion
Lenovo X1 Carbon 4G Win	150	2016	2.6 GHz	1 billion
Next Cube	150	1988	25 MHz	1.2 million
PowerSpec G405 Linux	170	2017	4.2 GHz	2 billion
Package around the world	190
PowerSpec G405 Win	200	2017	4.2 GHz	2 billion
Symbolics 3620	300	1986	5 MHz	390 thousand

These are the results of the measurement of the response between keystrokes and the display of the symbol in the console (see the appendix for more information). Results are sorted from fastest to slowest. When testing multiple operating systems on a single computer, the operating system is bold . When testing different update frequencies on the same computer, they are in italics .

The last two columns show the clock frequency and the number of transistors on the processor.
')
For reference, the transmission time of the packet across the globe over fiber from New York to New York via Tokyo and London is given.

If you look at the results as a whole, the fastest are ancient machines. Newer computers are found in all parts of the table. Intricate modern gaming configurations with an unusually high refresh rate can almost compete with the machines of the late 70s and early 80s, but "ordinary" modern computers are not able to compete with computers 30-40 years old.

You can still look at mobile devices. In this case, measure the scrolling response in the browser:

Device	Response (ms)	Year
iPad Pro 10.5 "Pencil	thirty	2017
iPad Pro 10.5 "	70	2017
iPhone 4S	70	2011
iPhone 6S	70	2015
iPhone 3GS	70	2009
iPhone X	80	2017
iPhone 7	80	2017
iPhone 6	80	2014
Gameboy color	80	1989
iPhone 5	90	2012
Blackberry q10	100	2013
Huawei Honor 8	110	2016
Google Pixel 2 XL	110	2017
Galaxy S7	120	2016
Galaxy Note 3	120	2016
Nexus 5X	120	2015
Oneplus 3T	130	2016
Blackberry key one	130	2017
Moto E (2G)	140	2015
Moto G4 Play	140	2017
Moto G4 Plus	140	2016
Google pixel	140	2016
Samsung Galaxy Avant	150	2014
Asus zenfone3 max	150	2016
Sony Xperia Z5 Compact	150	2015
HTC One M4	160	2013
Galaxy S4 Mini	170	2013
LG K4	180	2016
Package	190
HTC Rezound	240	2011
Palm Pilot 1000	490	1996
Kindle paperwhite 3	630	2015
Kindle 4	860	2011

As before, the results are sorted by response time from fastest to slowest.

If we exclude the Gameboy Color, which is a different class of devices, then all the fastest devices are Apple phones or tablets. Next in response time should be the BlackBerry Q10. We do not have enough data to explain such an unusually high speed of the BlackBerry Q10 for non-Apple devices, but there is a plausible conjecture that this is due to the presence of physical buttons - it is easier for them to realize a faster response than for a touchscreen and a virtual keyboard. The other two devices with physical buttons are Gameboy Color and Kindle 4.

After the "iPhones" and devices with buttons in the table are a variety of Android devices of different years. At the very bottom of the ancient Palm Pilot 1000 and a couple of e-books. The inhibition of Palm is explained by the touchscreen and display from the era when the touchscreen technology provided much lower speed. Kindle eBooks work on e-ink e-ink, which is much slower than displays in modern phones, so their backlog is not surprising.

Why is Apple 2e so fast?

The Apple 2 system greatly benefits modern computers (except the iPad Pro) in input and output speeds, since Apple 2 does not deal with context switching, buffers when switching different processes, etc.

If you look at modern keyboards, data entry is usually scanned with a frequency from 100 Hz to 200 Hz (for example, Ergodox claims a frequency of 167 Hz ). For comparison, Apple 2e effectively scans input at a frequency of 556 Hz. See appendix for more information.

If you look at the other end of the I / O pipeline, on the display, here we can also find the source of the delay. My display advertises a delay of 1 ms, but if you measure the real time from the beginning of the character output to the screen until it is fully displayed, then 10 ms can easily be there. This effect manifests itself even on some displays with a high refresh rate, which are sold through advertising its supposedly fast response.

At 144 Hz, each frame takes 7 ms. Changing the picture on the screen causes an additional delay from 0 ms to 7 ms due to waiting for the border of the next frame before drawing it (on average, we expect half of the maximum delay, that is, 3.5 ms). In addition, even though my home display claims a switching speed of 1 ms, it actually takes 10 ms to completely change the color from the moment this process starts. If we add up the delay from waiting for the next frame with the delay of the real color change, then we get the expected delay of 7/2 + 10 = 13.5 ms.

On older Apple 2e CRT monitors, we expect a delay of half the refresh rate of 60 Hz (16.7 ms / 2), that is, 8.3 ms. Today, this result is hard to beat: the best “gaming monitor” can reduce latency to about such values, but in terms of market share, such displays are installed on a very small number of systems, and even monitors that are advertised as fast are not always as such.

IOS rendering pipeline

If you look at all the processes between input and output, then to list the differences Apple 2e with modern computers will have to write a whole book. To get a picture of what is happening on modern machines, here is a high-level sketch of what is happening on iOS, from iOS / UIKit engineer Andy Matuschak , although he calls this description “his outdated memories of outdated information”:

Iron has a natural scanning frequency (for example, 120 Hz on the latest touch panels), so this adds a delay of up to 8 ms.
Events enter the kernel through the firmware; This is a relatively fast process, but system scheduling can add a couple of milliseconds here.
The kernel forwards these events to preferred subscribers (in this case, backboardd ) through the Mach port; Here, additional shedding losses are likely.
backboardd determines which processes should deliver these events; it requires blocking in relation to the Window Server, which shares this information (again, access to the kernel, additional delay for scheduling).
backboardd sends the event to the desired process; additional delay on sheduling before processing.
Events are removed from the queue in the main thread; and there may be something else happening (for example, as a result of network activity or timer events), which may add a delay, depending on the activity.
UIKit adds up to 1-2 ms per event handling, depending on the processor speed (CPU bound).
The application decides what to do with the incoming event; applications are poorly written, so it usually takes many milliseconds, then the results are assembled into an update (data-driven update) and sent to the rendering server via IPC.
- If, as a result of event processing, the application requires a new video buffer in shared memory, which happens in any non-trivial situation, then another IPC data exchange with the rendering server takes place; This is an additional delay for scheduling.
- (Trivial things are those that the rendering server can handle on their own, such as changing affine transformations or changing the color of layers; non-trivial include any text operations, most of the rasterization operations and vector operations).
- Updates of this kind often create a triple buffer: a GPU can use one buffer for the current rendering; render server - another buffer for the queue for the next frame; and the third to draw. Here are additional locks (cross-processing); additional data exchanges with the kernel.
The rendering server adds the received updates to the render tree (a few milliseconds)
Every N , the rendering tree is flushed to the GPU, from which you want to fill the video buffer.
- In reality, however, triple buffering of the screen buffer often occurs, for the reasons described above: drawing from the GPU in one buffer, and the other used for reading in the preparation of the next frame.
Every N this video buffer is replaced with another video buffer, and the display reads directly from this memory.
- (These N not necessarily ideally combined with N in the previous step)

Andy says that “the amount of work here is usually quite small. A couple of milliseconds of CPU time. The delay after pressing the keys occurs for the following reasons: "

periodic scans (input device, rendering server, display) are not perfectly combined with each other
multiple transfers across process boundaries, each time with a probability of delay due to the processing of some extraneous event in the queue
multiple locks, especially at process boundaries, require access to the kernel

For comparison, on Apple 2e there are practically no transfers, locks and process boundaries. A very simple code that writes the result to the display memory works, and it is automatically displayed the next time the screen is updated.

Refresh rate and response time

One interesting thing about testing response time on computers is the effect of the screen refresh rate. In the transition from 24 Hz to 165 Hz, we accelerate by 90 ms. At 24 Hz, the display of each frame takes 41.67 ms, and at 165 Hz - 6.061 ms. As we saw above, without buffering, the average delay when updating a frame would be 20.8 ms in the first case and 3.03 ms in the second (since we expect a frame to arrive at a random point, and the wait time is randomly distributed between 0 ms and the maximum wait time ), that is, the difference is about 18 ms. But in reality, the difference is 90 ms, which implies a delay of (90 − 18) / (41,67 − 6,061) = 2 frames from the buffer.

If you plot response results with different screen refresh rates on the same machine (we don’t publish it here), it roughly coincides with the “best fit” curve, assuming that when running PowerShell on this machine, the delay is 2.5 frames, regardless of update frequency. This allows you to assess what the delay is, if you put a display with an infinite refresh rate on a gaming computer with a short response time - it is expected to be around 140 − 2,5 * 41,67 = 36 , it is almost as fast as on computers 70 and 80s.

Complexity

Almost every computer or mobile device today is slower than typical computers of the 70s and 80s. Gaming desktops with low response time and iPad Pro can be compared with fast machines 30-40 years old, but most commercial models are not even close.

If you try to determine the main reason for the increase in response time, then we can say that this is “complexity”. Of course, everyone knows that complexity is bad. If during the last decade you have attended at least one unscientific or unincorporated technology conference, you have heard with a high probability at least one report that complexity is the main cause of all ills and how you should strive to reduce complexity.

Unfortunately, it is much more difficult to do this in reality than to declare from the stage. Difficulty often gives us certain benefits, directly or indirectly. When we compare data entry with a modern keyboard and the Apple 2 keyboard, we see unnecessary delays in processing data from the keyboard with a powerful and resource-intensive processor, compared to specialized logic circuits for the keyboard, which are simpler and cheaper. But the use of the processor makes it easy to customize the keyboard, and also transfers the problem of "programming" the keyboard from the hardware to software, which reduces the cost of production of keyboards. A more expensive chip increases the cost of production, but taking into account all the costs of the design of these semi-handicraft small-scale keyboards, it looks as a whole that the savings from simple programming outweigh the additional costs.

We see this kind of compromise at each stage of the pipeline. The most striking comparison of the OS on a modern desktop with a cycle on Apple 2. Modern OS allows programmers to write standard code that will work simultaneously with other programs on the same machine, with a fairly reasonable overall performance, but we pay a huge price for this in increasing complexity, and the processes involved in multitasking easily lead to a significant increase in response time.

Much of the complexity can be called random complexity , but it is also present here mainly because of its convenience. At each level, from the hardware architecture and syscall interface to the I / O framework, we are increasing the complexity, the lion’s share of which can be eliminated if we sit down and rewrite all the systems and their interfaces today. But it is too inconvenient to reinvent the Universe to reduce complexity, and economic growth makes a profit, so we live with what we have.

For these and other reasons, in practice, the problem of reduced productivity, which has arisen due to "excessive" complexity, is often solved by an even greater complication of the system. In particular, those achievements that allowed us to get closer to the fastest cars 30-40 years old, were not obtained thanks to following the admonitions about reducing complexity, namely from the additional complication of systems.

iPad Pro is a feat of modern engineering, where developers have increased the update rate on devices and input and output, as well as optimized software pipeline to eliminate unnecessary buffering. Design and production of high-refresh screens that reduce response time is a nontrivially more difficult task in many ways, which were not necessary in the time of the archaic standard 60 Hz displays.

In reality, this is common when trying to reduce latency. The most popular trick for this is to add a cache, but adding a cache to the system increases its complexity. For systems that generate new data and do not allow the use of cache, even more complex solutions are proposed. As an example, large-scale RoCE systems can be cited. They can reduce the delay in accessing remote data from milliseconds to microseconds, which opens the door to a new class of applications. But this is done by increasing the complexity. The development and competent optimization of the first large-scale RoCE systems took dozens of man-years and required tremendous operational support efforts.

Conclusion

It looks a bit paradoxical that a modern gaming machine that works 4000 times faster than Apple 2, where there are 500,000 times more transistors on a processor (and 2,000,000 times more transistors on a GPU) hardly produces the same response speed as Apple 2 - and then only in neatly written applications and only on the monitor with a triple update frequency compared to Apple 2. It is even more absurd that in the PowerSpec G405 with the default configuration - the fastest computer for single-threaded calculations until October 2017 - the delay from pressing keys to display n and the screen (about one meter, maybe three meters of real cable) is longer than the transmission time of a package around the globe (26,000 km from New York via London to Tokyo and back to New York).

On the other hand, we are clearly emerging from the dark times of huge delays - and today you can already assemble a computer or buy a tablet with a response time in the same range as standard machines in the 70s and 80s. This is a bit like the dark times of screen resolutions and pixel sizes, when the CRT monitors of the 90s, until relatively recently, had better performance than standard LCD displays of desktop computers. Nowadays, 4k displays have finally become common, and 8k displays have dropped to normal prices - this is no match for commercial CRT monitors. I don’t know if the same progress will happen with the response time, but let's hope for it.

Appendix: Why measure response time?

The response is very important! In very simple tasks, people are able to distinguish response up to 2 ms or less . Moreover, the increase in latency is not only noticeable to users, but also causes less precise execution of simple tasks . If you want a visual demonstration of what the delay looks like, and you don’t have an old computer at hand, here’s the MSR demo on touchscreen delay .

Performance also matters, but it is well understood and often measured. If you go to almost any site with benchmarks or open a regular review, you will see a huge number of performance measurements, so additional measurements here have no special value.

Application: Apple 2 Keyboard

Instead of a programmable microcontroller, the Apple 2e uses a much simpler dedicated chip designed to read the keyboard, the AY 3600, to read keystrokes. The AY 3600 documentation specifies the scan time (90 * 1/f) and the time for the key to be pressed again is strobe_delay . These parameters are set by several capacitors and a resistor of 47 pF, 100K ohms and 0.022 μF. If we insert these parameters into the formula from the AY3600 documentation, we get f = 50 , which gives a scan delay of 1.8 ms and a delay of a repeated key press of 6.8 ms (capacitors can degrade with time, so on our old Apple 2e real delays may be less), which gives a delay of 8.6 ms for the internal keyboard logic.

If you compare with the keyboard at 167 Hz with two additional passes to determine repeated presses , the equivalent parameter goes 3 * 6 = 18 . With a scanning frequency of 100 Hz, it turns out 3 * 10 = 30 . Scanning the keyboard in 18-30 ms with an additional delay for repeated pressing of the keys corresponds to preliminary real measurements of the response time of the keyboards .

For reference, the Ergodox keyboard has a 16 MHz microcontroller with about 80 thousand transistors, and an Apple 2e computer has a 1 MHz central processor with 3,500 transistors.

Appendix: experimental setup

Most measurements were made on a 240 FPS camera (resolution of 4.167 ms). Devices with response times shorter than 40 ms were re-measured by a 1000 FPS camera (1 ms resolution). The results in the tables are the result of averaging over the result of several measurements and are rounded to ten to avoid the impression of false accuracy. For desktops, the response time corresponds to the time from the start of the key movement until the end of the screen update. Please note that this is different from most key-to-screen-update measurements that can be found on the Internet — these benchmarks typically use settings that effectively eliminate any keyboard delay. As a test from start to finish, this is realistic only if you have a telepathic connection with a computer (although in such measurements there is also a benefit - if you, as a programmer, need a reproducible benchmark, then it’s good to get rid of factors outside the test your control, but not for end users).

People often advocate the measurement of one of the parameters: {pressing a key to the end, triggering a switch}. In addition to convenience, there are no more specific reasons to measure any of this, but people often give out these results for the “real” work of the keyboard. But they do not depend on the real time response of the switch. The time between pressing and activation, as well as the time between the sensation of response and activation, is arbitrary and available for customization. When the tester declares that this is a “real” user experience of keyboard operation, this generally means that the user does not understand the principles of keyboard operation. Although this is possible, I see no reason to translate one particular misconception about the work of keyboards into a specific metric, where people violently advocate various misconceptions. For more information on misconceptions regarding keyboards, see this article with measurements of response time .

Another important difference is that the measurements were made with the settings as close as possible to the OS default settings, since approximately 0% of users change the display settings to reduce buffering, disable the linker, etc. Waiting for the end of the screen update is also different from what is measured in most benchmarks - most believe that the update is “complete” when any movement on the screen is recorded. Waiting for completion of the update is similar to the “visual completion” time in the WebPagetest.

The results on computers are obtained in the console “by default” for this system (for example, PowerShell on Windows, LXTerminal on Lubuntu), which can easily mean a difference of 20-30 ms between fast and slow consoles. Between measurements in the console and measuring the total time from beginning to end, the results in this article should be slower than in other articles on this topic (where time is often measured before the changes on the screen start in games).

The basic result of the PowerSpec G405 is obtained on the integrated graphics (the car is sold without a video card), and the result from 60 Hz is obtained with a cheap video card.

Results for mobile devices are obtained in the default browser after downloading the site https://danluu.com and measuring the delay from moving your finger to the first movement of the picture on the screen, which signals the start of scrolling. In cases where such a test did not make sense (Kindle, Gameboy Color, etc.), other meaningful actions were performed on this platform (turning the page on the Kindle, pressing the joystick in the game for Gameboy Color, etc.). Unlike measurements on a desktop or laptop, these measurements were before the first change on the screen to avoid many scrolling frames. For ease of measurement, the finger initially touched the screen, and the timer turned on when it started moving (to avoid problems with determining the time when the finger touched the screen).

In the case of "equal" results, the sorting order in the table was determined by the non-rounded delay time, but this should not be considered an important factor. A difference of 10 ms should not be considered significant either.

The Haswell-E car was tested and with the G-Sync option turned on - there was no noticeable difference. The release year for this computer is set in some sense arbitrary, since the processor was released in 2014, and the display is newer (I think, until 2015, there was no 165 Hz display).

The number of transistors on some modern machines is given approximately, because the exact numbers were not disclosed. Feel free to let me know if you find a more accurate estimate!

All results under Linux are made on the kernel before KPTI . Perhaps KPTI will affect the delay.

Work is not finished yet. I'm going to collect benchmarks from more old computers on my next visit to Seattle. If you know about older computers that can be tested in the New York area (with original displays or something like that), let me know! If you have a device that you are willing to donate for tests, you can send to my address:

Dan luu
Recurse center
455 Broadway, 2nd Floor
New York, NY 10013

Source: https://habr.com/ru/post/345584/

All Articles