Features of the cache in relation to realtime on x86

In continuation of posts about the use of iron with x86 architecture in real-time systems. There I briefly described how x86 satisfy realtime requirements, and what prevents it.

A small lyrical digression. Real-time systems are one of the least known engines of computer progress. For example, the first laptop was created thanks to them. Now for some reason it is believed that the first serial laptop was Osborn. In fact, the device in the picture above was created in Siemens as a means of controlling and programming industrial automation two years before Osborn. Portable computers of this family (Siemens Simatic) are also available now, although, of course, the hardware has changed many times.

But let's get down to business. In this topic I will dwell in more detail on one of the factors that interferes with the predictability of the execution time of realtime code. Under the cut will not be long, but a tricky text.

Efficient use of cache is useful for most workloads, not just for realtime. A very vital example - on one core the code works, which wakes up once a millisecond, interrogates the sensors, executes the PLC code, controls some piece of hardware. At the same time, the GUI code works on the other core, which displays all this on the monitor and allows the operator to sometimes interfere. A modern GUI is quite “fat”, and gladly uses the entire available cache, which, by the way, is common with the first core. So when the realtime code wakes up, it will not find any of its data in the shared cache - you have to drag it out of memory again, spending tens or hundreds of microseconds.
')
In general, the x86 architecture provides not a lot of possibilities for programmatically controlling the operation of the cache. I will list all these methods, just enough fingers on one hand:

1. PREFETCHx - pull the line out of memory into the cache in cash
2. CFLUSH, WBINV (Very “evil” team, by the way) - “reset” the line or the entire cache
3. non temporal COVNTDQ / MOVNTDQA / MOVNTPS, ordering control (L / S / MFENCE) - control the cacheability of some data operations
4. Any indirect methods. Here I mean tricky ways of storing and accessing data, for example, more user-friendly prefetchers.
5. Direct cache entry via DMA. This not very popular feature is relevant for peripheral manufacturers.
You can disable the cache altogether, but this is somehow too extreme.

As you can see, among these methods there is nothing like, for example, cache lockdown - a feature that exists in ARM, or similar features of MIPS. Reserving a piece of cache is useful for developers of realtime code, for example, in the case described above. It is possible that sometime in x86 something similar will appear, although this contradicts the ideology of transparent memory work. But for now you can use a palliative.

The picture shows that the physical address and the address in the cache have a common 5 bits. Well, it turned out - just lucky. This allows you to "paint" the physical memory page by page in 32 colors. What for? The OS kernel when creating an address space for an application can then give it virtual memory of only one color. If you give the main consumers of the cache memory of different colors, their data will not be able to force each other out of the cache.

In the example above, it is obvious that if you allocate memory of different colors to both tasks, the problem will be solved. The GUI will get less cache, but it is very likely that the operator will not notice any brakes. There is, of course, one big drawback - we can talk only about virtual memory, and if something in the kernel also wants a lot of cache, then nothing can stop it.

The same method is used in the Windows and FreeBSD kernels to more evenly distribute memory across sets in the associative cache. With a low cache associativity, this is important enough so that no piece of it is lost in vain. To use this approach, nothing is required of the programmer - the OS does everything. But in no other OS production, cache coloring is currently used to separate these processes, there are only unofficial patches.

Well, by the way, all realtime developers on x86 want to remember to disable C-states and Speedstep.

By the way, if anyone knows the Russian replacement for any Anglicism that I used in the topic - please let me know in the comments, and I will correct it in the text.

Source: https://habr.com/ru/post/117760/

All Articles

Features of the cache in relation to realtime on x86

More articles: