TrustZone: hardware implementation in ARMv7A

Today we begin to explore the internal structure of TrustZone (this is a trademark of ARM).

The name itself is commercial, marketers invented it to inform the whole world about the key property of this technology. According to their idea, we must provide some kind of trusted, secure, very reliable place. For example, the house where we, having closed the doors and turned on the light, feel comfortable and safe.

Therefore, I will begin by saying that TrustZone is no “place” in the processor . It can not be found on the chip, like cash or ALU. And trusted programs, in fact, are not executed in some physically dedicated zone of the processor.
')
Even if we looked at the source code of the ARM core, we could not clearly distinguish TrustZone. Rather, by analogy with programs, TrustZone is several modules and a set of patches for almost all other parts of the processor.

In this article, we will look at how TrustZone is implemented at the hardware level of ARM Cortex-A processors (ARMv7A).
In ARMv8A it will be about the same, but in ARMv7M everything is completely different. For the sake of marketing, TrustZone is also there, but different.

Mode

The first component of the TrustZone is the processor mode. It is specified by the NS (Non-Secure) bit in the SCR (Secure Configuration Register). If NS = 1, we are in Non-Secure mode, if NS = 0, we are trusted, that is, Secure-mode.

SCR register with Cortex-A5

Regardless of NS, all the usual modes of operation of the processor remain in place. The most popular of them:

User - the execution mode of the application command;
Supervisor - OS kernel operating mode;
IRQ - mode when processing interrupts.

Thanks to NS, we have Secure User, Non-Secure User, Secure Supervisor, Non-Secure Supervisor, and so on.

Hidden text

The User / Supervisor names listed here are used for all 32-bit ARMs, up to and including ARMv7. Other notations are used in ARMv8: EL0 / EL1 and PL0 / PL1. Essentially it does not change.

The NS bit affects the performance of individual processor functions, denies access to individual blocks, and changes the behavior of part of the registers, both of the processor core and peripheral devices.

Moreover, it turns out that it is impossible to take and change the value of the NS bit in any of the normal modes of processor operation - this is prohibited. To change the value of NS, a ceremonial is provided with the processor entering a separate Secure Monitor mode, which does not belong to either Secure or Non-Secure. But we'll talk about this in the next article.

It turns out that NS splits the processor, creates two unequal modes of operation: Secure and Non-Secure. In each mode, however, there is everything that is needed for the execution of the OS and programs, just the privileges to access some of the functions of the CPU and peripherals are different.

Mode, not zone!

We continue to remove the veil.

The trusted mode of program execution is where NS = 0, that's all!

There is no additional command pipeline, the ALU, a separate program memory — nothing that can be imagined upon hearing the name TrustZone. There is no boundary of this zone, the teams of violators do not seek to “crawl” into the trusted zone, like viruses through the cell membrane.

In the general case, the pipeline executed the commands of an untrusted program (NS = 1), and then (bang!) An interrupt occurred, the processor went into the trusted mode (NS = 0) and immediately executed the trusted code.

In fact, TrustZone technology gives us the tools to take a number of measures (to divide the memory of trusted and untrusted programs, to divide access to the periphery) to create a reliable barrier between Secure and Non-Secure. But the reliability of this barrier will depend on the quality and completeness of the implementation of trusted software.

The end of lifting the curtains.

NS signal

The NS bit does not simply indicate to the processor core which mode to operate in. It is also an external signal connected from the processor to almost all peripherals.
How to present it? In general, we imagine that the peripherals to the CPU are connected by address, data, and control buses. NS is part of the control signals for those processors where TrustZone is implemented. Thus, not just Read, Write, but Secure Read, NonSecure Read, Secure Write, NonSecure Write commands go from the CPU to the device.

Cortex-A is ~~slightly more often than~~ always supplied as System On Chip (SoC), so all these tires are hidden from us inside the chip. However, a number of SoCs allow the NS signal to be output outside, in the event that external peripherals are connected that support safe mode.

Which peripherals support Secure / NonSecure access? For example, this is a GIC interrupt controller — in ARM, this is a peripheral device as part of a SoC. In Secure mode, it allows you to configure the delivery of certain interrupts to the Secure FIQ mode and prohibit changing this software setting from NonSecure mode.

Here is what happens when the CPU operates with GIC: when writing the GIC register in the Secure mode from the CPU, the NS = 0 signal goes along with the register address and data. GIC understands that the record is trusted, and gives full access. If NS = 1, GIC restricts access to part of the registers, both for writing and reading.

Other processor blocks that support the NS signal: memory controllers, real-time clock (RTC), key storage, reset controller, and power management.
Note that in ARMv7A, TrustZone support is optional, and when creating a SoC, the Secure Extensions option (read: TrustZone) can be disabled. At the same time, unnecessary blocks are removed from the chip and communications, in particular, there is no need to trace the NS line throughout the chip. At the same time, the inputs of the NS peripheral devices are connected to 0 (at least, we can imagine this). The topology of the chip becomes easier.

Multiprocessing

What happens when the SoC contains multiple processor cores? Each kernel (usually the kernel is called a CPU in the ARM documentation) can operate in either Secure or Non-Secure modes. At any given time, it may be that some of the cores are Secure, while others are not.

Hidden text

Moreover, the cores that do not support TrustZone can be combined with the cores that support it in one SoC.

Consider the insides of the work of the modern ARM, to understand how the TrustZone will work in this case.

In ARM processors, all processor cores, memory, and peripherals are connected by an internal bus called AMBA ( https://en.wikipedia.org/wiki/Advanced_Microcontroller_Bus_Architecture ). Starting around ARMv4, there is a switching unit in the AMBA bus, it connects the units, called Bus Master, to various Slave devices.

Only a really tough nutlet will understand the details of the work of AXI and AMBA, and after all, for the full picture you need to add AHB, APB and take into account the implementation details in different architectures. But the general idea is captured very quickly.

For example, the processor core (or rather, D-cache and I-cache of this processor) is the Bus Master, and some I2C controller is the Slave. Bus Master starts a bus transaction, i.e., read or write. Slave is the block where they write or read from. From here, by the way, the set of wizards itself follows: processor cores, DMA controllers and peripherals with built-in DMA (such as USB host).

The Master Slave switching unit will be discussed in more detail. In ARMv7A, it is called Interconnect and is part of the Advanced eXtensible Interface (AXI) implementation. In ARM926, this unit had the talking name Bus Matrix and was part of the implementation of the AHB (Advanced High-Perfomance Bus) internal bus interface. In essence, it is the same.
We have M × Master and N × Slave, and there is a switching matrix connecting the first with the second. At any time, each Master can be connected to one Slave or disconnected altogether. But several Master can be active at the same time, if connected to different devices.

In general, not all communications are possible. In particular, the system designer can eliminate unnecessary connections - for example, if there is no reason for the Ethernet controller (Master), you can write directly to the I2C controller (Slave).

In addition, some devices can be both Master and Slave. For example, USB Host, when it saves data via DMA into memory, is Master, and when we configure its registers, Slave.

In this case, each Master is also the source of the NS signal, and the Slave is the recipient of this signal. AXI transmits NS signals from Master to the corresponding Slave via Interconnect, and due to this, both Secure and NonSecure transactions can occur simultaneously in the SoC.

Periphery

Now we see how ARM Cortex-A supports simultaneous operation of several processor cores and multiple peripheral devices on the internal bus, simultaneously in the Secure and Non-Secure modes. A little more complicated?

When creating a SoC, the developer takes the blocks from ARM, blocks from third-party manufacturers and blocks of their own design, connects them into a single system.

From ARM taken, including

processor cores, for example, Cortex-A, Cortex-M4, or the entire multiprocessor system, for example, Cortex-A9 MPCore;
GIC interrupt controller, for example, PL390;
cache controller, for example, L2C-310.

All of them have TrustZone support and internally share access via NS to trusted and untrusted.

For example, the cache controller knows which lines were stored in a trusted mode, and which lines were in the untrusted mode, and will perform the corresponding AXI transactions to flush the data into physical memory.

Further, many processor units are purchased from third-party (reliable and well-known) developers, they are the same even in processors from different manufacturers. This, for example, USB host, SDHC host. SoC developer uses other blocks in all of its processors, almost without changing. This, for example, Ethernet MAC, I2C, UART, SPI controllers.

These purchased ones and their blocks may not have the support of TrustZone at all. This is understandable - we can not imagine why you need to share access to the UART between Secure and Non-Secure. But the question of the integration of such devices in TrustZone hangs in the air.

The integration of these devices is solved by the SoC manufacturer itself. In fact, the manufacturer must solve two problems:

for Bus Master without TrustZone support, substitute the correct NS-bit;
for Bus Slave provide customization and access control.

Access Bus Master without TrustZone support

Let's see what this means for the Bus Master with the example of a video controller taking data from memory and transferring it directly to HDMI.

We want to provide the notorious DRM: the encrypted video stream will come from Linux to Secure OS, and it will be decrypted and displayed on the screen. The decrypted data will be placed in a memory area accessible only to Secure Read / Write; reading this area from Linux (Non-Secure) will give an access error. Thus, we will not let Linux copy the decoded stream. A video adapter with the right of Secure-access will read the decrypted video data without difficulty and display it on the screen.

In order for the video adapter to receive data from Secure-Memory via AXI, it must access with NS = 0. However, if we don’t need DRM, we may not want to give privileged access to the video controller.

In order for the controller to work this way and that, a system is introduced in the system: the type of access for each Bus Master that does not support TrustZone. That is, at least 1 bit for each Bus Master. Perhaps this is just one register - but this is work for the SoC creator, his responsibility. And this, of course, is a source of incompatibility between processors from different manufacturers.

Access Bus Slave without TrustZone support

For each Slave, it is reasonable to determine the following access rights when working with AXI:

whether Secure Read access is allowed;
whether secure write access is allowed;
whether Non-Secure Read access is allowed;
Whether non-secure write access is allowed.

This set derives from the superposition of Read / Write operations and Secure / Non-Secure modes.
In fact, how to divide rights in this case is decided by the SoC manufacturer on its own. For example, you can reduce the number of settings by always allowing Secure Access. Or you can increase it by adding a partition by User / Supervisor access types.

For such access control, it is possible to provide a register with 2-4-8 bits for each Bus Slave, allowing or denying access to the device depending on the access mode.

And here we come to another topic: what will happen if the Bus Master has started access, and the Bus Slave has not resolved it?

access error

If there is a restriction, there will be a violation. If some type of access to the device is denied, something should happen if it is done.

In fact, not always. For example, in the same GIC (interrupt controller), non-secured write operations are not performed (quietly and quietly), and read operations return zeroes. Nothing happens, and this is specially conceived - it allows you to run the same OS (for example, Linux) in both Secure and Non-Secure modes.
In Secure mode, Linux will configure everything on its own, while in Non-Secure, the controller will be pre-configured, and Linux will only be able to configure what's left is allowed. But she will not blink an eye, she will not notice a dirty trick, because GIC will not give any error when writing to the forbidden area.

And what if we use less clever smart devices? Then, for example, with Non-Secure writing to Secure the memory area, Abort will occur. Abort is an ARM exception type that occurs when it is impossible to access any device or memory area.

Asynchronous Data Abort, or in Russian, asynchronous abortion will most often occur. ~~It is not worth discussing it at lunch.~~

Data Abort - because it occurred while reading / writing data, and not processor instructions. It is asynchronous because it does not occur immediately at the moment of an error, but some time after it. And from this place will be even more.
In general, in case of access violation, both synchronous and asynchronous abortion can occur.

For example, when Linux loads an application, it may not load it entirely, placing only part of the pages in physical memory, and adjust the rest to generate Abort at the time of access. The application will start, and when it comes to a page that is not loaded into physical memory, a synchronous abortion will occur. It is synchronous because it will happen exactly on the instruction that made the memory access. When the processor enters Abort mode, Linux will load the page of memory you need and return control to the same instruction that caused Abort. The result - the program will continue to work "as it did not happen."

But in the case of TrustZone, things are not so smooth. Some processors will generate synchronous exceptions, but most will generate asynchronous Abort for most access errors.

Hidden text

In principle, ARMv7A processors that have both Security Extensions and Virtualization Extensions at the same time can be configured to generate synchronous Abort. This, for example, Cortex-A17, but the bulk of ARMv7A (by the number of chips produced) does not have virtualization.

Answer yourself two questions:

Why is exactly asynchronous abortion?
What is bad?

Why asynchronous?

To begin with, ARMv7A is an architecture with a command pipeline, where instructions are pre-split by the processor and not executed strictly sequentially. Execution of part of the instructions may occur in parallel with others. For example:

	 STR r1, [r2] // * r2 = r1;
	 ADD r2, r2, # 16 // r2 = r2 + 16;

Here, the first command saves r1 to r2, and the second increases r2. After the execution of the first command, in general, the storage in memory will only begin , and it may not end when the second instruction is completed completely.

Further, the processor has a cache in which the recorded cell will be stuck for an indefinite amount of time, and an access error will potentially occur only at the moment of cache synchronization with the memory.

Then, even if the memory area is not cached: the memory in the ARM is divided into Normal, Strongly Ordered and Device Memory, allowing different freedoms from the processor to change the order of real memory and device accesses via AXI. As a result, a transaction through AXI may not occur immediately due to the fact that access to the device is occupied by another call.

And finally, if access to the usual Bus Slave caused Abort, then it will be a logical signal external to the processor core. The kernel does not expect that this signal is synchronized with what is happening now in the command pipeline, and this is absolutely true: the kernel cannot even 100% determine the cause of such an abortion.

Under any of these circumstances, ARM will generate Asynchronous Abort, telling us that there was an attempt to deny access, but he does not know how many ticks or instructions are back.

What is bad Asynchronous Abort?

Yes, the fact that we can not determine the point of failure and can not fix anything. The program after an erroneous access can take more than a dozen cycles and, during this time, move so far from the correct functioning that it can only be stopped and restarted. It is possible, with a complete reset of the processor, if any peripherals or internal OS structures suffer from the work of the program after Abort.

... and which of these can be concluded

When working with TrustZone, at first there is a temptation to use this technology as a hardware virtualization technology. But because of Asynchronous Abort, this cannot be done.

Indeed, there are two modes: Secure and Non-Secure. Secure mode can create an analogue sandbox for Non-Secure and limit access to peripherals.

However, the next step will be the virtualization of a part of the periphery, for example, Flash-memory, with which both the guest OS and the hypervisor work. And here we stumble upon the fact that it is impossible just to take and close access to the device for the guest OS.

I would like to:

the guest OS accesses the device; Abort (synchronous) occurs;
the hypervisor understands what happened;
the hypervisor emulates the expected guest OS operation;
the hypervisor returns control of the guest OS, it continues to work, as if nothing had happened.

But how come:

The guest OS accesses the device, conditions are created for Asynchronous Abortion;
guest OS continues to work, unaware of it;
suddenly for all Abort is generated by the system;
the hypervisor understands that Abort is asynchronous, and it cannot figure out which instruction caused it, at what address, and to which device it was accessed;
The hypervisor terminates the guest OS.

Conclusion: TrustZone technology cannot be used on its own for hardware virtualization.

You can force the guest OS to knock on the Secure OS to access illegal devices, and this is the main way to partition devices between the Secure OS and the guest OS. But we'll talk about it next time.

And memory, memory?

And what about access to normal memory? Is it possible to allocate a part of system DDRAM for Secure-access?

ARM took care of this less than you can expect!

Memory controllers are different, for example,

static memory controller, SRAM, is often an internal SoC memory;
dynamic memory controller, for example, DDR3;
universal controller of access to parallel memory, can be used for SRAM, NOR Flash.

All of these controllers are typical Bus Slave. ARM does not develop them, so Secure / Non-Secure access control falls on the shoulders of the SoC developer, according to the above scheme.

The most basic option is almost always - access to the embedded SRAM is configured as Secure, and to DDR - as Non-Secure.

This is a fairly secure way, because all Secure data is stored inside the chip and does not leave its perimeter. But the built-in SRAM is a measly tens or hundreds of kilobytes, and this may not be enough for high-grade Secure OS and protected data.

A more flexible way appears if the SoC manufacturer, at its discretion, implemented a DDR controller with support for memory zoning according to the NS = 0/1 criterion. In fact, implementation options can be many, but this does not change the essence.

In general, such a memory offers at least the following:

There are zones with different access rights, a number from 3.
One zone can be configured as Non-Secure, there will work Linux or another guest OS. This is the largest part of the memory.
Another zone can be configured as Secure, there will be Secure OS data. This zone is much smaller in size.
We configure the third zone with both Secure and Non-Secure access. It is used to exchange large amounts of data between Linux and Secure OS, these are just a few MB.
More flexible settings allow you to make Secure Write / Non-Secure Read areas and, conversely, for unidirectional data exchange.

Fortunately, manufacturers really include such controllers in their SoCs.

It is a pity that ARM did not take care of this, and we have a variety of solutions.

This implementation has a minus: since the usual program and data memory in ARM is cached, and the memory controller is the usual Bus Slave, we can not immediately find out that the recording occurred at a forbidden address. An asynchronous Abort will occur, and we will only need to remove the wreckage of the program.

Conclusion

In this article, we looked at the hardware implementation of the TrustZone in ARMv7A and dispelled some of the misconceptions associated with this technology.

Reviewed:

Secure and Non-Secure;
the work of one and several cores;
work with peripherals via AXI;
working with peripherals developed without TrustZone support;
types of access errors that occur;
access control to physical memory.

We can say that we figured out under the hood, but the ignition has not yet turned on. In the next article, we will launch the processor, consider its operation in the Secure, Non-Secure modes and switch between them through the Secure Monitor mode.

Source: https://habr.com/ru/post/340912/

All Articles

TrustZone: hardware implementation in ARMv7A

Mode

Mode, not zone!

NS signal

Multiprocessing

Periphery

Access Bus Master without TrustZone support

Access Bus Slave without TrustZone support

access error

Why asynchronous?

What is bad Asynchronous Abort?

... and which of these can be concluded

And memory, memory?

Conclusion

More articles: