Recently asked to briefly tell serious people about loading the operating system on the ARM and assess the security threats to this process.
In general, ARM-processors and OS in general . You understand, everyone has heard about these ARMs, and everyone also knows what the OS is. It is desirable at the level of small squares with arrows.
Download ARM in four rectangles - under the cut.
Immediately limit the level of detail. We are interested in
what is happening, and not
how , that is, we will set aside specific instructions of the processor. We will try to find a common for all processors and all operating systems. Look for security threats.
Varieties of ARM processors
If you know about ARM, then this section can be safely skipped.
')
In the production and operation now there are ARM processors of five architectures: ARMv4, ARMv5, ARMv6, ARMv7 and ARMv8. The ARM company gives these architectures commercial names, therefore ARMv4 is called, for example, ARM7, ARMv5 is ARM9, and the name Cortex has processors on ARMv6, v7, v8 architectures. The following table lists the main varieties.
Architecture | Commercial name | Common species | Run Linux |
ARMv4 | ARM7 | ARM7TDMI | Impractical |
ARMv5 | ARM9 | ARM926EJ-S | Yes |
ARMv6 | ARM11 | ARM1176JZF-S | Yes |
| Cortex-M0 | Cortex-M0 | Not |
ARMv7 | Cortex-m | Cortex-M3 | Impractical |
| Cortex-a | Cortex-A9 | Yes |
| Cortex-R | Cortex-R4 | Yes |
ARMv8 | Cortex-a | Cortex-A53 | Yes |
For example, push-button telephones mostly use ARM7, and smartphones use Cortex-A. Modern smartphones are built primarily on ARMv8, the only 64-bit. The ARM7 and ARM9 processors were widely used in various industrial controllers, network equipment, and now the focus is shifting to using Cortex-A in them. In various household appliances, small electronic devices, in the field of safety, etc. Cortex-M microcontrollers are used.
In general, all ARM devices can be conditionally divided into microcontrollers and the Application Processor.
- Microcontrollers are distinguished by the presence of flash memory and working RAM on a chip. Used for tasks related to small automation.
- Application Processor primarily uses external memory - DDRAM and Flash. We will continue to call them simply - processors. The scale of their tasks is greater.
For a long time, the same ARM7, ARM9 architectures were used both for building processors and microcontrollers. With the advent of the Cortex line, separation occurred, and now the microcontrollers are called Cortex-M, and the Cortex-A and Cortex-R processors.
Types of OS
What are the options for running the OS:
- microcontrollers usually run a small real-time OS (RTOS) or just a program without an OS;
- On processors, a common use OS (Linux, Android) is often launched, sometimes a small RTOS, sometimes a full-featured RTOS (such as vxWORKS).
For example, tablets and smartphones use Android, iOS, or the Linux variant. Telecommunications equipment can be Linux or one of the variants of RTOS. In more simple equipment, RTOS or a non-OS program can be used.
In the future, we will only talk about the launch of the OS (Linux, Android) or RTOS on ARM. By the way of launching, the “big” RTOSs fall into the same group with Linux, and the “small” RTOSs are combined with the programs without OS.
ARM9, ARM11, Cortex-A processors are well suited to run Linux. A truncated version of Linux can also be downloaded to ARM7, Cortex-M4 and Cortex-M7, but this is not appropriate.
Microcontrollers and processors ARM7, ARM9, Cortex-M are suitable for running small RTOS. In some cases, initial models of Cortex-A, for example, Cortex-A5, are used for RTOS. Most Cortex-A processors are so complex that their capabilities can only be used in conjunction with the Linux / Android SDK supplied by the manufacturer, which determines the choice in favor of Linux.
OS loader
From the point of view of the developer, the system software of the device is divided into the boot loader and the OS. The main function is always performed by a program running under OS or RTOS.
The loader provides OS booting and service functions, such as:
- check the integrity of the OS image before launch;
- software update;
- service functions, initial initialization functions of the device;
- self test
In the case of RTOS, the bootloader is often written by the device developer and is a small specialized program. In the case of general-purpose operating systems, open source loaders are widely used, for example, u-boot.
Thus, from the point of view of the product developer, the launch of the OS is as follows:
Here the // sign indicates the moment of power supply or processor reset. Such an easy way to run was some processors ARM7. In subsequent versions, the launch process is more difficult in
reality than in the diagram above, but for the developer of the final solution, this is usually not essential.
The “Loader-OS” scheme is very convenient for practical reasons, because the loader takes over all the low-level work:
- initializes the memory before starting the OS and loads the OS kernel into the memory;
- initializes part of the periphery;
- often implements the storage of two OS images: current and backup, or an image for recovery;
- controls the OS image before loading;
- gives a service mode even with a damaged OS image.
For example, to run Linux on ARM, the loader must initialize the memory, at least one terminal, load the kernel image and Device Tree into memory and transfer control to the kernel. All of this is described in <
https://www.kernel.org/doc/Documentation/arm/Booting >. The Linux kernel initialization code will not do what the loader should do.
At the same time, the bootloader is often weakly protected or not protected at all. In most home routers, just open the lid and connect to the UART connector to enter the bootloader control menu. In higher-class telecommunications equipment, entry to the bootloader menu is often possible by an undocumented key combination or by pressing a button when the device is turned on. In other words, often the loader is not protected from a local intruder.
Consider the work of the bootloader on the example of u-boot, loading Linux, step by step.
- After powering on or resetting, the processor loads the u-boot image stored in Flash memory into RAM and transfers control to the first command of this image.
- The u-boot initializes DDRAM.
- The u-boot initializes bootable media (EAN) drivers, for example, eMMC, NAND Flash.
- The u-boot reads from the EIT a region of variable configurations. In the configuration, the boot script is set, which u-boot then executes.
- U-boot displays in the console a suggestion to interrupt the boot process and configure the device. If the user does not do this in 2-3 seconds, the download script runs.
- Sometimes the script begins by searching for a suitable OS image to load on all available media. In other cases ZN is set in the script hard.
- The script loads the Linux kernel image (zImage), the Device Tree file with the kernel parameters (* .dtb) from ZN to DDRAM.
- In addition, the script can load into the DDRAM an image of the initrd - a small file system with the necessary device drivers to start. Modern Linux distributions sometimes use initrd, and sometimes not.
- Placing the loaded 2 or 3 files in memory, the script transfers control to the first command of the zImage image (Linux kernel).
- zImage consists of a unpacker and a compressed kernel image. The unpacker deploys the kernel in memory and the OS starts.
Bootloader run - preloader
However, in reality, it almost never happens that the loader commands are executed first after powering on or resetting the processor. It was still on the ARM7 processors, but almost never met further.
Any core of the ARM processor, when reset, begins execution at address 0, where the “reset” vector is recorded. The old series of processors literally started to boot from external memory displayed at zero address, and then the first processor command was a command loader. However, only parallel NOR Flash or ROM is suitable for such loading. These types of memory work very simply - when submitting an address, they give out data. A typical example of parallel NOR Flash is a BIOS microchip in personal computers.
In modern systems, other types of memory are used, because they are cheaper and the volume is larger. This is NAND, eMMC, SPI / QSPI Flash. These types of memory no longer work according to the principle: I submitted an address — you read the data, which means that they are not suitable for direct execution of commands. Even for a simple reading, you need to write a driver here, and we have a “chicken and egg” problem: you need to load the driver from somewhere in advance.
For this reason, ROM is integrated with a preloader in modern ARM processors. The ROM is mapped in processor memory to address 0, and it is from it that the processor begins execution of instructions.
The preloader tasks include the following:
- determining the configuration of the connected devices;
- definition of bootable media (MN);
- device initialization and MN;
- reading the bootloader from the OS;
- transfer control loader.
The preloader configuration is usually installed in one of two ways:
- circuit design, by connecting specific pins of the processor to the ground or power bus;
- written to the one-time programmable processor memory at the production stage.
In general, it is almost always possible to specify a single boot option or the main one and several alternative ones. At the same time, among the alternatives there may be an initial download via USB or a serial port, which is very convenient during initial initialization in production.
Such a preloader is installed in ARM processors, such as the Cortex-A, and in microcontrollers, even small ones like the Cortex-M0. Together with the preloader, the OS startup procedure looks like this:
Threat analysis at this stage
The source code of the preloader is written by the processor manufacturer, not by ARM, is part of the chip as a product of the manufacturer and is protected by copyright. For example, in the Atmel and NXP ARM processors, the preloaders are written, respectively, by Atmel and NXP.
In some cases, the preloader can be read from ROM and analyzed, but sometimes access to it is limited. For example, the pre-loader of the Cypress Psoc4000 series processor was covered with several layers of protection (but was
hacked by a talented hacker ).
Using a preloader in most scenarios cannot be avoided. You can consider it as a version of the BIOS, which is not in ARM-systems.
Hidden textIn fact, ARMv8-A has ARM Trusted Firmware, this is system software that is responsible, for example, for power management (PSCI). This code can be considered as BIOS for ARMv8. ARMv7 and earlier have no such standard software.
By itself, a preloader in ROM carries the risk of disrupting the order of loading and executing arbitrary code. But after control is transferred to the OS loader, the preloader is already harmless. We can simply not transfer control to it, reconfigure all interrupt handlers, and so on.
In some small microcontrollers, manufacturers integrate into ROM libraries for working with peripheral devices that need to be called up throughout the microcontroller's work. In this case, the system software (bootloader and OS) itself periodically transfers control somewhere to the preloader area, and the control transfer scheme is obtained as follows:
This is generally unsafe, but is found only in some microcontrollers on the ARM architecture. Such microcontrollers usually run programs without an operating system or small RTOS, and the system designer can assess the risks.
Download from TrustZone
TrustZone technology is built into the ARM Cortex-A and Cortex-R processors. This technology allows you to select two execution modes at the hardware level: Secure (Secure) and Non-Secure (Guest).
These processors are mainly aimed at the market of smartphones and tablet computers, and TrustZone is used to create a trusted trusted sandbox in Secure mode for executing code related to cryptography, DRM, and storing user data.
In Secure mode, this starts a special OS, generally called TEE (Trusted Execution Environment, trusted runtime environment), and a normal OS, such as Linux, Android, iOS, starts in Non-Secure mode. At the same time, access rights to some devices are limited for a normal OS, therefore it is also called a
guest OS.
Due to the restrictions imposed, the guest OS has to call TEE functions from time to time to perform certain operations. TEE continues to exist in parallel with the guest OS all the time, and the guest OS cannot do anything about it.
For example, the guest OS uses TEE functions to:
- turning on and off the processor cores (in ARMv8-A this happens through PSCI - part of ARM Trusted Firmware, and in ARMv7 - differently for each processor manufacturer);
- storing keys, bank card data, etc .;
- storing keys full disk encryption;
- operations with cryptography;
- display DRM content.
In this case, from the point of view of security, at the time of such calls, control is transferred to an unknown, untested code. We cannot unequivocally say what Samsung KNOX or QSEE from Qualcomm is doing.
Why do system developers agree to this mode of operation? In TrustZone-enabled processors, the Secure Boot mechanism is also integrated in one form or another.
With Secure Boot, the preloader verifies the signature of the boot image using the public key stitched during the production phase. Thus, it is guaranteed that only the signed image will be downloaded. This is a security feature.
That is, the OS boot becomes the following:
- start preloader in ROM. It loads the keys to verify the TEE signature from ROM;
- the preloader loads the TEE image into memory, verifies the signature. If the check is successful, TEE starts;
- TEE configures Secure and Non-Secure modes. Next, TEE loads the main OS loader and switches to it in Non-Secure mode. The TEE itself remains in secure mode and waits;
- the main OS loader loads the OS as usual;
- The OS is forced from time to time to call TEE functions to perform certain tasks.
However, a manufacturer typically supplies signed bootloader images and TEEs as part of the SDK for the processor and supplies processors that are already wired with the manufacturer's key. In this case, the preloader from ROM will not execute any bootloader, unless it is signed by the manufacturer. All the main processors for smartphones are now delivered already “stitched” for the execution of their own TEE before the execution of the OS loader.
Then laziness works - with TEE everything works, and without TEE it does not even start. Developers use SDK with TEE, call the closed binary code from the Linux kernel and do not worry.
How to check your project on appeals to TrustZone
It may even seem that all this TrustZone does not exist, at least in your particular design. Check it out is a snap.
The fact is that all processors with TrustZone start in Secure mode, and only then switch to Normal. If your OS is running in Normal mode, then some kind of Secure OS (TEE) exists in the system and transferred it to this mode.
The litmus test is a call to TEE to enable L2 Cache. For some reason, the ARM architecture does not allow this to be done from Normal World. Therefore, to enable the cache, the OS kernel will need to make at least one call to TrustZone. This is done by a single command:
smc # 0 , and you can search for it yourself in the Linux or Android kernel.
Of course, we ourselves searched and found such calls in the support code of a number of processors Qualcomm, Samsung, Mediatek, Rockchip, Spreadtrum, HiSilicon, Broadcom, Cavium.
ARM Cortex-A download and threat analysis
So, the promised process of loading the OS on ARM (here - Cortex-A) into four blocks:
In the diagram, the dotted line indicates the path of circulation from the OS kernel to TEE.
In two blocks - unknown to us code. Let's see what this means.
Technically, any of the components of the system software may contain errors, intentional bookmarks, and so on. However, in most cases, the bootloader, OS, and system software can be checked by examining the source codes. We will concentrate on possible threats emanating from the preloader and TEE, the source codes to which are closed.
The preloader works at the earliest stage, when the connection scheme to the processor of various peripheral devices is not yet known, no communication devices (WiFi, 3G, etc.) are configured, communication protocols do not work. At the same time, a preloader is a small program, with a code size of the order of several tens of kilobytes, and it is difficult to imagine the placement of full stacks of protocols or serious heuristics by definition of connected devices in it. Therefore, the preloader hardly harbors serious bookmarks related to surveillance, data transfer, etc.
TEE is a much more interesting point of attack, since its functions are called during OS operation, when all peripheral devices work and communication protocols are configured. Creating a spy bookmark in the TEE code allows you to follow the CBT user almost unlimitedly.
In a small study, we showed the feasibility of a bookmark in TEE, imperceptibly intercepting Linux system calls. To activate a bookmark, you only need one call from the Linux kernel to TEE (for example, the one for the second level cache), after which the system becomes fully manageable. This allows:
- control reading and writing files, modify data "on the fly";
- intercept user input, and the entered characters are intercepted even from the on-screen keyboard;
- quietly embed your data when communicating with remote servers, including the https protocol, masking the transfer of spyware information to ordinary encrypted Web traffic.
Undoubtedly, the identified opportunities are only the tip of the iceberg, and the creation of bookmarks was not the goal of the study.
findings
We reviewed the process of loading various microcontrollers and ARM processors.
In microcontrollers, the OS is the most vulnerable place in the boot process.
Modern ARM Cortex-A processors include the TrustZone - and there is no getting away from it. TrustZone assumes the launch before the OS of the TEE trusted execution environment
TEE is the most vulnerable point in the OS boot process on ARM Cortex-A, because calls to TEE lead to the execution of a closed system code known to the manufacturer, but hidden from us.
Without control over TEE, it is impossible to ensure the security and power of attorney for the execution of any OS on ARM Cortex-A.