Quickly raised is not considered to be dropped. Increase resiliency of embedded systems

A year ago, he did a rather interesting job of developing an embedded computer for a single electronics company. The computer did not represent anything fundamentally interesting: a Cortex A-8 processor running at sub-gigahertz frequencies, 512Mb DDR3, 1Gb NAND, a lightweight Linux build. However, the device into which the computer was built, and therefore he himself, had to work in fairly tough conditions. Wide temperature range (from -40 to +85 degrees Celsius), moisture resistance, resistance to electromagnetic radiation, kilovolt power pulses, protection against static in 4 kV and many other interesting things that are well described in various state-of-the-art equipment, are all about him One of the main requirements of the customer is the term of production for failure not less than 10 years. In this case, the manufacturer provides warranty repair of the product for five years, because the question is not rhetorical, but monetary and serious. A corresponding element base was laid in the product. The device passed the tests with honor and received the required certificates, but the conversation is not about that. The problems started when the installation batch was made, and the devices were divided into departments and design offices for creating application software. Come back with the wording: "Something does not load."

It was a FAIL

During the inspection, it turned out that in 100% of the failures, the NAND partition with the file system (rootfs) was damaged, and all other partitions were intact, normally mounted and read. A survey of witnesses showed that the device refused to start, after a hard emergency power failure. The direction of research is clear. A file system failure can be caused by turning off the power while writing to the media. We are building a test bench, the task of which is to supply power to the device, wait for Linux to boot and run the test script (generates files and writes to Flash) and cut off the power. And so in a circle. In total, the cycle lasted just over a minute. We put on the test a few devices. On average, after 2000 iterations, each device refused to load, the section with rootfs fell! It seems found.

For reasons of durability and reliability, our device uses SLC NAND as a ROM. Options with eMMC (embedded Multimedia Memory Card) were rejected immediately due to the small number of rewrite cycles. Today, eMMC is not a standard for industrial applications, probably for this reason such a small number of offers of similar chips with a lower limit of the operating temperature range of -40 ° C. The main limitation for use in industrial systems is a small period of warranty data storage. If for SLC NAND it is about 10 years, then for eMMC - about a year.
')
Unlike the eMMC-based solution (or the usual SD “Secure Digital” Card), where the software level of interaction with physical media (FTL - Flash Translation Layer) is implemented by the controller built into the memory, FTL should be implemented by means of the central processor. Despite the increasing complexity of implementation, it provides tangible advantages in the configuration flexibility of the final system, as well as due to the possibility of using special algorithms for aligning the wear of physical memory cells, increases the durability of the carrier. (Actually, the FTL level built into eMMC also implements wear leveling algorithms, but this is a “black box”).

Linux operating systems use a number of file systems to work with NAND physical media: JFFS2 and its evolutionary development - UBI / UBIFS (thanks to Nokia for this), and also the competitor - LogFS . By the combination of parameters, preference was given to the UBI / UBIFS bundle. UBI / UBIFS - these are two software layers: UBI (Unsorted Block Images) - provides work directly with physical media, UBIFS (UBI File System) - actually, the file system itself.

Main features of UBI:

works with sections, allows you to create, delete, or resize them;
aligns the recording across the entire volume of the medium;
works with bad blocks;
minimizes the likelihood of data loss in case of power failure or other failures.

UBIFS, among other things, is engaged in logging.

Despite the fact that, in general, UBI and UBIFS were developed taking into account the requirements for tolerance to power interruption, as practice has shown during the operation of the device under certain conditions, after an emergency shutdown (in other words, a power failure), the partition is corrupted. If this is a section with rootfs, then the device loses its functionality as a whole. The probability of this event is not great, the device can work stably for several months or even several years, successfully going through a single power failure. However, this factor can not be ignored if the device is designed to work in a hard-to-reach place, with a person’s limited access or its failure can have fatal consequences.

The reason for the failure is in the physical structure of the NAND. Data recording occurs page by page, previously, the page must be erased - all units are recorded in the area. Erasing occurs in blocks, such a block is called PEB (physical erase blocks). In order to erase the page, you need to erase the entire block. In one block there can be many pages, for example, a 4K page, and a block 256KB. The developers of the UBI / UBIFS technology are aware of this problem and blame the so-called “unstable bits” for everything. They point to four major events when data from the media may be lost.

Causes of failure and loss of information in NAND

Power was turned off before work on the memory page was completed. After reloading the page can be read correctly, but when re-reading you can get an error ECC. This is because a number of unstable bits have appeared, which can be read correctly or not correctly.
The power is turned off when you start working with the NAND page. After reloading, the page can be read correctly: all units are counted (0xFF), but sometimes, after reloading, only zeros can be counted from this area. In addition, if you then write this page again, sometimes an ECC error may occur. The reason - again unstable bits.
Power failure during block erasure. After a reboot, again, unstable bits may appear, and the data in the block becomes corrupted.
The power is turned off after the block cleaning operation has been started. And again, after a reboot, the block contains unstable bits: either returns zeroes or damaged data when reading, when trying to write information there.

In all cases, after an emergency power outage, the memory area can be read correctly, as a result, the logging system will not see the trick. But with subsequent access to this area data may be damaged. The number of such “unstable bits” may be greater than the ECC algorithm can correct. Therefore, previously read pages become unreadable, or vice versa, a previously unreadable page may suddenly become readable. The problem is exacerbated by the fact that unstable bits may occur in the file system log, since statistically, this area of NAND is most often modified.

Saving the system

To increase the survivability of the file system, we decided to introduce redundancy into the root file system architecture (FSC). The idea is as follows: we create a “virtual” section from two physical sections on the media. One partition contains a rootfs image that is read-only, and while the operating system is running, all changes are recorded in the second partition, which is readable and writeable. Since the recording is carried out only in the second section, only one can be damaged during a power failure. The second section will remain the original. This technology is known as cascade-joint mount.

In addition, they decided to post the system software (meaning rootfs, the kernel was originally on a separate read-only section) and application software on different physical partitions. Due to the specifics of our device (works with massive databases), we have allocated a section for backup. In this place we are glad that enough memory has been put into the device (1 GiB).

Auxiliary file system aufs is used for cascade-merged partition mount. As mentioned above, two physical sections are merged. The first section, in which the image of the working CFS was originally written, is available only for reading (RO - read only), the second section, initially empty, serves to store changes, respectively, it is available for both reading and writing (RW - read write ). In terms of aufs, the first and second sections are called branches (branch). The merging of branches occurs in the process of mounting. As a result, the operating system sees the mounted area as a whole. The data is accessed by the kernel driver. The driver first sends requests for reading the file to the RW branch; if the data is present there, they are issued, if there is no data there, the file is read from the RO branch. When recording, the data fall into the RW branch. When deleting a file, a label indicating that the file has been deleted is added to the RW branch (a corresponding empty hidden file is created with a certain prefix in the name). Physically, the file remains in the RO branch. This approach allows you to avoid write operations in the section with critical information. In addition, since the RO branch is read-only, it is in principle possible to add additional control over data integrity. This can be implemented using UBIFS, making the partition created static. The static section is read-only and the data there is protected by a checksum (CRC-32).

Total, we want to get this architecture FSC:

The “rootfs_” sections contain the system part of the FSC that ensures the operability of the Linux operating system, and the “data_” sections are designed to store application software, configuration files, and databases. The “backup” section is intended to periodically back up the current system settings and databases. Backup is provided by application software.

Bake aufs

Currently, aufs is not included in the main branch of the Linux kernel, therefore, in addition to the utilities for working with technology, it is necessary to independently apply patches to the kernel sources. In order to deploy technology aufs on the target platform (target) Linux you need:

Apply patches to the kernel. All patches and how-to can be found on the project website.
In the core enable aufs.
Build the core.
Build tools to work with aufs.
Move the kernel and utilities to the target.

You can check the technology on the target by running:

mount -t aufs -o br=/tmp/rw=rw:${HOME}=ro none /tmp/aufs

Command format

 mount [-fnrsvw] [-t FS_type] [-o parameters] device directory 
 moun -t aufs -o br = / tmp / rw: $ {HOME} none / tmp / aufs

As a result, the contents of the home directory will be in / tmp / aufs, you can write there and delete files, the contents of $ {HOME} will not change.

Fine! aufs hooked up, now the most interesting thing: how to make the system boot from it? By default, when booting, we cannot specify the rootfs partition on aufs via cmdline. At the start of the kernel, there is no such section yet, it only needs to be created. This means that during the system startup, before the initialization process starts (the process with PID = 0, in my case it is systemd) we have to mount an auxiliary aufs partition, chroot it, and only after that run / sbin / init. For such tasks, there is a mechanism for preliminary initialization. In cmdline specify the path to the script, which will have to work before the start of the initialization daemon. Add parameter to cmdline:

 init=/sbin/preinit

The script is written in the shell , because at the time of execution, the system should already have all the utilities necessary for it. That is, in fact, to execute the script, the partition with rootfs must already be mounted! For these purposes, you can use rootfs on the RAM disk, or initially boot from the combat partition with rootfs, but in read-only mode, this is our choice. Edit cmdline accordingly, add a parameter (9 is the number of the mtd section, where I have rootfs_ro):

 root=ubi0:rootfs_ro ro ubi.mtd=9

Preinit script

Mount the system partitions (needed for the shell):

 mount -t proc none /proc mount -t tmpfs tmpfs /tmp mount -t sysfs sys /sys

The rootfs_ro section is already mounted, we booted from it, we are mounting rootfs_rw to a temporary folder:

 ubiattach -m 10 -d 1 > /dev/null mount -t ubifs ubi1:$rootfs_rw /tmp/aufs/rootfs_rw

If something went wrong during the mount, then we can safely format rootfs_rw, and if that didn't work, then delete the partition and create it again. We try to mount again. I will not give the code, there are too many “magic numbers” defined by the NAND architecture. Let me just say that you will need a set of utilities UBI.

Copy the current rootfs mount point to a temporary directory:

 mkdir -p /tmp/aufs/rootfs_ro mount --bind / /tmp/aufs/rootfs_ro

We stick together a layer cake - we mount the section aufs:

 mount -t aufs -o br:/tmp/aufs/rootfs_rw :/tmp/aufs/rootfs_ro=ro none /aufs

After that, the new rootfs section is available in / aufs.

Make a feint with ears: transfer the mount points rootfs_ro and rootfs_rw to a new section:

 mount --move /tmp/aufs/ rootfs_ro /aufs/aufs/ rootfs_ro mount --move /tmp/aufs/ rootfs_rw /aufs/aufs/ rootfs_rw

And at the same time move / dev:

 mount --move /dev /aufs/dev

It is clear that the directories into which the mount points are transferred must be created in advance.

We clean up after ourselves, disable system partitions:

 umount -l /proc umount -l /tmp umount -l /sys

Change CFS and run initialization:

 exec /usr/sbin/chroot /aufs /sbin/init

In a combat script, we build a “pie” for / appl on the same principle and mount / backup. The figure below shows the resulting architecture of the final CFS.

To improve reliability, the / backup partition is provided with exclusive access to strictly one utility responsible for backup and restore. The utility itself is located in the data_ro section.

Conclusion

As a result, the overall survival of the system in the event of an emergency power outage increased dramatically. Although the application of cascade mount CFS technology is shown on the example of NAND, this principle is not limited to the physical type of data carrier and is easily transferred to eMMC, SD, and more. If during operation the system does not accumulate data, but only works out a specific algorithm (for example, a regular router), then it is advisable to use a RAM disk as the RW branch when mounting the aufs partition.

And instead of PS: backup power supply has not yet been canceled.

Read on

UBIFS - UBI File-System
Official site of the project AUFS . Instructions for make and source codes there.

Source: https://habr.com/ru/post/273425/

All Articles