Flash memory for downloading FreeNAS and other embedded OS

annotation

Analysis of errors and technical failures is traditionally the least disclosed and the most interesting topic, which just shows how successful the engineering idea was. Having built a NAS server from old hardware some time ago, we begin to analyze the failures that occurred with it. This article will discuss failures due to a bootable flash drive and their probable causes. Recommendations on the choice of flash memory. Partly applicable to other embedded systems, such as home DVRs.

Another do-it-yourself NAS, part 2: good memories *

* The advertising slogan of a well-known memory manufacturer sounds “Good memories start here”.

Almost all experts recommend choosing a bootable flash drive not to save or use the popular noname brand. Despite the Free-nas root file system version 9 in read-only mode, consumer USB flash drives of compact design failed twice in half a year, in both cases approximately after a couple of months of operation. Moreover, both used brands, in the opinion of some of the industry representatives surveyed, are quite adequate and respected, in the delivery of a frank marriage unnoticed.
')

In the photo: two quite adequate representatives of tiny flash drives for 8GB and 16GB

What is the matter? Is the flash memory resource endless? Let's try to figure it out.

How it was

The FreeNAS-based system worked well for an average of two months. Then they stopped receiving daily reports via email, the web-based administrator interface fell behind them. But the server did not give up so easily: SSH worked, and the network folder services for users bravely kept to the last, continuing to serve Business. Commendable resilience.
Until the access via SSH fell off, I found a problem page (python script), where I saw quite clearly the replacement of individual letters in the text , in search of the reasons for the failure of the web interface. It is difficult to describe in words, but as if at strictly equal intervals of characters the programmer's finger fell between the keys. Then the programmer seemed to move out of the coils completely, or in a panic he pulled out his own stuck finger from the keyboard, but the meaningful script broke off and turned into a random set of bytes. Is bit flip? It turned out that other users had a very similar picture . The flash drive refilling took place without a single error, hid the defect, and many hours of testing with the utility. Michael did not reveal any errors. Well, just the perfect flash drive. The most annoying thing is that I wiped out a sample of corrupted data, which I regret very much. Devilry some.

Who is guilty?

Immediately, I’ll make a reservation that I haven’t yet managed to find an engineering confirmed answer to question No. 1 of the Russian intelligentsia . But I want to dispel some myths and explain the role of the notorious human factor. At least, my problem has a solution, and I give it further.

offtopic to question number 1

“Who is to blame?” - a novel in two parts of Alexander Ivanovich Herzen.

What is a flash drive

This topic was well developed by the popularizer and author Habr Tiberius in the article Inside Look: Flash Memory and RAM . Who has not read - this is really a look from the inside, bravo! I will try to set out in a compact way from a different angle. There are less invasive ways to get inside the flash drive, see, for example, the resource http://flashboot.ru/iflash/ .
So, a modern household flash drive is a tiny computer, with its own processor (microcontroller), a small amount of RAM and ROM, data buses, I / O interface and, in fact, NAND flash memory, usually on separate chips.
The microcontroller communicates with the host via the USB bus, reads, erases, writes blocks (“sectors”), can calculate checksums of the blocks, controls the wear of ~~its~~ NAND chip ~~economy~~ (see wear leveling ) and do a lot of other things that we are mortals, and close have no idea.

offtopic microcontrollers

~~The magic world of microelectronics,~~ the microcontroller market, by the way, has made a big step in recent years towards amateurs and enthusiasts, which is only the Arduino platform or the Chelyabinsk project to them. DI HALT , God bless them. So those who are tired of cars, boats, can try new toys, with such a smell of industrial brutality ;-)

What makes the manufacturer of flash drives

In my opinion, this is a very simple question: the manufacturer assembles the “computer” described above from the components, prepares software (microcontroller firmware), puts its brand (brand) on the finished product and sells it on the market. The reliability of the flash drive depends on the quality of the components and on the firmware. Sometimes used components of its own production, sometimes - third-party (there are very large plants). Firm brooms do not knit, as they said once. But the company values its reputation very much, and therefore it checks the ~~brooms of the~~ microcircuit.
There is a widespread version that manufacturers of microcircuits (components) divide manufactured products of the same type into different quality classes, called “baskets” in the jargon. Conditionally for developed markets and conditional for all others. Or for business and consumer use. Or somehow, but it is very important for the consumer (both at home and business) that the product of the same vendor with the same code does not necessarily mean the same "insides". Therefore, buying e-bikes on eBay (listen carefully) at half price is the risk of getting goods with chips for Third World markets, despite the complete coincidence of all outdoor codes. At least, this opinion is shared by local electric power vendors who have to compete with global online commerce.

What happens to flash memory

This topic was opened by user alexzeynikov in his article A brief excursion into the history of flash memory , you can also see the translated post 2007. RAM, ROM, NAND, NOR - what do these capital letters mean ?
Household flash drives use NAND memory. If you take a very quick look at the NAND flash memory market, you can identify the following main types of chip manufacturing (in chronological order of entry to the market): SLC single-level cells (store 1 bits in each cell), multi-level MLCs (most popular, store 2 bits, using 4 levels) and TLC gaining popularity (3 bits, 8 charge levels). This information is compacted by using several conditional charge levels in one microelectronic element (cell) to squeeze in more stored bits into the same physical volume and, most importantly, with approximately the same cost of production. In order not to break spears on the decoding of MLC and TLC, I recommend the Russian-language Wikipedia article: Flash memory, SLC and MLC devices (thanks to a5b ). A 16-tier technology is on the way, so you can still knead the interfaces and cables.
Understandably, density has to be paid for by reliable storage and, therefore, by more sophisticated methods of handling errors. Well, in theory it is sometimes easier to push errors with a powerful “mathematics”, it all depends on the parameters of the system.

A piece of the evolution of household flash memory. From left to right: 2GB, 8GB, 16GB, 32GB

In terms of use, you can distinguish between memory for industrial systems (space, energy, high-tech weapons, etc.), for business (servers, professional equipment) and consumer (games, music, and other pictures on a smartphone). By the way, car DVRs and professional photo and video cameras are almost industrial uses due to the almost continuous recording in fairly wide temperature ranges (but, fortunately, mostly without radiation). So do not blindly pursue only the volume of the flash drive for your DVR, this is not a radio tape recorder, but potentially an argument in court, with all the consequences. Take a professional or industrial product.

What is the rewriting resource for flash memory

NAND flash memory is well known that it is subject to electrical wear and has a finite resource for write cycles. For high-quality SLC memory, a resource of 100 thousand write cycles is considered adequate, which is what we see in industrial products. However, you have to pay for everything, and the amount of data stored in SLC is small compared to MLC and TLC with the same dimensions and cost. This is where the most interesting begins: manufacturers are actively introducing increasingly capacious products to the market, but they somehow darken about their actual electrical durability, because it harms sales. At the same time, no one denies that the resource of the MLC is less than that of SLC, but more than that of TLC.
So how much? Someone calls the following numbers.

Evaluation of rewriting resource for different types of NAND-memory

Technology	Rewrite Resource, cycles
SLC 34nm	100,000
MLC 34nm	10,000
MLC 24nm IMFT	5,000
MLC 20nm	3,000
TLC 20nm	1,000

Those. in supercompact TLC (I expect it in a 32GB microSD, but it is better to check with Dr. X-Ray ) we can expect “only” about 1000 rewriting cycles. However, the reader should not panic right away, and this is usually enough for storing music and photos on a smartphone. The microcontroller of any modern flash drive should distribute the wear evenly, so there should ~~not be~~ any “washed-up” and “stubborn” blocks, regardless of the file system type. In theory, in order to “whittle away” a 8GB flash drive with a 1000 cycle resource, you need to write on it a total of about 8TB of information. In practice, of course, it will die earlier, but if you use good brands with high-quality chips and do not infect a smartphone with an evil virus-killer of flash drives, everything will be fine and long.
Here you have the engineering paradox: in theory, an old, large-sized 2GB flash drive (probably SLC) even with the rest (!) Of its resource in reliability can plug a completely new supercompact “16GB” crumb (probably made using MLC or TLC technology). Although I would not check this argument on the "combat" server.

(FreeNAS || NAS4free) && NAND

An attentive reader will, of course, ask: what does it all mean for FreeNAS if its root file system is mounted read-only? Not in the eyebrow, but in the eye.
The FreeNAS loader requires 2GB, of which approximately 1GB is occupied by the root system, which is actually read-only mounted. In addition to it, a small / order (read-write) section is created on the same flash drive for storing settings and useful system statistics collected by the collectd (so as not to “forget”, say, the RAM consumption history for a month when rebooting). Another 1GB is not used.
By the way, a relative of FreeNAS NAS4free works a little differently. It creates a single root partition, where it keeps both the system and the settings (while offering to create a swap on the flash drive, but unobtrusively). NAS4free's system statistics are rather rudimentary and do not survive when reboots (and there is nothing much to survive there, but for many it is not a critical condition). More importantly, the NAS4free settings (in the form of XML) are stored in the read-only section, and when they are saved, the entire root file system must be re -installed from read-only to read-write and then back to read-only. Quite awkward, but it works.

Subtotal

Given the final flash memory resource, both the FreeNAS and NAS4free projects are a good choice, due to the read-only file system.
2GB flash drives for FreeNAS will definitely suffice with a margin, and no other sections on the flash drive can be created by design (you can at NAS4free).
Due to the preservation of statistics, FreeNAS regularly writes to the flash drive, although in small portions (about 1 MB once every hour, or about 8 GB per year, but this is a very approximate estimate).
NAS4free does not keep any regular writing to the flash drive, but due to the castrated system statistics and the combination of the OS with the settings on the same root partition (with all the consequences: starting in read-only - in read-write - saving settings - and again in read-only , up to the need to reboot).
According to external signs, it is unlikely that our refusals occurred due to the wear of the flash drive recording, although this is the first thing that usually comes to mind. ~~A programmer’s stuck fingers: a~~ broken web script located on the read-only section is somehow not very well linked to the result of writing.
One thing can be said quite definitely: all other things being equal, the smaller the size of a flash drive, the less its resource and reliability.

Infinite Flash Reader Myth

It is considered that the number of read cycles for flash drives is infinite, however, in the case of NAND memory, this is not quite true, if only because of the read disturb effect described by Jim Cooke in the report The Inconvenient Truths of NAND Flash Memory ( direct link ; shift, but the search engine should give the title of the report; see slides 19-20). However, this effect is electrically reversible and should be removed completely transparently by the built-in microcontroller, using error correction (see below) and block transfer. I was alerted by the phrase:

Disturbed bits are effectively managed with ECC

This means that bit flip in NAND is expected and can be corrected "on the fly" with correction codes, but it is too early to panic, because the same thing has been happening for a long time in spindle disks, communication devices and more.
Interestingly, in accordance with the same report, the NAND SLC memory has about 1 million read cycles, and the MLC has 100 thousand cycles. The microcontroller must take this into account and copy the risk block to the new location in advance , removing the disturbance effect and freeing the old block. In this case, error control should monitor the integrity of information, and if the block is damaged beyond the capabilities of the correction scheme used, the flash drive should produce a read error.
For a number of reasons in this article, I deliberately avoid explicitly describing “full-fledged” SSDs, but I assume that something similar happens in them, at different speeds, with coiled logic and stuffed peripherals. And, since we have touched upon SSD, let me remind you about the notorious 25% of free space ( Things you don’t need to do with a solid-state drive (SSD) , or Exploring the Relationship in Modern SSDs ).
However, personally, I can only suggest one explanation ~~of the~~ bit flip ~~programmer’s stuck fingers syndrome~~ described above: could it be the read disturb effect that broke through the parity check due to a bug in the microcontroller firmware or excessive simplification of logic? This is the most provocative question of this article.

By the way:

Who cares to know the SSD device

Colleagues, go on the subject of research and give it to Tiberius '; perhaps he will postpone his case, split the subject into atoms and write another awesome article. Only for the disc it will be exactly a one way ticket :)

Who controls errors and how

Warning: ECC is sometimes referred to as Elliptic Curve Cryptography , but in this article it is the Error Correction Code .
Data integrity is the concern of the microcontroller, it uses special encoding algorithms for this. As is known, the bit depth (length) of the checksum affects the maximum number of detected (and sometimes corrected) erroneous bits. Recall RS232: one extra parity bit can detect if there was one error bit in the block. But two erroneous bits will go unnoticed, they need more control bits and a smarter algorithm. And so on: the smarter the algorithm and the more “spare” bits are put into the message, the better the system's ability to eliminate errors without sending (copying) the entire message. The noise-protected coding is, as they say, our entire information world.
Refer to the document TN-29-17: NAND Flash Design and Use Considerations ( link ) of one of the manufacturers of NAND chips, we find a recommendation to the developers of flash drives (ie, "assemblers", which these chips then use as components):

Use More Powerful ECC :
It is correct that the correctable limit is true. The device will begin reading from the new location.

Those. As usual, there are minimum and recommended requirements of the chip manufacturer for the complexity (high cost) of the microcontroller, the choice between which is made by the developer based on, of course, the mode of using the memory. For example, for industrial tasks, you need to do expensive logic with a long ECC code, and for everyday tasks you can get by with simpler code and cheaper chips.
Finding another document at random, this is the Texas Instruments Raw NAND ECC wiki page, in which for the MLC it is recommended to use 4, 8 or 16-bit checksums for each 512-byte block:

Why is ECC required for NANDs?
Data stored in NANDs can get corrupted (randomly). The process of the process is subject to an error. SLC NANDs have less ECC requirements than MLC NANDs. The NAND datasheet gives the ECC requirement for the NAND device. For SLC NANDs, 1 / 4bits per 512 bytes are common currently. For MLC, devices with 4/8/16 bits per 512 bytes ECC requirements are in the market.

There are also links to popular algorithms: single-bit errors are “treated” by Hamming codes, it is common to deal with multibit errors using Bose-Choudhuri-Okengem codes (BCH), and somewhere in the middle are the Reed-Solomon codes that are popular in the data storage industry special case BCH). Here is another randomly found document on this topic: What should be used for flash memory? ( link ).
But let's not soar for too long in the clouds of abstract algebra, it’s time for us on the solid land of engineering. If the block had too many hard-wired bits, if the memory manufacturer saved and used a cheaper microcontroller, a simpler error correction algorithm or less skilled developers, then the chances of silent data corruption (that is, without obvious reading failures) in theory are increasing. Do not forget that it is not the gods who ~~burn pots and~~ write the firmware code.
I used a household flash drive to place an embedded type system that can read certain blocks very intensively (especially when there is a shortage of RAM, like mine). However, data loss due to the read disturb effect is too serious a charge that requires more careful investigation. In the meantime, I can derive another criterion for evaluating the reliability of a flash drive: other things being equal, the longer the ECC, the better.

What to do?

The answer to question number 2 of the Russian intelligentsia , oddly enough, it turned out to be easier to find.

offtopic to question number 2

“What to do?” Is a novel by a Russian philosopher, journalist and literary critic Nikolai Chernyshevsky, written in solitary confinement at the Peter and Paul Fortress and subsequently prohibited by censorship.

The question is, why did two identical failures occur? After the first failure, there was a strong desire to immediately switch to industrial flash memory (you need only 2GB, half of which is used), but finding it with the usual USB connector turned out to be difficult: either the pin ones were on sale (for example, Transcend TS2GUFM-V ), either Compact Flash or Disk-on-Module with IDE interface in general. And since my quest for industrial flash memory went in the wrong direction, I bought a consumer USB flash drive for the third time, but no longer a “crumb”, but a “standard” size with the Kingston brand.
Pondering other options, just in case, I even decided to prepare for the transition to an industrial Compact Flash, according to the canons of the Crepsoondo practice, having loaded the system in a test mode from a card reader (reader, be careful, it can fail on its own). And, by the way, speaking of SSDs, for a simple bootloader it is relatively expensive and also, oddly enough, not a panacea .

Above: crumb on 8GB; below: Kingston at 8GB

The full-size Kingston flash drive on 8GB as a result worked for three months without complaints, and one seller in the store said that already, they say, the year of the bottle opens the same, and nothing. But for a more sound sleep, I still chose another option, which I will tell you about right now.

Meet: industrial memory

The representative of industrial memory TS2GUFM-V

As a result, it turned out ( reference ) that the pin interface "mother" of the very industrial product TS2GUFM-V is a two-row 10-pin connector with a 2.54mm (1/10 ") pitch, including for fitting into the front "USB ports on a completely domestic motherboard (4 out of 10 contacts are used). Hooray, my quest is over.

Connectors "front" USB-cable and flash drives are depicted with positional matching

So, the TS2GUFM-V is a 2GB industrial flash memory module with a vertical case (letter V), even equipped with snaps from falling out of the connector during shocks and vibrations. So if the reader suddenly needs to build the OS into a homing hammer with CNC , then this is a good option. There is, however, a horizontal version of the TS2GUFM-H, but it is even more brutal (unpackaged and fixed with three bolts), it is less common for sale, and it is even more difficult to put it on a regular motherboard. Of course, nothing is impossible, it all depends on the desire, ingenuity and hull design.

Product Specifications TS512M ~ 4GUFM-V

Parameter	Value
Technology	SLC
Volume	from 512MB to 4GB
Record resource	100,000 cycles
Read speed	up to 33 MB / s
Write speed	up to 20Mb / s
ECC digits	eight
Year of launch	2006
Price	about € 25

The product, as you can see, is not new at all, but for industrial options and 10 years is sometimes not age, and the price can fall from military to almost everyday values over time (let me remind you that I spent two ordinary flash drives for the same money as in famous proverb). For comparison: the budget household flash drives have a write speed of only about 5 Mb / s, and TS2GUFM with its 20 Mb / s is an excellent solution according to the canons of Crepsondo philosophy. Only Compact Flash for professional cameras is cooler: when they “shoot” with RAW-format bursts, it will not seem like a little. TS2GUFM-V, however, closes two USB connectors at once, using one, but this can be fixed with adapters if desired.
To ~~get the flight task of the~~ fill in the image of the loader, you need to connect this hard warrior to a sysadmin laptop, the USB connectors of which do not differ in industrial severity. The reader can use any convenient option (try searching for pictures on the line “USB 10pin adapter”, learn a lot of new things).But, by a strange coincidence, in my old Sisadminian trunk there was a crimper pincer (apparently, I had done something like this before, I just don’t remember what). With a crimper, I compressed a pin low-voltage adapter from a disabled cable that was damaged by an office wheel hitch.

By the way

- , , - :)

Clearing and crimping the crimping device

Protecting with heat shrinking

Take the 2.54mm male connector (although you can also with a paper clip)

Let's

check how the

USB flash drive sits on the cable ready for casting The image of FreeNAS is installed in the usual way, install our industrial flash drive into the slot for the front USB ports on the motherboard. Do not confuse the pins, there is no “anti-fool” on the product. Contact number 9 should get to the place of the "sawed" pin.

"Pinout" connector

ATTENTION:the dimensions of our harsh TS2GUFM-V product can make it difficult to fit into the connector due to electronic components, wires and other connectors sticking out here and there even in very spacious “tower” type buildings. For example, squeezing the TS2GUFM-V into the USB4_5 connector on the ASRock P4i65G motherboard between the onboard audio and the LAN without destroying it turned out to be impossible, so send it to the spare USB67 connector. But even there they had to dodge the capacitor banks and the plug of the case tweeters, sticking close to the cherished pins.
Therefore, the reader, especially when using compact cases (for example, the well-known Harlampy-Pankrat MicroServer brand), should carefully study the fact of the presence of 10-pin “fit” connectors, as well as the surrounding environment. In which case, take action in the form of adapters (google to pictures "USB adapter 10pin"). Or choose another flash drive.

USB flash drive sat in the slot and earned

findings

The electrophysical processes occurring inside solid-state drives are far from being as unambiguous and simple as it seems from the outside (thanks, Captain Obvious).
The reliability of flash memory depends both on the NAND memory technology (SLC, MLC, TLC, etc.) and on the complexity of the microcontroller, and during production it can be saved both on the first and on the second.
(ECC, ): , .
, , - .
-, MLC (TLC) .
.
FreeNAS «» , — SLC 2.

In the following parts, we will further expand the topic of failures, touch upon corporeal engineering, system tuning, and also show some of the tricks of Unix Kung Fu for beginners.

Other parts of the story about another do-it-yourself NAS :
part 1: from what was
part 2: good memories (Flash memory for downloading FreeNAS and other embedded OS)
part 3: adventures in the old tower
part 4: the ghost of Chernobyl

Links

www.wikipedia.org/wiki/Flash_memory#NAND_flash
www.wikipedia.org/wiki/Wear_leveling
www.wikipedia.org/wiki/Single-level_cell
www.wikipedia.org/wiki/Multi-level_cell
www.wikipedia.org/wiki/Triple-level_cell
ru.wikipedia.org/wiki/%D0%A4%D0%BB%D0%B5%D1%88-%D0%BF%D0%B0%D0%BC%D1%8F%D1%82%D1%8C#SLC-_.D0.B8_MLC-.D0.BF.D1.80.D0.B8.D0.B1.D0.BE.D1.80.D1.8B
www.wikipedia.org/wiki/Error_detection_and_correction
www.wikipedia.org/wiki/BCH_code
www.pcper.com/reviews/Editorial/Taking-Accurate-Look-SSD-Write-Endurance
collectd.org
www.transcendusa.com/support/dlcenter/EDM/UFM-EDM.pdf
www.micron.com/-/media/Documents/Products/Presentation/flash_mem_summit_jcooke_inconvenient_truths_nand.pdf
pt.slideshare.net/Flashdomain/tn2917-nand-flash-design-and-use-considerations
processors.wiki.ti.com/index.php/Raw_NAND_ECC
www.spansion.com/Support/Application%20Notes/Types_of_ECC_Used_on_Flash_AN.pdf
forums.freenas.org/threads/data-corruption-on-usb-flash-drive.15505/#post-80954
forums.freenas.org/threads/intel-passed-power-loss-protected-ssd-tests.17168
mikelab.kiev.ua/index_en.php?page=PROGRAMS/chkflsh_en
lifehacker.ru/2013/06/26/veshhi-kotorye-ne-nuzhno-delat-s-ssd
www.anandtech.com/show/6489/playing-with-op
flashboot.ru/iflash

Source: https://habr.com/ru/post/214803/

All Articles