⬆️ ⬇️

Slow work of SD cards - who is to blame and what to do?

Long thought to write an article on Habr, but somehow he did not dare. Although it seems that there are thoughts that would be of interest to the community, but it stops the assumption that this “seems” stems from excessive self-esteem. Nevertheless, I will try. Since I am professionally engaged in electronics, in particular, microcontroller programming, for quite a long time (as I suspect, longer than most of Habr's readers live), a fair amount of interesting cases have accumulated during this time. I submit to the community a story about one of them.



So, in one development, I needed to save significant amounts of information in order to be transmitted through the network to the processing center. Since the resulting device assumed mass production, the option was chosen using relatively inexpensive components, and in particular, a microcontroller as a central element of the system. Since at that time (mid-2012) the offer of microcontrollers with an Ethernet PHY on board did not differ in diversity (and even now the situation is not much better), the TI Stellaris family of ICs was chosen, specifically the LM3S8962, especially since I already have a debugging motherboard there was. The MK at that time is relatively new, actively promoted by TI (this is at the end of 2013 it has EXTREMELY transferred the entire series to the category of NRND), and it has parameters sufficient for solving this problem. For storage, the option with an SD card was chosen, primarily because of their availability and low cost, and also because the debug board had a contact device for them, and there were numerous examples on the CD supplied with the debug board, including SD cards. The simplest interface to the card was implemented - SPI, the proposed examples immediately earned, the decision allowed to process the data before writing the interface using elementary transfer of the card from the device to the PC card reader, so that initial debugging of the interaction algorithms with the control object did not cause problems. at least in this part of the project. As everyone understands, problems arose somewhat later ...





When the althorhythms were debugged and the device as a whole started working, test runs began. And here it turns out that the SD card is not able to record information at the rate at which the control object delivers it, and the speed difference is several times, and taking into account the size of the storage unit (2.7 megabytes), it will not be possible to create an intermediate buffer at an acceptable price. Turning to specific numbers, a file size of 2.7 megabytes was required to be written to the SD card in no more than 1.6 seconds, and the data was actually recorded in 30 seconds, with class 10 cards purchased, that is, the writing speed was 10 MB / s. The fight for speed went in several stages and the opponents were the microcontroller, the standard library (branded by TI, by the way), then, in fact, the SD card.

')

The first stage is examining the recording timings and immediately find out that the recording of different sections of information takes place at different times, and the recording time of the same blocks of information is significantly (at times) different. By experimenting with different sizes of recording blocks, I establish a simple pattern — the larger the information recording blocks, the shorter the recording time, related to its size. Since the library modules support FAT and record information sector-by-sector, I don’t see any sense in rewriting them, reformatting the card into a sector size of 32 KB and getting a recording time of 14 seconds - 1 point SD.



The second stage is to check the operation of the SPI interface and find that it operates at a frequency of 12.5 MHz, although the description allows you to set the transmission frequency to 25 MHz (half of the processor clock frequency of 50 MHz). It turns out that the routine for setting the SPI module frequency from the library limits the maximum possible frequency to 12.5 MHz, and there is no such limitation in the documentation for the microcontroller interface module.
i = ROM_SysCtlClockGet() / 2; if(i > 12500000) { i = 12500000; } 


We change the code and get a decrease in recording time by 2 times to 7 seconds - 1 point TI.



The third stage is exploring the exchange modules with an SD card and discovering a very unproductive spending of time in low-level procedures, namely: the SPI module in the microcontroller incorporates FIFO buffer of 8 bytes, which allows speeding up work with it. The output module, before transmitting the next byte, checks the “transfer buffer is not full” flag to wait for the next byte to be sent, and everything seems to be normal. But after the transfer of a byte, a byte reception module is called (the fact is that when transmitting, a reception is simultaneously made in the SPI interface), which must select these unnecessary received bytes from the receive buffer. And this procedure polls the “receive buffer is not empty” flag, that is, it waits for the end of serialization of the last byte of the buffer. That is, it waits until the current byte is completely transmitted and only then prepares the next one for transmission.

 void xmit_spi(BYTE dat) { uint32_t ui32RcvDat; SSIDataPut(SDC_SSI_BASE, dat); /* Write */ SSIDataGet(SDC_SSI_BASE, &ui32RcvDat); /* flush data */ } 


I correct the detected error (how else to call it?) And get the file transfer time 3 seconds - 1 point TI.

And that's what happened as a result of optimization, not taking into account the peculiarities of the problem.

 static void xmit_spi_my (BYTE const *dst, int length) { int i, *p, *d; d=(int*)(SDC_SSI_BASE+SSI_O_DR); p=(int*)(SDC_SSI_BASE+SSI_O_SR); do { while (!(*p & SSI_SR_TNF)) {} *d=*dst++; } while (--length); while (*p & SSI_SR_RNE) i=*d; } 


The fourth stage is examining higher-level modules and finding out that since data transfer to the interface is provided only from memory, I have to do double work — first read the data stream from the control object and send it to the microcontroller’s working memory (which is, by the way, 32 kilobytes buffer), and then from memory to the SPI interface registers. I write my own module for transferring data directly from the register to the register, and I get a recording time of 1.6 seconds. At the same time, I mask my call to my module inside a standard call, so that the file system understands that 32 kilobytes have been transmitted - 1 point TI.



Fifth stage. The goal has already been achieved, but the optimization process continues by inertia. I investigate again the signals on the interface and find out that in fact it is not a continuous sequence of clock pulses that is transmitted, but 8 data bits plus a pause of 2 clock cycles. Well, well, the ninth bit is needed to transmit the synchronization signal (not to be confused with the clock signal), and I don’t need it at all, but then the tenth one? Experiments with different SPI modes led to receiving a transmitted signal in real 8 bits without gaps and, accordingly, to a recording time of 1.3 seconds - 1 point of Stellaris.



The sixth stage. It seems to be all right, but quite unexpectedly another problem arises - when streaming multiple files, the first 3 fit into the required interval and even with a small margin, but the fourth file shows a much longer recording time - up to 1.8-2.0 seconds and, accordingly, breaks sequence. I try the obvious solution, assuming that the matter is in the transitions through the pages of the FLASH memory, and exclude these places from the processing. Now those files that used to be recorded well begin to be recorded for a long time. Numerous experiments lead to the conclusion that the behavior of FLASH is somehow related to its peculiarities of internal organization. I believe that the internal high voltage generator for recording (its existence is undoubtedly) is not capable of maintaining the required voltage level during long operations and requires a certain amount of time for charge recovery. In this case, the overall average speed is maintained, but what I need is not the average speed, but the instantaneous write speed of each file. The introduction of a data buffer for load balancing could help out here, but another solution was found - SD cards from different companies were acquired, and among them were those that gave a constant recording time of 1.4 seconds without significant scatter. I will not mention the specific names of the manufacturers of the cards so that they do not find an advertising article - 1 point SD.



The result is that the problem has been solved, the devices have been shipped to the consumer and are functioning without failures, the total bill for the number of detected and corrected problems: SD cards - 2, library from TI - 3, microcontroller features -1. And from the above, we can draw the following conclusions:

1. Particular attention should be paid to the existing libraries of standard programs with examples of use. They, as a rule, function and even sometimes without errors, but are in no way optimized for performance. So we look at the source codes (there is a benefit) and we creatively modify them. Moreover, I had the opinion that such freely distributed libraries were deliberately made non-optimal in order to encourage the acquisition of their paid counterparts.

2. We are wary of specifications regarding the performance of various devices, that is, we carefully read the specifications, in what modes and what numbers are achieved, and not just look at 1-2 figures of the parameters and decide what they suit us.

3. Carefully read the documentation on the modules of microcontrollers, trying to understand their internal structure, do not forget about the oscilloscope to study the real processes on a real board.



And at the end of the article, one small note - I decided to see how things are going in the implementation of similar procedures in the new support package for microcontrollers such as TIVA-C (TivaWare_C_Series-2.0.1.11577). Well, what can I say - the traditions are not broken. Absolutely all the same rakes are all in the same places, with one more added - now functions are not called directly from the FLASH memory, but from the so-called ROM library using double indexing, which does not add speed. As Mikhail Zhvanetsky said, "Either we will live well, or my works will always be relevant." So far, the second is true.

Source: https://habr.com/ru/post/220433/



All Articles