📜 ⬆️ ⬇️

Masked bugs in embedda

Closures are inevitable when developing any software. In the Embedda, their generous five kopecks may even throw up hardware problems, but then a separate song. But a purely software ambush, when you get stuck on, like, an empty place ... For me, there are three types.

The easiest way is when the manual, standard, or, say, the order of configuring a library for iron is not fully understood. Here it is clear: not all the moves have been exhausted, patience and work, even five or two more experiments, and it will come to life. Oscilloscope and scientific help.


Selecting a frequency divider to configure the CAN bus
')
Worse, when the problem is in a typo or an error in logic that you cannot see at close range, until you pass through this place twenty times with your eyes and in step-by-step debugging. Then it dawns, a ringing blow on the forehead, the cry “Well, you’re a bad lady!”, Edit. Works.

And a gloomy third kind: a glitch, entrenched in a strange library and getting out at the junction with the iron. Shakespearean passions creates a steady light of the monitor. “Why, the system cannot, cannot behave like this, because it can never! Well, really! Ah ?! ”Nah. Get it, sign it.

As a result, reality turns out to be wider, wider than expected. A couple of examples:

Story number 1. MicroSD-flash and work on DMA


Anamnesis


It is necessary to reset the data to a file on the SD card. Of course, there is neither the time nor the desire to write the file system and the SDIO driver on your own, so I’m taking a complete library. I configure it under the iron, and everything works fine. At first. And then it turns out that the data is recorded wildly: the volumes are exact, but in the files themselves, individual pairs or triples of bytes are then duplicated, then disappear, without any regularity. Not good!

Experiments begin. I am writing test data - everything is ok. I write combat - some kind of devilry. I am changing the size of the data buffers, the frequency of their flushing, the data templates are useless. In the buffers themselves, everything is always great, the data in the memory is all that is needed. And, nevertheless, glitches on the flash drive - here they are.

On the excavation of the dog went somewhere a couple of days.

Diagnosis


The problem was the interaction of the library with the DMA hardware.

SD-cards have a feature: they are written only in blocks of 512 bytes. To do this, the library buffers data into a 512-byte array, and when it is filled out, resets it from there via DMA to the flash. But!

If I transfer a fragment larger than <512xN + an empty space in the library buffer> bytes to the record, the library (obviously, so as not to chase the memory back and forth), does this: refills its buffer, writes it to flash , and the next 512xN bytes are thrown into the DMA straight from my buffer! Well, if something remains unfinished, it copies it again into its own, until the next time.

And all would be fine, but the DMA controller requires that data be placed in memory with alignment on a 4-byte border. The library buffer is always so aligned, the language guarantees it. But from what address, after copying a part of the data, start those remaining 512xN with a small byte from me - God knows. And the library doesn’t check this in any way: the address, as is, is transferred to the DMA controller.

“A clumsy something was sent ... A dog with him.” The controller silently resets the low 2 bits of the transmitted address. And starts the transfer.


The address, which was not originally a multiple of 4, is replaced by a multiple address — voila, up to the last three bytes from the library buffer are re-written to a file from mine, and as many bytes from my buffer are lost without a trace. As a result, the total amount of data is correct, operations go smoothly, but on the disk is nonsense.

Treatment


I had to add one more buffer just before calling the hardware write function. If the address passed to the record is not a multiple of 4, the data is first copied to it. At the same time, the average speed has increased due to the reasonable choice of buffer size. Of course, this was the memory, but what is 4 kilobytes for a good thing, when you have - immense 192!

Story number 2. Rantaym and a bunch


Prologue


After the next change, the program began to fall, and somehow fell very hard, throwing the processor into the Hard Fault handler. And I threw it there right after the start, even before the execution reached main (), that is, not a single line of my code had time to execute.

The first impression is “Bobby is dead, the chip is for replacement”. And then the programmer gave oak. But no, the old firmware version works stably, but the new one stably falls in some obscure assembly depths between the launch and my code. I had no assumptions what kind of heresy it was.

Chapter 1


Climbed on the Internet to look at how to get at least some additional information. The procedure for analyzing the consequences of a hardfolded case: the state of the registers, the stack dump. Dopilil Took advantage.

It turned out that it falls due to an operation error on the bus. I decided that this again unaligned access is a problem of the same type as in the first story, but from a different perspective. But the worst thing is where the error occurred. And it arose inside the runtime library, that is, in the code, which, in theory, was licked as the cat's reasons on a sunny day.

Continuing the analysis showed that the glitch is a consequence of an attempt to initialize local static variables.

Lyrical digression
By the way, considering the disassembled code, I simultaneously learned the answer to a question that I sometimes asked myself, but I was too lazy to google it right away: how is the situation when such a variable can be tried to simultaneously initialize 2 or more threads. It turned out that in this case, the compiler arranges the initialization with semaphores, ensuring that only one stream at a time passes the whole procedure, and the rest wait until the first one ends. This behavior is standardized starting with C ++ 11. Did you know? Me not.

Chapter 2


Once runtime is engaged in the construction of variables, it also calls the destructors when the program ends (even if the program never actually completes the work, which is the absolute norm for microcontrollers). To do this, it needs somewhere to store information about all the variables that he managed to initialize.

That's it in the place where such information is stored in some internal list, runtime and fell. Because the malloc () function, through which the memory was allocated for the elements of this list and which, according to the standard, gives out blocks, guaranteed at least 8 bytes aligned, after the number of successful calls, gave out a piece that was not aligned along this border.



Changes in the new firmware code broke malloc ?! But how is this even possible? I didn’t exactly redefine malloc, I don’t need it myself anywhere!

It was useful for the compiler options, to look for some keywords, references, but everywhere it was clearly stated: malloc () guarantees the issuance of memory aligned with the fundamental boundary. Or null pointer if there is not enough memory .

Chapter 3


For a long time, I meaninglessly stuck in the code, put breakpoints, suffered and did not understand anything, until at some point I didn’t bother and I looked at the addresses returned by malloc more carefully. Prior to that, the entire analysis was to see if the last digit of the address 0x4 is multiple. And now I began to compare entirely the addresses issued by successive calls to malloc.

And, oh, a miracle!

All successful calls produced addresses from the memory space (0x20000000 and older for this stone), increasing sequentially from call to call. And the first one failed - returned 0x00000036. That is, the address is not enough that it was not aligned, it was also not at all in the address space of the RAM! The processor tried to write something there and naturally fell.

And, surprisingly, even if malloc () acted according to the standard and returned 0 if there was not enough space, this would not change anything in the sense of a program crash (unless the cause of the bug would have been clarified earlier). The value returned by malloc is still not checked in any way, but immediately goes into the matter. It in.

Epilogue


Increased the size of the heap in the configuration file, and everything was fixed.

But before that I had not even thought about its volume. Whether the hell surrendered to me, I thought. All the same, I have all the variables and objects either static or on the stack. So, just by inertia I left 0x300 bytes for it, since some amount for a heap is allocated in all sample projects. But no, for C ++ runtime, you still need dynamically allocated memory, and in sufficiently noticeable quantities, by the standards of controllers.

Live and learn.

Source: https://habr.com/ru/post/453944/


All Articles