Quite often you can find articles on the use of low power modes. In most cases, they describe the advantages and disadvantages of a particular microcontroller, and all the recommendations are reduced to a generalized phrase - use sleep modes.
In this article, I would like to delve a little into these recommendations and tell the reader about the methods of reducing energy consumption that I personally encountered when developing one of the devices.
The background is such that it was necessary to develop an analog signal logger. Circuitry no difficulties:
- 11 ADC channels
- Bluetooth
- SD card
- OLED display 128x64
- powered by one AAA battery
')
The idea was as follows: the user turns on the device, using the buttons and the display, adjusts the parameters and starts the measurements. Next, the channels are digitized and saved to the memory card. Optionally, you can turn on Bluetooth to view measurements in real time from your smartphone, or unload previously saved data to the card. In the measurement mode, the device had to live on one AAA battery for 3 days.
For the reader not versed in the calculations, an approximate estimate:
24 hours a day, on average, an AAA battery with a nominal 1.5V value gives 600–1000 mAh, so in the worst case the device should consume 600 / (24 * 3) = 8.3 mA, at best 1000 / (24 * 3) = 13 mA. But there is a very important feature: the consumption of 1.5V. The memory card and the microcontroller work at 3V, so in terms of 3V roughly this figure should be 2 times less, i.e. 4-6mA. When you turn on Bluetooth and the display requirements were more "soft", so they are not taken into account.
There was no doubt about the choice of platform - STM32, mainly due to the availability and recommendations of other developers, and the platform has long been mastered. L0 did not fit due to the lack of stones with the necessary amount of minced meat, so the choice was made in favor of STM32L151. There were thoughts about the STM32L4, but the price at that time was higher, and there were no obvious reasons for choosing it.
At that time, I had no experience in developing low power devices, but in general, the requirements and preliminary calculations of power consumption using the + cubemx datasheet showed that everything should converge. To penetrate a little, turn on a regular microcontroller, for example STM32F103 at the maximum frequency of 72 MHz and you will see consumption of only one processor in tens of mA, without peripherals. Even the usual red LED consumes 10mA at 1.9V. Therefore, it was assumed that the device will sleep most of the time.

Circuit features the device does not contain. Used already proven power scheme. From 1.5V it was pumped into 3B, the ADC part was powered from 2.5V. When the power was reduced to 0.95V, the device was turned off.

The first problem faced was how to get data from the ADC. In fact, there were 11 channels, each of which was digitized at its own frequency, in addition, some channels were 12-bit, some 8-bit. The total data flow is about 6.5kbytes per second. There were two options: 1. pick up via DMA. 2. to start transformations on the timer and to take away in interruption.
An inexperienced reader will say that the DMA + timer is driving, you can start conversions and take the CPU to sleep. However, there is only one ADC in L151, so it cannot be launched simultaneously at different frequencies. If one channel is polled at a frequency of 1 Hz, and the second is 2 kHz, then when working through DMA, both will have to be polled at 2 kHz. The disadvantages are obvious - additional RAM consumption, additional buffer raking involving CPU, additional DMA consumption.
Obviously, if the distribution of the survey frequency was different, then everything would be different, but in my particular case, it turned out to be completely unprofitable to use DMA. Tests have shown that polling on interrupts saves battery consumption much more. I repeat, it all depends on the specific situation.
Another interesting note for using a bunch of ADC + DMA + timer. Keep in mind that the timers themselves consume very unevenly, so be sure to check out the datasheet before you start a particular timer.
Despite the variety of sleep patterns in L1, they are quite limited. For example, do you want to confuse DMA + ADC and turn off the CPU? Then the only mode you can do is Sleep, because the ADC is clocked by HSI, and HSI only works in Sleep.

The topic of further research was which interface to use for a memory card - SPI or SDIO. Unfortunately, I don’t remember the details, but when recording by sector, the consumption was approximately the same, but in the multiblock, due to the write-to-power ratio, the choice was clearly in favor of SDIO.
Also, in principle, the multiblock turned out to be more profitable in terms of power consumption, so saving data as much as possible in RAM is the right decision. There was an option to put an external operative, but many new risks appeared there, so I had to confine myself to internal one.
Another piece of work was the FAT file system. It was required that the memory card when connected to a PC was visible on any computer without additional software. Taking into account the features required only FAT32. At the same time, the data are not lost when a sudden power outage.
From the point of view of energy consumption, as already mentioned above, it was much more profitable to write with a multiblock and at the same time it was necessary to minimize the number of calls to the card. The FAT problem is a periodic reference to the FAT table, where cluster chains are updated.
The solution turned out to be simple - before the recording began, space was allocated for a large file, and later during the work, the data was written without the FAT library, using the usual low-level functions. It also avoided the problems associated with a sudden power outage.
As for the SD memory cards themselves, this is a separate story. Despite their prevalence, there is simply no 100% reliable data on how maps work on the Internet. Information had to be collected bit by bit. The problems that I had to face two: 1. Consume differently 2. Sector transfer works differently.
With problem 1, it is simply impossible to fight. A simple example. You take a card - you write down a sector, N mA is spent on it. If you stop writing, the card continues to consume these N mA, for example, 64 ms, without doing anything. You take another card; it immediately stops consuming after recording a sector.
Problem 2. Maybe someone has heard that there is such a thing as wear leveling, in short, this is a controller inside the memory card, which ensures that the sectors on the card are worn evenly. Apparently, there is no single standard on this score, and the mean mention of this came across only (if not mistaken) from toshiba. Therefore, there are reasons to believe that this controller works completely differently in different cards. And in some maps, there is none at all.
It appears so, you write many times in the same sector. For those cards that apparently do not have a controller, the sector stops reading after N records. For other cards, at certain intervals, the recording time increases dramatically once. Moreover, tests have shown that this time can reach up to 1 second. Yes, yes, there is no error in this figure. Here is an example of recording 5,000 times in the same sector, we see a periodic increase in recording time.

In practice, no difference was noted between brand and noname cards. Checked a bunch of cards from different manufacturers. It so happened that in tests the card of a certain manufacturer showed excellent results, but the same card, bought in another store or with a different amount of GB, showed absolutely disgusting indicators.
The only solution to the problem was to finish the memory card tester in the device, buy a card - insert it into the device, test it, if it passes on power consumption, then you buy a batch. Within the same batch, consumption of the cards was similar.
Total, on STM32L151 it was succeeded to get into 10 mA on 1.5B. During the development process, many additional requests appeared, so the initial idea that the device would sleep turned out to be fundamentally wrong. In general, this fit the requirement of 3 days, however, it turned out that they should include 4 mA of the customer's additional fee :). The only hope was the transfer of the project under STM32L4.
The main trump card STM32L4 was 3 ADC, instead of one. What is good? You can run each ADC transform independently, i.e. The ADC + DMA + timer, no longer gave such an overhead as in L1. Now it was possible to group the channels by the number of samples per second. This made it possible for the processor to go to sleep more often and spend a minimum of time on raking the buffer.
Compare the ADC clocking system for L1

And for L4

As mentioned earlier, the measurements in L1 are quite limited. For example, the ADC is clocked directly from HSI, if you need to measure something, then HSI should be enabled at 16 MHz and nothing else. In STM32L4 almost everything is configured independently of each other. ADC can be clocked from any generator.
The most pleasant surprise was the MSI clock generator. Yes, it is in L1, however, as mentioned above, it is impossible to start an ADC from it. In my case, the difference in consumption between HSI 16 MHz and MSI 8 MHz was just huge.
But studies of the dependence of the clock frequency on power consumption showed that reducing the clock speed to 4 MHz does not give a strong difference in consumption, but the performance drops dramatically.
Clock signals for the rest of the periphery, now also become possible to understate. There were also bonus things for L4, such as low power clocking SDIO and ADC, when clocking is included directly at the moments of transmission. The use of the CRC hardware module also paid off. Here, you can enable another feature, you can configure the peripherals so that it is automatically turned off when you enter Sleep.
The essential step, which allowed to achieve a result, was the use of compression. More precisely, in the first version of the device it was also present, the algorithm used was developed by the customer specifically for this device. However, tests have shown that LZ4 presses much better and spends significantly less CPU. On average, from 6.5kB turned out 1.5-2kB data.
An important factor was the verification of each nominal for suspenders, high-quality washing of printed circuit boards, any drop of poorly washed flux gave additional leaks. Practice has shown that the boards are best washed only with ordinary alcohol. The seemingly imperceptible defects of installation are very important, therefore I highly recommend paying great attention to this.
In conclusion, I can say that the transition and the price of L4 are absolutely justified for devices that require really low power consumption. In the end, it was possible to achieve the desired consumption of 4.5-5.5 mA at 1.5V. On tests, the device successfully lived more than 3 days from one battery.