Is there anything more permanent in the world than temporary variables?
Looking through the thematic forum, I saw the traditional “The sketch doesn’t work, tell me what it is” - there are fewer such posts than all, but the title featured working with an SD card, so I decided to take a look. I was most pleased with the phrase that “most likely, it’s a matter of a card, but I decided to ask to see the code, maybe I’ll see something interesting,” I cannot vouch for the accuracy of the quotation, but I conveyed the meaning correctly. Indeed, there was a lot of interesting things there, my eyes caught on the expression
unsigned long Interval = 2000; .... while (micros() < StartInterval + Interval) {};
moreover, this variable was used as a constant in triggering the delay. Let's leave aside the way of dealing with time, now this is not about that.
At first, I, acting on the machine, wrote that #define should be used, “yes I became thoughtful, and I kept the cheese in my mouth”. Maybe I do not know something, and the use of constants in this format can be beneficial under certain conditions? Thoughtful reading of manuals for different microcontrollers (MK) led to interesting findings, which (together with the answer to my question) I am going to share under the cut.
To begin with, we fix a number of provisions regarding the architecture of modern MCs, on which we will rely in further discussion.
')
The first is that there are at least two types of memory in an MC: program memory and data memory and they are accessed in different ways. As a rule, we have two physically separated buses, even if the address space is aligned and linear, we often have different instructions for accessing different types of memory and the address space may overlap.
Secondly, these two types of memory can have different word widths (I call this term the number of bits in a word) and it is absolutely not necessary to be equal to the width of MK registers (which in turn can be changed by combining registers into groups), which gives special piquancy to further discussion of the subject.
We illustrate the above with specific examples of architectures on which we will continue to draw.
Number zero is most likely a champion in the beauty of architecture, but not in efficiency, unfortunately (my first love in technology), PDP 11 from DEC. Unfortunately, she is no longer with us, but the memory of her will always remain in our hearts (not only memory, many modern architectures were clearly inspired by her ideas, for example MSP430 from TI). The width of the registers (P) is 16 bits, the words of the programs (P) are 16 bits, the data words (D) are 16 bits (access to the high and low byte of the word is possible separately), the address bus (A) is 16 bits (20 in the extended mode), the address space is linear and combined, all commands are homogeneous.
The first architecture is 8051 from Intel, an undoubted classic (and still not dead), so this is a great candidate. The width of the registers is 8 bits, the words of the programs are 8 bits, the data words are 8 bits, the address bus is 16 bits, the address space is overlapped and has 3 sections (programs, data, extended data), there are three different commands for reading from different sections, registers can form register pairs.
The second architecture - AVR (Tiny, Classic, Mega) from Atmel - is widespread (Arduino), is quite good, so I see no reason not to look at it, especially since the program was written for it. The width of the registers is 8 bits, the words of the programs are 16, the data words are 8, the address bus is 16, the address space is overlapped and has two sections (programs and data), there are two different commands for reading from different sections, the registers can form register pairs.
The third architecture is the undoubted leader at the present time, Cortex-M from ARM, so I do not see the possibility of not considering it. I will not specify a specific implementation, thousands of them. The width of the registers is 32 bits, the words of the program are 32 bits (16 in Thumb mode), the data words are 32 bits, the address bus is 32 bits, the address space is linear and combined, all commands are homogeneous.
Undoubtedly, there are many more worthy architectures to consider, but some of them are of historical interest only (i8048, PDP8), others are not as common (Sparc, MIPS, PDP11 / 78), others I know very superficially (PIC, Scenix), and the fourth I frankly do not like (HC08, x86). However, interesting solutions from these MCs will also be mentioned, but the focus will be on the four previously listed.
So, let's formulate the task exactly - we need to put in the processor register some pre-set number, and its value is statically determined at the time of compilation (constant). How we can do this and what advantages and disadvantages of each of the possible methods will be considered in more detail.
The first and obvious way - direct loading - the constant is part of the command itself, that is, the command word itself consists of the operation code and the constant itself in its pure form. This method is free of any flaws that are somehow peculiar to other methods, but ... it should be practically applicable, and this means that the width of the program word must be greater than the width of the register (P> P), otherwise we simply don’t realize such commands we can And not just more, but much more so that you can implement more than one load command (or at least load in more than one register). This method is perfectly used by the MK type 2, but for the other ones it is simply impossible, since their main condition is not satisfied.
To change the ratio of the width of the register and the words of the programs, and to ensure the condition P <P, we have two ways (and both are obvious) - either to reduce the first value, or to increase the second.
To begin, let's go the first way and offer an indirect load — when in a command instead of a constant “pure” in width P lies less space, information about its formation P '<P - all architectures use this method in one form or another. After all, the cleanup command (CLR) is the load command of special value 0, and the constant itself is implicitly present in the command code. In some architectures, there is also a command for setting all the bits of the register (SET), and the increment command for units allows you to perform an operation with a constant of 1 and is almost everywhere. Further, various modifications of the method are possible, associated with the features of a particular MC and the level of imagination of its developers.
For example, in STM8 (by the way, a great MK, especially considering its price, and I don’t understand why it didn’t become the de facto standard at one time) a special register of constants was entered and, in combination with different addressing methods, this allows you to specify 5 the constants encountered (-1,0,1,2,4).
Another approach was shown by ARM, which encodes in the field constants two two-bit fields XX and YY, and also encodes the way they are used, which allows you to form a set of constants, ranging from the simplest (0,1,2, -1,2) and ending minke whales (010101 ...). It should be understood that the number of possible constants does not exceed 2 ^ (the length of the encoding field). The method of indirect loading retains all the advantages of direct loading, such as the uniformity of the flow of execution, the uniformity of loading of the conveyor and also has almost no drawbacks, except for limitations.
We continue to move in the first way and reduce the width of the register R. At first glance, an absurd idea, because we cannot do this - it turns out, we can, but for this we need to present the register as a set of smaller width segments and operate alternately with them separately.
A striking example of this approach is the MIPS architecture, in which there are loading commands for the lower and upper half of the register. Due to what we can win, because to load the entire register, we need to execute as many as two teams? And the gain is achieved due to the fact that many typical constants can be obtained by loading only the younger half of the word and expanding its most significant bit into the upper half, that is, any number from -2 ^ (P / 2) to 2 ^ (P / 2) - 1 can be obtained for one team. And if we also add a two-bit field and encode one of the 4 possible operations for the higher part (clear, set, expand the sign, repeat the younger one), then the number of constants formed will increase even more. But the main thing here is to stop in time because any such extension reduces the number of other teams.
Now we will move along the second path and begin to increase the width of P. memory. Since we don’t want to significantly change the parameters of the architecture, we are left with the only way - to form a new community from more than one command and interpret it as an extended-length command '> . Let us call this method direct loading, as the analogous addressing method in architecture 0 was called. That is, the command remains standard length and there is no information in its body about the constant itself, but the program word following this command is treated as a constant itself. Note that for the applicability of this method, the condition P> = P is necessary and this imposes limitations on the architecture of the MC, but is significantly less weak than for direct loading. In addition, no one prevents us from expanding this approach and, if necessary, using more than one word of programs to represent a constant.
Since this method is used a little less than always, consider it in more detail. A concrete implementation is possible both in the form of a normal forwarding command with a special addressing method (027 in my favorite architecture), and in the form of special loading commands (LDI). The advantage is obvious - we can form absolutely any constant of arbitrary length, and the implementation of this approach in hardware is very simple. The drawbacks are not so obvious, but they are there - the command execution time increases (at least for the second and subsequent words in the program), the commands are not regular (which makes deassembly difficult), and the loading pipeline is not uniform (if it is). And the last one (I’m not sure that it is the last one, but I can’t think of more), but not the last drawback - memory redundancy, that is, every time a constant appears in the code, space is allocated to it in the program memory. It is also used in the case of the previously described methods, but there we still could not use the part of the command used to store the constant for other purposes (frankly, the statement is controversial for VLIW architectures, but we do not consider such), so everything was not so bad, but here we have to allocate an additional word of memory each time, and this drawback begins to be sharply striking. But one cannot escape anywhere, “one has to pay for everything in this world”.
The topic with direct and immediate loading of a constant into the register can be considered complete. We have considered all possible ways to implement it and we see that, along with undoubted advantages, this method either has limited use or puts significant demands on the architecture parameters of the MC or causes an increase in the program size. .
But for those who got acquainted with the MK using the example of architecture 0 (I strongly recommend that you familiarize yourself with it, a more transparent and cleaner implementation, as well as a clear description, I haven’t met), the question immediately arises whether it’s indirect or not. addressing (code 037) in its various forms? Of course, the correct answer is “yes”, otherwise I would not ask this question, but everything is not so simple. First of all, a little bit of the essence of this method - the command contains information not about the operand (in this case, a constant), but about its location in the address space, from which the constant itself can be extracted. Of course, this method as applied to constants can be acceptable only if this address information has a smaller width (and much smaller) than the width of the register. That is, we can (in architecture 0) instead of the actual constant 16 bits wide specify its address of the same width, but in that case we retain all the shortcomings of the direct method, and even aggravating somewhat the problems with the command execution time, and nothing at all having in return. If we consider the architectures mentioned by me, then the condition < is not satisfied in any of them, therefore the method seems to be poorly implementable. But, as always, there are nuances.
First, you can reduce the address width by highlighting special regions (and creating special commands to work with these regions), as was done in the already mentioned STM8, where the address in general is 24 bits wide, but there is a special prefix for working with lower 64K bytes (MDA , the younger 64 kilos in the MK crystal, as if I laughed at a similar assumption in the years of my youth, when the address spreader was present only in CM4, where the processor module occupied half a two-meter rack) and in this case the address width is only 16 bi And, when operating with the first 256 bytes of address space and all 8 bits. But this method is of more theoretical interest, since so access to the constants in the considered architectures is not implemented.
The second method is much more promising and in architecture 0 has the name "index" or "indirect with offset" (code 06x). As always, few details - in this method, the address of the operand (for us constants) is formed by adding the value of the index register (sometimes the last one can be any RON architecture 0, sometimes only some of them are 1.3, sometimes special register pairs - 2) and the offset value, which is specified in the actual command. Of course, if we need to pre-set the value of the index register, then we can not win anything in principle, but there are two options that make this approach acceptable. The first of these is the organization of a pool of constants, setting a pointer to it in some register and further addressing with the help of an offset, which will give us a gain with a significant number of constants, but at the cost of a register, which may not always be acceptable. But the second way is much more interesting - using the instruction counter as the index register, since we (well, not the compiler, but what difference) do know the value of it at the time of the execution of any command, therefore applying the offset to it will give a completely unambiguous result, and for free, because we were not going to (and often could not) use the command counter as a general-purpose register. Of course, this method has its limitations, since we (in order to save) use not too long offsets, so this local pool of constants will not be available from the whole program code, but we will save a lot, at least in the length of the command, but with the execution time is not so straightforward.
In order to clarify our last statement, let us consider once again the organization of memory in the MC, and recall that there are at least two types of memory — programs and data, and they do not differ in name or location in the address space, but in the physical principles of operation (there are also implementations, for example, for MSP430, in which both types of memory are implemented on one universal FRAM carrier and are fundamentally indistinguishable, but it is difficult to call such families as particularly successful, which reminds of the well-known phrase “Who is fats? This is someone who is able to do a wide variety of cases equally bad, "so we do not carefully consider such implementation). And since we have different physical media, then they are accessed differently and they can (and will) have different parameters that characterize, among other things, the access time to them.
Since the program memory (or part of it) must be non-volatile, otherwise we will not be able to ensure the operation of the device, and this is an advantage, and there is also data memory, it would be logical to assume that the latter also has some advantages, otherwise its application is difficult to explain. Indeed, the data memory is much faster in terms of writing (or rather, the program memory is much slower, but it does not affect the result of the comparison), the physical size of the data memory cell is much smaller, it is easier to integrate with the actual MC in terms of manufacturing technology and and so on, but there is another advantage that is important for the issue under discussion - it is usually faster on read operations (the latter is true only for static memory, but I haven’t yet seen dynamic memory inside MK echela, although who knows what lies ahead). Different manufacturers resort to various tricks to ensure that such an undoubted lack of program memory does not affect the performance of the MC as a whole, but there are so many options and practically the only one actually implemented - a buffer in one form or another, a kind of program cache. It can be implemented explicitly, it can be hidden, it can be simply implemented as an increased width of the word read from the program memory, but all these methods increase the speed of reading for consecutive memory cells and are not very efficient in the case of random access.
Returning to the method of allocating constants and their indexing addressing, we see that, since we use index addressing relative to the instruction counter, the constants will be located in the program memory and we will have to read it from there. Therefore, after the command has been decrypted and the executive address is formed, we are in for a tedious procedure of reading constant programs from the memory.
Of course, if we have access to the memory of programs with a word of extended width and we have aligned our constant well, then we can immediately read it all, for example, in Thumb mode in architecture 3, but the delay for the first reading is inevitable. Of course, it is possible to combine the addressing phase with the decoding of the next command, since it is already moving along the pipeline and we do not have to skip the phase, as in the case of direct presentation, but here my reasoning is somewhat speculative. Therefore, I don’t really understand the reasons that motivated ARM to such a decision, it’s quite possible that when a program is located in a special section of data memory, there will be a significant gain due to the loading of a pipeline that compensates for the expected losses., , , . , . , , . — , , , , , « - — , - ». , , , - , . , , , 1 2 , ( , , ).
, . :
1+. ( ) , , , ( ) .
2+. ( , ), ( , ), , — , .
3+. They are ready to work immediately after turning on the device, which is important when displaying diagnostic messages.one-. As a rule, they are slower in their work by themselves and often they are organized by using slower commands than by accessing the data memory, which the process of working with them does not speed up.2-. They are in permanent memory and can not be modified in principle (note, I just praised it), so I am not permanent.Constants in data memory:1-. (, const, , — ), , .
2-. , ( ), , ( ) ( ).
3-. .
1+. They are usually faster in accessing and accessing them can save program memory by avoiding duplication of frequently encountered constants.2+. They can be modified if necessary, which may require careful programming if you have a very good architecture (at last I will explain what I meant by this combination of words - the presence of a mechanism for protecting address space segments and controlling access to them).3+. They are used by default in many architectures and programming systems, and, in a certain sense, natural.Having considered all the pros and cons, we can determine when a set of constants in the program memory is definitely preferable: if you have a shortage of data memory or if this set is used only once in a program (for example, to issue a welcome line). If you need a modification of the set, then the only possible solution is a set of constants in the program memory. All other cases can be left to the discretion of the compiler, because now you do it quite consciously, knowing the possible consequences., , , ( ) . , , ( ) , .
, , godbolt.org, -2 msp430, mips, avr, arm.
Architecture 0. Placing data in memory will require more space in the program memory, more space in the data memory and will be executed unequivocally slower - there is no reason for such a decision.Architecture 1. You can rewrite the previous phrase word for word.Architecture 2. The situation is even worse for the proposed method, since for constants that fit in bytes, there is the adiw command, which improves the speed of the other method.Architecture 3. And here there are chances - if the program is located in the data memory (many implementations of this architecture support this feature), then the index access to the data memory may be faster than retrieving a constant from the program memory, so that performance can be improved, however, by increasing the size of the data section and the program. But if a constant can be formed by coding, then there are no options - the placement in the program memory wins unconditionally.— ( ), , , , . — , — , « ». , , , , , ( ). , , , , .
And finally - a small reminder in case you decide to modify the constants "in place", without creating a copy. In this case, you can observe a very funny phenomenon (well, it is funny, in the opinion of the external observer, you will not be so much fun).It consists in the fact that some compilers for identical constants, especially string constants, allocate space only once, therefore, by changing one of them, you may find that the other has changed. Of course, if you have a good architecture, then you will get a runtime error, but if you don’t have one, then exciting searches are provided. "Just a week of debugging can save you an hour of thinking through the program architecture." Perhaps there are options that prevent this behavior of the compiler, but personally I do not know them, and most likely they will be intolerable. Therefore, this method, despite its attractiveness in some aspects, should be treated as a dirty trick and avoided in every way, unless you have a good reason for a different behavior. "Do not kill one animal another ... without a good reason.", , , , : UART Modbus, SPI SD , USB . , .
:
1. (),
2. (),
3. ,
4. ,
5. ,
6. , , ,
7. HAL,
8. Compromise between portability and efficiency,9. Modular middleware organization,10. Related issues.At the moment I am finishing the first two parts, which do not depend on a specific interface, and ponder which direction to go next. The survey is waiting for you if this topic is interesting to you.