📜 ⬆️ ⬇️

To the question about pins

The Day of Knowledge is dedicated to ...


This post is devoted to what all users of Arduino (hereinafter referred to as A, keep in mind that under this letter will hide both the crystal itself and the development environment of programs), namely with the work of I / O ports.

You may ask, what exactly is there to consider? The functions of working with ports are spelled out clearly, there are a large number of examples, so the use of ports presents no difficulty. The initial LED flashing program uses these functions and works great, what are we talking about?

All this is true, but only up to the moment when you need to connect to A something farther than the LED (I have no complaints about these wonderful devices, but usually, due to the specifics of using particular speed, they are not required), and then you will need effective work with pins (we will call I / O ports for brevity), and then the forums have questions like “why the program works so slowly for me,” to which the young gurus instantly answer “work directly with the registers and you will be happy” and show vayut exactly how, in their opinion, it should do so.
')
Despite the fact that their answer contains a bit of truth, nevertheless, such an answer is far from complete, which is not quite correct (see how they recommend working with registers), is not quite optimal, and is not very clear. This post is intended to fill the space between standard sketches and a similar answer.

Since I see a reader with a different level of training in front of me, I will try to focus on a different range of knowledge in the field of MK (microcontrollers), so if something seems well known to you (and there will be many such places) feel free to skip this fragment, but if that it will not be completely clear, the comments and there are in order to ask questions there, I read them and, as far as I can, I try to answer.

For a start, a little theory. Pins, as follows from their definition, are intended for the interaction of the program created by you with the outside world (while the inner world is considered only the MC). In order to use pins, you must be aware of their existence and, since there are more than one of them, be able to specify these pins (this is called naming or addressing). Therefore, each pin available to you (and there are also inaccessible ones, but they are not interesting to us from a practical point of view) must have a unique parameter characterizing it. In A, pin numbers are taken as such a parameter and, although this is not the only possible way, it is not bad in its own way. Further, all you need to know is the presence of a pin connection that has a specific number, with a specific MK pin and, accordingly, contact on the comb through which you connect to your board (of course, this is board A, but since you bought it, it is yours) various external devices, and accordingly, on the effect of a particular pin on a specific external device.

For example, if you want the LED located on board A to light up, then you need to submit the “Low” level to pin 13, and to move the data through the shift register chip to form a transition from pin 6 to low to high and so on. Moreover, for the first case — the LED control — further reasoning does not make much sense, since the human eye is the receiver of information, and its capabilities in terms of speed do not exceed a dozen Hertz, then for controlling the shift register, the time at which the level changes at the leg is very important, since to an external device (for example, a seven-segment indicator), you will need many changes and their total time may be unacceptable for a particular case (it will block the work of the other parts of the program that are critical for the polling period, such as processing the encoder, which will disrupt their work). Well, if you control the pin of a virtual SPI, then the resulting speed of the SD card will surprise you very unpleasantly, so the task of speeding up work with pins is quite practical.

To control pins in A, there are predefined functions, the main one of which is DigitalWrite, to which you must report the pin number for the modification and the value on it after the function has been executed. However, if you have problems after writing the DigitalWrite (13, Low) command (provided that you have not forgotten the pin mode setting command somewhere before), then they are just beginning for the executing system. The fact is that there are MK architectures in which each pin really has a unique address, which ensures that your command is easily mapped to the MK command system, and this is what the execution system (the compiler bundle and the system library) does, but Atmel is at a time when was created by A, did not indulge her admirers with such delights (this is not quite true, but in the first approximation so). In the microcontrollers of the Mega family, on which platform A was historically based, a slightly different scheme of working with pins was adopted. Here, the work with the outside world is carried out not through unique pins, but through input / output ports, which are a collection of pins (in this case no more than 8) and, accordingly, each pin has 2 parameters in the physical representation - the port name (represented by a letter from And to E in different members of the family MK) and the number of bits inside the port (figure from 0 to 7). So, for example, pin 13 can have a physical address PB.5 in one MK, and RC.0 in another.

Therefore, the first task of the performing system is to convert the pin number into a physical representation for subsequent work with them. This problem can be solved in various ways and, in my opinion, in A this is not done in the best way, but, fortunately, the system is open, and we can make the necessary changes and corrections to it.

First of all, we note that the executing system consists of two components - the compiler (in fact, under this collective name, it is not just one function that hides, but at least the preprocessor, the compiler itself, the assembler, the linker and the librarian, as well as the executing subsystem represented by library modules (sketches). So, the transformation task should be solved either at the compilation stage, or at the execution stage, or somehow distributed between these stages. And it’s preferable to do as much work as possible at the first stage, since the time spent on it is negligible compared to the time of actually writing the code, but any costs (memory and time) at the execution stage require the expenditure of limited MC resources (compared to the development platform) resources Unfortunately, this proposal is not always realizable, but under the conditions when the specific composition of the execution system is known and all the source files are present, it can be very useful, but I ran a little ahead, paused and looked at the implementation of the function of working with pin in A. Here is the source code of the function.

void digitalWrite(uint8_t pin, uint8_t val) { uint8_t timer = digitalPinToTimer(pin); uint8_t bit = digitalPinToBitMask(pin); uint8_t port = digitalPinToPort(pin); volatile uint8_t *out; if (port == NOT_A_PIN) return; // If the pin that support PWM output, we need to turn it off before doing a digital write. if (timer != NOT_ON_TIMER) turnOffPWM(timer); out = portOutputRegister(port); uint8_t oldSREG = SREG; cli(); if (val == LOW) { *out &= ~bit; } else { *out |= bit; } SREG = oldSREG; } 

Immediately, I note that this code is taken from here , as well as the subsequent source codes, but it does not look like fake, and in other sources I came across just such a code, so we will consider it really code for A.

Before we move through the text, to begin with, I will express my displeasure with the signature of the function, and this displeasure is quite reasonable. Both parameters are obviously not integer numbers and accept a limited set of values, so they must be defined as instances of enumerated types, which will allow us to check the correctness of the actual parameters passed to them, at least with respect to constant expressions, which is what I have done. implementation. Well, now you can go to the consideration of the code.

What we see here is first of all the conversion of the pin number to the physical address, implemented by extracting information from the table of constants. Consider this operation in more detail, why go through the source text, look at the code and see the operation of obtaining a bit mask:

 #define digitalPinToBitMask(P) ( pgm_read_byte( digital_pin_to_bit_mask_PGM + (P) ) ) 

And we find out that this is a macro extension, which passes the pin number to another function and adds the name of the table where the numbers of bits are stored. Next, we find out that:

 #define pgm_read_byte(address_short) pgm_read_byte_near(address_short) 

This is also a macro wrapper, passing its arguments to the following function, it immediately turns out that:

 #define pgm_read_byte_near(address_short) __LPM((uint16_t)(address_short)) 

This ... yes, right, the macro wrapper, you begin to pick up the principle that passes its arguments to the function:

 #define __LPM(addr) __LPM_enhanced__(addr) 

Which is (who would have thought) a macro wrapper for a function that is ... and here it was not guessed, it all had to end sooner or later, a real macro substitution of the assembler insert. I don’t understand too much about the presence of four wrappers, but since we agreed (or rather, I stated it, and you didn’t argue or argue, but I didn’t notice it) that the compiler time is not worth anything, we will not focus on this aspect, But let's just be surprised out loud, fix our surprise on paper and go on to look at the code.

 #define __LPM_enhanced__(addr) (__extension__({ uint16_t __addr16 = (uint16_t)(addr); uint8_t __result; __asm__ ( "lpm %0, Z" "\n\t" : "=r" (__result) : "z" (__addr16) ); __result; })) 

Let's consider the text of the function of data extraction from the table more closely, there are a number of interesting points. Since we have been given an assembler insert, we should take into account the specifics of the implementation of the MK architecture, it is important for us that it is 8-bit and battery-powered.

Let us pay attention to the first two lines, where the actual parameter is not extracted from the macro substitution text, but from the intermediate variable __addr16, and ask the question - why it was done this way, this is unnecessary forwarding. The answer that I came up with was maybe it was really necessary if we wanted to use statically noncomputable expressions as a pin number, that is, a string of characters whose value cannot be determined at the compilation stage with an accuracy of one transfer. Then, when the macro is opened, commands will be generated that calculate this expression “on the fly”, the result is transferred to the intermediate variable, and then used for its intended purpose. That is, in this case we have an account of the features of the preprocessor, and not the architecture of the MC, for which you have to pay for speed. You have already guessed that I am not thrilled with this practice, when the opportunity to use statically undefined expressions must pay all function calls, including those with constant parameters, of which there will be an overwhelming majority.

We will immediately think about alternatives - firstly, it can be a variant implementation based on determining the type of the parameter, if this is possible in this preprocessor (I don’t know it well enough to bring this solution), secondly, it can be direct prohibiting the use of statically undefined expressions when the DigitalWrite(BasePinNumber+6, Low) construction will result in a compiler error and you will have to turn it into int PinNumber=BasePinNumber+6; DigitalWrite(PinNumber,Low) int PinNumber=BasePinNumber+6; DigitalWrite(PinNumber,Low) , which seems to me a reasonable price for an increase in performance in other cases.

(Later note. Viewing the real code showed that this line does not generate additional code in the compiler in question, removing it and sending the macro argument to the assembler does not change the code as much as possible, so apparently this legacy of the damned past will be left at maximum speed It does not affect, but I do not want to delete the text, the idea was not bad).

The necessary explanation is that although I argue about the shortcomings of the implementation of the A function, I nevertheless have to admit that I do not have the board, that is, of course it does, but this is the so-called And compatible Intel Edisson board, which, although it can be programmed from A, Nevertheless, in no way is the replacement of the board itself A. The next sin, in which I must admit - I do not use the A development environment, for which there are many reasons ...

There is a great joke on this subject. After one of the battles, Napoleon asks the Marshal of the artillery why the artillery did not shoot. Marshal replies: “There were many reasons for this, Sire. First, we did not have shells. Secondly ...". Napoleon interrupts him: "Enough"

... of which at the moment it is decisive that it is very inconvenient to watch intermediate files, including assembly language code. Therefore, the further results considered are related to the code obtained in the gcc.godbolt.org online compiler in the AVR gcc 4.5.2 mode with -O optimization enabled. Immediately I want to assure the reader that this compiler generates a very effective code, if I wrote it with my hands, it would not be much better (although still a little better), I think that in A compiler the results are better than those obtained by this method just will not.

Further (or rather, a little earlier), we see the extraction of additional information from the auxiliary table, which tells us whether this pin is not the output port of the timer. I don’t really understand how exactly this fact can affect the reluctance of the executing system to work with such a pin without disconnecting the timer (it seems to me that in this case it takes a lot of itself), but the fact that such testing requires additional time is undoubtedly for me . Perhaps this is the legacy of the damned past, the true meaning of such a decision is inaccessible to the uninitiated in the innermost secrets of A. What can we do to speed up the work? Well, you can combine this check with getting the port index by entering a special value, especially since such a similar check of the port index for admissibility is carried out several lines later. Further, we can define a conditional compilation, giving the user the opportunity to determine if he is willing to pay the execution time (if it was about compile time, I would only be happy) for an additional check. Well, like a cherry on a cake, the verification condition itself using double negation seems to me to be somewhat fanciful, the simple if (timer == IS_ON_TIMER) condition if (timer == IS_ON_TIMER) is not inferior to the original condition, but more readable. Let's also pay attention to the fact that we get this value in one place, and use (and once) much later, which is also not beautiful, which is inevitable for the C language, but we have C ++, and we can do it more correctly, at least and not faster.

Another interesting point related to checking for a timer. There is an undoubted error here, if you look at the generated code in the Shima off function (see this code yourself) turnOffPWM (), you will see the possibility of interruption of the operation of other modules. Of course, it will appear extremely rarely, perhaps, it will never appear at all (but do not forget about the laws of Murphy), but it is and should be definitely corrected, because otherwise children can see THAT, and decide, “but I didn’t know that is so possible. "

This is followed by checking the port index for admissibility, since not all pins can have a physical representation in a particular MC. And again, not everything has been done well. First, the rule is violated that each function must have one exit point, well, here it is enough to slightly correct the text. True, the cyclomatic complexity of the program will increase, but here you have to choose which of the mutually exclusive rules to support. Secondly, we again have to pay for the paranoia of developers by the time of execution, and we are not asked if we need such care. There are two options for increasing the speed of this fragment - the first is a conditional compilation, but the second is more cunning - specifying in the table elements corresponding to the missing pins, the port address and the bit number, the change of which is neutral with respect to external outputs (and internal registers MK, which is even more important). The easiest way to specify a zero bit mask for the real port, but options are possible. Yes, in this case we will do the useless work in case of a wrong pin number (which is absent in this MC), but we will not waste time with the correct number.

This is what the developers can not but agree with, the need to prevent the internal bits of the MK registers from being modified when the pin number is set incorrectly, and the specified code fragment does, but then why the hell is there no such check at the very beginning of the function, because we ask to change the state of pin 137, then we get a completely unpredictable behavior of the program, and this is the developer. But he doesn’t care at all - there is a funny combination of paranoia and pofigism, I thought, the last is peculiar only to us, Slavs. We can insert such a check into the text of the function, but it is much better to do it, as I advised at the very beginning - to create a custom type and the compiler will do the necessary checks itself and there will be nothing left for the execution phase. Again, you can easily bypass this test and shoot wherever you want, but you will have to clearly and clearly the compiler about your intention to warn and then do not complain.

We look further and notice that the bit number is extracted in one stage, and the port address is in two stages; first, by the pin number, we get the port index - a number from 0 to the number of ports, and then, based on the index, we retrieve the port address itself. Why this is done, because it is obviously longer than immediately get the address we need - you can think of two explanations. Firstly, such a technique gives a great deal of flexibility - frankly, a far-fetched explanation. The second possible reason is saving the size of the ROM, which will be in bytes the number of pins minus the size of the additional code, that is, bytes 6-8, which seems to me obviously insufficient compensation for a significant decrease in performance. Moreover, the same result can be achieved by specifying in the first table not an index, but an offset and then turning it into an address by a less expensive way of adding to the base or even combining, as demonstrated in my implementation. Yes, this method is not as portable as the original one, but as far as I remember, nobody canceled the conditional compilation directives. Actually, I have the impression that many components of libraries A were made in haste from existing universal (who is a universal - a person who can do a lot of things equally bad) blanks, and then acted on the principle “Does this thing work? - Yes. “Do not touch it.” Unfortunately, a much faster way to access ports through the in and out commands is not acceptable in this case, since it is impossible to specify the port number as an argument to the command.

Continue to review the code. We received the port address and the bit mask and can proceed with the actual execution of the operation, that is, here we see exactly what the young gurus from A. recommend us. Once again, as I have repeatedly done in my posts, I beg you not to do so. That is, there is no other way to change the content of the bit in the port (but I deceived you, there is, but more on that later), but it is not necessary to register this way as a direct operation. Be sure to wrap the call to registers in a macro or inline function, this can save you a lot of time when debugging. Of course, if you are among the lucky ones who have never forgotten the ~ character in front of the mask, then you don’t need a wrapper (but then why do you read it at all, my post is not for the demigods from programming), but it doesn’t do any harm , but for normal people who are prone to make mistakes, it is very useful. Moreover, the authors And they know about this need, look at the implementation of switching off the PWM, there is just a macro for resetting a bit, but in this particular place they arrogantly ignore this possibility.

And let us pay attention to the fact that the actual work with the register, namely reading-modification-writing, is framed with additional lines, the purpose of which is not clear to neophytes. We all understand that this is a resource sharing protection made in the classic style of the critical section, where the interrupt enable bit is used as a semaphore, but it inevitably takes time to execute, even if it is in this particular program (I hope no one I was not offended that I called the sketch like that) is not needed. By invoking conditional compilation, we can further improve the speed of the function. I recommend to pay attention to the fact that all newly introduced compilation conditions were initially set in such a way that the operation of the function in the default mode did not change at all in order to preserve continuity, all of a sudden it was taken into account in some sketch that the operation time of the function is this and that is important should not change.

By the way, one interesting observation. In the absence of protection against register sharing, work on changing the state of the port bits produced by that part of the program that interrupted the work of another part, and never vice versa, can be lost. Apparently, in this way world justice realizes itself, expressed in popular wisdom “who interferes, is beaten”.

And another small pebble in the garden A - from the text of the function it is easy to see that the description of the function is not quite right - if the value of the second parameter is LOW, then the bit will be cleared, otherwise (and not if the parameter is equal to HIGH, as in the description a) the bit will be set. If this parameter accepts only the specified values, then this clarification does not make sense, but in the original function its values ​​are not limited by anything except the goodwill of the programmer.

In order for you to understand the following calculations, under the spoiler there is a code with comments when compiled:

Assembly code
 digitalWrite(unsigned char, unsigned char):  9  push r15 push r16 push r17 mov r16,r22 mov r18,r24 ldi r19,lo8(0) uint8_t timer = digitalPinToTimer(pin); 7  mov r30,r18 mov r31,r19 subi r30,lo8(-(digital_pin_to_timer_PGM)) sbci r31,hi8(-(digital_pin_to_timer_PGM)) lpm r24, Z uint8_t bit = digitalPinToBitMask(pin); 7  mov r30,r18 mov r31,r19 subi r30,lo8(-(digital_pin_to_bit_mask_PGM)) sbci r31,hi8(-(digital_pin_to_bit_mask_PGM)) lpm r17, Z uint8_t port = digitalPinToPort(pin); 7  subi r18,lo8(-(digital_pin_to_port_PGM)) sbci r19,hi8(-(digital_pin_to_port_PGM)) mov r30,r18 mov r31,r19 lpm r15, Z if (port == NOT_A_PIN) return; 2  tst r15 breq .L12 if (timer != NOT_ON_TIMER) turnOffPWM(timer); 3     ,       cpse r24,__zero_reg__ rcall turnOffPWM(unsigned char) out = portOutputRegister(port); 14  mov r30,r15 ldi r31,lo8(0) lsl r30 rol r31 subi r30,lo8(-(port_to_output_PGM)) sbci r31,hi8(-(port_to_output_PGM)) lpm r24, Z+ lpm r25, Z mov r30,r24 mov r31,r25 uint8_t oldSREG = SREG; 1  in r24,__SREG__ cli(); 1  cli if (val == LOW) { *out &= ~bit; } else { *out |= bit; } 10/8  LOW/HIGH tst r16 brne .L15 ld r25,Z com r17 and r17,r25 st Z,r17 rjmp .L16 .L15: ld r25,Z or r17,r25 st Z,r17 .L16: SREG = oldSREG; 1  out __SREG__,r24 .L12:  10  pop r17 pop r16 pop r15 ret main: ....   4  ldi r24,lo8(1) ldi r22,lo8(0) rcall digitalWrite(unsigned char, unsigned char) ..... ret 


Now you can take stock. Call + preamble + postambula - 4 + 9 + 10 = 23, resource protection - 3, timer protection - 7 + 3 = 10, pin protection - 2 getting the bit number - 7, getting the port address - 7 + 14 = 21, modifying the value - 10/8, which gives us the execution time of the function 76/74 clock cycle, or at a clock frequency of 16 MHz MK it will be 4.75 / 4.625 microsecond - the result is quite expected for someone who saw the source code and is familiar with the AVR architecture. In different sources I saw different digits of the execution time of the digitalWrite function, but they were only more received in this case.

An interesting observation is that the time of setting and resetting a bit is different, which is not too good. This disadvantage is easy to fix, replacing the condition with the opposite, then we get 9/9 - alignment leads to an increase in one of the times, but they are aligned - a trifle, but nice.

It would be great
The same result is obtained when using the skip command, and we get 8/8, but for some reason (it is clear why, because the logic of the function changes, which is unacceptable) the compiler refuses to use it and does not want to make such code as we would like, but from the following code

  register char tmp; tmp=*out; tmp |= bit; bit = ~ bit; if (val == HIGH) tmp &= bit; *out=tmp; 
get the next lowest possible program
  ld r18,Z or r18,r17 com r17 cpse r16,one_reg and r18,r17 st Z,r25 

Unexpectedly, the following consideration came to mind - the standard implementation of the temporary interruption shutdown, recommended by Atmel in numerous examples, is unsafe. It is vulnerable at the point from reading the current value of the status register and until the interrupt is disabled, so the results of the function interrupting the program at this point for changing the interrupt enable bit will be lost when the state register is restored to the saved value. , 0-1 , 1-0 . , , . , — 1 , . — .

, . :

 #define I_NEED_TIMER_CHECKING 0 #define I_NEED_PORT_CHEKING 0 #define I_NEED_OLD_PORT 0 #define I_NEED_OLD_DATA 0 #define I_NEED_INTERRUPTS 0 void digitalWrite(uint8_t pin, uint8_t val) { #if ( I_NEED_TIMER_CHECKING == 1) uint8_t timer = digitalPinToTimer(pin); #endif uint8_t bit = digitalPinToBitMask(pin); uint8_t port; #if I_NEED_OLD_PORT == 1 port = digitalPinToPort(pin); #else port = digitalPinToPortNew(pin); #endif #if I_NEED_PORT_CHEKING == 1 if (port == NOT_A_PIN) return; #endif #if ( I_NEED_TIMER_CHECKING == 1) if (timer != NOT_ON_TIMER) turnOffPWM(timer); #endif uint8_t *out; #if I_NEED_OLD_PORT == 1 out = (uint8_t *) portOutputRegister(port); #else out = (uint8_t *) ( port + BASEPORT ); #endif #if I_NEED_INTERRUPTS ==1 uint8_t oldSREG = SREG; cli(); #endif #if I_NEED_OLD_DATA == 0 if (val == LOW) { *out &= ~bit; } else { *out |= bit; } #else if (val != LOW) { *out |= bit; } else { *out &= ~bit; } #endif #if I_NEED_INTERRUPTS ==1 SREG = oldSREG; #endif }; 

 digitalWrite(unsigned char, unsigned char):  1  ldi r25,lo8(0) uint8_t bit = digitalPinToBitMask(pin); 7  mov r30,r24 mov r31,r25 subi r30,lo8(-(digital_pin_to_bit_mask_PGM)) sbci r31,hi8(-(digital_pin_to_bit_mask_PGM)) lpm r18, Z uint8_t port = digitalPinToPortNew(pin); 7  subi r24,lo8(-(digital_pin_to_port_new_PGM)) sbci r25,hi8(-(digital_pin_to_port_new_PGM)) mov r30,r24 mov r31,r25 lpm r24, Z uint8_t *out = (uint8_t *) ( port + BASEPORT ); 2  mov r26,r24 ldi r27,lo8(0) if (val == LOW) { *out &= ~bit;} else { *out |= bit; } 8/8  tst r22 brne .L13 com r18 ld r30,X and r18,r30 st X,r18 ret .L13: ld r30,X or r18,r30 st X,r18  4  ret main: ....   4  ldi r24,lo8(1) ldi r22,lo8(0) rcall digitalWrite(unsigned char, unsigned char) .... 

++ — 4+1+4 = 9, — 0, — 0, — 0, — 7, — 7+2 = 9, — 8/8, 9+7+9+8/8 = 33/33 , 16 2.062/2.062 , , 2 , , .

, . — , . , , , . , , . , :

 PinAdr Pin13=TransferPin(PIN13); DigitalPut(&Pin13,LOW); 

, — 13+3+9+8/8 = 33 30 ( , , ), , . , 10%, ? , , (9) .

, DigitalPut2(Pin13.Port,Pin13.Mask,LOW); 15+8/8 = 24 (27 ), 30% 30 3 :

 typedef struct { uint8_t *Port; uint8_t Mask;} PinAdr; PinAdr TransferPin(uint8_t Pin) { PinAdr PinAdrTmp; uint8_t port; #if I_NEED_OLD_PORT == 1 port = digitalPinToPort(Pin); #else port = digitalPinToPortNew(Pin); #endif #if I_NEED_PORT_CHEKING == 1 if (port == NOT_A_PIN) PinAdrTmp.Mask=0; else #endif PinAdrTmp.Mask = digitalPinToBitMask(Pin); #if ( I_NEED_TIMER_CHECKING == 1) uint8_t timer = digitalPinToTimer(Pin); if (timer != NOT_ON_TIMER) turnOffPWM(timer); #endif #if I_NEED_OLD_PORT == 1 PinAdrTmp.Port = (uint8_t *) portOutputRegister(port); #else PinAdrTmp.Port = (uint8_t *) ( port + BASEPORT ); #endif return PinAdrTmp; } void DigitalPut(PinAdr &Pin, uint8_t val) { #if I_NEED_INTERRUPTS ==1 uint8_t oldSREG = SREG; cli(); #endif if (val != LOW) { *(Pin.Port) |= Pin.Mask; } else { *(Pin.Port) &= ~ Pin.Mask; } #if I_NEED_INTERRUPTS ==1 SREG = oldSREG; #endif }; void DigitalPut2(uint8_t *Port, uint8_t Mask, uint8_t val) { #if I_NEED_INTERRUPTS ==1 uint8_t oldSREG = SREG; cli(); #endif if (val != LOW) { *Port |= Mask; } else { *Port &= ~ Mask; } #if I_NEED_INTERRUPTS ==1 SREG = oldSREG; #endif }; 

 DigitalPut(PinAdr&, unsigned char): mov r30,r24 mov r31,r25 tst r22 breq .L14 ld r26,Z ldd r27,Z+1 ld r25,X ldd r24,Z+2 or r24,r25 st X,r24 ret .L14: ld r26,Z ldd r27,Z+1 ldd r24,Z+2 com r24 ld r25,X and r24,r25 st X,r24 ret DigitalPut2(unsigned char*, unsigned char, unsigned char): mov r30,r24 mov r31,r25 tst r20 breq .L17 ld r24,Z or r22,r24 st Z,r22 ret .L17: com r22 ld r24,Z and r22,r24 st Z,r22 ret main: ..... mov r24,r28 mov r25,r29 adiw r24,1 ldi r22,lo8(0) rcall DigitalPut(PinAdr&, unsigned char) ldd r24,Y+1 ldd r25,Y+2 ldd r22,Y+3 ldi r20,lo8(0) rcall DigitalPut2(unsigned char*, unsigned char, unsigned char) 

Not bad. . , , , ?

, , . , . , , ( ), ( ), , « ...», ( , ).

The fact is that when I declared the impossibility of another way of changing the state of a bit in a register, I hinted that it did exist. And this way is connected with the use of special commands for working with bits, namely, commands with CBI and SBI mnemonics. Why we left this opportunity at last - because it is a dependent solution and it is poorly implemented (not implemented at all) within the C language, but the same can be said about reading the data from the tables, the assembler involvement is inevitable.

Unfortunately, both of these commands do not accept any parameters, which does not allow specifying the pin number for the modification and the type of modification, and this information is part of the command itself and we have, theoretically, 32 different commands for 32 different ports. We take into account the possibility that each port has 8 bits and we get 256 different commands. Since the type of pin modification, namely setting or resetting a bit, is also part of the command, 512 different commands are obtained in total.

A rather large number, in the architecture of 51 teams there were no more than 256, and there is such a luxury, but a 16-bit command system can afford it. Of course, not all of these 512 teams will perform meaningful actions on a particular MC, but some will, and we should be able as quickly as possible to execute one specific command from this set.

The reader will immediately have a question - why do we need such problems with the use of assembler and other difficulties, and I immediately answer it - these commands are atomic (non-interruptible) which allows us to significantly save on execution time without disabling protection from resource sharing because for atomic operation it is unnecessary. So, we can implement a call to a specific command we need in the classical way, by creating a table of commands and gaining access to it when calculating the index in the table on the fly, as shown in the following code fragment:

The first option with a table
 typedef void func (void); void fnull(void) {}; void fres1(void) {(__extension__({__asm__("cbi PORTD,0""\n\t");}));}; void fset1(void) {(__extension__({__asm__("sbi PORTD,0""\n\t");}));}; void fres2(void) {(__extension__({__asm__("cbi PORTD,1""\n\t");}));}; void fset2(void) {(__extension__({__asm__("sbi PORTD,1""\n\t");}));}; func *funcAdr_PGM[] PROGMEM = { fnull,fnull, fres1,fset1, fres2,fset2, }; #define funcOfPin(P) ( (func *)(pgm_read_word( funcAdr_PGM + (P))) ) void digitalWriteF(uint8_t Pin, uint8_t val) { #if I_NEED_PORT_CHEKING == 1 uint8_t port; #if I_NEED_OLD_PORT == 1 port = digitalPinToPort(Pin); #else port = digitalPinToPortNew(Pin); #endif if (port == NOT_A_PIN) return; #endif #if ( I_NEED_TIMER_CHECKING == 1) uint8_t timer = digitalPinToTimer(Pin); if (timer != NOT_ON_TIMER) turnOffPWM(timer); #endif Pin=Pin*2; if (val!=LOW) Pin++; funcOfPin(Pin)(); }; main: digitalWriteF(13,LOW);}; 

And here are the results of the broadcast
 digitalWriteF(unsigned char, unsigned char): Pin=Pin*2; if (val!=LOW) Pin++; lsl r24 cpse r22,__zero_reg__ subi r24,lo8(-(1)) func *p=funcOfPin(Pin); mov r30,r24 ldi r31,lo8(0) lsl r30 rol r31 subi r30,lo8(-(funcAdr_PGM)) sbci r31,hi8(-(funcAdr_PGM)) lpm r24, Z+ lpm r25, Z p(); mov r30,r24 mov r31,r25 icall ret main: ldi r24,lo8(13) ldi r22,lo8(0) rcall digitalWriteF(unsigned char, unsigned char) 

++ — 4+0+4 = 8, — 0 ( ), — 3, — 6, — 3+3+2 = 8, 3+2+4 = 9, 8+3+6+8+9 = 34/34 . , , , ( ) , , p() asm («ijmp \n\t») ( , ), 4 , 30 .

3* ( ) 2 — 1 , 1 1 . , , 2*1 ( + 1)+ , .

:

 void funcAll(void) { asm volatile ("nop \n \t ret \n\t"); asm volatile ("nop \n \t ret \n\t"); asm volatile ("cbi PORTD,0 \n \t ret \n\t"); asm volatile ("sbi PORTD,0 \n \t ret \n\t"); asm volatile ("cbi PORTD,1 \n \t ret \n\t"); asm volatile ("sbi PORTD,1 \n \t ret \n\t"); }; typedef func *PinAdr; PinAdr TransferPin(uint8_t Pin) { #if I_NEED_PORT_CHEKING == 1 uint8_t port; #if I_NEED_OLD_PORT == 1 port = digitalPinToPort(Pin); #else port = digitalPinToPortNew(Pin); #endif if (port == NOT_A_PIN) return; #endif #if ( I_NEED_TIMER_CHECKING == 1) uint8_t timer = digitalPinToTimer(Pin); if (timer != NOT_ON_TIMER) turnOffPWM(timer); #endif Pin=Pin*4; PinAdr PinTmp = (PinAdr) ((int)funcAll + Pin); return PinTmp; }; void DigitalPut(func *Pin, uint8_t val) { if (val != LOW) { Pin = (PinAdr) ((int)(Pin)+2); }; Pin(); main: digitalPut(Pin13,LOW); }; 

, ,
 DigitalPut(void (*)(), unsigned char): cpse r22,__zero_reg__ adiw r24,2 mov r30,r24 mov r31,r25 icall ( ijmp ) ret main: lds r24,main::Pin13 lds r25,main::Pin13+1 ldi r22,lo8(1) rcall DigitalPut(void (*)(), unsigned char) 

++ — 7+0+4 = 11, — 0 ( ), — 3, — 0, — 0, 2+3+2+4 = 11(7), 11+3+11(7) = 25(21) , , 6 . , , , 2* . , , .

( ), 11+3+(3+2)5=19 , , , .

— , , . , , , . , 2 4 ( , , ), 4+4=8 , 1+4=5 2 , — , . .

, , , , , ( ), , .

What are these cases and why they are not always present, although quite often? We are talking about a situation where two conditions are fulfilled: first, the source codes of the program modules are available for which fast work with pins is necessary, and second, the pin numbers must be constant, that is, known at the compilation stage (it is constant, but not statically defined , it is important). Then we can create macro substitution to work with the pins of the following form:

The perfect solution in terms of speed
 #define digitalWriteC(pin,val) \ if (pin == 0) { \ if (val == LOW) asm volatile ("cbi PORTD, 0 \n\t"); else asm volatile ("sbi PORTD, 0 \n\t");\ }; \ if (pin == 1) { \ if (val == LOW) asm volatile ("cbi PORTD, 1 \n\t"); else asm volatile ("sbi PORTD, 1 \n\t");\ }; \ if (pin == 2) { \ if (val == LOW) asm volatile ("cbi PORTD, 2 \n\t"); else asm volatile ("sbi PORTD, 2 \n\t");\ }; main: digitalWriteC(13,HIGH); 

, , , , , - ( ), . , - , , , , . , , ( , ) (, , ), . , .

, , , I2C SPI , , , , , , , , .

, , :

 #define DigitalPut(Pin,Data) ( asm ( "DW 0x0123+((Pin / 8) << 4)+(Pin % 8) + (Data << 8)) 

But for some reason I didn’t manage to build such a construction, if anyone knows how to do it, I ask in the comment.

If we had a normal macro language in the preprocessor, then we could write a generalized substitution that would generate the optimal variant for each case, without forcing the programmer to think about which construct to call and which paradigm to use, but this is from the field of baseless dreams , since the standard preprocessor does not provide such opportunities (at least it does not provide me, if you are with it on more friendly terms, share secrets).

, , , . , (, , , ), , , , , , , . , — , , .

Source: https://habr.com/ru/post/308662/


All Articles