Inventing the vusb library

Introduction

After reading the name, a logical question may arise: why nowadays study the software implementation of low-speed USB when there are a bunch of cheap controllers with a hardware module? The fact is that the hardware module, hiding the level of exchange of logical levels, turns the USB protocol into a kind of magic. In order to feel how this “magic” works, there is nothing better than reproducing it from scratch, starting from the lowest level.

To this end, we will try to make a device pretending to be USB-HID based on the ATmega8 controller. Unlike the widespread literature, we will not go from theory to practice, from the lowest to the highest level, from logical voltages on the conclusions, and end with the “invention” of the same vusb, after each step checking whether the code works as expected. Separately, I note that I do not invent an alternative to this library, but rather, I consistently reproduce its source code, preserving the original structure and names as much as possible, explaining why this or that section serves. However, my usual style of writing code is different from the style of vusb authors. Immediately, I honestly admit that in addition to altruistic interest (to tell a difficult topic to others), I also have a selfish interest - to study the topic on my own and to catch a maximum of subtle points for myself. It follows from this that some important point may be missed, or some topic is not fully disclosed.

For a better understanding of the code, I tried to highlight the changed sections with comments and remove them from the sections discussed earlier. Actually, the source code will be the main source of information, and the text will explain what was done and why, as well as what result is expected.

I also note that only low-speed USB is considered, even without mentioning, what distinguishes more high-speed varieties.

Step 0. Iron and other preparation

As a test, let's take a homemade debugging board based on ATmega8 with 12 MHz quartz. I will not give the scheme, it is quite standard (see the official vusb website), the only thing worth mentioning is the conclusions used. In my case, the D + pin corresponds to PD2, the D- PD3 pin, and the suspender hangs on PD4. In principle, a pull-up resistor could be connected to power, but manual control seems a little more consistent with the standard.

5 V power is supplied from the USB connector, however, no more than 3.6 V is expected on signal lines (why was this a mystery for me). So you need to either lower the power of the controller, or put the zener diodes on the signal lines. I chose the second option, but by and large it is not important.

Since we are “inventing” the implementation, it would be nice to see what happens in the controller’s brain, that is, at least some kind of debugging information is needed. In my case, these are two LEDs on PD6, PD7 and, most importantly, UART on PD0, PD1, configured on 115200, so that you can listen to the controller’s chatter through a regular screen or another program for working with the COM port:

$ screen /dev/ttyUSB0 115200

Also, a wireshark with the appropriate module will turn out to be a useful utility for USB debugging (it doesn’t always start from the box, but solving such problems is quite successfully located on the Internet and is not the task of this article).

Here it would be possible to spend another kilobyte of text on the description of the programmer, makefiles, and other things, but this hardly makes sense. In the same way, I will not focus on peripheral settings that are not related to USB. If someone cannot even figure this out, is it too early to get into the bowels of software USB?

The source code for all steps is available on Github.

Step 1. Accept at least something

According to the documentation, USB supports several fixed speeds, of which AVR will pull only the lowest: 1.5 megabits per second. It is determined by the pull-up resistor and subsequent communication. For our chosen frequency, the resistor should connect D- with a 3.3 V power supply and have a nominal value of 1.5 kOhm, but in practice it can be connected with +5 V, and the nominal value will vary slightly. With a controller frequency of 12 MHz, there are only 8 clock cycles per bit. It is clear that such accuracy and speed are achievable only in assembler, so we’ll start the drvasm.S file. This also implies the need to use an interrupt to catch the beginning of a byte. I am glad that the first byte transmitted via USB is always the same, SYNC, so if you get to the beginning, it's okay. As a result, from the beginning of the byte to its end, only 64 cycles of the controller take place (in fact, the margin is even smaller), so you should not use other non-USB interrupts.

Immediately put the configuration into a separate usbconfig.h file. It is there that the outputs responsible for USB will be set, as well as the bits, constants and registers used.

Theoretical insert
Transfer via USB is carried out in packets of several bytes in each. The first byte is always the SYNC synchronization byte, equal to 0b10000000, the second is the byte identifier of the PID packet. The transfer of each byte goes from the least significant bit to the most significant one (this is not entirely true, but in vusb this subtlety is ignored, given elsewhere) using NRZI encoding. This method consists in the fact that a logical zero is transmitted by changing the logical level to the opposite, and a logical unit is transmitted by non-change. In addition, protection from desynchronization is introduced (which we will not use, but must be taken into account) of the signal source and receiver: if there are six units in a row in a transmitted sequence, that is, six consecutive clock cycles do not change the state of outputs, a forced inversion is added to the transmission, as if zero is transmitted. Thus, the byte size can be 8 or 9 bits.
It is also worth mentioning that the data lines in USB are differential, that is, when D + is high, D- it is low (this is called the K-state) and vice versa (J-state). This is done for better noise immunity at high frequency. True, there is an exception: the signal at the end of the packet (it is called SE0) is transmitted by pulling both signal lines to the ground (D + = D- = 0). There are two more signals transmitted by holding a low voltage on the D + line and a high voltage on the D + line for various times. If the time is small (one byte length or a little longer) then this is Idle, a pause between packets, and if it is large, a reset signal.

So, the transmission is on a differential pair, not counting the exotic case of SE0, but we will not consider it yet. So to determine the status of the USB bus, we need only one line, D + or D-. By and large, there is no difference which one to choose, but for definiteness let D- be.

The beginning of the packet can be determined by receiving the SYNC byte after a long Idle. The Idle state corresponds to log. 1 on the D- line (it is also the J-state), and the SYNC byte is 0b100000, but it is transmitted from the least significant bit to the most significant one, and even encoded in NRZI, that is, each zero means signal inversion, and one means maintaining the same level. So the sequence of states D- will be as follows:

byte	Idle	SYNC	PID
USB	1..1	00000001	????????
D-	1..1	01010100	????????

The beginning of a packet is easiest to detect on a falling edge, and we will configure an interrupt on it. But what if the controller is busy during the start of reception and cannot enter the interrupt immediately? In order to avoid losing track counts in such a situation, we use the SYNC byte for its intended purpose. It consists entirely of fronts at the boundaries of bits, so that we can wait for one of them, then another half-bit, and get straight into the middle of the next. However, waiting for a “some” front is not a good idea, because we need not only to get into the middle of the bit, but also to know what bit we got into the score. And for this SYNC is also suitable: it has two zero bits in a row at the end (they are K-states). Here we will catch them. So, in the drvasm.S file, a piece of code appears from the interrupt entry to foundK. Moreover, due to the time for checking the status of the port, for an unconditional transition and so on, we get to the mark not at the beginning of the bit, but just in the middle. But it’s pointless to check the same bit, because we already know its meaning. Therefore, we wait for 8 clock cycles (so far empty nop'ami) and check the next bit. If it is also zero, then we have found the end of SYNC and can proceed to receive significant bits.

Actually, all further code is designed to read two more bytes with subsequent output to UART. Well, waiting for the state of SE0 so as not to accidentally get into the next package.

Now you can compile the resulting code and see what bytes our device accepts. Personally, I have the following sequence:

 4E 55 00 00 4E 55 00 00 4E 55 00 00 4E 55 00 00 4E 55 00 00

Remember, we are outputting raw data, excluding incremental zeros and NRZI decoding. Let's try to decode manually, starting with the low bit:


	4E
NRZI	01001110	0 (previous bit)
byte	00101101
	2D


	55
NRZI	01010101	0 (previous bit)
byte	00000000
	00

It does not make sense to decode zeros, since 16 identical values in a row cannot be included in a packet.

Thus, we were able to write firmware that accepts the first two bytes of the packet, although so far without decoding.

Step 2. Demo version of NRZI

In order not to recode manually, you can entrust this to the controller itself: the XOR operation does exactly what you need, although the result is inverted, so we add another inversion after it:

 mov temp, shift lsl shift eor temp, shift com temp rcall uart_hex

The result is quite expected:

 2D 00 FF FF 2D 00 FF FF 2D 00 FF FF 2D 00 FF FF 2D 00 FF FF

Step 3. Get rid of the byte receive cycle

Let's take one more small step and expand the cycle of receiving the first byte into a linear code. Thus it turns out a lot of nops, needed only to wait for the start of the next bit. Instead of some of them, you can use the NRZI decoder, others will come in handy in the future.

The result from the previous option is no different.

Step 4. Read to the buffer

Reading in separate registers is, of course, quick and beautiful, but when there is too much data, it is better to use a buffer entry located somewhere in the RAM. To do this, declare an array of sufficient size in the maine, and in the interrupt we will write there.
Theoretical insert

The packet structure in USB is standardized and consists of the following parts: SYNC byte, PID + CHECK byte (2 fields of 4 bits each), data field (sometimes 11 bits, but more often an arbitrary number of 8-bit bytes) and a CRC check sum of either 5 ( for an 11-bit data field), or 16 (for the rest) bits. Finally, the end of packet indication (EOP) is two pause bits, but this is no longer data.

Before working with the array, you still need to configure the registers, and free nop before the first bit is not enough for this. Therefore, you will have to put the reading of the first two bits into the linear section of the code, between the commands of which we will insert the initialization code, and then jump into the middle of the reading cycle, to the rxbit2 label. Speaking of buffer size. According to the documentation, more than 8 bytes of data cannot be transferred in one packet. We add the service bytes PID and CRC16, we get a buffer size of 11 bytes. SYNC byte and EOP state will not be written. We won’t be able to control the interval of requests from the host, but we don’t want to lose them either, so we’ll take a double margin for reading. For now, we will not use the entire buffer, but in order not to return in the future, it is better to immediately select the required volume.

Step 5. Working with the buffer humanly

Instead of reading directly the first bytes of the array, we write a piece of code that reads exactly as many bytes as it was actually written to the array. And at the same time add a separator between packages.
Now the output looks like this:

 >03 2D 00 10 >01 FF >03 2D 00 10 >01 FF >03 2D 00 10 >01 FF >03 2D 00 10 >01 FF >03 2D 00 10 >01 FF

Step 6. Adding an Additive Zero Supplement

Finally it's time to finish reading the bitstream to standard. The last item that we have successfully managed without is a fake zero, added after every six consecutive units. Since we have byte reception deployed to the linear body of the loop, you have to check after each bit, in all eight places. Consider the first two bits as an example:

 unstuff0: ;1 (  breq) andi x3, ~(1<<0) ;1 [15]  0-  .     mov x1, x2 ;1 [16]      () in x2, USBIN ;1 [17] <-- 1-   .     ori shift, (1<<0) ;1 [18]  0-   .1      rjmp didUnstuff0 ;2 [20] ;<---//---> rxLoop: eor shift, x3 ;1 [0] in x1, USBIN ;1 [1] st y+, shift ;2 [3] ldi x3, 0xFF ;1 [4] nop ;1 [5] eor x2, x1 ;1 [6] bst x2, USBMINUS ;1 [7]     0-   shift bld shift, 0 ;1 [8] in x2, USBIN ;1 [9] <--  1- (, ) andi x2, USBMASK ;1 [10] breq se0 ;1 [11] andi shift, 0xF9 ;1 [12] didUnstuff0: breq unstuff0 ;1 [13] eor x1, x2 ;1 [14]; bst x1, USBMINUS ;1 [15]     1-   shift bld shift, 1 ;1 [16] rxbit2: in x1, USBIN ;1 [17] <--  2-  (, ) andi shift, 0xF3 ;1 [18] breq unstuff1 ;1 [19] didUnstuff1:

For the convenience of navigation, the addresses of the described commands will be counted by the labels on the right. Please note that they were introduced for counting the clock cycles of the controller, so they are not in order. The next byte is read on the rxLoop label, the previous byte is inverted and written to the buffer [0, 3]. Next, on the label [1], the status of the D- line is read, according to XOR with the previous accepted state, we decode NRZI (I recall that ordinary XOR adds its inversion, to fix which we enter the mask register x3, initialized with units 0xFF) and write to 0- ith bit of the shift register [7,8]. Then the fun part begins - we check to see if the received bit was the sixth unchanged. The constant bit received with D- corresponds to writing zero (and not one! We will change to one at the end, XOR) in the register. Therefore, you need to check if bits 0, 7, 6, 5, 4, 3 are zeros. The remaining two bits do not matter, they remained from the previous byte and were checked earlier. To get rid of them, we cut off the register by the mask [12], where all the bits of interest to us are set to 1: 0b11111001 = 0xF9. If after applying the mask all the bits turned out to be zeros, the situation of adding a bit is fixed and there is a transition to the unstuff0 label. One more bit [17] is read there instead of what was previously read, in the interval between other operations, of an excess [9]. We also swap the registers of the current and previous values x1, x2. The fact is that on each bit the value is read in one register, and then XOR is with another, after which the registers are swapped. Accordingly, when reading the incremental register, this operation also needs to be done. But the most interesting thing is that in the shift data register we write not the zero, which we received honestly, but the unit that the host tried to transfer [18]. This is due to the fact that when receiving the next bits, the value of zero will also have to be taken into account, and if we wrote down zero, the mask check could not find out that the extra bit has already been taken into account. Thus, in the shift register, all bits are inverted (relative to the transmitted by the host), and zero is not. To prevent such a mess in the buffer, we will perform a reverse inversion according to XOR not with 0xFF [0], but with 0xFE, that is, a register in which the corresponding bit will be reset to 0 and, accordingly, will not lead to inversion. To do this, on the sample [15] and reset the zero bit.

A similar situation occurs with bits 1-5. Say, the 1st bit corresponds to check 1, 0, 7, 6, 5, 4, while bits 2, 3 are ignored. This corresponds to the mask 0xF3.
But the processing of 6 and 7 bits is different:

 didUnstuff5: andi shift, 0x3F ;1 [45]   5-0 breq unstuff5 ;1 [46] ;<---//---> bld shift, 6 ;1 [52] didUnstuff6: cpi shift, 0x02 ;1 [53]   6-1 brlo unstuff6 ;1 [54] ;<---//---> bld shift, 7 ;1 [60] didUnstuff7: cpi shift, 0x04 ;1 [61]   7-2 brsh rxLoop ;3 [63] unstuff7:

The mask for the 6th bit is the number 0b01111110 (0x7E), but you cannot superimpose it on the shift register, since it will reset the 0th bit, which must be written to the array. In addition, at the countdown [45], a mask was already superimposed, resetting 7 bits. Therefore, it is necessary to process the additional bit if bits 1-6 are equal to zero, and the 0th one does not matter. That is, the value of the register should be 0 or 1, which is perfectly checked by comparing "less than 2" [53, 54].

The same principle was used for the 7th bit: instead of applying the 0xFC mask, a check is performed for “less than 4” [61, 63].

Step 7. Sort the packages

Since we can receive a packet with the first byte (PID) equal to 0x2D (SETUP), we will try to sort the received one. By the way, why did I call the package 0x2D SETUP when it seems to be ACK? The fact is that USB transmission from the least significant bit to the most significant one is carried out within each field, and not byte, while we accept byte-by-byte. The first significant field, PID, takes up only 4 bits, followed by 4 more CHECK bits, representing a bitwise inversion of the PID field. Thus, the first byte received will not be PID + CHECK, but rather, CHECK + PID. However, there is not much difference, since all values are known in advance, and it is not difficult to rearrange the nibbles in places. Right away, we’ll write the main codes that may be useful to us in the usbconfig.h file.

Until we started to add the PID processing code, we should note that it should be fast (that is, in assembler), but alignment by clocks is not required, because we have already accepted the packet. Therefore, subsequently this section will be transferred to the asmcommon.inc file, which will contain assembler code that is not tied to frequency. In the meantime, just highlight the comment.
Now let's move on to sorting the received packets.

Theoretical insert
Data packets on the USB bus are combined into transactions. Each transaction begins with the sending of a special marker packet by the host that carries information about what the host wants to do with the device: configure (SETUP), transmit data (OUT) or receive it (IN). After the marker packet is transmitted, a pause of two bits follows. This is followed by a data packet (DATA0 or DATA1), which can be sent by both the host and the device, depending on the marker packet. Next, another pause of two bits in length and the answer is HANDSHAKE, a confirmation packet (ACK, NAK, STALL, we will consider them another time).
SETUP DATA0 Handshake
host-> device pause host-> device pause device-> host

OUT DATA0 / DATA1 Handshake
host-> device pause host-> device pause device-> host

IN DATA0 / DATA1 Handshake
host-> device pause device-> host pause host-> device

Since the exchange takes place on the same lines, the host and device have to constantly switch between transmission and reception. Obviously, the two-bit delay is precisely for this purpose and is made so that they do not start playing push-pull, while trying to simultaneously transfer some data to the bus.

SETUP		DATA0		Handshake
host-> device	pause	host-> device	pause	device-> host

OUT		DATA0 / DATA1		Handshake
host-> device	pause	host-> device	pause	device-> host

IN		DATA0 / DATA1		Handshake
host-> device	pause	device-> host	pause	host-> device

So, we know all the types of packages needed for exchange. We add a check of the received PID byte for compliance with each. At the moment, the device is not yet able to write even such primitive packets as ACK to the bus, which means it is unable to tell the host what it is. Therefore, commands like IN cannot be expected. So we will only check the reception of the SETUP and OUT commands, for which we will register the inclusion of the corresponding LEDs in the corresponding branches.

In addition, it is worth making the sending of logs beyond the interrupt, somewhere in main.

We flash the device with what happened after making these changes and observe the following sequence of received bytes:

 2D|80|06|00|01|00|00|40|00 C3|80|06|00|01|00|00|40|00 2D|80|06|00|01|00|00|40|00 C3|80|06|00|01|00|00|40|00

And besides - both burning LEDs. So, we caught both SETUP and OUT.

Step 8. Read the address on the envelope

Theoretical insert
Marker packets (SETUP, IN, OUT) serve not only to show the device what they want from it, but also to address a specific device on the bus and to a specific endpoint inside it. Endpoints are needed in order to functionally highlight a particular subfunction of a device. They can vary in polling frequency, exchange rate, and other parameters. Say, if the device appears to be a USB-COM adapter, its main task is to receive data from the bus and transfer it to the port (first endpoint) and receive data from the port and send it to the bus (second). In terms of meaning, these points are intended for a large flow of unstructured data. But besides this, from time to time the device must exchange with the host the status of control lines (all kinds of RTS, DTR and others) and exchange settings (speed, parity). And here, large amounts of data are not expected. In addition, it is convenient when service information is not mixed with data. So it turns out that for a USB-COM adapter it’s convenient to use at least 3 endpoints. In practice, of course, it happens in different ways ...
An equally interesting question is why the device is sent its address, because apart from it, you still can’t stick anything into this particular port. This is done to simplify the development of USB hubs. They can be quite “dumb” and simply broadcast signals from the host to all devices without worrying about sorting. And the device itself will figure it out, process the packet or ignore it.
So, both the device address and the endpoint address are contained in the marker packets. The structure of such packages is shown below:
field
field SYNC addr endpoint CRC Eop
USB bits 0-7 0123456 0123 01234 01
received bits 0123456 7012 34567

, - ( - PID = SETUP OUT) (IN) , .

field	SYNC	addr	endpoint	CRC	Eop
USB bits	0-7	0123456	0123	01234	01
received bits		0123456	7012	34567

, (-) (Handshake) :

: , , NAK
-: SETUP OUT, , IN — ,
. , , ,

« — » . PID', , . «PID» . usbCurrentTok. PID' (DATA0, DATA1) , . , ? : , ( 0 usbCurrentTok ), , . ( SE0) , - , D+, D- . , SYNC, . , , . «» , . .

, . x3, (, , , ).

, USB , , . , , , CRC ( ). , [21]. 0- . , [26]. , CRC, .

9.

, , « », ACK. NAK', ( cnt — ). USB , , SYNC PID. Y, cnt ( ). , — ACK. x3 — 1 , . x3 ( r20) 20.

( SETUP, ), ACK' , , , . , .

, D+, D- ( ), — . XOR , , , , - .

, , , , . , , , . . vusb : txBitloop 2 ([00], [08]). 3 , 6 . , . 1 3 : 171. ( 171, 11 , ), — , . cnt=4:

4 — 171 = -167 = ( ) 89 (+ )
89 — 171 = -82 = ( ) 174 (+ )
174 — 171 = 3. ,
, .

, 3 , 1. 6 , , x4. D+, D- , . .
:

 2D|80|06|00|01|00|00|40|00 69|00|10|00|01|00|00|40|00

C3 . , , UART . , , IN , . , .

10. NAK

NAK , . , . , - .

, . , , - , . usbRxBuf, . , — , USB_BUFSIZE. usbInputBufOffset, . .

NAK handleData , [22]. (usbRxLen), - . ( — ), usbRxLen, , — usbRxToken, SETUP OUT - . : , , ACK .

. , , - , -, . ? , , , , - .

 2D|80|06|00|01|00|00|40|00

, NAK`, , .

11.

, , . — . , , , , , . . . , USB, usbPoll. — , . — . SETUP , PID CRC, SETUP 5- , 16-. 3 «» . «» PID usbRxToken, CRC , , . usbProcessRx, , .

, , — , SE0. , USB .

. SETUP, . . SETUP usbRequest_t 8 . : ( USB-) , - . , . .
, , , .

12. SETUP'

, , . . usbDriverSetup, . , . , ( , , ) . , : ACK NAK, .

13.

, SETUP + DATAx, DATAx 8 . IN DATAx, . , . , ACK NAK. , . — usbTxBuf, , usbTxLen . low-speed USB 8 ( PID, CRC), usbTxLen 11. PID, , . , 16, , 0x0F, . PID , . IN, , (handshake , ).

:
SETUP + DATAx, ACK NAK . , , usbPoll, , ( PID=DATA1 ( DATA0 DATA1 , , DATA1). CRC . , , - . — 4 . , 3 , 4. , SYNC . « IN NAK?» NAK. , , DATA1 .

, — USBRQ_SET_ADDRESS ( , ). . (drvsdm.S, make SE0). , , , DATA1 , , . , , , , , . , , .

14.

, . , USBRQ_GET_DESCRIPTOR USBRQ_SET_ADDRESS, , . usbDriverDescriptor, . , USBRQ_GET_DESCRIPTOR. , , :

USBDESCR_DEVICE — : USB (1.1 ), , , . .
USBDESCR_CONFIG — , , . .
USBDESCR_STRING — , .
, , USBDESCR_DEVICE, , .

15.

. -, . , - - , , HID, , . Vendor ID Product ID, USB, . , vusb .

, , - . , , , (, ) usbMsgPtr, — len, usbMsgLen. ( ) 18 , 8. , , 3 . - , STALL.

usbDeviceRead. , memcpy_P, , , .

, , , . , , .

, , .

PID' DATA0 DATA1 . PID' , , - .

, DATA0 / DATA1 ( ), , , 3 , . XOR PID', . , , XOR' . PID DATA1, XOR PID , XOR DATA0 .

, , USBDESCR_CONFIG.

16. - !

USBDESCR_CONFIG USBDESCR_DEVICE. ( , ) . , - USB-, , D+, D-.

, : , , . , ( , ). , UTF-16, . USB UTF-8 .

vusb , lsusb . VID, PID , . , VID, PID, — .

, , ( ). SETUP: , , . 0, , — . , , , .
.

17. (HID)

HID — human interface device, , , . HID , . , , , , , . «» . HID ( low-speed 800 ), .

HID , USBDESCR_HID_REPORT. vusb, . , usbDriverSetup ( ) usbFunctionSetup ( ). , SETUP, OUT. , , , usbFunctionWrite.

, usbDeviceRead usbFunctionRead, . , , usbFunctionSetup ( , ) USB_FLG_USE_USER_RW, usbDriverSetup .

— — usbFunctionWrite usbFunctionRead. . — , .

usbDriverSetup.

18.

, , . HID, , , ( udev - ). , , . , , , .
UPD: ramzes2 , HIDAPI

19. vusb

vusb , .

drvasm.S - usbdrvasm.S asmcommon.inc, -, , usbdrvasm12.inc — usbdrvasm20.inc.

main.c main.c ( ) usbdrv.c ( vusb)
usbconfig.h ( ), , , usbconfig.h.

Conclusion

vusb, , , . , , . . , , , USB-HID. , , , vusb, , , , .

https://www.obdev.at/products/vusb/index.html ( vusb)
http://microsin.net/programming/arm-working-with-usb/usb-in-a-nutshell-part1.html
.. USB:
https://radiohlam.ru/tag/usb/
http://we.easyelectronics.ru/electro-and-pc/usb-dlya-avr-chast-1-vvodnaya.html
http://usb.fober.net/cat/teoriya/

PS - (, ) ,

Source: https://habr.com/ru/post/460815/

All Articles