Disassemble and reassemble the USB stack

Illustrated projection of the OSI network interconnection model on a universal serial bus.

Three "wonderful" levels of the USB stack

I was not satisfied with the view of the USB stack, which can be found most often in the network:

Not very useful USB stack

Bus level, logical, functional ... These, of course, are wonderful abstractions, but they are more likely for those who are going to make a driver or application software for a host. On the side of the microcontroller, I expect a sample state machine, in the nodes of which we usually embed our useful code, and it will first be buggy according to all the laws of the genre. Or the software on a host will be buggy. Or a driver. In any case, someone will fail. In libraries MK, too, with a swoop can not figure out. And here I look at the traffic on the USB bus with the analyzer, where the events in an unfamiliar language with three remarkable levels do not tally at all. Interestingly, is it in my mind from the flu fever in my head that dissonance?
')
If the reader had similar feelings, I suggest an alternative vision of the USB stack, which came to me unexpectedly clearly in an overheated brain, based on the favorite 7-level OSI model. I limited myself to five levels:

I do not want to say that all software and libraries have already been made or should be designed on the basis of this model. For engineering reasons, the code with levels will be strongly mixed. But I want to help those who are starting their acquaintance with the USB bus, who want to understand device exchange protocols and subject area terminology, get closer to ready-made examples, libraries, and better navigate them. This model is not for download in the MK, but in your brilliant minds, dear friends. And then your golden hands will do everything themselves, I have no doubt :)

So, go, correct, if you see the shoals. This is a draft version, and if it has already been drawn somewhere, I ask you to forgive, I did not find it and therefore I tied it myself. I think the picture will not run away, but for the time being I will explain to the venerable public why I even took up this publication.

Regular Flashback from the Nineties

I shook out my first bug out of someone else’s code in the late nineties, when I was a student in a part-time job. It was pppd under FreeBSD, which we then screwed onto a modem pool. Motorol modems stuck in the end, nobody could get through, the line disappeared in vain, and for some reason, the only remaining way through PPP keep-alive was buggy. That's when I found out that pppd is for some reason waiting for six LCP bytes instead of four ones. Then I felt myself a kind of dashing zhukotryas from the nineties :-) What does the PPP have to do with it? It just looks like USB: batch and point-to-point. True, unlike USB 2.0, full duplex.

Whether we like it or not, the evolution of microcontrollers is clearly not going to stand in place. No, no, yes, and there is a glimpse in publications ( http://habrahabr.ru/post/208026/ , http://habrahabr.ru/post/233391/ ) "heavy peripherals" - mounted on the USB implementation bus, with parsing examples, using HID, etc. We must pay tribute to the author of RaJa : out of eight examples given in the standard library STSW-STM32121 (UM0424) and somehow documented , he chose the most useful (Custom HID), ported it into a free environment Em :: Blocks, outlined in understandable language, a little embellished, bravo! It saved me a lot of time.

How do I get to the library?

Having received on the GitHub the project RHIDDemo for Em :: Blocks, kindly laid out by the author, I started porting it to Keil (my FTDI-based CoLink debugger; somebody, tell me the Coocox plugin for Em :: Blocks). But he could not understand in any way: where the hell did the author get the SPL 3.6.1 release of 2012, if the site has 3.5.0 of 2011? I went through a rather boring quest, which to my surprise led ... directly to the finished Custom HID project for Keil as part of the USB FS 4.0.0 library. Lies in public view, like a mouse under a broom. Well, okay. But I finally lit up the releases of STMicroelectronics, found the description of the USB FS library STSW-STM32121 (UM0424) and stopped the attempts of the developer to drive me crazy. Tell me, is it okay to put a vintage CMSIS 1.30 of the sample of 2009 into the SPL 3.5.0 set of the release of 2011, the new SPL 3.6.1 of the release of 2012 to hide in USB-FS 4.0.0 of the release of 2013 (putting the same CMSIS 3.0.1 from 2012), despite the fact that they also posted the latest version of CMSIS 3.30 release 2014? By the way, in SPL 3.6.x for STM32F10X fixed a couple of USART bugs related to buffer overflow signals. Thank you, though release notes left ...

HID vs SNMP

So, having taken the STM32F103C8T6, I also decided to slightly move on the USB HID topic, the USB HID abstraction painfully fits into the concept of all sorts of sensors, sensors and other PWM-driven power drivers. Something reminded me of SNMP, only in a highly simplified form: HID descriptors play the role of an SNMP MIB. When the device is initialized by the host: “Hello, host! I'm a coffee maker. I have a [start] button, regulators [cream], [sugar], sensors [coffee residue], [water residue], [sugar residue], [cream residue]. Pull up the driver, push the button, drink some coffee. ” Nothing like? An example of an SNMP conversation: “Well, hello, management station with software for $ 100,000. And I have a switchboard chassis for $ 200,000, and I have 4 more modules for $ 100,000 each; in each of 16 more ports with indecent speed, and all the functions here simply do not list ... ask separately for each item; oh, yes, the processor load is so, the memory is so much ... ". And a dozen more pages in the same spirit.

I liked the idea of HID. But it was enough to get out of Windows beyond the training tasks of flashing LEDs (forward to real UNIX environments!), As I began to see through all the unused gaps , and I felt like some kind of helpless lamer. Debugging the project, I instinctively grabbed some sort of tcpdump (and called: usbdump (8) , or usbmon ), but I saw only messages in an unfamiliar language.

It became obvious: there is a lack of fundamental knowledge about the USB bus. If the OSI model and the TCP / IP stack are understood by any grated IT specialist somewhere at the level of the spinal cord simply because of necessity, then the situation with USB is different. This is understandable: there you can (need) to spy traffic through the same tcpdump and configure hardware with software, and then complete plug and play, and you can fix something by updating the driver or firmware (or reinstalling the OS). But you and I are here just to make good firmware, aren't we? After reading some USB descriptions on the network, I was surprised how confusing the documentation could be. I even had the feeling that they specifically wanted to lead us astray, letting fog and get rid of competition in the bud. I do not agree with this state of affairs!

Another great scheme

In the open spaces of the network I met another such illustration (it was in BMP format, no joke):

At first it looks optimistic. Finally, the stack is disassembled. Frames, however, are marked unsuccessfully: I would draw them with vertical dashed lines, and EOF is just a pause, in fact, the data is not transmitted. But we start to read the context and ~~lose the~~ understanding of the author’s true intention (to confuse us):

The USB bus interface host controller frames ;
Frames are transmitted by sequential transmission of bits using the NRZI method.

And here's another:

each frame consists of the highest priority parcels , the composition of which forms the host driver;
each transfer consists of one or more transactions;
each transaction consists of packages ;
each packet consists of a packet identifier, data (if any) and a checksum.

It seems to be drawn and everything is correct, but as you read the questions becomes more and more. Is the minimum bus data structure transmitted — is it a frame or a packet? In general, it is necessary to look from the top down or vice versa? And what is encoded by the NRZI method - frames, packets, or just the entire bitstream over the bus? Transaction consists of a parcel, transmission, or, maybe, a valuable parcel post?
Why can't it be simple: the host groups the packets in a transaction and distributes them in time slots, called frames, to give priority to time-critical data (video, audio) based on the current bus bandwidth? Yes, there are some nuances in USB with scheduling the transfer of packets, I haven’t affected them yet.

My vision of the USB stack

I consider as good documentation the USB in a NutShell mentioned here on Habré (hurray, translation ), as well as USB Made Simple . According to them, I collected my version of the USB stack, I will draw it again.

Physical level

At the physical level, a set of electrical modes of a differential pair of conductors (along with earth) is used to denote the states by which the bitstream is encoded using the NRZI method with bit stuffing : here after six consecutive "1" (well, I wanted to transfer, say , 0xffff) is inserted "0", so that the receiver will not stick for a long time in one state; the receiver recognizes the inserted “0” and as the data does not count, this is a fairly common trick in coding for better auto-tuning of frequencies. A pair of wires together with the ground makes it possible to form at least four static states (they are denoted by J, K, SE0, SE1). In USB 2.0, the SE1 is not used, and the three remaining ones are additionally played in dynamics (with clocks and transitions) to transfer several more control characters (packet boundaries, reset, connect / disconnect, power saving / exit). Good illustrations are in USB Made Simple, Part 3 - Data Flow .
Those. as a result, data is transmitted in the form of zeros and ones, plus any control characters, so that normal data packets can be prepared from the entire electrodynamic kitchen.
(supplemented at the request of readers)

Batch level

At the packet level, addressless packets are transmitted between the host and the device (a pair of devices on a half-duplex line can do without addressing). A packet consists of a SYNC token for synchronizing receiver ticks, a sequence of bytes, and an EOP character. The packet length is variable, but is negotiated through the upper levels of the stack. The first byte is called Packet Identifier (PID), it has a simple redundant format for noise immunity and is suitable for feeding to the next level machine (for assembling transaction from packets). Packages with a filling (longer than one PID byte) are supplied with a checksum (short CRC5 or long CRC16, depending on the type of packet). A protocol analyzer should, at a minimum, show us packets.

Transaction level

At the next level, transactions are collected from packets . A transaction is a small set of packets (in Full Speed USB 1, 2 or 3) that follow one after another, which (in half-duplex mode) the host exchanges with the endpoint (endpoint), and only with one. It is very important that the transaction is opened only by the host, this is USB specific (we have less trouble in the MK firmware). At the transaction level, you can talk about the pipe (pipe) between the host and one of the device’s endpoints, but I deliberately avoid the term “data link” (Data Link) from the OSI model. The protocol analyzer must at least decode transactions.

Gear level

Above the transaction we place the level of transfers (transfers). Four types of them are used in USB: control with end point # 0 (control transfers), transfers with interrupts (interrupt transfers), isochronous (isochronous transfers) and large-block transfers (bulk transfers). The last three are variants of stream channels (stream pipe), about which I will say a few words. This level should also display a good protocol analyzer.

Application layer

Crowns the stack, as usual, the application layer. Here they are: setting the address to the device by the host, telling the device about itself in the language of descriptors, host commands to select a configuration (control transmissions), exchanging data with HID devices (in the examples I found the transmission with interrupts, I want to try the control one), print to a printer and scanning, access to USB storage (large-block), communication via headsets and webcams (isochronous), and many other wonderful things.

Finishing touch

Having run down by levels for a second, you can add that the host periodically throws those same Start of Frame (SOF) packets on the bus, breaking the time into equal intervals, but not to split the transactions themselves. Therefore, SOF packages can be considered independent transactions. Do not confuse the frame (frame) USB with the homonym of the data link layer of the OSI model. It is better to remember the frames (frames) of an audio CD, it's just a time slice: the host is “ticking” into the bus with SOF packets, so that the connected devices plan in advance to participate in the so-called isochronous transmissions that drive real-time data streams. Well, or like this: groups of transactions are scheduled by the host at time intervals called frames. The frame is 1ms for Full Speed and 125µs for High Speed USB, but High Speed is a more complex standard, it is better to study it separately.
UPD:
A good question was asked by the readers: what about fragmentation? I did not find in USB 2.0 signs of fragmentation at the transaction level and below, i.e. transactions for this purpose are to be transferred entirely. Transfers in some cases can and should be broken up into several transactions, especially taking into account isochronous modes. And I will repeat that all planning is in our hands while the host is in charge (on the MK side, you have to think less).

We look at the traffic via USB

A good selection of illustrations is in the mentioned book USB Made Simple, Chapter 5: www.usbmadesimple.co.uk/ums_5.htm

Here is one of them

So, the transaction is always initiated by the host in relation to one selected end point on the device (in addition to the special point number 0, there can be up to 15 more on one device, for example, a combined keyboard with a mouse, a thermometer, a flash drive, a coffee maker and an order ~~plumber~~ button pizza).
In case the host receives data from the device, the latter cannot open the transaction itself, but can only wait for the right moment and participate in it. The host opens a transaction to the device with a package with PID = IN (Token group) and guarantees freedom of the bus at the right time, the device drops a packet from the Data group, depending on the type of transaction the host can confirm success with a third packet from the Handshake group (ACK, NAK, STALL, NYET ), the transaction is closed.
When sending data to a device (PID = OUT, Token group), the host opens a transaction, sends a packet with data (Data), and, depending on the mode, can receive a Handshake packet confirming the success of the transaction.
At the end of the transaction, everything will return to normal, the device will again wait for control packets from the host.

USB transfer modes in STM32 USB FS examples

In order for one pair of wires to be able to drive copying from a disc simultaneously with an audio-video stream, mouse gestures, and a speed oscilloscope signal, there are different types of messages and broadcasts.
Just above, I just described a simple streaming channel (Stream Pipe) between the host and the endpoint, where the packets with the filling (Data groups) do not carry any special or control information to the USB subsystem itself. Full freedom of correspondence, the controller's library should provide primitives for uploading a buffer of arbitrary size from the MK memory to the host or back. Cutting into packets, forwarding, and "defragmenting" let the MK library work together with the host driver. In STM32, this is USB_SIL_Write () and USB_SIL_Read (), described in UM0424. They are the very logical level of abstraction. On the host side, see the description of the corresponding driver (for example, on FreeBSD, this is ugen (4) ).
However, I consider using heavy peripherals like USB to organize a simple streaming channel to be sacrilege (the question is: what did USART not please?). But situations, of course, are all sorts.
In any case, so that the USB subsystem comes to life at all and the device is determined, the exchange of control transactions is required.

DISCLAIMER

Further examples will be mentioned from the same UM0424 library for working with Full Speed USB from STMicroelectronics, but they are designed for their native demos. Take the example of the author Raja , show engineering ingenuity in adapting projects for your demo payment.

On software, everything is clear: these are examples not for industrial use, there may be bugs, some parts (such as the reference table in the Mass storage example) are protected by a patent, and you do not have the right to use them in a commercial project. But this is nothing, the Chinese manage to sell USB products on the market, which even the library VID and PID did not bother to change.

For iron, as I understand it, you need to start with quartz. I have a Chelyabinsk PinBoard II with 12 MHz quartz (all libraries are sharpened at 8 MHz), I changed the PLL multiplier from 9 to 6 ( link with explanations), otherwise the MC will accelerate to 108 MHz instead of 72 MHz, and USB to 72 MHz instead of 48 MHz will not go at all. You can still slow down the speed of MK to 48 MHz, changing the USB bus divider from one and a half to one. Specialists do not like to use the internal generator MK HSI: the frequency may slightly swim away from heating, I find it difficult to predict the consequences for USB. Well, do not forget about the periphery, of course. Without the SPI / SDIO flash memory from the Mass storage example, you can only make an analogue of / dev / null, but you can format it as hell :-)

Control transfers and message channels

Thinking about USB, I remember the good old PPP protocol with its LCP , IPCP , CCP and also xCP . Host exchange with end point # 0 of a special type of message is the local equivalent of xcCP.
Through check transfers, the device is initialized, gets an address, tells the host about itself in the language of descriptors (so that it can find and activate the necessary driver). Without control operations, simple streaming will not “go”, if the device does not respond in form, the host will quickly shut down the port: the protocol must be followed.
In principle, the protocol does not prohibit hanging on the checkpoint number 0 and the exchange of data, similar to the mode with interrupts. At the same time think: how will you update the firmware MK, so to speak, in the field? Is the programmer on hold? There is another solution.
Example: Device firmware upgrade

Interrupt transfers

This kind of ( interrupt transfer ) is designed for the exchange of small transactions, similar to the control. No, the device cannot interrupt the host, it waits for polling, their frequency and packet sizes are negotiated in advance in the device descriptor. Well suited for all kinds of consoles, sensors, sensors, mice, LEDs and other HID-coffee makers. The channel with interrupts of each point is unidirectional.
Examples: Custom HID , Joystick mouse , Virtual COM port

Isochronous transfers

ρόνος in Greek means "time." Isochronous transfer ( isochronous transfer ) - a local high-tech that allows you to control the flow of data in real time. It features guaranteed (but not necessarily broad) bandwidth and no confirming transactions, almost like UDP with QoS. Broken package? This god Chronos pushed MK along the leg. Do not try to send the package again, otherwise God will be upset. Checksums, however, check quietly from Chronos. Isochronous transfers are good for audio-video and real-time measurement systems, as well as other dual-use toys. Although some of them m. more interesting to hang any AVR, linking it with our ARM on USART or SPI. Isochronous operations are involved in frame signaling (recall the ticking of the SOF package).
Example: USB voice speaker

Large block transmissions

No, we will not carry bags of cement. I think everyone learned the mode of operation of various USB drives. Bulk transfer transmissions have the goal to send data as much as possible and faster, always with the transfer of broken packets, but without guarantees for bandwidth, yielding it to isochronous transfers if necessary (as in TCP without QoS). I have already told you about the internal structure of USB flash drives, now you can download and run a working prototype. I did not try it myself, but the table of SCSI commands in the description of the example (as if, by the way) is very symbolic. I did not find any signs of the wear control algorithm for NAND memory :-)
ATTENTION: STM patent protection is in place.
Example: Mass storage

What remains undisclosed

I don’t have a goal to make another USB tutorial, there are enough of them without me, and there they are well described: electrical part, protocol details, work with hubs, descriptor language and HID abstraction level, problems with unique VID / PID, USB 3.0 and many Other great USB bus features, both useful to us and not so great. IT specialists especially recommend an excursion to the dark side with an overview of enemy devices (a flash drive with a disguised HID-keyboard that will do scary things).

Links

Adaptation of the Custom HID example to the free Em :: Blocks environment and the STM32F103C8T6 budget demo from LC-Tech : habrahabr.ru/post/208026
Battle of the UPS: habrahabr.ru/post/233391 another battle for the UPS: habrahabr.ru/post/233391/#comment_7944489
Excursion to the dark side (spy device from AVR): habrahabr.ru/post/153571
Instructions for USB analysis in Wireshark for Windows and Linux: wiki.wireshark.org/CaptureSetup/USB
USB in a NutShell book: www.beyondlogic.org/usbnutshell/usb1.shtml
Translation USB in a NutShell: microsin.ru/content/view/1107/44
Book USB Made Simple (really simplified): www.usbmadesimple.co.uk
STSW-STM32121, STMicroelectronics USB library full speed device library and all examples mentioned (UM0424) www.st.com/web/en/catalog/tools/PF258157

PS
Reading publications on Habré, devoted in varying degrees to microelectronics, I saw two engineering castes, let's call them conditionally: Promelectronics and IT specialists . This is a kind of engineering Yin and Yang, in each of us there is a share of both.

Promelectrons have brilliant knowledge and skills on the gland, solder radio components as thick as their hair with their left hand with eyes closed (and then it works). Having looked at the electronic circuit, almost physically they begin to feel all its currents with potentials, they also work with power circuits and with (large, fast, dangerous) industrial products. The approach to programming MK is appropriate: it just has to give the right logic levels to the right legs at the right time, it’s not so important how. Conservative in technology (do not fit - it works), the heavy periphery of the MC is not particularly favored. When discussing object-oriented programming, information security, giant projects in a million lines of code and all the clever graphical interfaces will be boring.Instead of a packet-oriented USB bus, USART streaming mode is preferred, enhanced either by the usual RS-232 or more brutal RS-485 (serial bus for industrial applications, up to 10 Mbit / s at 15m, up to 100kBit / s at 1200m, up to 32 devices).

IT specialists are brought up on the understanding of operating systems, network infrastructure and complex interactions, the elite are well-versed in information security and understands all sorts of invisible ways of penetrating someone else's system. Some of them are very fond of cats (well, how can you not love them? I really do not hold, breed or cook :-). Many love the freedom of information, curse corporations / governments and conquer the forces of nature with an effort of thought. They are pathetically lazy, but they adore new technologies and twisted engineering puzzles with expensive toys (preferably solved at the level of software or, as a last resort, jumpers). Relationship with a soldering iron wary: do not ask the IT specialist if he loves a soldering iron, may misunderstand; better ask if he likes to solder electronic circuits.

What am I for? We just see this world in different ways ... After all, the same Linux guys have built the Linux kernel, from modules on C and assembler inserts for specific platforms, and they seem to have done without holivars. I see a really serious project as a multi-core system that combines the latest MKs with heavy peripherals, but I don’t exclude bundles with classical models of the AVR type: they can weigh some critical rapidly rotating points of technical progress. If the code is proven over the years, then why not?

Source: https://habr.com/ru/post/236401/

All Articles