DMA in general and in particular

Would know where you fall, would spread straws

The existence of DMA (Direct Memory Access) - Russian-language PDP (Direct Memory Access), many developers of embedded devices have heard, but they use it much less frequently than it (PDP) deserves. By the way, I will mention this abbreviation, not because I am such a stubborn patriot and opponent of English-language borrowings, but only because I am too lazy to switch the keyboard layout once again.

There are three main reasons for insufficient use of the RAP in MK programs: 1) the relative complexity of this device, which, together with 2) lack of understanding of the benefits of its use, leads to the unwillingness of this device to study and master (as they say in such cases, the older sister does not order - for those who is in the tank is about the laziness that was born before us), burdened by 3) the lack of good and understandable examples of the use of PDP in manuals supplied with the MC. And if the first two reasons are clearly subjective, then the third one is undoubtedly objective and paranoid wakes up inside me and insists that it was done specifically to prevent domestic developers of the MC at advanced levels, somewhere above 60 (the fact that and the rest of the developers around the world, paranoid is ignored, because either 1) the right examples are distributed outside of Russia, or 2) for the great purpose of not getting up, you know who, from the knees of the burzhuiny are ready to go for any rtvy).

Nevertheless, no joke, indeed, in the examples, at best, there is a setup module for a single PDP channel, and you will not find a coherent system with a PDP driver (you will not find an even reason in CMSIS, I’ll write a post about I will mention it). Why, in fact, I don’t know, but the crystal developers know better, the only rational justification that comes to my mind is that the PDPs are quite specific, so “you can’t just transfer the code from another source, but because the low demand for RAPs in real-world development; the absence of such examples is not considered a significant drawback. This post is filled with the knowledge gap indicated by me (an indiscreet statement, but if you do not praise yourself, you go all day as a spat), so those whom I intrigued can press a button.

Nevertheless, I must warn the impatient reader that he will not find a silver bullet here, which can be safely included in his designs, and only (but this is not a little) will find some thoughts and approaches that will facilitate his construction of his own systems on the MC with using the RAP. That is, I will check the boxes in those places where the rake lies exactly, but I do not guarantee that there will be no unlabeled rake, which, however, does not prevent you from throwing up the flags in the FIG and walk on the rake yourself. In general, those who read my posts probably noticed that I was not focusing on what should be done and how, but why I would recommend to do so.
')
So, the PDP is a part of the MC hardware that allows data transfer between different components of a given MC (and the system with its participation) without using processor resources (more precisely, with minimal involvement, since doing anything in the MC system WITHOUT processor involvement - a bold and far-flung thought.

The main idea of these devices is that when executing a program, not every command causes a system bus to access external devices or main memory (access to program memory is not considered), so the device serving the system bus has idle cycles that it would be nice to use something.

Another reason contributing to the creation of the RAP was the appearance of fast-acting I / O devices, the maintenance of which in terms of poling required a significant amount of CPU time, and the maintenance of interrupts was almost impossible. The so-called peripheral processors in the architecture of mainframes with local memory should be considered the forerunners of the PDP, which were then transformed into real PDPs that solve data transfer problems between high-speed I / O devices and the main RAM and are part of the hardware that controls a specific external device.

There was a picture (for example, in the Far Eastern Military Committee), when one WU worked according to the RAP, and the second - by interruptions or even in the polling cycle (the MX driver, if anyone remembers). When the first MKs appeared in which the memory was integrated into the chip, the creation of external devices with PDP in relation to the chip became a very nontrivial task and, naturally, the PDP migrated inside the MC and became part of its architecture. This process did not have a linear progressive character and can be found in both the MK based on the 51st architecture with support for the RAP and the MK based on the Cortex-M3 with a rich set of peripherals, but without the support of the RAP (for example, Stellaris). Nevertheless, in most modern MCs based on the ARM RAP is present and we can proceed to study them and, to begin, dwell on the consideration of their features.

As mentioned above, the main purpose of the RAP is to transfer data between different devices with minimal processor involvement. As a rule, one of these devices is the main memory of the MC, although sometimes it is possible to transfer between two external devices and from memory to memory. Some PDPs support such capabilities, some are not, and this should be considered when choosing a MC.

The next important characteristic is the number of simultaneously serviced transmission channels - in the sense, of course, that only one piece of data can be transmitted over the bus at a time, but there can be several requests that will be serviced alternately, again with minimal CPU resources. Naturally, the minimum number of PDP-1 channels (smaller number of channels will hardly be efficient), the maximum number is practically unlimited, the number of external devices + 1 (for memory-memory requests) can be a real limitation, but nothing (except for cost) can prevent make us as many channels as you like.

The following feature can be considered as a mechanism for servicing requests in one channel. Let us dwell on this moment in more detail and analyze in general terms the work of one request for one PDP channel.

In order to complete the transfer, we should inform the RAP about our intentions, that is, from where we want to transfer data, where they should be sent, how much data should be transferred, transfer modes (more on that later), and possibly proprietary information. We can put this information either in the registers responsible for the operation of the RAP (the outdated method and then we will understand why) or in the operational memory of the system, and in the latter case it must be either a special dedicated area of memory so that the RAP knows where to get the information , or in some kind of the PDP register, information should be placed on where the data is located in the RAM (in the usual language, we should place the link). It is the latter method that is used because it has the following advantages: the MK's RAM is a valuable resource and it is not good practice to segment it into deterministic blocks.

So, after the specified transfer is made, we will receive a signal about the end of the transfer (most likely in the form of an interrupt) and we may well need to start the next transfer as soon as possible. If we set the transfer parameters in registers, then either we cannot modify them until the end of the current transfer, or make shadow registers. If the register contains a link to the data defining the transfer (TCB - Transfer Control Block - Transfer Control Unit - BUP), then we can configure the BUP in advance for the next transfer and at the end of the current transfer just change the link in the PDP register .

In practice, there are various combined schemes, mainly to save MK equipment - a pointer to a memory area in which 2 or more PCBs are located, cyclically switching as they are executed, a pointer to the next PCB in the service field of the current PCB, hierarchical access when the PCB contains indications of the PCB sequence, which are actually executed and so on.

In the cases described above, we will also need a special tag, indicating the completion of all the programs requested so far, which will be among the proprietary information. Among other overhead information, there are often tags that indicate the transfer mode (the width of the transmitted data, the step and direction of changing the addresses of the receiver and the sender, information for the memory management module ...).

We now turn from consideration of the PDP in general to the consideration of a specific implementation, namely, to Milandr’s company MK 1986EV1T.

Why precisely to him? Well, firstly, I work with him, secondly, he has a fairly rich PDP, and thirdly, he has rich features (this is not a bug, this is such a feature - yes, it is from this category) that make working with MCs is a fascinating occupation, after which work with counterparts from well-known manufacturers will seem easy and simple.

First, about good - PDP supports 32 independent access channels - one for each external device as part of the MC and one more for memory-to-memory transfers. In addition, as you already understood from the previous sentence, the PDP supports all possible transfer modes, namely: WU-memory registers, WU-registers, WU registers, memory-memory, and if it seems to you that this should be so, then not so, and the different modes function slightly differently and are not implemented everywhere.

Further, the PDP supports different data formats: 1 byte, 2 bytes and 4 bytes (a word in our architecture), as well as different types of address increment for a separate source and receiver: 1, 2, and 4 (decrement is not supported). RAP has a system of arbitration of serviced channels with the possibility of assigning flexible priorities for each channel and a customizable size of an elementary transaction (the number of transfers of one channel according to which the arbitration is performed). In addition, each channel can have up to 2 BUPs, which can be changed according to the cyclic system, or work in a hierarchical mode, while a one-time mode is also possible.

To keep a balance, let's say about less good. Again about the documentation - if you don’t know how the RAP works, then you don’t know exactly about this from the company’s documentation. The documentation is clearly translated, there are translation errors that distort the meaning, there are also very vaguely described places, but in general it can be enough for a trained developer if you are used to thinking about the author. Of the more significant shortcomings (of course, compared with the ideal device) is the significant cost of the bus resource (6 accesses per shipment, although maybe I misunderstood something) and a number of features, which are a little later.

And here we are waiting for:

Ambush # 1 from the developers - the interruption from the end of the transfer does not appear in external devices. That is, we have one interrupt vector at the end of a transaction from any of the programmed channels. Moreover, there is no register in which the number of the channel that completed the transaction, or at least a bit register with flags, would be stored. That is, the only way to determine the number of the channel that completed the transfer is to loop through all the channels and watch the corresponding TCB fields, and we will have to do this in the interrupt handler, which should take minimal time. The promising solution - to transfer the search channel in the lower half of the driver processing does not pass, because we expect:

WITHOUT number 2 from the developers - the interruption is potential and we cannot stop the work of the upper half without explicitly dropping it. Moreover, there is also

WITNING number 3 from them - we can not reset the interruption by manipulating the RPS registers, and must carry out a reset permit for generating requests in the registers of the external device. Yes, yes, exactly, the PDP driver should know something about the composition of the registers of serviced devices, a more monstrous violation of the principle of encapsulation (and it is also true for designing equipment) is hard to imagine. I do not know that the developers of the PDP smoked, but, as Haiduk wrote, “something very interesting”. That is, you may not believe, but if we prohibit the passage of requests of the corresponding channel and prohibit its processing, then we still DO NOT RESET the interruption.
Something turned out more letters than planned, so let's leave readers to reflect on the difficult situation in which the main character of the story found himself, which is especially important on Friday evening, and I'll write myself. To be continued ...

Source: https://habr.com/ru/post/228531/

All Articles

DMA in general and in particular

Would know where you fall, would spread straws

More articles: