Old Shakes: intercepting data stream between i386 and ATA controller using STM32

Good afternoon, dear habrovchane. In today's article, we indulge in nostalgia, work with the good old iron, and try to fasten to it no less good new things. And at the same time, let's remember how the ISA bus works and how the x86 processor communicates with the peripherals in general.

Introduction

A couple of days ago, my very first computer caught my eye — an old 386y, on an AMD processor (Am386-DX), with 4 megabytes of RAM, a VGA video card, and a multi-card that takes on the functions of a disk drive controller, hard drive, parallel and serial ports.
Of course, he had long been spared from his case and from the old, defunct hard - now he was just a motherboard with a pair of expansion cards. A few years ago, I connected a new hard drive to it, 10 GB (initially it contained only a 200 megabyte drive), on which I put FreeDOS.

However, this time he refused to boot further on the BIOS - judging by the sounds, 10 GB hard for several years lying in the closet had time to go after two hundred meters.
And then I woke up with a keen desire to do something with this computer, to touch this antiquity, with which I began my acquaintance with IT, already as a developer, and not as a user. Ideally, of course, I would like to make a hard disk emulator that works with an SD card, but we will gradually go towards this goal. Let's start with a simpler task - we will assemble a device that hangs in parallel with a real hard disk controller and logs data exchange in order to find out exactly how the old BIOS detects hard disks. Initially, I intended to do the same, but for the floppy controller, however, after my last hard drive died, all I had left was a BIOS that didn’t check for floppy. But he has a couple of items related to Hardy - the detection of hard drives and means for formatting them.
Of course, on the FPGA this is done very easily due to their architecture, but we will stick to the budget option and try to do it on the STM32F103 controller and a few discrete logic chips. So, let's begin.
')

Iron

Traditionally we will go from the hard bottom. Let's remember what the ISA bus is, which is the basis of old computers, and how you can connect to it. For those who do not really represent the circuitry inside x86 machines, this will help shed light on the architecture of such systems. In fact, everything is very simple - in the "clean" ISA there are no means of Plug & Play - they appeared only in the following standard - and, therefore, no means of issuing addresses to devices.
Thus, ISA-cards are devices with a hardware-defined address (rigidly defined circuitry, at best - with the ability to choose a base address with jumpers). The bus itself contains 20 address lines, 16 data lines, several power signals, several IRQ lines, and a set of control signals.

How does all this work? Suppose we need the ability to light / extinguish several LEDs on our device. To do this, we will place on our ISA board a register chip, for example, such as 74HC273 .
This is the most common 8-bit "latch", remembering what she was given to the input signal. The outputs of the register will connect to the LEDs and forget about them. From a software point of view, interaction with a device on the ISA bus can be implemented in two ways.

Using memory mapping - then we will decode the memory read / write signals and output the results to the bus instead of the DRAM controller - this is how the video card comes in, its video memory is mapped to the address memory space of the computer. Thus, writing to the memory of a video card for a computer is no different from writing to its RAM and is performed by the usual MOV command.
For those devices that do not need to transfer large blocks of data, the so-called “input-output space” is used - a separate address space allocated for peripheral devices and limited to 16 bits of the address. It is accessed by the IN and OUT commands (read and write to I / O ports)

In fact, behind the words "separate address space" lies a physically simple entity: there are 4 signals on the ISA bus - MEMW, MEMR, IOR, IOW . When executing a read / write to memory ( MOV ) command or read / write to IO ( IN , OUT ), the desired address is set on the same bus, lines A0-A19 ISA. The data also follow the same lines - D0-D15 . The only difference is that when reading from memory, the active level is set on the MEMR line, when writing to memory - MEMW , when reading from the IO port - IOR , when writing to it - IOW .
Thus, in order to make a simple device with one register and LEDs, we need to determine when the address we need is set on the bus (we remember that no one gives us addresses, we have to choose an address that will not conflict with the existing peripherals) and on the signal IOW to allow data recording from lines D0-D8 to our register.
In more complex devices containing several registers, the upper address lines go to the decoder, forming an active output signal when coinciding with some “base” address of the device, while the lower ones form the register number that should be addressed.
Let us turn to a more specific example - our ATA controller. For a better understanding of the principles of its work, I recommend reading the article from the OSDev wiki.
It is controlled by nine IO registers, eight of which are located in a row, starting with the base address 0x1F0 . The ninth, unfortunately, is located at the address 0x3F6 , which somewhat complicates the decoding scheme.
Of course, we won’t get all the address lines to the controller and do a decoder on it, otherwise we don’t have time - the 8 MHz bus clock frequency, the IO cycle, according to the specification, lasts 4 clocks, which at a frequency of 72 MHz gives us only 36 clocks to think. Therefore, we use cheap chips discrete logic.
If it were not for this ninth register, which sticks out at 0x3F6 , then we would need to build a circuit that outputs an active signal when zero is set on lines A9 and A3 , and one is set on A4-A8 (that is, for addresses 0x1F ( ..) ). Bits older than A9 in ISA cards usually do not decode, not paying attention to the ability to access the same device at the addresses above.
The processing of the three lower bits could already be assigned to the controller. Alas, we still have an uncovered register 0x3F6 .
Initial conditions (active A4-A8 lines and inactive A3 ) are always met, since these bits are in the specified states for both 0x1F (..) and 0x3F6 . A condition is added to them, which can be formulated as follows: with active A9 , there should be active levels on A1 and A2 (address 0x3F6 )
I.e,

CS0 = A8 & A7 & A6 & A5 & A4 & ~A3 CS1 = A1 & A2 & A9 CS2 = CS0 & (~A9 | CS1)

Using the online logic circuit simulator Logic.Ly , I built this circuit based on the chips I had in my possession - 74HC04, the quadruple element NOT , 74HC30 - eight-input NAND and 74HC10, triple three-input NAND .
Since we do not have the OR element, we recall the rules of De Morgan - negation of a conjunction is a disjunction of negations and negation of a disjunction is a conjunction of negations, or, in the form of logical equalities

 ~(A&B) = ~A | ~B ~(A|B) = ~A & ~B

This and use:

  ~( ~ (~A9 | CS1))) = ~(A9&~CS1) - = (A9 NAND ~CS1) CS2 = CS0 & (A9 NAND ~CS1)

We also do not have pure AND , therefore we will submit its components to the three-input NAND block and we will enter into a recess interrupt.
As you can see, all the logic fits in exactly three cases.

The presence of an active level on IOR or IOW is added to these conditions (do not forget that, according to the standard, the active level is low on them, that is, we already receive inverted signals, ~ IOR and ~ IOW ):

 CS = CS2 & (IOR | IOW) (IOR|IOW) = ~(~(IOR & IOW) ) = ~(~IOR & ~ IOW) = (IOR NAND IOW) CS = CS2 & (IOR NAND IOW)

The final scheme looks like this:

Now we begin to collect it in the gland, using a breadboard. First, we will arrange the first three chips, to which the most input signals go, and take care of their connection to the power and ground buses.

Carefully add the output circuits, then add the input in the form of sufficiently long probes, which then plug into the computer's motherboard:

For convenience, I temporarily fixed the upper address entries on the left ( A3-A9 ), the younger ones on the right ( A0-A2 ), and in the middle I output the signal CS2 .

Let's temporarily distract from the assembly and try to look with an oscilloscope, what happened with us. So, we connect the address inputs to the bus - since ISA is the bus, we do not need to try to plug the probes into the same connector where the ATA controller card is inserted, we choose any convenient for us. Probes, unfortunately, turned out to be too small for such holes, so I plugged a comb of straight pins from above — individually, and the probes and pins fall out, but they hold together quite well.
Also, do not forget to connect the earth and power supply to the ISA, and at the same time - the earth of the oscilloscope probes.
Turn on the oscilloscope and computer (I immediately went into the BIOS setup menu) and poke into the CLK signal. We should see something like this:

This, of course, is a bus clock whose frequency is usually 8 MHz. On my motherboard, its frequency is equal to 7.19 MHz, which is reflected in the BIOS settings. Apparently, this is a feature of iron - BIOS did not allow me to lower this frequency, or at least set it at exactly 8 MHz, stubbornly exposing 7.19 MHz. Anyway.
We check the contacts of the inputs of our circuit - by poking into any of them we will get a chaotic signal on the oscilloscope screen, as the system constantly refers to different addresses and ports. So if there is silence at the entrance, it means that the contact has departed and you need to double-check it.
Now we connect to our signal CS2 and see the following picture:

Quite expectedly, the signals IOR and IOW do not participate in the formation of CS2 , so that it becomes active when the address on the bus matches the one we specified ( 0x1F0-0x1F7 and 0x3F6 ). The system performs regular DRAM regeneration, so we get a nice periodic signal. Now is the time to tune the scan and levels of the oscilloscope to see the signals in all its glory.
After making sure that everything works, we de-energize the circuit and assemble it to the end, having obtained a hellish mess of wires like this:

Turn on the computer again, go to the BIOS setup menu, turn on the oscilloscope.
No signals! Well, it's time to check the correctness of our calculations - select the item “Autodetect hard drive”. The first disk is determined quickly, and, most likely, we will not have time to notice anything on the oscilloscope screen unless we turn on the single mode.
But the second disk (due to its absence) will be detected long enough for us to see this on the computer screen:

And on the oscilloscope screen - this is:

To make sure that we are right, we exit the disk detection, turn on the oscilloscope's single mode and carefully look at its screen - nothing! No matter how much we wait, CS does not become active! But one has only to go into the detection of disks, as we again catch the familiar picture, which fully complies with the standards - the I / O cycle lasts four bus cycles.
Well, it's time to take a fee with the STM32 and connect it to the system!
I connected as follows:

ISA data bus ( D0 - D7 ) is connected to GPIOD.0 - GPIOD.7 ,
Three lower address lines ( A0 - A2 ) - to GPIOD.8 - GPIOD.10 ,
The address line A9 is to GPIOD.11 (after all, we will need this bit to understand that the call goes not to 0x1F6 , but to 0x3F6 !)
IOW and IOR lines to GPIOD.12 and GPIOD.13 .
CS signal - to GPIOB.0

Now, when interrupting on GPIOB.0, we just need to read GPIOD-> IDR (Input Data Register), in which the lower 8 bits will be the desired data, the next four will be the address (and possible combinations will be 0000 - 0111 and 1011 , corresponding to ports 0x1F0 - 0x1F7 and 0x3F6 ), the next two are in the mode (read at 01 or write at 10 ).
Here it is important to note the following: if suddenly we get the result with the mode bits that are in the invalid state — 00 or 11 , this will signal to us about the operation error — this fact will be useful to us very soon.
So, go to the software.

Soft

With software, everything is extremely simple - we set up a GPIOD as an input, like GPIOB.0 , and then we set up a drop interrupt on the EXTI line connected to GPIOB.0 .
In the interrupt handler, we will only read the value from the GPIOD and increment the pointer to the buffer. This buffer can then be sent to a computer for analysis on any interface, or not bother with it at all and watch it directly in debug.
The setup code is shown below:

 GPIO_InitTypeDef GPIO_InitStructure; EXTI_InitTypeDef EXTI_InitStructure; RCC_APB2PeriphClockCmd(RCC_APB2Periph_GPIOD|RCC_APB2Periph_GPIOB|RCC_APB2Periph_AFIO, ENABLE); GPIO_InitStructure.GPIO_Speed = GPIO_Speed_50MHz; GPIO_InitStructure.GPIO_Mode = GPIO_Mode_IPD; GPIO_InitStructure.GPIO_Pin = GPIO_Pin_0|GPIO_Pin_1|GPIO_Pin_2|GPIO_Pin_3| GPIO_Pin_4|GPIO_Pin_5|GPIO_Pin_6|GPIO_Pin_7 |GPIO_Pin_8|GPIO_Pin_9|GPIO_Pin_10|GPIO_Pin_11| GPIO_Pin_12|GPIO_Pin_13; GPIO_Init(GPIOD, &GPIO_InitStructure); GPIO_InitStructure.GPIO_Mode = GPIO_Mode_IPU; GPIO_InitStructure.GPIO_Pin = GPIO_Pin_0; GPIO_Init(GPIOB, &GPIO_InitStructure); GPIO_InitStructure.GPIO_Mode = GPIO_Mode_Out_PP; GPIO_InitStructure.GPIO_Pin = GPIO_Pin_2; GPIO_Init(GPIOB, &GPIO_InitStructure); GPIO_EXTILineConfig(GPIO_PortSourceGPIOB, GPIO_PinSource0); EXTI_InitStructure.EXTI_Line = EXTI_Line0; EXTI_InitStructure.EXTI_Mode = EXTI_Mode_Interrupt; EXTI_InitStructure.EXTI_Trigger = EXTI_Trigger_Falling; EXTI_InitStructure.EXTI_LineCmd = ENABLE; EXTI_Init(&EXTI_InitStructure); NVIC_InitTypeDef NVIC_InitStructure; NVIC_SetVectorTable(NVIC_VectTab_FLASH, 0x0); NVIC_PriorityGroupConfig(NVIC_PriorityGroup_0); NVIC_InitStructure.NVIC_IRQChannel = EXTI0_IRQn; NVIC_InitStructure.NVIC_IRQChannelPreemptionPriority = 0; NVIC_InitStructure.NVIC_IRQChannelSubPriority = 0; NVIC_InitStructure.NVIC_IRQChannelCmd = ENABLE; NVIC_Init(&NVIC_InitStructure);

And here is the interrupt handler code:

 uint16_t Log[1024]; uint16_t ptr=0; void EXTI0_IRQHandler() { Log[ptr]=GPIOD->IDR; ptr++; EXTI_ClearITPendingBit(EXTI_Line0); }

Tests, debugging and finishing

It's time to check out what happened! We start the computer, go to the BIOS setting. We launch STM debag. We go into the detection of disks, and, after detecting the disk C, we suspend the execution of the controller program. In the debug window, we see that some data has been caught, and there are a lot of them!
Then I did the following: I copied the contents of the buffer to MS Exel from the debag window to break down the columns and get rid of the first one containing the variable name, then copied the column with the values into a new text file and got something like this:

 58453 54527 42069 38143 42069 38143 ...

Now it's time to write a program to process the results in any convenient language, I used C # for this. We need to split each input uint into data, address and access mode, creating a readable report. This is done very simply, with the usual bit shifts and bitwise operations, for example, like this:

 var busData = uint.Parse(entry); uint data = (busData & 0xFF); uint address = ((busData & 0xFF00) >> 8); uint rw = (address & 0x30)>>4; address = (address & 0x0F);

However, after launching the program, I ran into a big problem - many records from the file contained access mode 11 , which meant no read / write signals. Since the entrance to the interrupt was possible only with one of these signals, I concluded that the interrupt lasts longer than the bus cycle takes place, and I just don’t have time to read valid data.
To test this hypothesis, I decided to set the GPIOB.2 pin to 1 at the entrance to the interrupt, and reset it to 0 at the exit, after which I hung an oscilloscope probe on it.
The result was depressing:

As you can see, the system enters the interrupt already at the very end of the I / O cycle, despite the promised 12 input cycles. Even the attribute (naked) did not help, the difference was completely insignificant.
It upset me, but I decided to try to overclock the controller - the same AVRs succumbed very well to overclocking, why not check out how this is with STM32. To do this, we need to go to the file system_stm32f10x.c , to the procedure for initializing the system clock signal SetSysClockTo72 , and find the line

 RCC->CFGR |= (uint32_t)(RCC_CFGR_PLLSRC_HSE | RCC_CFGR_PLLMULL9);

In this line you should change the constant RCC_CFGR_PLLMULL9 for something bigger. I will say right away - I tried all the options and eventually settled on the maximum value, RCC_CFGR_PLLMULL16 . Thus, the controller quietly wound up at 128 MHz instead of 72, even without warming up.
By the way, it would be very good to tie the controller clock signal to the CLK of the ISA bus to work with it synchronously, but I really didn’t want to unsolder the quartz on the debug board, so I didn’t do that.
Let's see what the oscilloscope shows now:

Finally, we began to enter the interrapte early enough to have time to process it! Let's type in the input data again and try to analyze it.
I added a little to the program so that the report was issued formatted and immediately with the names of the registers that are being accessed. In the case of an incorrect access mode, a row is added to the report about the invalid data.
Here is the result of the program:

 WRITE: Cylinder Low [0x1F4] VALUE: 0x55 READ: Cylinder Low [0x1F4] VALUE: 0x55 WRITE: Cylinder Low [0x1F4] VALUE: 0xAA READ: Cylinder Low [0x1F4] VALUE: 0xAA WRITE: Cylinder Low [0x1F4] VALUE: 0x0F READ: Cylinder Low [0x1F4] VALUE: 0x0F WRITE: Cylinder Low [0x1F4] VALUE: 0x00 READ: Cylinder Low [0x1F4] VALUE: 0x00 READ: Status [0x1F7] VALUE: 0x50 WRITE: Drive/Head [0x1F6] VALUE: 0xA0 READ: Status [0x1F7] VALUE: 0x50 WRITE: Drive/Head [0x1F6] VALUE: 0x04 WRITE: Drive/Head [0x1F6] VALUE: 0x00 READ: Status [0x1F7] VALUE: 0x50 WRITE: Drive/Head [0x1F6] VALUE: 0xA0 READ: Status [0x1F7] VALUE: 0x50 WRITE: Drive/Head [0x1F6] VALUE: 0xA0 READ: Status [0x1F7] VALUE: 0x50 WRITE: Command [0x1F7] VALUE: 0x10 READ: Status [0x1F7] VALUE: 0x50 READ: Status [0x1F7] VALUE: 0x50 WRITE: Drive/Head [0x1F6] VALUE: 0xA0 READ: Status [0x1F7] VALUE: 0x50 WRITE: Drive/Head [0x1F6] VALUE: 0xA0 READ: Status [0x1F7] VALUE: 0x50 WRITE: Command [0x1F7] VALUE: 0xEC READ: Status [0x1F7] VALUE: 0x58 READ: Data [0x1F0] VALUE: 0x5A READ: Data [0x1F0] VALUE: 0xFF READ: Data [0x1F0] VALUE: 0x00 READ: Data [0x1F0] VALUE: 0x10 READ: Data [0x1F0] VALUE: 0x00 READ: Data [0x1F0] VALUE: 0x00 READ: Data [0x1F0] VALUE: 0x3F READ: Data [0x1F0] VALUE: 0x00 READ: Data [0x1F0] VALUE: 0x00 READ: Data [0x1F0] VALUE: 0x00 READ: Data [0x1F0] VALUE: 0x45 ...

As we see, there are no invalid data anymore.
Let's try to understand how BIOS performs detection.
In the beginning, he persistently writes-reads to the registers that specify the address - making sure that the same value that was written is read. If the ATA controller is not in the system, then the BIOS will try for a long time to write-read this same register, 0x1F4 - this is an example of a report when the controller board is pulled out:

 WRITE: Cylinder Low [0x1F4] VALUE: 0x55 READ: Cylinder Low [0x1F4] VALUE: 0xFF WRITE: Cylinder Low [0x1F4] VALUE: 0x55 READ: Cylinder Low [0x1F4] VALUE: 0xFF ... WRITE: Cylinder Low [0x1F4] VALUE: 0x55 READ: Cylinder Low [0x1F4] VALUE: 0xFF WRITE: Cylinder Low [0x1F4] VALUE: 0x55

Then he gives the command 0x10 , the value of which was kindly suggested by mark_ablov and which is an outdated recalibration command, forcing the disk to rearrange its magnetic heads to sector 0. Then the BIOS checks the status byte, waiting for the command to complete (you can see that the hard does not even enter the BUSY state , immediately responding that he did).
And finally - the command 0xEC , DRIVE IDENTIFY , in response to which hard gives 256 16-bit words of disk information.
Before starting to read them from port 0x1F0 , BIOS requests the status byte from register 0x1F7 , waiting for the disk to be ready.
Here, unfortunately, I understood my mistake - I decided that the data is given out by 8 bits, since the control registers are 8-bit. However, as it turned out, the data is given out at 16 bits, so I received only 256 low bytes. To get complete information, you will have to redo the scheme a bit, giving the entire GPIOD to the data, and outputting the address and access mode to other pins, which, of course, will increase the delay in processing them.
Therefore, at the moment I paused, although perhaps in the near future I will continue to work and try to get on the bus not as a monitor, but as a device. The ISA bus has a wonderful IOCHRDY signal, exposing an inactive level at which the device signals the need to increase the duration of the IO-cycle, which means that I may have enough time to switch pins to the output and issue my status.

That's all for now, thank you for your attention.

Upd :
Just discovered an interesting thing. If in the project settings you set O1 optimization instead of O3, then the interrupt entry time becomes exactly the way it should be. After some research, I found out that with O2 or O3 optimization turned on, the compiler rearranges the instructions, which is why the signal pin is set to 1 not immediately after entering the interrupt, but after executing part of the code.
Thus, when optimizing O1 on an oscilloscope, you can see that the system entered the interrupt handler earlier, but the execution time of this handler is longer than in the case of O3.

Source: https://habr.com/ru/post/161617/

All Articles

Old Shakes: intercepting data stream between i386 and ATA controller using STM32

Introduction

Iron

Soft

Tests, debugging and finishing

More articles: