📜 ⬆️ ⬇️

8088 MPH: we will break all your emulators

image

One of the points of my wish list after reading the first report from the party in 1991 was a visit to the European demopati and participation in the compo competition. I participated in NAID '96 and even took a place there , but my dream was always to compete with the best of the best. I am pleased to announce that after six months of hard work with good friends and incredibly talented people, we succeeded. Our demo 8088 MPH won in oldskool demo compo Revision 2015 . (My personal victory was that our demo was shown to the compo last, which was a sign of respect for the organizers.) On April 7, 2015, there were no IBM PC emulators in the world capable of running our demo correctly; they hung up or fell out before the demo was completed, and the colors were distorted. The same applies to the rest of the gland, except for the target gland (see below). To see what 8088 MPH is, I recommend that you watch the video recordings of a demo running on real hardware:


In the demo there are so many technological discoveries made for the first time in the world, and we exploit iron as no one thought of before us, so it will be honest to tell you how we did it. One of my posts was the “organizer” of the demo, so I will tell about it scene by scene, briefly explaining the basics of each trick. I will tell you a bit more about the parts, but for a deep analysis of the technologies I will update this post so that you can leave links to the posts reenigne, VileR and Scali. We hope that this story will attract interest in "old school" programming of software for the platform. After reading this review post, I recommend following the links to articles, where the individual parts of the demo are discussed in more detail.


More general information:


Target Equipment Specifications


Before proceeding to the individual parts, let's look at what the target system for this demo was: IBM 5150 1981 (the very first “IBM PC”) with 640 KB of RAM, a floppy disk drive, an IBM CGA card and a built-in speaker . System Content:
')

It seems that the requirements of 640 KB of RAM - this is too cool, but in the first IBM PC it was possible to install them, moreover, for 1985 this is a fairly ordinary amount. If you still want to be perturbed, then note that the only effect that needs this amount of RAM is the kefrens bars. It is necessary for the repeating pattern to be repeated less frequently and to be more pleasant to the eye. We could reduce it, but then the audience would notice that the pattern repeats faster. With the kefrens bars effect, the demo uses 507 KB of RAM; without it - only 349 KB. Most of the effects use much less memory, and some are generally tiny, for example, the plasma takes up only 6 KB (including the banner graphics), and the image of a girl requires only 18 KB (2 KB more than the size of the image data itself). We deliberately sacrificed size in favor of speed. It was a conscious decision to fit as many effects as possible in the 8 minutes of running the demo (limited by the compo contest). If we had a few more minutes of execution time, then we could probably fit all the demo at 256 KB or even less, but the pauses between the effects would be longer. It is also worth noting that there are two different versions of the IBM CGA , which basically differ in the way they generate composite colors. We had the same number of IBM “old” and “new” style CGA cards, so we decided to create graphics for the “old” one. If you have a “new” style CGA card, then the demo will still work, but with slightly distorted colors.

Technical Overview


Used development tools



All data files were embedded directly in the .exe / .com files. Thanks to this, it was possible to save everything in one binary file, that is, the data can be subjected to compression (see below). Most development cycles use design in wetware (in the head), coding on modern systems (or in DOSBox running on a modern system), testing / debugging in DOSBox, and then transferring it to real hardware for final testing. When the effect became so sophisticated that it stopped running in the emulator, the development cycle slowed down because testing could only be performed on real hardware. To transfer the code to the equipment, we used various methods: Scali used a serial interface cable, I had an ethernet card with a packet driver and mTCP ; at the party, we used an 8-bit IDE ISA adapter (Silicon Valley ADP-50) connected to a CF-to-IDE adapter to turn the CF card into a hard disk. We used a USB CF card reader to transfer information. The most intriguing was the reenigne method, which used a special controller connected to the keyboard port. He used the IBM BIOS test mode as a “serial port for the poor”. (I hope Andrew will write more about this!)

Bootloader, API and general structure


We all had our favorite languages ​​and environments, so at an early stage we decided to create a common “boot loader” that would execute the .EXE and .COM files so that developers could develop effects in any environment that suits them. This concept is not new; for the same reasons, it was used in the famous Second Reality demo . The same technique was used even earlier for a lot of demos for other platforms. (Before you ask: no, we didn’t copy the Second Reality code ; in fact, we didn’t even consult with the developers, because we needed to write an unusually compact code to minimize the amount of memory that was supposed to work on 8088 (opcodes 80186 are used in the Second Reality code)). The loader API services are going to be approximately 450 bytes of code. The loader is responsible for the following aspects:


Performing effects using the loader consists of the following procedure:

  1. Print the text on the screen and animate it using an interrupt and register the starting address 6845
  2. Effect
  3. The effect performs unpacking, preliminary calculations, and then informs the loader that it is ready to begin
  4. The loader clears the moving screen text, and then informs the effect that it can begin
  5. The effect starts, the magic begins

It was extremely important to design this part of the work correctly, because any bugs would have caused the crash of the entire demo. The structure of this part was completed before writing the first line of code. For the curious, I posted online dizdok . (The loader was written by me.) Playing background music should be as simple as possible, so as not to affect any effects. From a practical point of view, the only option was a simple PC squeak, changing (or disappearing) in each frame, so background music consists only of a 60 Hz squeak. The MONOTONE composition program was used to generate the speaker timer values. Even despite the fact that the playback code consists of only 18 lines in assembler, it occupies two raster lines on the screen, that is, something more complex would take even more CPU resources, and some full-screen 60 Hz effects would be impossible to realize .

Compress executable file


Another aspect that was considered at the earliest stages of development was the ability to compress the executable file. We needed to find out the following:


I took most of the classic and modern packers of executable files and conducted tests with old programs that were like what we were going to do. The results pleasantly surprised me. Compression levels were high enough so that we could afford to embed pre-computed data, rather than counting them on the fly. At the same time, decompression turned out to be quite fast, so much so that the full load of the program from a floppy disk was in fact even slightly faster than if it was loaded uncompressed. In the end, we chose the winner - pklite. For comparison, I posted the test results data online. (If I missed some packers that have significant advantages over my set, then tell me about it. About 100 packers are created for DOS, but if they do not compress better than apack or upx, or do not unpack faster than pklite or lzexe, which with all this is compatible with 8088 , then I do not want to know anything about them.)

Scene breakdown


Below I will explain each effect by scene. As mentioned above, I will describe in detail only those scenes I worked on myself; if the rest of the team members want, they will write a technical analysis of their parts. Description of each effect comes after the effect screenshot.

image

This intro performed two tasks: it was supposed to acquaint the audience with the system and explain what difficulties we had to face, creating a world-class demo on such hardware, and at the same time temper their expectations. Obviously, the text mode is simulated; in fact, I duplicated the basic functions of the text mode BIOS, but simulated them in graphics mode. Flicker of the cursor and the text is implemented in the same way as 6845 does, reinforcing the illusion. (Almost) it is impossible to change the starting address of the display of the graphics mode so that each individual raster line is taken from another place, so the deployment of the initial screen is realized by brute force - by copying new raster lines into memory hidden by the reverse frame scan. The initial screen disappears with the help of a “dimming effect” from the top edge, performed by an “AND” operation with a mask for consecutive lines of screen data.

image

Many people think that the initial screen is the same picture that VileR showed several years ago . But it is not so! He remade it under a 16-color composite signal specifically for this demo, and also edited it a bit.

image

The rocking effect was achieved by creating a software reversal of the vertical sweep, performed every time in one place on the screen (immediately after the last displayed line), followed by its processing by changing the initial display address 6845. To transmit the interruption of information about whether it was time to delete the letters , were used flags. Deletion is performed by simply using REP STOSW to fill the screen with black lines. Since 6845 displays two lines of text on the “line”, the text can only move to even lines, so the movement does not look as smooth as it should. Honestly, it was possible to make them move to any line, but it would be expensive from the point of view of the CPU. The whole point of the bootloader is to use as little CPU resources as possible, so I had to make a compromise. For other effects, the simulated interruption of the reverse sweep of the frame scan is also implemented using the loader API services. Effects can disable it, reinitialize and bind / untie its own procedures from it.

image

The moire effect was implemented using the 40 Ă— 25 text mode framework, stretched ASCII characters in half-character blocks, as well as a bunch of code with loop promotion. The circles are chosen to demonstrate the classic effect, but in fact this effect can combine any two images. This effect created reenigne.

image

Rotozoomer - this is the same tortured old procedure, which I showed in 1996 at 8086 compo , only optimized and accelerated to draw only every second line. The misunderstanding between me and VileR led to the use of not the best to demonstrate the effect of the texture, but it still works well. There were plans to add a 60 Hz version of this effect, but we did not have enough time.

image

The fundamental concept of the 1024-color mode is strong abuse of the 80 × 25 text mode with NTSC enabled. Initially, VileR came up with this trick for 512 colors, but reenigne managed to increase the number of colors to 1024 using a trick with a CRT controller. Some people thought that everything was done in this mode. But this is not the case, because the 80-line text mode suffers from the famous CGA snow defect when directly writing to the CGA RAM in this mode. Unfortunately, this is noticeable in the plasma effect (see below). By the way, I saw this picture in 2013, and it was then that I realized that I must gather all these people to create a demo. Look, it's awesome! When I saw her, my jaw dropped. If I hadn’t seen how the collaboration between VileR and reenigne led to this picture, then the 8088 MPH demo might not appear.

image

In fact, these stars are the result of code operation with loop promotion and tables of previously calculated values, which together take bytes from one place and move to another place in the video memory. Although we had other drawings ready, for example, a whirlwind, we decided that the starry sky is more suitable for a typical oldskool demo. The effect was created by reenigne.

image

The part with the sprites is similar to black magic, but in reality it is a combination of using the Scali compiler of sprites and vertical screen adjustment using the initial address register 6845. There is only one video memory screen in the CGA, so when the address is shifted down, the screen scrolls up and the data at the edges of the screen are repeated. However, the data is not repeated evenly along the border, so processing is required. Timer tracking was performed to know when the line containing the last pixel of the sprite was drawn, after which the sprite was redrawn. (In other words, redrawing the sprite was an “overtake the ray” exercise.) The timings were very tight to avoid the screen / sprite breaking effect.

image

Also part of the effect of the compiled sprite, here 30 vector balls are displayed at 30 Hz. We had another effect demonstrating a smaller number of balls with a frequency of 60 Hz, but Scali had an idea at the last moment to make up some kind of an inscription, for example "8088", "IBM" or something like that, so he wrote the code changes to the party. Upgrading is performed using double buffering; The sprites occupy only a small rectangular area on the screen, so the CRT controller parameters of the screen mode were reprogrammed to create a video mode with a small area in the middle of the physical screen, using only half of the available video memory. So we got a real hidden page in which it was possible to draw / erase vector balls, which then became visible using the register of the initial display address 6845.

image

This plasma uses a variation of the 1024-color screen mode, which can only be updated using the attribute byte (which limits the number of colors to 256). The effect records only when the CRT beam performs a horizontal or vertical reverse. Unfortunately, the timing necessary for the correct implementation, for some reason, stopped working on the party (perhaps it happened because we changed the order of the effects), so along the left side of the screen you can see a line of noise, and a little noise from above. This is my fault, because I wrote this effect using a lazy poll procedure. Alas, the CGA “snow” is still there, but without all of the reverse sweep processing, the effect could work at 60fps. In the demo with “snow”, it works with a frequency of only 20fps. Perhaps VileR will write in more detail about how this screen mode and color system work, and if this happens, I will update the links at the beginning of the article. If we take up the final version of the demo, fixing this bug will be one of the top priorities [app. Per.: The video from the beginning of the article shows the final version] . In fact, I am sure that reenigne will be able to replace the effect of the survey with the effect of counting cycles, which will not only eliminate the “snow”, but also increase the speed.

image

1024-color mode replicates the start address every two lines. I used this behavior to create a simple “run-off” effect for a stunning VileR image. You can say that much more sophisticated effects are possible (here you can recall the Copper demo), but I did not have enough time to make it even better.

image

This classic Kefrens bars effect was created by reenigne in 320x200x4 mode. This is an effect with counting cycles, because there is simply no time to track the horizontal retrace. To ensure a constant number of cycles, we did a lot, including changing the default system DRAM update interval from 18 to 19, so that the DRAM update periods correspond to access to the CRT controller.

image

This effect was made by Scali, he was inspired by his own demo of 1991 , in which a big torus was also present. Here the following happens:



image

At the party, reenigne stated that it should be possible to restart the starting address of the CRT controller for each raster line. In this case, it would be possible to obtain a video mode with a height of only 100 drain, which will give us a 1024-color mode with a resolution of 80 × 100. The image above shows the result of such a code plus a very long work on creating a program for modeling the composite signal CGA NTSC, written by reenigne a few months before, to perform image conversion. (No, we don’t give it to anyone. And before you ask, I will say, pictures of a girl and CGA 1k are not a simple transformation, they were manually painted by VileR in Photoshop, and 4-colors / 16-colors / "Until Now" screens were created in an augmented version of Pablodraw written by him.) We did not have time to put text on this picture. The people shown above go in the same order as in the credits: Trixter, reenigne, Scali, VileR, Phoenix and virt. We apologize to coda and puppeh, but, as you can see, if you compress the picture even more, you won’t recognize the person at all. Sorry!

image

And finally, the finishing blow: a multichannel music engine for the PC speaker. We didn’t want to just copy the engine for the ZX Spectrum, and other engines like the one used by the Music Construction Set, but instead decided to raise the bar incredibly high: we play the protracker mod-file through the speaker. There are other mod players through the speaker, but they require a 10 MHz 80286, and they barely cope with the output at a 6 KHz sampling frequency. Our player accurately reproduces all the effects of protracker, mixing and outputting sound to the speaker in real time at a frequency of 16.5 kHz, and all this on a 4.77 MHz processor. It was the creation of reenigne, which became a true technical achievement, which required 8088 to implement non-standard thinking and serious knowledge. I am sure that he will write a more detailed post about creating a player. In the meantime, I can mention the following details:


A funny fact: after the preliminary conversion of the melody and its transformation into a self-playing .exe, the final result after compression is less than the original module.

Sprint on the party


We came to the party with a 90% ready project. Before we came to the party, we created what seemed to us worthy of participation in the competition, as well as two "rescue" videos, one for display on the big screen, and in the second a demo running on real hardware was recorded as evidence for the judges . We were afraid that the equipment that we take with us would be damaged during transportation, so we decided to make sure and show at least something . Fortunately, the IBM computers 5160 reenigne and Scali arrived unscathed (which is especially noteworthy because the reenigne had to transport the car from Britain to Germany on several trains!). We also brought two CGA-cards, two devices for capturing and three different ways of transferring new pieces of software from our laptops to old hardware.You can never prepare for everything! We spent most of the time coding at the party by adding kefrens and photos at the end of the demo, fixing bugs, adding beautiful transitions, cutting seconds from each part to fit the compo restrictions, and changing the order of the parts so that the BTTF inspired introThe virt's melodies coincided with the part of the demo with sprites. Almost all the time before compo, we spent on coding, eating and hygiene, and we only managed to communicate with people after. Although we came with a program that was almost ready to participate in the compo, the time spent during the party was invaluable - we were able to turn a raw draft into something that could truly compete for first place. We were all at the same table, which means we could instantly communicate. We have learned a lesson: rarely that can replace face-to-face work! One of the results of such joint work for the party was the decision to change the credits at the end from text-only scrolling at variable speed to smoother ANSI-style scrolling, which, in my opinion, was the best implementation compared to all those parts we took from home.To save time (and for correct video conversion - I'm sorry, but most people don’t know how to work with interlaced video correctly), I suggested to give Gasman 720 @ 60p video. NTSC CGA output is slightly different; instead of 262.5 lines on the field, it generates 262. This means that it generates 59.92 fields (29.96 frames) per second instead of the NTSC broadcast standard of 59.94 (29.97 fps). This prevents the use of most modern capture devices; for example, Scali had a high-quality Blackmagic Intensity Shuttle, but he could not capture the signal. I knew from experience that some cheap video capture devices, such as the Terratec Grabby or Dazzle DVC100, have a higher tolerance because they were intended for use with VCR signal sources,so I brought these devices with me and sent one reenigne for testing. For the capture, we used the DVC100 with a slight modification of the amplifier-corrector so that the capture looked as close as possible to the output of a CRT monitor. To further improve the video capture, we used VirtualDub as capture software, which has the option of dynamic resampling of the incoming audio signal to match the capture frame rate in case it is slightly offset. This combination of software and hardware worked very well. To capture the sound, we firstwhich has the option of dynamic resampling of the incoming audio signal to match the frame rate of the capture in case it is slightly shifted. This combination of software and hardware worked very well. To capture the sound, we firstwhich has the option of dynamic resampling of the incoming audio signal to match the frame rate of the capture in case it is slightly shifted. This combination of software and hardware worked very well. To capture the sound, we firsthooked up to the dynamics of the crocodile clips , but Scali brought along Sound Blaster, which had a real PC speaker contact, and it could be connected with an internal cable, so we used it to capture it.

A look into the future


After seeing the demo and reading the article, you may wonder if there is still room for improvement? Believe it or not, it exists - there are certainly possible alternative ways of generating sound and additional tricks with loop extraction. We could build more effects into the demo, but we didn’t have enough time: first, the development time, secondly, the demo playback time, because in the compo discipline on Revision, the limit is 8 minutes or less. In sum, I know all people who have worked together on the demo for 60 years. Working on a demo with them was an honor and a privilege. Will we work together again? I would say that it is definitely possible; the day after compo, we threw out a few ideas, for example, creating a game instead of a demo. Personally, I burned out and spent the next few weeks playing games,who have long wanted to go, and to restore health. In addition, I have several other large projects that I want to launch this summer, one of which is very much waiting for the PC preservation software community, and the second is the online sound card museum. But who knows…



Bonus - what does 8088 MPH look like in DOSBox:

Source: https://habr.com/ru/post/358122/


All Articles