There is a lot of good literature on the Quake engine: books, countless articles on the Internet, blogs, and wiki pages. Among them, my favorite are Michael Abrash's
“Graphics Programming Black Book” , published in 1997, and David L. Craddock's
“Rocket Jump: Quake and the Golden Age of First-Person Shooters” (2018).
Unfortunately, you can find very little information about the equipment developed around 1996 that improved 3D rendering and, in particular, the graphics of the revolutionary id Software game. Inside the architecture and design of these pieces of silicon lies the history of the technological duel between Rendition V1000 and 3dfx Interactive Voodoo.
After the release of vQuake in early December 1996, it seemed that Rendition took over. The V1000 was a fast card capable of running Quake with hardware acceleration, which, according to the developer, provides a fill rate of 25 megapixels / s
[1] . Right before Christmas, Rendition captured the market, allowing players to launch the game in high resolution, frame rates and in 16-bit color
[2] . But, as history has shown, a flaw in the design of the Vérité 1000 turned out to be fatal for an innovative company.
Correctly chosen time and killer applications
The idea of specialized equipment to accelerate graphics did not appear suddenly. Back in 1954, United Airlines had flight simulators for pilot training. The largest player in the field, Silicon Graphics, Inc. (SGI), appeared in 1982 and offered at that time powerful workstations, such as Indy, O2 and Indigo². However, the prices of these cars did not allow them to be purchased by ordinary consumers (SGI Infinite Reality 1993 could be sold for $ 100,000, equivalent to $ 177,262 in 2019). The reason for the situation that arose in the late 90s was a combination of three factors.
')
First, the price of RAM has dropped significantly. Even though there was a huge shortage of RAM in 1995 (mainly because 8 MB of memory was recommended for Microsoft Windows 95), the price of RAM fell by almost 90% over the year. This opened up prospects for cards with incredibly huge frame buffers (640x480 with 16-bit RGB color) that can store textures locally.
Secondly, the increased RAM performance. FastPage RAM was a step forward compared to DRAM, but after the release of EDO RAM delays decreased by 30%, and the access time to RAM was 50 ns
[3] .
The third and final piece of the puzzle was “killer apps”. The PC has got powerful CPUs, for example, Intel Pentium with a frequency of 166 MHz, which the developers used to create high-quality 3D games. In 1996, everyone was talking about two games: Core Design's Tomb Raider and id Software's Quake.
Rendition and V1000
Rendition Inc was founded in 1993. Two years later, in 1995, the company announced the creation of the V1000 architecture, which was quickly licensed by four OEMs. Creative Labs 3D Blaster PCI, Sierra Screamin '3D, Canopus Total 3D and Intergraph Reactor were the first to emerge on the market, and MiRO soon followed.
Intergraph Reactor. Image from vgamuseum.ru.Creative Labs 3D Blaster. Image of the club "Retro Graphics Cards".Note that the first V1000-E chip was later replaced with a V1000L-P with less power consumption and 20% faster
[4] .
MiroCrystal VRX. Image from vgamuseum.info.Canopus Total3D. Image from vgamuseum.ru.The name of the cards changed, but the chips used in them were the same. The only parameter by which manufacturers had to balance price and performance was the quality of the RAM installed on the card.
- VGA port for connecting to a CRT monitor.
- Ramdac, usually from Bt, but sometimes AT & T chip.
- The core of the card is a V1000-E, V1000-P or v1000-L chip.
- Eight 512 kibibyte DRAM / EDO chips (a total of 4 mebibytes) for storing framebuffers and textures.
- 64 kibibyte EEPROM containing BIOS.
The V1000 had two inalienable properties that are important to note, because in 3dfx Voodoo (which I will discuss later), a radically different approach was used.
First, the card was supposed to be a replacement for what is already installed by the buyer. The chip supported the rendering of both 2D and 3D in VGA, and thanks to the context switches it had an impressive “3D in window” mode. Therefore, the card had a single output VGA port.
The second feature is the “big iron” architecture, based on a single Mips CPU, accessing all 4 mebibyte memory. The 64-bit data bus between them did not have any special properties. Such a standardized design made it easy to program a card using a microcode loaded at startup (this turned the card into the first GPU for a PC, long before Nvidia came up with this definition.)
V1000 programming
The SDK
[5] came with a set of header files for interacting with the C language (RRedline in Windows and Speedy3D in DOS). Drawing a textured triangle resembled what Vulkan provides today with manual VRAM controls. An API that can render textured triangles based on the angle also supported alpha tests, alpha blending and fog.
#include <string.h> #include <windows.h> #include <redline.h> WinMain(HINSTANCE instance, HINSTANCE prevInstance, LPSTR cmdLine, int cmdShow){ int WIDTH=640, HEIGHT = 480; HWND hWndMain = ... ; // Setup Verite board and resolution/refresh rate v_handle verite; VL_OpenVerite(hWndMain, &verite); V_SetDisplayType(verite, V_FULLSCREEN_APP); V_SetDisplayMode(verite, WIDTH, HEIGHT, 16, 75); // Copy texture to VRAM bmp_info bmp = loadBMP("data\\rlogo.bmp"); v_memory memObj = V_AllocLockedMem(verite, bmp.linebytes*bmp.height); memcpy(V_GetMemoryObjectAddress(memObj), bmp.addr, bmp.linebytes*bmp.height); v_surface *display, *texture; VL_CreateSurface(verite, &display, V_SURFACE_PRIMARY, 2, V_PIXFMT_565, WIDTH, HEIGHT); VL_CreateSurface(verite, &texture, 0, 1, V_PIXFMT_565, bmp.width, bmp.height); v_cmdbuffer cmdbuffer = V_CreateCmdBuffer(verite, 0, 0); VL_LoadBuffer(&cmdbuffer, texture, 0, bmp.linebytes, bmp.width, bmp.height, memObj, 0); VL_InstallDstBuffer(&cmdbuffer, display); VL_InstallTextureMap(&cmdbuffer, texture); VL_SetSrcFunc(&cmdbuffer, V_SRCFUNC_REPLACE) // Clear screen to black VL_FillBuffer(&cmdbuffer, display, 1, 0, 0, display->width, display->height,0); // Populate cmd with triangle coo and textCoo v_kaxyzuvq vertex[3] = ... ; VL_Triangle(&cmdbuffer, V_FIFO_KAXYZUVQ, &vertex[0], &vertex[1], &vertex[2]); V_IssueCmdBuffer(verite, cmdbuffer); VL_SwapDisplaySurface(&cmdbuffer, display); }
RRedline provided 128 kibibytes of microcode to the Vérité and translated C calls to V1000 assembly function calls.
An interesting fact: the name of the API “RRedline” played up the phrase “Rendition Ready” and most likely was chosen collectively. However, the name Speedy3D was the idea of Walt Donovan.
In fact, the v1000 was only a slow CPU (25 MHz), having a one-cycle multiplication of 32 * 32 (occupying a solid part of the chip!), A one-cycle instruction for calculating the approximate inverse value (that is, a two-stroke approximated integer division), and the usual set of RISC instructions. Oh, and another “bilinear load” instruction that read a 2x2 linear memory block and performed bilinear filtering based on fractional u and v values passed to the instruction. The map had a tiny cache, it seems to be only 4 pixels. Therefore, if a perfectly matching 2x2 block appeared, we received a reduction in the load on the memory bandwidth.
Hardware support for Z-buffers was missing. Therefore, software running in v1000 had to read Z, perform a comparison, and then decide whether to write or not.
- Walt Donovan (architect of algorithms)
To send textures and microcode to the card, the driver used DMA to transfer data over PCI without CPU intervention. In practice, many motherboard-based bus controls were not implemented correctly, so the games had to return to the PCI FIFO mode, which adversely affected the performance
[6] . Inside the card, all operations were performed in 32-bit integers with a fixed comma.
The developers decided that Rendition would be fully programmable, but did not use any smart pipeline or fast sync. Therefore, if 25 instructions were needed to record a pixel, then we will get only 1 megapixel / s. If you use equipment with fixed functionality, you can create a pipeline that is equivalent to these 25 instructions, and achieve 25 megapixels / s. The 3dfx employees came from SGI, so they chose the approach that turned out to be the right decision - to create in the equipment a triangle processing engine with fixed functionality and a subset of OpenGL functions for management. The V1000 developers had a completely different experience, they did not know OpenGL, and therefore they decided that it would be more correct to create a CPU.
- Walt Donovan (architect of algorithms)
In addition to all of this feature set, the map also had an innovative anti-aliasing system, which had a funny side effect.
The anti-aliasing algorithm used in vQuake has been patented (patent number 6005580). There was a funny joke about this algorithm. He worked only with triangles, but not intervals. Quake used the concept of “perfect z-buffering,” in which the graphics were divided into intervals and sorted visually using BSP / PVS (binary partitioning of the space / set of potentially visible elements). Therefore, the engine created a set of intervals that ideally covered the screen without overlaps and missing pixels, and a single write operation (without z-buffering!) To the display memory was required for rendering. However, the initial data for these intervals were triangles. The antialiasing algorithm looked for edges of silhouettes and smoothed them. (For more on this idea, see the website humus.name, a Geometric Post-Process antialiasing record dated March 2011 - the author invented this technology again!) But since smoothing was performed after the screen was rendered (all intervals were already drawn), the algorithm is not had apparently a rib or not. He painted it anyway. (If z-buffer were used, then only visible edges would be redrawn!) In practice, this was not a big problem, because BSP was usually very well cut off invisible triangles.
But not in character models! Therefore, vquake allowed the player to see people hiding behind doors and walls, creating a small and agile distortion in the textures!
- Walt Donovan (architect of algorithms)
vQuake
At the time of the release of the cards they supported several good games. Yes, Descent II, Grand Prix Legends, IndyCar Racing II, Myst, Nascar Racing, EF2000 and Tomb Raider were good games, but the true jewel in the crown, the most demanding and promotional sale was Quake. The id Software game got its own port under Vérité called vQuake, released on December 2, 1996. It was written by Walt Donovan and Stefan Selll of Vérité in collaboration with Michael Abrash of id Software.
The work was quite painstaking, but the port was working. A Pentium 166Mhz capable of rendering Quake at a resolution of 320x200 at a frequency of 26 frames per second could jump to 640x480 with bilinear filtering and still be rendered at a frequency of 22 frames per second
[8] . In practice, players chose a resolution of 512x384, which both looked beautiful and allowed 32 frames per second on P166. For a short time, vQuake remained the best way to play Quake.
Software renderingVérité V1000Many thanks to @swaaye from the vogons.org forum for taking screenshots of the V1000 and Fruit Of the Dojo for its high-quality and easy-to-hacking Quake port on MacOSX [9] .Software renderingVérité V1000Z-Buffer Flaw
What the V1000 lacked (and indirectly to its successor, the V2200) was the hardware acceleration of the z-buffer. As soon as the developer turned on the depth test, the fill rate dropped to 12.5 megapixels / s and the frame rate was halved. As Stefan Sdelle
[10] explained later, vQuake (and all other games) were ported to the V1000 in such a way as to minimize reading of the z-buffer.
The developers found out that the only way to ensure the desired speed was to transfer the main part of the work to the CPU. In the case of vQuake, this meant that the map would be used as a super-fast horizontal interval renderer, which always writes to the z-buffer, but reads and compares z only when rendering enemies. And although the developers managed to create good products, the consequences of such a choice of architecture were shaken for a long time.
3dfx and drop rendition
id Software released GLQuake on January 22, 1997. It was implemented on the basis of miniGL (a subset of the OpenGL 1.0 standard, which, among other things, lacked GL_LIGHT and GL_FOG). This binary opened the doors to all hardware-accelerated PC cards. In this regard, 3dfx Interactive's Voodoo cards stood out in particular, their stunning performance (41fps in 512x384 resolution with 16-bit color on P166
[11] ) de facto became the benchmark for 3D accelerators. The fill rate of the V1000 is 25 megapixels / s, which once differed favorably from the Pentium software rendering, now seemed mediocre against the 50 megapixel background of the Voodoo map, which was not even affected by the z-tests.
The answer to the Rendition was a more powerful V2x00, which paradoxically aggravated the situation. It was advertised that, thanks to the hardware z-buffer, the V2x00 was twice as fast, but it failed to improve even the frame rate in vQuake. This anomaly undermined the trust of customers and had a bad effect on vQuake developer Stefan delle, who felt he needed to explain why vQuake's performance was limited to the CPU, not the GPU
[12] .
... my reputation was spoiled by the fact that VQuake and VHexen2 did not work faster on V2x00, so I have to explain why this happened.
[...]
Walt and Michael decided that since Verite 1000 did not perform well in pixels with Z-buffering, if you allow Pentium to do this sorting of intervals, it can reduce the number of pixels that need to be drawn by Verite. Moreover, we could disable the Verify Z feature in Verite.
[...]
... whatever the Verite chip, the CPU got a lot of work.
- Stefan Share
Moreover, there were significant hardware architecture problems that initially led to the failure of
[13] V2x00. It took several months to fix the problem, and even after that the board still operated at 50 MHz, while NVidia NV3 and Voodoo2 already reached 100 MHz.
The third generation, based on the V3300, could have changed the course of history, but it was too late. The project was canceled in 1998, after Rendition bought Micron Technology.
Working in Rendition, we made a lot of mistakes. It was possible to release the v1000 a few months earlier (and not have any competitors during these months) if we were to develop the scheme ourselves, and not pass it to the fab. In addition, the quality control of the chip raised questions. One guy in our company spent several months implementing mpeg decompression in V1000 assembly language, but he couldn’t make it work due to unpredictable chip bugs.
vQuake worked well only because v1000 didn't do much work. “Render this list of intervals”, “smooth this edge” - that's almost all he did. Mike Abrash and I spent too much time maintaining Quake compatibility with the V1000, so this model was not suitable for the long run.
- Walt Donovan (architect of algorithms)
After the collapse of Rendition, 3dfx redoubled its efforts to promote Voodoo2, the outstanding characteristics of which allowed all competitors to be swept away. The king of 3D graphics on the PC has been in the market for some time. Then the game continued, new competitors appeared on the scene, and among them were Canadian ATI and a company almost unknown at that time called Nvidia.
Reference materials
[1] Source:
VGA Museum, V1000 Texel Fillrate (MTexel / s) reported as 25[2] Source:
John Carmack. Plan Aug 22, 1996 "At 512 * 384[3] Source:
3dfx VOODOO1 Reference Rev. 1.0[4] Source:
Review of the V1000[5] Source:
Rendition Verite V1000 SDK[6] Source:
The immaturity of the PCI bus [...] caused DMA bugs to surface[7] Source:
RRedline Programming Guide[8] Source:
Benchmarks to compare the Rendition Vérité V1000-E and V1000L-P[9] Source:
MacOSX X Quake port source code on github.com[10] Source:
Stephan Podell BSS post[11] Source:
Comparison of Frame-rates in GLQuake Using Voodoo1[12] Source:
Stephan Podell BSS post[13] Source:
wikipedia.com, Downfall section