Michael Abrash, a veteran of programming three-dimensional graphics, on the example of developing the first Quake, talks about the need for creative thinking in programming.Many years ago, I worked for the now-defunct video video card manufacturer Video Seven. There I helped develop the VGA clone. My colleague Tom Wilson, who worked around the clock for a long time on developing a Video Seven VGA chip, tried to make VGA as fast as possible and was confident that its performance was optimized to almost the maximum. However, when Tom was already putting the final touches to the chip design, we heard rumors that our competitor Paradise had achieved even greater performance in its developed clone, adding to it a FIFO.
This ended the rumors - we did not know what the FIFO was, or how much it helped, nothing else. However, Tom, usually a friendly and relaxed person, turned into an active, obsessed fanatic with too much blood caffeine. Based on these nuggets of information, he tried to figure out what Paradise managed to do. In the end, he came to the conclusion that Paradise probably inserted a FIFO-write buffer between the system bus and VGA, so that when the CPU was writing to the video memory, the recorded data would immediately fall into the FIFO, and this would allow the CPU to continue processing, rather than idle each time when he was writing to the display memory.
')
Tom had neither logical elements nor enough time to implement a full FIFO, but he managed to implement a FIFO with a depth of one operation, which allowed the processor to overtake the VGA chip by one write operation. Tom was not sure that it would give good results, but it was the only thing he could do, so he implemented this system and transferred the chip to production.
It turned out that the FIFO for one operation works incredibly well: at that time, Video Seven VGA chips were the fastest on the market; they became proof of Tom’s genius and evidence of what the creator is capable of under the pressure of circumstances. However, the most remarkable thing about this story is that Paradise’s FIFO design had nothing to do with Tom’s design, and
it didn’t work at all . Paradise inserted a FIFO
read buffer between the display memory and the video output stage of the VGA chip, which allowed the video output to be read beforehand so that when the CPU needed to access the display memory, pixels could be taken from the FIFO and the CPU command was executed instantly. This really improved performance, but not as much as Tom's FIFO buffer.
And this case can be considered a good parable about the very nature of the creative process that a person can achieve. Scraps of news about Paradise's chip contained almost no factual information, but they forced Tom to go beyond the boundaries that he subconsciously set while developing the original chip design. I believe that this is the most important element of any ingenious design, be it from the field of equipment, software or any other creative sphere, and it was him who instilled in Tome news about Paradise: the ability to detect boundaries that you built during the work on a project and overcome these boundaries.
The problem, of course, is how to overcome the boundaries, the existence of which you do not even know. There is no formula for success, but two principles can serve as a good service: firstly, simplify, secondly, constantly try something new.
Usually, when you feel that the code becomes complicated, you begin to make minor changes to the frozen structure, and there is a high probability that you can improve performance and reduce the amount of code by inventing a new design. A really good structure should bring a moment of deep satisfaction, when everything falls into place, and you are surprised at how small the amount of code is needed and how well all boundary cases work.
In the process of revising the structure, it is important to study any ideas that come to mind, no matter how insane they may seem. Many of the truly magnificent ideas that I heard at first seemed nonsense because they did not fit into my current picture of the world. Often, such ideas are really insane, but just as the news about Paradise's chip stimulated Tom’s imagination, an aggressive exploration of seemingly delusional ideas can open up new possibilities for you.
Good illustration: the evolution of the three-dimensional graphics engine Quake.
The most difficult task in the world of three-dimensional graphics
I spent the last seven months working on Quake, DOOM's heir from id Software, and I assume that after three more months, by the time you read this article, Quake will already be released as shareware.
In terms of graphics, Quake will be for DOOM what DOOM was for its predecessor, Wolfenstein 3D. Quake adds real arbitrary 3D (the player can look up and down, lean to the sides and even fall to the side), detailed lighting and shadows, 3D monsters and characters instead of DOOM sprites. Soon I will talk about all this in more detail, but this month I want to talk about the task that I call the most difficult in my book in three-dimensional graphics: determining the visible surfaces (drawing the corresponding surface for each pixel), and about a very close task to it - cutting off (as fast as possible to drop invisible polygons as a way to accelerate the detection of visible surfaces). For the sake of brevity, I will use the abbreviation VSD to refer to the definition of visible surfaces (visible surface determination) and clipping (culling).
Why do I think VSD is the most difficult task of 3D graphics? Although the tasks of rasterization, for example, the imposition of textures, are just as overwhelming and important, they have a rather limited scope, and after the appearance of 3D accelerators, their solution will be transferred to “iron”; besides, their scale depends only on the screen resolution, that is, they are rather modest.
In contrast, VSD is an unlimited task and dozens of approaches are being used to solve it. But more importantly, the performance of VSD in a naive solution scales depending on the complexity of the scene, which usually increases as a quadratic or cubic function, so it quickly becomes a limiting factor in creating realistic worlds. I believe that in the next few years, VSD will become an increasingly dominant problem in real-time 3D graphics for a PC, because the detailing of 3D worlds will constantly increase. Even today, the Quake level of solid dimensions can contain about 10,000 polygons, that is, almost three times more than the DOOM level of comparable size.
Quake tier structure
Before delving into the VSD, let me mention that each level of Quake is stored as one huge three-dimensional BSP-tree. This BSP-tree, like any other BSP, divides the space, in this case, into the planes of the polygons. However, unlike the BSP-tree, which I demonstrated earlier, the BSP-tree in Quake does not store polygons in tree nodes, as part of the separation planes, but in empty (unfilled) leaves, as shown in Figure 1 in a plan view.
Figure 1. In Quake, polygons are stored in empty leaves. Dark areas are filled leaves (filled volumes, such as the insides of walls)The correct order of rendering can be obtained by drawing leaves in the BSP order “front to back” or “back to front”. In addition, since BSP-leaves are always convex, and polygons are the boundaries of BSP-leaves, looking inwards, polygons of any leaf can never overlap each other and can be drawn in any order. (This is a common property of convex polyhedra.)
Clipping and definition of visible surfaces
Ideally, the VSD process should work as follows: first, you need to discard all polygons that are outside the pyramid of visibility, and also cut off all the invisible parts of the polygons that are partially visible. Then you need to draw only those pixels of each polygon that are actually visible from the current viewpoint, as shown in Figure 2 in the top view, without wasting time on repeated redrawing of pixels; Notice how small the set of polygons in Figure 2 are to draw. And finally, in the ideal world, visibility of parts of polygons should be inexpensive, and the processing time should be the same for all possible points of view, so the game runs smoothly.
Figure 2. Ideal VSD architecture renders only visible parts of visible polygons.In the process of performing this task, it is easy to determine which of the polygons are completely outside the limits of the visibility pyramid or partially clipped, and it is quite possible to find out exactly which pixels need to be drawn. Alas, the world is far from ideal, and all these checks are costly, so the trick is to speed up or skip different checks, still creating the necessary result.
As I explained in detail in previous articles, having a BSP, you can easily and cost-effectively bypass the world in the order of "front to back" or "back to front." The simplest VSD solution is to simply walk the tree backwards, truncate each polygon with the pyramid of visibility and draw it if its face is directed into the camera and is not completely cut off (artist's algorithm). But is this an adequate solution?
For relatively simple worlds, it is quite applicable. However, it doesn’t scale well. One of the problems is that when new polygons are added to the world, for cutting off invisible polygons, more and more transformations and checks are required; after a certain threshold, this will start to significantly slow down performance.
Fortunately, this particular problem has a good workaround. As mentioned above, each leaf in a BSP tree describes a convex subspace, and nodes connected to the leaf limit the space. It is less obvious that each node in a BSP tree also describes a subspace — a subspace consisting of all the child elements of the node, as shown in Figure 3. You can think of it this way: each node splits into two parts the subspace created by the nodes located higher in tree, and the child elements of the node further cut this subspace into all the leaves emanating from the node.
Figure 3: Node E describes a shaded subspace that contains leaves 5, 6, 7 and node F.Since the knot subspace is bounded and convex, it is possible to check whether it is completely outside of the pyramid of visibility. If this is the case, then
all the children of the node will most certainly be completely clipped and can be discarded without additional processing. Since the main part of the world is usually located outside the pyramid of visibility, many polygons of the world can be cut off almost without spending huge fragments of subspaces of nodes. Performing an ideal cut check of subspaces is quite expensive, so usually for each node there are bounding parallelograms or spheres used in the cut checks.
That is, clipping on the pyramid of visibility is not a problem, and for drawing from behind forward, you can use BSP. So what's the problem?
Redrawing
The problem that the main author of DOOM and Quake technologies, John Carmack, encountered during the development of Quake, was that in the complex world many scenes in the pyramid of visibility have a huge amount of polygons. Most of these polygons are partially or completely overlapped by other polygons, but the artist’s algorithm described above requires drawing each pixel of each polygon in the visibility pyramid. However, they are often drawn only to be redrawn by others. At a Quake level of 10,000 polygons, it’s easy to get the worst case of a redraw when pixels are drawn 10 or more times; that is, in some frames, each pixel will be drawn on average 10 or more times. No rasterizer is fast enough to compensate for the order of values ​​needed to render the scene; even worse, the artist’s algorithm creates huge performance differences between the best and worst cases, which leads to a significant change in the frame rate as the viewer moves.
So, John faced the problem of reducing the amount of redrawing to acceptable levels. Ideally, each pixel should be rendered only once, but not more than two or three times in the worst case. As for the cut-off by the pyramid of visibility, ideally, it was required that all the invisible polygons in the pyramid should be cut off at almost no extra cost. The advantage would also be that only visible parts of partially visible polygons could be drawn, but the balance must be maintained: the operation should spend less than the redrawing.
When I arrived at id in early March, John already had a prototype of the engine and an outlined plan, so I assumed that our job would be simply to complete and optimize this engine. However, if I knew the history of id, I could figure out what was happening. John created not only DOOM, but also engines for Wolf 3D and several other previous games, and also developed several different versions of each engine during the development process (once he created four engines in four weeks), that is, in four years he wrote about 20 engines . John’s irrepressible pursuit of increasingly new and high-quality Quake engine technologies will end only after the game is released.
Three months after my arrival, only one element of the original VSD structure remained in the engine, and John’s desire to “try new things” has gone so far that I have never seen this before.
Sheaf tree
In John’s original Quake project, the rendering was performed from front to back using the second BSP tree, which tracks the already drawn and still empty parts of the screen that had to be painted over with the remaining polygons. Logically, this BSP-tree can be considered a two-dimensional area that describes the filled and empty parts of the screen, as shown in Figure 4, but in fact it is a three-dimensional tree, known as the “beam tree” (beam tree). A bundle tree is a set of 3D sectors (bundles) bounded by planes, projected from some central point (in our case, a viewpoint), as shown in Figure 5.
Figure 4. Quake Beaming Tree effectively splits the screen into 2D regionsFigure 5. A Quake bundle tree consists of 3D sectors, or bundles, projected from a viewpoint to the edges of polygons.In John’s project, a beam tree initially consists of a single beam describing the visibility pyramid; everything outside this bundle is considered to be filled (that is, there is no need to draw anything), and everything inside the bundle is considered empty. When a polygon was reached by the BSP world tree from front to back, this polygon was transformed into a bundle by passing planes from its edges through the viewpoint, and all parts of the bundle intersecting the empty bundles in the bundle tree were considered drawn and added to the bundle tree as a filled bundle. This went on, or until the polygons ended, or until the beam tree was completely filled. After completion of the beam tree, visible parts of polygons trapped in the beam tree were drawn.
The advantage of working with a three-dimensional bundle tree instead of a 2D region was that to determine which side of the bundle plane the vertex of the polygon is located, it suffices to check the sign of the vector product of the ray to the vertex and normal to the plane, because all bundle planes pass through the beginning coordinates (viewpoint). In addition, since the beam plane is completely described by a single normal, only the scalar product of the edge and the beam from this edge to the point of view is sufficient to generate a beam from the edge of the polygon. Finally, for the visibility pyramid group clipping described above, you can use the bounding spheres of BSP nodes.
At first, the sheaf tree property (completion, when the sheaf tree becomes full) seemed attractive because it looked like it would limit the performance in the worst cases. Unfortunately, scenes were still possible in which you could see everything up to the sky or the back wall of the world, so in the worst case you still had to check all the polygons in the pyramid of visibility with respect to the beam tree. Similar problems can also occur with small cracks due to numerical accuracy limitations. A lot of time is spent on clipping along the beam tree, and in scenes with a large visual range, for example, when viewed from the top of the level, the total cost of beam processing lowered the Quake frame rate to turtle speeds. That is, in the end it turned out that a solution with beam trees suffers from almost the same diseases as the artist’s algorithm: the worst case is much worse than the average case, and it scales poorly as the level increases.
New 3D engine every day
After the beam tree started working, John worked tirelessly on accelerating the 3D engine, constantly trying to improve its structure, rather than making changes to the implementation. At least once a week, and often every day, he went to my office and said: "I could not sleep last night, so I thought that ...", and I knew that he would slightly strain my brain again. John tried many ways to improve the beam tree, and achieved some success, but much more interesting was the abundance of completely different approaches that he generated. Some of them were barely mentioned, others were implemented overnight or weekend intense coding. In both cases, they were discarded or evolved as a result, until they began to meet the necessary criteria. Here are some examples of such approaches. I will talk about them in minimal detail, hoping that they will wake up your imagination just like Paradise's FIFO buffer woke Tom Wilson's imagination.
Dividing raycast : in a screen divided into an 8x8 grid, rays are fired; This is a very efficient operation, because the first intersection with the surface can be found simply by limiting the beam to the BSP tree, starting from the viewpoint, until the completed sheet is reached. If the neighboring rays do not fall on the same surface, then a beam is emitted in the middle between them, and this continues until all the neighboring rays fall on one surface, or are not in neighboring pixels; then a block is drawn around each ray from the polygon to which the ray fell. This approach scales very well, and is limited to the number of pixels, and the redrawing does not occur. His problem lies in falling out: there is a high probability that small polygons will be between the rays and disappear.
Surfaceless surfaces : the world is represented as a set of planes of surfaces. Polygons appear indirectly at the intersections of the planes and are extracted from the planes at the last stage before drawing. This provides fast clipping and a very small set of data (compared to polygons, the planes are described much more compactly), but it takes a long time to extract polygons from the planes.
The render buffer : it is similar to the z-buffer, but with only one bit per pixel, indicating whether a pixel has already been drawn. This eliminates redrawing, but at the cost of checking the internal loop in the buffer, additional write operations and cache misses, and worst of all, the complexity of the code increases significantly. Variants of this approach are: checking the rendering buffer one byte at a time and completely skipping fully overlapped bytes, or branching each byte of the rendering buffer into one of the 256 internal loops deployed to draw 0-8 pixels; In this process, you can theoretically take advantage of the ability of x86 processors to perform parallel perspective fractional divisions when processing 8 pixels.
Interval-based rendering : polygons are rasterized into intervals that are added to the global list of intervals and are cut off according to this list so that only the closest interval remains in each pixel. A small sort of traversing from front to back is necessary, because if there are intersections, the interval already on the list is closer. This eliminates the need for redrawing, but at the cost of arithmetic for calculating intervals; besides, each polygon must still be turned into intervals.
Portals : holes are tracked in which there are no polygons on the surfaces, because the visual range can be extended only through such portals. Drawing is performed from front to back, and when a portal is found, the polygons and portals behind it are limited to its limits, until there are no visible polygons and portals. With the recursive application of this principle, it allows you to draw only the visible parts of the visible polygons, but at the cost of a significant amount of clipping on the portals.
Breakthrough
In the end, John decided that a beam tree is a second-order structure, reflecting information already indirectly contained in the BSP-tree of the world, so he began to solve the problem of visibility information directly from the BSP-tree of the world. He spent a week on this, in the process of creating the ideal DOOM (2D) visibility architecture, in which one linear bypass of the BSP-tree DOOM creates a 2D-visibility with zero redrawing. However, solving the same problem for 3D turned out to be a much more complicated problem, and by the end of the week John had already taken out of himself the increasing complexity and constant glitches in the visibility code. Although the solution with direct processing of BSP was already close to the working version, it required more and more refinement, and the simple and clear architecture did not take shape. When I left work one Friday, John was preparing to make the solution work with direct BSP processing over the weekend.
On Monday morning, John looked like a man breaking into another world, and also like a man who didn’t sleep much. All weekend he worked on a solution with direct BSP processing, and achieved his satisfactory work, and also made some notes on how to complete it. At 3:30 am on Monday, when John lay in bed and thought about the portals, he thought that it was possible to pre-calculate and save in each sheet a list of all the leaves visible from this sheet, and then during the program just draw the visible leaves back to front, depending on which sheet was the point of view, completely ignoring all the other leaves.
The difficulty was in size: the initially uncompressed set of potentially visible leaves (potentially visible set, PVS) took several megabytes. However, PVS could be stored as a bit vector with 1 bit per leaf. This structure is very easy to compress with simple compression of zero bytes. John also suggested changing the BSP heuristics so that it generates fewer leaves. This was the opposite of what I proposed several months ago, namely, choosing the polygon as the next separator separating the smallest number of other polygons, and according to the latest data it turned out to be the best heuristics. John's approach allowed limiting the space outside the level so that the BSP handler could remove external surfaces that the player would never see, reducing the PVS size to a sufficiently large level to 20 kilobytes as a result.
In exchange for these 20 kilobytes, leaf clipping outside the visibility pyramid is accelerated (because only leaves in PVS are counted), and only a small redrawing is needed for clipping inside the visibility pyramid (PVS for a leaf contains all the leaves visible from any point on the leaf, therefore slight redrawing, usually about 50%, but capable of reaching up to 150%). Even better, pre-computation of PVS results in leveling performance; the worst case is now not much worse than the best, because additional VSD processing is no longer needed, only more polygons and perhaps a small additional redrawing in the case of complex scenes. When John first showed me his working prototype, I specifically launched the most difficult scene - the place where the frame rate was reduced to single-digit numbers, and it worked smoothly, without noticeable slowdowns.
John said, the preliminary calculation of the PVS was a logical development of the approaches that he considered, and there was no moment of “euryka”. Nevertheless, it was the discovery of a completely new, qualitatively better design, which, along with the rasterizer under development with the sorting of ribs, which completely eliminates redrawing, becomes very close to the “ideal” requirements that we chose from the very beginning.
Simplify and try something new
What does this mean? Exactly what I said at the beginning: simplify and try something new. Pre-computed PVS is simpler than any other schemes that we considered earlier (although pre-computation of PVS is an interesting problem, which we will discuss another time). In fact, at run time, pre-computed PVS is just a limited version of the artist's algorithm. Does this mean that this approach is not very deep?
Not at all. All really excellent structures seem simple and even obvious, but after they have been invented. , , .
, , . : — , . , , «» . , « »; . VSD, , . , « » , ( , ) PVS .
— , , . , , . , 3D- , ; , , .
.
,
, . , Dr. Dobb's Journal , — . , Tiny C , D-Flat , -
DDJ . ( , .) — ,
DDJ .
id Software , Quake . id Wolfenstein 3D ftp.idsoftware.com/idstuff/source
[. .: github ] ; , , ; wolfsrc.txt , , .
: , . , , , , , . ; — , . , !
Foley, James D.,
et al. ,
Computer Graphics: Principles and Practice , Addison Wesley, 1990, ISBN 0-201-12110-7 (, BSP-, VSD).
Teller, Seth,
Visibility Computations in Densely Occluded Polyhedral Environments (),
http://theory.lcs.mit.edu/~seth/ .
Teller, Seth,
Visibility Preprocessing for Interactive Walkthroughs , SIGGRAPH 91 proceedings, pp. 61-69.