A year ago I came across a series of very interesting articles by Mr. Simon . Simon loves to disassemble how games are created, namely the graphic solutions of one or another element in the game. Starting from chipping on the edges of plates , ending with how cutting of pieces from objects is implemented . But especially interesting is his series of articles under the general title “ Render Hell ”, in which he examines in detail how the rendering of 3D objects occurs at the hardware level (and programmatically too).
Free translation. I made it for myself so that at some point I could come back and read what I could not catch on the first try, or just forget.
So, let's start?
Book One. Overview
(original book here ) ')
Guys, hold on: in terms of PCs, your work in 3D is nothing more than a list of vertices and textures. All this data is converted into the Next-Gene image, and this is mainly done using a system processor ( CPU ) and a graphics processor ( GPU ).
First, data is loaded from your hard disk ( HDD ) into random access memory ( RAM ) for quick access to it. After that, the objects ( Meshes ) and textures that are needed for displaying (rendering) are loaded into the video memory of the video card ( VRAM ). This is due to the fact that access to VRAM from a video card is much faster.
If the texture is no longer needed (after unloading in VRAM), it can be removed from RAM (But you must be sure that you will not need it anytime soon, because unloading from the HDD takes a very long time). The meshes should remain in RAM, because most likely the processor will want to have access to them, for example, to determine a collision.
Amended by the second edition
Now all the information on the video card (in the video card RAM - VRAM). But the transfer speed from VRAM to GPU is still low. A GPU can process much more information than it receives.
Consequently, the engineers put a small amount of memory straight into the video processor (GPU) itself and called this memory cache (Cache). This is a small amount of memory, because it is incredibly expensive to put a large amount of memory directly into the processor. The GPU copies to cache only what it needs now and in small portions.
Our copied information now lies in a Level 2 cache (L2 Cache). Basically, this is a small amount of memory (for example, on the NVDIA GM204, the volume is 2048 kB), which is installed in the GPU and is available for reading much faster than VRAM.
But even this is not enough to work effectively! Therefore, there is a small level 1 cache (L1 Cache). On the NVIDIA GM204, it is 384Kbytes, which is available not only for the GPU, but also for the nearest co-processors.
In addition, there is another memory that is intended for input and output data for the GPU cores: for file registration and recording. From here, the GPU takes, for example, two types of values, counts them and records the results in a register: After that, these results are placed back into L1 / L2 / VRAM to make room for new calculations. You, as a programmer, usually do not have to worry about their calculations.
Why does all this work without problems? As stated above, this is all about access time. And if we compare the access time, for example, HDD and L1 Cache, then a black hole between them is such a difference. You can also read about the exact figures for the delay on this link: gist.github.com/hellerbarde/2843375
Before the render starts to fire, the CPU sets some global values that describe how the meshes should be rendered. These values are called Render State.
Render state
These are sort of parameters for how meshes should be rendered. The parameters contain information about what the texture should be, what vertex and pixel shaders should be used to draw subsequent meshes, light, transparency, and so on.
AND IT IS IMPORTANT TO UNDERSTAND: Each mesh that the CPU sends to the GPU for rendering will be rendered under the parameters (Render State) that were specified before it. That is, you can render a sword, a stone, a chair and a car - all of them will be rendered under one texture, if you do not specify RenderState drawing parameters in front of each of these objects.
When all the preparations are complete, the CPU can finally call the GPU and tell it to draw. This command is called Draw Call.
Drawcall
This is the CPU command for the GPU to render one mesh. The command specifies a specific mesh for the render and does not contain any information about materials and other things - this is all indicated in the Render State.
Mesh is already loaded in VRAM memory.
After the command is sent, the GPU takes the RenderState data (material, textures, shaders), as well as all the information about the vertices of the object, and converts this data into (we want to believe) beautiful pixels on your screen. This conversion process is called Pipeline (Google likes to translate this word as “pipeline”).
Pipeline
As mentioned earlier, any objects are no more than a collection of vertices and texture information. To convert this into a brain-bearing picture, the video card creates triangles from vertices, calculates how they should be lit, draws textures on them, and so on.
These actions are called Pipeline States. Most often, most of this work is done using a GPU video card. But sometimes, for example, the creation of triangles is done with the help of other co-processors of the video card.
Amended by the second edition
This example is extremely simplified and should be considered only as a rough overview or “logical” pipeline: each triangle / pixel goes through logical steps, but what actually happens is slightly different from the one described.
Here is an example of the steps that iron takes for a single triangle:
Rendering a picture takes place by solving tens, hundreds of thousands of similar tasks, drawing millions of pixels on the screen. And all this should (I hope) fit at least 30 frames per second.
Modern processors have 6-8 cores each, while video processors have several thousand cores (though not as powerful as the CPU, but powerful enough to handle a bunch of vertices and other data).
Book number 2 is devoted to the details of the organization of high and low levels in the graphics processor.
When information (for example, a handful of vertices) gets on the pipeline, then several cores carry out the work of transformation from vertices into a full-fledged image, so a handful of these elements are formed into the image simultaneously (in parallel).
Now we know that the GPU can process information in parallel. But what about the communication between the CPU and the GPU? Does the CPU wait for the GPU to finish before sending new tasks to it?
NO!
Fortunately, no! The reason for this is the weak link, which is formed like a bottleneck, when the CPU is not able to send the following tasks quickly enough. The solution is a list of commands in which the CPU adds commands to the GPU while it processes the previous command. This sheet is called - Command Buffer.
Command buffer
The command buffer makes it possible for CPU and GPU operations to be independent of each other. When the CPU wants to render something, it stuffs the objects into the command queue, and when the GPU is released, it takes them out of the list (buffer) and starts executing the command. The principle of taking the team - the order of execution. The first team came, and it will be the first to be executed.
By the way, there are different teams. For example, one command can be DrawCall, the second one can change RenderState to new parameters.
Well, in general, this is the first book. Now you have an idea of how the information is rendered, called Draw Calls, Render State, and interact between the CPU and the GPU.