
Instancing
Imagine that you have conceived a scene containing a huge number of object models, mostly these models contain the same vertex data, only the transformation matrices applied to them differ. For example, a scene with a grass field, where each blade of grass is represented by a small model composed of literally a pair of triangles. Of course, to achieve the desired effect, you will have to render this model not once, but a thousand, ten thousand times per frame. Since each leaf contains literally a pair of triangles, its render will be almost instantaneous. But here thousands of repeated calls of the render functions together will hit the performance very significantly.
If we really planned to display a lot of objects in the scene in the described way, then in the code it would look something like this:
for (unsigned int ix = 0; ix < model_count; ++ix) {
When rendering multiple
instances of the same model, we will quickly reach the bottleneck in terms of performance - they will be many calls to the primitive rendering functions. Compared to the time spent on direct rendering, transferring data to the GPU that you want to render something using functions like
glDrawArrays or
glDrawElemenets takes quite a lot of time. This time is spent on the preparation required by OpenGL before directly outputting vertex data: transferring to the GPU data about the current data reading buffer, location and format of the vertex attribute data, and so on. And all this exchange is carried out on a relatively slow bus connecting CPU and GPU. A paradoxical situation arises: the rendering of vertex data is lightning-fast, but the transfer of commands for rendering the rendering is rather slow.
It would be great to be able to send the necessary data to the video card once, and then just one call to ask OpenGL to render a variety of objects using this data. Welcome to the world of
instansing !
Instancing is a technology that allows you to display a lot of objects using a single call to the drawing function, which saves us from unnecessary exchange of CPU -> GPU when rendering. All you need to do to start using instancing is to change the
glDrawArrays and
glDrawElemenets calls to
glDrawArraysInstanced and
glDrawElementsInstanced, respectively. Versions that support instance take one additional parameter, in addition to functions already familiar with regular versions. This parameter is the number of instancing instances, i.e. the number of instantiated model instances. Thus, we once feed the GPU all the data necessary for the rendering, and then tell it how to render the desired number of object instances in just one special function call. And the video card will draw the entire set of objects without constant access to the CPU.
By itself, this possibility is not very useful: by outputting thousands of objects in the same way, in the same position, we end up with an image of a single object — all the copies will be superimposed on each other. To solve this problem in vertex shaders, the available built-in GLSL variable
gl_InstanceID .
When using functions for instance rendering, the value of this variable will increase by one for each displayed instance, starting with zero. Thus, rendering the 43rd instance of the object, in the vertex shader, we get
gl_InstanceID equal to 42. Having a unique index corresponding to the instance, we could, for example, use it to select from a large array of position vectors in order to render each instance in a certain place of the scene .
To get a better sense of the essence of instancing, let's try to figure out a simple example that renders hundreds of quads (rectangles) in normalized coordinates of a device (NDC) using a single draw call. The offset is determined by sampling from a uniform, which is an array containing one hundred displacement vectors. The result is a nice grid of rectangles that fill the entire area of the window:
Each quad is made up of two triangles, which gives us six vertices. Each vertex contains a two-component position vector in NDC and a color vector. Below are the vertex data from the example - the size of the triangles is chosen small enough to correctly fill the screen in large quantities:
float quadVertices[] = {
The quad color sets the fragment shader, which simply redirects the interpolated vertex color obtained from the vertex shader directly to the output variable:
#version 330 core out vec4 FragColor; in vec3 fColor; void main() { FragColor = vec4(fColor, 1.0); }
Nothing new for us. But in the vertex shader, things are different:
#version 330 core layout (location = 0) in vec2 aPos; layout (location = 1) in vec3 aColor; out vec3 fColor; uniform vec2 offsets[100]; void main() { vec2 offset = offsets[gl_InstanceID]; gl_Position = vec4(aPos + offset, 0.0, 1.0); fColor = aColor; }
Here we declared a uniforms-array
offsets , containing a hundred displacement vectors. In the shader code, we get the offset value by sampling from the array by the value of the variable
gl_InstanceID . As a result, using this shader, we can render a hundred quads located in different positions on the screen.
However, additional work is required - the displacement array itself will not fill up. Fill it in our application, before entering the main draw cycle:
glm::vec2 translations[100]; int index = 0; float offset = 0.1f; for(int y = -10; y < 10; y += 2) { for(int x = -10; x < 10; x += 2) { glm::vec2 translation; translation.x = (float)x / 10.0f + offset; translation.y = (float)y / 10.0f + offset; translations[index++] = translation; } }
A hundred of transfer vectors are created here that define a uniform 10x10 grid.
Do not forget to transfer the generated data to the uniform array of shader:
shader.use(); for(unsigned int i = 0; i < 100; i++) { stringstream ss; string index; ss << i; index = ss.str(); shader.setVec2(("offsets[" + index + "]").c_str(), translations[i]); }
In this piece of code, we will convert the loop variable
i into a variable of type
string so that we can dynamically set the string of the name of the uniform and get the location of the uniform of that name. For each element from the offsets array, we pass the corresponding generated offset vector.
If C ++ 11 and newer is available, better use std :: to_string (). Approx.
Now that the preparatory work is finished, you can finally proceed to the render. Remember, you must use
glDrawArraysInstanced or
glDrawElementsInstanced to invoke the instantiated render. Since we do not use the index buffer in the example, the following code is used:
glBindVertexArray(quadVAO); glDrawArraysInstanced(GL_TRIANGLES, 0, 6, 100);
The parameters passed to the drawing function are identical to those passed to
glDrawArrays , with the exception of the last parameter specifying the desired number of instances to the renderer. Since we want to output 100 quads in a 10x10 grid, we are passing the number 100. Execution of the code should lead to the derivation of an already familiar picture with a hundred colorful rectangles.
Instance Arrays
The previous example is quite a hard worker and does its job. But there is a problem: if our appetites grow, and we want to withdraw much more than 100 copies, then very soon we will rest on the
ceiling of the allowed volume of uniforms sent to the shader. An alternative to transmitting data through uniforms is
instantiated arrays (
instanced arrays ), which are specified as vertex attributes, the selection of which occurs only when the current index of the object being rendered is changed. In the end, this allows you to transfer much larger amounts of data in a more convenient way.
For normal vertex attributes, GLSL fetches new vertex data values with each successive execution of the vertex shader code. However, by specifying the vertex attribute as an instantiated array, we force GLSL to fetch a new attribute value for each successive instance of the object, rather than the next vertex of the object. As a result, you can use the usual vertex attributes for the data represented by the vertex, and instantiated arrays for the data unique to the object instance.
To better understand how this works, we modify the example code to use an instantiated array instead of a uniform array. We'll have to update the shader code by specifying a new vertex attribute:
#version 330 core layout (location = 0) in vec2 aPos; layout (location = 1) in vec3 aColor; layout (location = 2) in vec2 aOffset; out vec3 fColor; void main() { gl_Position = vec4(aPos + aOffset, 0.0, 1.0); fColor = aColor; }
Here we no longer use the
gl_InstanceID variable and can directly access the
offset attribute, without having to fetch from an array.
Since the implementation of an instantiated array is essentially based on vertex attributes, such as
position or
color , it is necessary to save the data in the vertex buffer object and configure the vertex attribute pointer. First, save the data of the
translations array in the new buffer object:
unsigned int instanceVBO; glGenBuffers(1, &instanceVBO); glBindBuffer(GL_ARRAY_BUFFER, instanceVBO); glBufferData(GL_ARRAY_BUFFER, sizeof(glm::vec2) * 100, &translations[0], GL_STATIC_DRAW); glBindBuffer(GL_ARRAY_BUFFER, 0);
Also configure the pointer of the vertex attribute and activate the attribute:
glEnableVertexAttribArray(2); glBindBuffer(GL_ARRAY_BUFFER, instanceVBO); glVertexAttribPointer(2, 2, GL_FLOAT, GL_FALSE, 2 * sizeof(float), (void*)0); glBindBuffer(GL_ARRAY_BUFFER, 0); glVertexAttribDivisor(2, 1);
The code is familiar, except for the last line calling
glVertexAttribDivisor . This function tells OpenGL when to fetch a new element from a vertex attribute. The first parameter is the index of the attribute of interest, and the second is the
attribute separator (
attribute divisor ). By default, it is set to 0, which corresponds to an attribute update for each new vertex processed by the vertex shader. By setting this parameter to 1, we tell OpenGL to update the attribute when rendering each subsequent instance. By setting the delimiter to 2, we will provide an update every two instances, and so on. In fact, setting the separator to 1, we indicate that the attribute with this separator is represented by an instantiated array.
If we now draw the scene using
glDrawArraysInstanced , we get the following picture:
Exactly the same as last time, but implemented using an instantiated array, which allows you to transfer much more data to the vertex shader to provide an instantiated render.
Purely from prank, we try to gradually reduce each quad, starting from the upper right corner in the direction of the lower left corner. Again, use the variable
gl_InstanceID , because why not?
void main() { vec2 pos = aPos * (gl_InstanceID / 100.0); gl_Position = vec4(pos + aOffset, 0.0, 1.0); fColor = aColor; }
As a result, we get a picture where the first copies are rendered tiny, but as the number of the sample approaches 100, the size of each rectangle tends to the original. This sharing of instantiated arrays and
gl_InstanceID is completely valid.
If you doubt that you have learned how to work with an instantiated renderer, or if you just want to study the structure of the entire example code, the source code is available
here .
All this, of course, is good, but these examples give a faint idea of the real usefulness of instancing. Of course, the technical details are shown here, but the very essence of the instancing is revealed only when rendering a crazy amount of similar objects - something that we haven’t reached yet. That is why in the next section we will have to go into outer space to see firsthand the true power of instancing.
Asteroid field
Imagine a scene where a huge planet is surrounded by a massive asteroid belt. Such a belt may well contain thousands, or even tens of thousands of stony formations. The output of such a scene will very quickly become almost impossible on any good video card. But it is precisely in this scenario that the use of instancing suggests itself, since all the belt asteroids may well be represented by a single model. Each asteroid will be slightly different from its neighbors due to the unique transformation matrix.
To show the positive effect of instancing, we first try to bring this scene out without using it. The scene will contain a large planet, the model of which can be downloaded
here , as well as a large set of asteroids, located in a special way around the planet. The asteroid model can be downloaded
here .
In the application code, we load the model data using the bootloader, which was analyzed in
modeling lessons.
To achieve the required scene configuration, we will create a transformation matrix that is unique for each asteroid, which will be used as a model matrix for rendering each of them. The matrix is formed in several stages. First, a transfer transform is applied to place the asteroid somewhere within the ring. We also use a small random offset to add realism to the distribution of asteroids. Next, a random scaling and rotation around the rotation vector is added. As a result, we obtain a transformation matrix, which places each asteroid somewhere in the vicinity of the planet, at the same time providing its unique look. And the asteroid belt is filled with a bunch of dissimilar stone blocks.
unsigned int amount = 1000; glm::mat4 *modelMatrices; modelMatrices = new glm::mat4[amount]; srand(glfwGetTime());
This code snippet may seem daunting, but here we just place each asteroid in the XZ plane along a circle defined by the
radius , and also add a small random offset within (-
offset ,
offset ) relative to that circle. We change the Y coordinate to a lesser degree, in order to give the asterodia ring the shape of the ring itself. In addition, scaling and rotation is applied, and the result is stored in an array of
modelMatrices of the amount of amount. In this example, 1000 model matrices are created, one per asteroid.
After loading the models of the planet and the asteroid, as well as compiling the shaders, you can proceed to the rendering code:
First, we draw a model of the planet, which has to be slightly shifted and scaled to fit into the scene. Then we render asteroids in an amount equal to the amount of the prepared array of transformations. Before the withdrawal of each asteroid, we have to transfer the corresponding data to the uniforms containing the model matrix.
It turns out a picture resembling a picture from space, with a fairly believable looking planet, surrounded by an asteroid belt:
This scene performs 1001 calls to render functions per frame, 1000 of which fall on an asteroid model. Sources are
here .
If we begin to increase the number of asteroids that we have extracted, we will quickly notice that the scene ceases to redraw smoothly, and the number of frames per second drops sharply. As soon as we reach the attempt to bring out 2000 asteroids, the render becomes so irresponsive that simply moving in the scene is almost impossible.
Now, try to do the same, but using instancing. First, let's tweak the vertex shader:
#version 330 core layout (location = 0) in vec3 aPos; layout (location = 2) in vec2 aTexCoords; layout (location = 3) in mat4 instanceMatrix; out vec2 TexCoords; uniform mat4 projection; uniform mat4 view; void main() { gl_Position = projection * view * instanceMatrix * vec4(aPos, 1.0); TexCoords = aTexCoords; }
We no longer use uniforms containing a model matrix. Instead, we declare a new vertex attribute that stores the matrix, in which we will place an instantiated array of transformation matrices. It should be noted that when specifying a vertex attribute with a type size greater than the size of
vec4 , one particular feature must be taken into account. Since
mat4 is essentially four connected
vec4 , as many as four indexes of the
location of the vertex attribute will be reserved for this attribute. Here, we assigned an allocation index to the attribute equal to 3, which means that the columns of the matrix receive the placement indexes 3, 4, 5 and 6.
In the client code, we will have to set pointers to vertex attributes for each of these implicitly defined location indices. And do not forget to initialize each of them as an instantiated array:
I note that here we have flipped a little bit by declaring
VAO as a public and not a private variable of the
Mesh class - this allowed us to simplify access to the object of the vertex array. Let not the most elegant and clean solution, but for the needs of a simple example will fit. In addition to this small hack, the rest of the code should be clear. Here we simply indicate how OpenGL should interpret the contents of the buffer for each element of the vertex attribute represented by the matrix. We also indicate that each of these attributes is an instantiated array.
Next, we again turn to the
VAO of the prepared models and call the render:
Here the render is performed with the same number of asteroids as in the previous example, but now instancing is used. Visually, the result will be similar. . 1000 1500 . 100000 . , 576 57 - !
100000
radius = 150.0f offset = 25.0f . –
.
, 100000 . , .
, . , , , , – , .
PS :
- . , !