Today we will continue to study the graphics pipeline, and I will talk about such wonderful things as Compute Shader and Geometry Shader using the example of creating a system for 1000000+ particles, which in turn are not points, but squares ( billboard quads ) and have their own texture. In other words, we will derive 2000000+ textured triangles with FPS> 100 (on a budget GeForce 550 Ti video card ).
Introduction
I wrote a lot about shaders among my articles, but we always operated on only two types: Vertex Shader , Pixel Shader . However, with the advent of DX10 + , new types of shaders have appeared: Geometry Shader, Domain Shader, Hull Shader, Compute Shader . Just in case, let me remind you what the graphics pipeline looks like now: ')
At once I will make a reservation that in this article we will not touch on Domain Shader and Hull Shader , I will write about tessellation in the following articles.
Only Geometry Shader remains unexplored. What is a Geometry Shader ?
Chapter 1: Geometry Shader
Vertex Shader is engaged in processing vertices, Pixel Shader is engaged in processing pixels, and as you can guess, Geometry Shader is engaged in processing primitives.
This shader is an optional part of the pipeline, i.e. it may not be at all: verteks directly go to the Primitive Assembly Stage and the rasterization of the primitive goes on. Geometry Shader is located between the Primitive Assembly Stage and the Rasterizer Stage .
At the entrance, he can get information both about the assembled primitive and about neighboring primitives:
At the exit we have a stream of primitives, where we in turn add a primitive. Moreover, the type of the returned primitive may differ from the input. For example - we receive Point , we return Line . An example of a simple geometric shader that does nothing and simply connects the input to the output:
DirectX10 + has such a type of buffers as Structured Buffer , such a buffer can be described by a programmer as he pleases, i.e. in the very classical sense, it is a homogeneous array of structures of a certain type, which is stored in the memory of the GPU .
Let's try to create a similar buffer for our particle system. Let's describe what properties a particle has ( on the C # side ):
Where initialParticles is an array of GPUParticleData with the size of the desired number of particles.
It is worth noting that the flags when creating the buffer are set as follows:
BufferFlags.ShaderResource - to be able to access the buffer from the shader BufferFlags.StructuredBuffer - indicates the buffer belongs BufferFlags.UnorderedAccess - for the ability to change the buffer from the shader
Create a buffer with a size of 1,000,000 elements and fill it with random elements:
GPUParticleData[] initialParticles = new GPUParticleData[PARTICLES_COUNT]; for (int i = 0; i < PARTICLES_COUNT; i++) { initialParticles[i].Position = random.NextVector3(new Vector3(-30f, -30f, -30f), new Vector3(30f, 30f, 30f)); }
After that, we will store a buffer of 1,000,000 elements with random values in the memory of the GPU .
Chapter 3. Render Point-Particles
Now we need to figure out how to draw this buffer? After all, we do not even have vertices! We will generate vertexes on the go, based on the values of our structural buffer.
Create two shaders - Vertex Shader and Pixel Shader . First, let's describe the input data for shaders:
In this country of magic, we simply read a particular particle from the particle buffer according to the current VertexID (and we have it in the range from 0 to 999999) and using the position of the particle, we project it into the screen space.
If you have not forgotten the first chapter, you can safely turn our set of points into full-fledged Billboard consisting of two triangles.
I’ll talk a little bit about what QuadBillboard is : this is a square made of two triangles and this square is always turned towards the camera.
How to create this square? We need to come up with an algorithm for quickly generating such squares. Let's take a look at something in the Vertex Shader . There we have three spaces when building the SV_Position :
World Space - vertex position in world coordinates
View Space - vertex position in view coordinates
Projection Space - vertex position in screen coordinates
View Space is just what we need, because these coordinates are just relative to the camera and the plane (-1 + px, -1 + py, pz) -> (1 + px, 1 + py, pz) created in this space will always have a normal, which is aimed at the camera.
At the output of SV_Position, we will transmit not a ProjectionSpace-position , but a ViewSpace-position , in order to create new primitives in the Geometry Shader in the ViewSpace .
Now, everything is ready, we have a special buffer in the memory of the GPU and there is a particle render, built using the Geometry Shader , but such a system is static. You can, of course, change the position on the CPU , just read the buffer data from the GPU every time, change it, and then load it back, but what kind of GPU Power can we talk about? Such a system will not survive and 100,000 particles.
And to work on the GPU with such buffers, you can use a special shader - Compute Shader . It is outside the traditional render-pipeline and can be used separately.
What is a Compute Shader ?
In its own words, the Compute Shader is a special stage of the pipeline that replaces all the traditional ones (but can still be used with it), allows you to execute arbitrary code using the GPU , read / write data into buffers (including texture ). Moreover, the execution of this code occurs as parallel as the developer sets up.
At the very beginning of the code is the numthreads field, which indicates the number of threads in the group. For now, we will not use group streams and make sure that there is one stream per group. uint3 DTiD.xyz points to the current thread.
The next step is to launch such a shader, it is performed as follows:
In the Dispatch method, we indicate how many groups of threads we should have, and the maximum number of each dimension is limited to 65536 . And if we execute such a code, the shader code on the GPU will be executed once, because we have 1 group of threads, in each group 1 stream. If you put, for example, Dispatch (5, 1, 1) - the shader code on the GPU will be executed five times, 5 groups of streams, 1 stream in each group. If at the same time change also numthreads -> (5, 1, 1) , then the code will be executed 25 times, and in 5 groups of threads, in each group there are 5 threads. More detail can be considered if you look at the picture:
Now, back to the particle system, what do we have? We have a one-dimensional array of 1,000,000 elements and the task is to process the positions of the particles. Since Since particles move independently of each other, this task can be very well parallelized.
In DX10 (we use this version of CS , to support DX10 cards) the maximum number of flows per group of flows is 768 , and in all three dimensions. I create 32 * 24 * 1 = 768 threads in total for each group of threads, i.e. our one group is able to process 768 particles (1 stream - 1 particle). Further, it is necessary to calculate how many flow groups one needs (taking into account the fact that one group will process 768 particles) in order to process the N -th number of particles. You can calculate it by the formula:
Another magic happens here, we use our particle buffer as a special resource: RWStructuredBuffer , this means that we can read and write to this buffer. (!) Prerequisite for writing - this buffer must be marked with the UnorderedAccess flag when created .
Well, the final stage, we set the resource for the shader as UnorderedAccessView to our buffer and call Dispatch :
After completing the execution of the code, it is necessary to remove the UnorderedAccessView from the shader, otherwise we will not be able to use it!
Let's do something with particles, we will write the simplest solver:
If we talk about particles, then nothing prevents you from creating a full-fledged and powerful system of particles: the points are easy enough to sort (to ensure transparency), to apply the soft particles technique when drawing, and also to take into account the lighting of “non-luminous” particles. Computational shaders are mainly used to create the Bokeh Blur effect (we also need geometric ones), to create a Tiled Deferred Renderer , etc. Geometric shaders, for example, can be used when you need to generate a lot of geometry. The most striking example is grass and particles. By the way, the use of GS and CS are endless and limited only by the imagination of the developer.
Conclusion 2
Traditionally, I attach the full source code and demo to the post. PS to run the demo - you need a video card with support for DX10 and Compute Shader.
Conclusion 3
I am very pleased when people show interest in what I write.And the reaction to the article is very important for me, be it in the form of a plus or a minus with a constructive comment.So I can determine which topics are more interesting for the habrasoobshchestvo, and which are not.