📜 ⬆️ ⬇️

OpenGL ES 2.0. One million particles

In this article we will look at one of the options for implementing the particle system on OpenGL ES 2.0. Let's talk in detail about the limitations, we will describe the principles and analyze a small example.

image

Restrictions



In general, we will require two additional properties from OpenGL ES 2.0 (the specification does not require their presence):

')
Processor Information:
CPUVertex TIUAccuracyRange
Snapdragon Adreno 2xxfour23[-2 ^ 62, 2 ^ 62]
Snapdragon Adreno 3xxsixteen23[-2 ^ 127, 2 ^ 127]
Snapdragon Adreno 4xxsixteen23[-2 ^ 127, 2 ^ 127]
Snapdragon Adreno 5xxsixteen23[-2 ^ 127, 2 ^ 127]
Intel HD Graphicssixteen23[-2 ^ 127, 2 ^ 127]
ARM Mali-T6xxsixteen23[-2 ^ 127, 2 ^ 127]
ARM Mali-T7xxsixteen23[-2 ^ 127, 2 ^ 127]
ARM Mali-T8xxsixteen23[-2 ^ 127, 2 ^ 127]
NVIDIA Tegra 2/3/4000
NVIDIA Tegra K1 / X13223[-2 ^ 127, 2 ^ 127]
PowerVR SGX (Series5)eight23[-2 ^ 127, 2 ^ 127]
PowerVR SGX (Series5XT)eight23[-2 ^ 127, 2 ^ 127]
PowerVR Rogue (Series6)sixteen23[-2 ^ 127, 2 ^ 127]
PowerVR Rogue (Series6XT)sixteen23[-2 ^ 127, 2 ^ 127]
VideoCore IVeight23[-2 ^ 127, 2 ^ 127]
Vivante GC1000four23[-2 ^ 127, 2 ^ 127]
Vivante GC4000sixteen23[-2 ^ 127, 2 ^ 127]


There is a problem with NVIDIA Tegra 2/3/4 , a number of popular devices such as the Nexus 7, HTC One X, ASUS Transformer work on this series.

Particle system



Considering the systems of particles generated on the CPU, in the context of increasing the amount of data being processed (the number of particles), the main performance problem is copying (uploading) data from the RAM to the video memory on each frame. Therefore, our main task is to avoid this copying, transferring the calculations to the non-operational mode on the graphics processor.
Recall that in OpenGL ES 2.0 there are no built-in mechanisms such as Transform Feedback (available in OpenGL ES 3.0) or Compute Shader (available in OpenGL ES 3.1), allowing you to perform calculations on the GPU.


The essence of the method is to use as a buffer data for storage, characterizing a particle, values ​​(coordinates, accelerations, etc.) - textures and process them with vertex and fragment shaders. Also, as we store and load the normals, talking about the Normal mapping . The size of the buffer, in our case, is proportional to the number of particles to be processed. Each texel stores a separate value (quantities, if there are several) for an individual particle. Accordingly, the number of processed quantities is inversely related to the number of particles. For example, in order to process positions and accelerations for 1048576 particles, we need two textures of 1024x1024 (if there is no need to preserve the aspect ratio)

There are additional restrictions that we need to take into account. To be able to record any information, the format of the pixel texture data must be supported by the implementation as a color-renderable format . This means that we can use the texture as a color buffer in the frame -by- frame rendering . The specification describes only three such formats: GL_RGBA4, GL_RGB5_A1, GL_RGB565 . Considering the subject area, we need at least 32 bits per pixel to process values ​​such as coordinates or accelerations (for the two-dimensional case). Therefore, the formats mentioned above are not enough for us.

To ensure the necessary minimum, we consider two additional types of textures: GL_RGBA8 and GL_RGBA16F . Such textures are often called LDR (SDR) and HDR textures, respectively.


According to GPUINFO for 2013 - 2015, support for extensions is as follows:
ExpansionDevices (%)
OES_rgb8_rgba898.69%
GL_OES_texture_half_float61.5%
GL_OES_texture_half_float_linear43.86%
GL_EXT_color_buffer_half_float32.78%


Generally speaking, HDR textures are more suitable for our purposes. First, they allow us to process more information without compromising performance, for example, to manipulate particles in three-dimensional space without increasing the number of buffers. Secondly, there is no need for intermediate mechanisms for unpacking and packing data when reading and writing, respectively. But, due to poor support for HDR textures, we will choose LDR.

So, going back to the point, the general scheme of what we are going to do looks like this:

image

The first thing we need is to split the calculations into passes. The splitting depends on the number and type of characterizing values ​​that we are going to work on. Based on the fact that we have a texture as a data buffer and taking into account the limitations on the format of pixel data described above, each pass can process no more than 32 bits of information on each particle. For example, on the first pass, we calculated the accelerations (32 bits, 16 bits per component), on the second, the positions were updated (32 bits, 16 bits per component).

Each pass processes data in double buffering mode. This provides access to the system state of the previous frame.

The core of the aisle is the usual texture mapping into two triangles, where our data buffers act as texture maps. The general view of shaders is as follows:
//   attribute vec2 a_vertex_xy; attribute vec2 a_vertex_uv; varying vec2 v_uv; void main() { gl_Position = vec4(a_vertex_xy, 0.0, 1.0); v_uv = a_vertex_uv; } 

 //   precision highp float; varying vec2 v_uv; //    // ( ) uniform sampler2D u_prev_state; //     // ( ) uniform sampler2D u_pass_0; ... uniform sampler2D u_pass_n; //   <type> unpack(vec4 raw); <type_0> unpack_0(vec4 raw); ... <type_n> unpack_1(vec4 raw) //   vec4 pack(<type> data); void main() { //      //     v_uv <type> data = unpack(texture2D(u_prev_state, v_uv)); <type_0> data_pass_0 = unpack_0(texture2D(u_pass_0, v_uv)); ... <type_n> data_pass_n = unpack_n(texture2D(u_pass_n, v_uv)); //   <type> result = ... //     gl_FragColor = pack(result); } 


The implementation of the unpacking / packing functions depends on the quantities that we process. At this stage, we rely on the requirement, described at the beginning, of high precision calculations.
For example, for two-dimensional coordinates (components [x, y] of 16 bits), the functions might look like this:
 vec4 pack(vec2 value) { vec2 shift = vec2(255.0, 1.0); vec2 mask = vec2(0.0, 1.0 / 255.0); vec4 result = fract(value.xxyy * shift.xyxy); return result - result.xxzz * mask.xyxy; } vec2 unpack(vec4 value) { vec2 shift = vec2(1.0 / 255.0, 1.0); return vec2(dot(value.xy, shift), dot(value.zw, shift)); } 

Drawing



After the computation stage, the rendering phase follows. To access the particles at this stage, we need, for sorting, some external index. The vertex buffer ( Vertex Buffer Object ) with the texture coordinates of the data buffer will act as such an index. The index is created and initialized (unloaded into the memory of the video device) once and does not change in the process.

In this step, the requirement for access to texture maps takes effect. The vertex shader is similar to the fragment shader from the calculation stage:
 //   //   attribute vec2 a_data_uv; //  ,   uniform sampler2D u_positions; //   ( ) uniform sampler2D u_data_0; ... uniform sampler2D u_data_n; //    vec2 unpack(vec4 data); //     <type_0> unpack_0(vec4 data); ... <type_n> unpack_n(vec4 data); void main() { //      vec2 position = unpack(texture2D(u_positions, a_data_uv)); gl_Position = vec4(position * 2.0 - 1.0, 0.0, 1.0); //      <type_0> data_0 = unpack(texture2D(u_data_0, a_data_uv)); ... <type_n> data_n = unpack(texture2D(u_data_n, a_data_uv)); } 

Example



As a small example, we will try to generate a dynamic system of 1048576 particles, known as Strange Attractor .

Strange attractors

Frame processing consists of several stages:

image

Compute stack



At the computation stage, we will have only one independent pass, which is responsible for the positioning of the particles. It is based on a simple formula:
  Xn+1 = sin(a * Yn) - cos(b * Xn) Yn+1 = sin(c * Xn) - cos(d * Yn) 


Such a system is also called Peter de Jong Attractors . Over time, we will change only the coefficients.
 //   attribute vec2 a_vertex_xy; varying vec2 v_uv; void main() { gl_Position = vec4(a_vertex_xy, 0.0, 1.0); v_uv = a_vertex_xy * 0.5 + 0.5; } 

 //   precision highp float; varying vec2 v_uv; uniform lowp float u_attractor_a; uniform lowp float u_attractor_b; uniform lowp float u_attractor_c; uniform lowp float u_attractor_d; vec4 pack(vec2 value) { vec2 shift = vec2(255.0, 1.0); vec2 mask = vec2(0.0, 1.0 / 255.0); vec4 result = fract(value.xxyy * shift.xyxy); return result - result.xxzz * mask.xyxy; } void main() { vec2 pos = v_uv * 4.0 - 2.0; for(int i = 0; i < 3; ++i) { pos = vec2(sin(u_attractor_a * pos.y) - cos(u_attractor_b * pos.x), sin(u_attractor_c * pos.x) - cos(u_attractor_d * pos.y)); } pos = clamp(pos, vec2(-2.0), vec2(2.0)); gl_FragColor = pack(pos * 0.25 + 0.5); } 

Renderer Stage



At the stage rendering stage, we will render our particles with ordinary sprites.
 //   //  attribute vec2 a_positions_uv; //  (,    ) uniform sampler2D u_positions; varying vec4 v_color; vec2 unpack(vec4 value) { vec2 shift = vec2(0.00392156863, 1.0); return vec2(dot(value.xy, shift), dot(value.zw, shift)); } void main() { vec2 position = unpack(texture2D(u_positions, a_positions_uv)); gl_Position = vec4(position * 2.0 - 1.0, 0.0, 1.0); v_color = vec4(0.8); } 

 //   precision lowp float; varying vec4 v_color; void main() { gl_FragColor = v_color; } 

Result
image

Postprocessing



Finally, at the post-processing stage, we will apply several effects.

Gradient mapping . Adds color content based on the brightness of the original image.
Result
image


Bloom . Adds a slight glow.
Result
image

Code



Project repository on GitHub .
Currently available:

Continuation



In this example, we see that the computation stage, the stage rendering stage and the post-processing stage consist of several dependent passages.
In the next part, we will try to consider the implementation of multi-pass rendering, taking into account the requirements imposed by each stage.

I would be happy for comments and suggestions (you can mail yegorov.alex@gmail.com)
Thank!

Source: https://habr.com/ru/post/303142/


All Articles