📜 ⬆️ ⬇️

We write a shader on AGAL

It's no secret that Flash Player 11 has support for GPU graphics acceleration. The new version introduces the Molehill API, allowing you to work with a video card at a fairly low level, which, on the one hand, gives you full imagination, on the other hand, requires a deeper understanding of the principles of modern 3D graphics.

This article focuses on the shader writing language - AGAL (Adobe Graphics Assembly Language). It is assumed that the reader is familiar with the basic fundamentals of modern real-time 3D graphics, and ideally has experience with OpenGL or Direct3D . For the rest I will spend a little digression:

Syntax

In the current implementation of AGAL, ​​the Shader Model 2.0 trim is used, i.e. iron fitchist limited to 2005. But it is worth remembering that this restriction is only the capabilities of the shader program, but not the performance of the hardware. Perhaps in future versions of Flash Player, the bar will be raised to SM 3.0, and we can render several textures at once and make a texture sample directly from the vertex shader, but taking into account the Adobe policy, this will happen not earlier than the next generation of mobile devices.

Any program on AGAL is essentially a low-level assembly language. The language itself is very simple, but it requires a fair amount of attentiveness. The shader code is represented by a set of instructions like:
opcode [dst], [src1], [src2] 
what in the free interpretation means “to execute the opcode command with the src1 and src2 parameters, returning the value in dst”. A shader can contain up to 256 instructions. As dst, src1 and src2 are the names of registers: va, vc, fc, vt, ft, op, oc, v, fs. Each of these registers, with the exception of fs, is a four-dimensional (xyzw or rgba) vector. It is possible to work with individual components of the vector, including swizzling (a different order):
 dp4 ft0.x, v0.xyzw, v0.yxww 

Consider each of the types of registers in more detail.

Register output

As a result of the calculation, the vertex shader is obliged to write the value of the window position of the vertex to the op (output position) register, and the fragment shader - to oc (output color) the value of the final pixel color. In the case of a fragment shader, it is possible to cancel the kil instruction, which will be described below.
')
Register attribute

A vertex can contain up to 8 vector attributes, which are accessed from the shader via the va registers, whose position in the vertex buffer is specified by the Context3D.setVertexBufferAt function. Attribute data can be in FLOAT_1, FLOAT_2, FLOAT_3, FLOAT_4 and BYTES_4 formats. The number in the title indicates the number of components of the vector. It should be noted that in the case of BYTES_4, the component values ​​are normalized, i.e. are divided by 255.

Interpolator register

In addition to writing to the op register, the vertex shader can transfer up to 8 vectors to the fragment shader via the v registers. The values ​​of these vectors will be linearly interpolated over the entire area of ​​the polygon during rasterization. We illustrate the work of interpolators using the example of a triangle, at the vertices of which the attribute displayed by the fragment shader is stored:
 // vertex mov op, va0 //   -  mov v0, va1 //        // fragment mov oc, v0 //       



Register Variable

In the vertex and fragment shaders, up to 8 registers vt and ft are available for storing intermediate calculation results. For example, in a fragmentary shader, you must calculate the sum of four vectors taken from the vertex program (v0..v3 registers):
 add ft0, v0, v1 // ft0 = v0 + v1 add ft0, ft0, v2 // ft0 = ft0 + v2 add ft0, ft0, v3 // ft0 = ft0 + v3 

As a result, ft0 will store the amount we need, and everything seems to be great, but there is a seemingly unobvious optimization possibility that is directly related to the architecture of the video card software pipeline and is partly the reason for its high performance.

The basis of shaders laid the concept of ILP (Instruction-level parallelism), which, judging from the name, allows you to perform several instructions simultaneously. The main condition for using this mechanism is the independence of instructions from each other. In relation to the example above:
 add ft0, v0, v1 // ft0 = v0 + v1 add ft1, v2, v3 // ft1 = v2 + v3 add ft0, ft0, ft1 // ft0 = ft0 + ft1 

The first two instructions will be executed at the same time, since work with independent registers. It follows that the key role in the performance of your shader is played not so much by the number of instructions as by their independence from each other.

Register-constant

Storage of numerical constants directly in the shader code is not allowed, i.e. all the constants necessary for operation should be passed to the shader before the Context3D.drawTriangles call, and will be available in the vc (128 vectors) and fc registers (28 vectors). It is possible to refer to the register by its index using square brackets, which is very convenient when implementing skeletal animation or indexing materials. It is important to remember that the operation of specifying shader constants is relatively expensive and should be avoided if possible. For example, it makes no sense to transfer to the shader the projection matrix in front of the render of each object, if it does not change in the current frame.

Sampler Register

Up to 8 textures can be passed to the fragment shader using the Context3D.setTextureAt function, which are accessed via the corresponding fs registers, which are used exclusively in the tex statement. Let's slightly change the example with a triangle, and as the second attribute of the vertex, we will transfer the texture coordinates, and in the fragment shader we will make a texture sample using these already interpolated coordinates:
 // vertex mov op, va0 //  mov v0, va1 //   -   // fragment tex oc, v0, fs0 <2d,linear> //    



Operators

At the moment (October 2011), AGAL implements the following operators:
  mov dst = src1 neg dst = -src1 abs dst = abs(src1) add dst = src1 + src2 sub dst = src1 – src2 mul dst = src1 * src2 div dst = src1 / src2 rcp dst = 1 / src1 min dst = min(src1, src2) max dst = max(src1, src2) sat dst = max(min(src1, 1), 0) frc dst = src1 – floor(src1) sqt dst = src1^0.5 rsq dst = 1 / (src1^0.5) pow dst = src1^src2 log dst = log2(src1) exp dst = 2^src1 nrm dst = normalize(src1) sin dst = sine(src1) cos dst = cosine(src1) slt dst = (src1 < src2) ? 1 : 0 sge dst = (src1 >= src2) ? 1 : 0 dp3   dst = src1.x*src2.x + src1.y*src2.y + src1.z*src2.z dp4       dst = src1.x*src2.x + src1.y*src2.y + src1.z*src2.z + src1.w*src2.w crs   dst.x = src1.y * src2.z – src1.z * src2.y dst.y = src1.z * src2.x – src1.x * src2.z dst.z = src1.x * src2.y – src1.y * src2.x m33     33 dst.x = dp3(src1, src2[0]) dst.y = dp3(src1, src2[1]) dst.z = dp3(src1, src2[2]) m34     34 dst.x = dp4(src1, src2[0]) dst.y = dp4(src1, src2[1]) dst.z = dp4(src1, src2[2]) m44     44 dst.x = dp4(src1, src2[0]) dst.y = dp4(src1, src2[1]) dst.z = dp4(src1, src2[2]) dst.w = dp4(src1, src2[3]) kil       ,   src1  ,     alpha-test,       . tex       dst     src1   src2    ,   , : tex ft0, v0, fs0 <2d,repeat,linear,miplinear>     :   2d, cube  nearest, linear  nomip, miplinear, mipnearest  clamp, repeat 

The remaining operators, including conditional jumps and loops, are planned to be implemented in future versions of Flash Player. But this does not mean that now even the usual if cannot be used, the slt and sge instructions are quite suitable for these tasks.

Effects

We got acquainted with the basics, now the most interesting part of the article is the practical application of new knowledge. As mentioned at the very beginning, the ability to write a shader completely untie the hands of a graphics programmer, i.e. the actual limitations are only in the imagination and mathematical ingenuity of the developer. Previously, it was possible to make sure that the assembly language itself is simple, but behind the simplicity lies the complexity of “tasting” in the already forgotten code. Therefore, I highly recommend commenting on key sections of the shader code in order to quickly navigate in it if necessary.

Stocking

The starting point for all the following examples will be a small “blank” in the form of a teapot. Unlike the example with a triangle, we need a matrix of projection and transformation of the camera, to create the effect of perspective and rotation around the object. We will pass it to the constant registers. Here it is important to remember that the 4x4 matrix occupies exactly 4 registers, and when writing it to the vc0 register, v0..v3 will be occupied. We also need a constant vector of numbers often used in the shader (0.0, 0.5, 1.0, 2.0).
Total, the base code of the shader will look like this:
 // vertex m44 op, va0, vc0 //  viewProj  // fragment mov ft0, fc0.xxxz //   ft0    mov oc, ft0 //  ft0     



Texture mapping

Up to 8 textures are possible in the shader, with an almost unlimited number of samples. This means that this limit does not matter when using atlases or cubic textures. Let's improve our example and, instead of setting the color in the fragment shader, we will get it from the texture by the interpolator texture coordinates adopted from the vertex shader:
 // vertex ... mov v0, va1 //       // fragment tex ft0, v0, fs0 <2d,repeat,linear,miplinear> 



Lambert shading

The most primitive lighting model imitating the real. Based on the position that the intensity of light falling on a surface linearly depends on the cosine of the angle between the falling vectors and the normal to the surface. From the school course of mathematics we recall that the scalar product of unit vectors gives the cosine of the angle between them, therefore, our Lambert lighting formula will look like:
Lambert = Diffuse * (Ambient + max (0, dot (LightVec, Normal)))
Color = Lambert
where Diffuse is the color of the object at a point (taken from a texture for example),
Ambient - background color
LightVec - a unit vector from a point to a light source
Normal - perpendicular to the surface
Color - the final color of the pixel

The shader will take two new constant parameters: the position of the source and the value of the background light:
 // vertex ... mov v1, va2 // v1 = normal sub v2, vc4, va0 // v2 = lightPos - vertex (lightVec) // fragment ... nrm ft1.xyz, v1 // normal ft1 = normalize(lerp_normal) nrm ft2.xyz, v2 // lightVec ft2 = normalize(lerp_lightVec) dp3 ft5.x, ft1.xyz, ft2.xyz // ft5 = dot(normal, lightVec) max ft5.x, ft5.x, fc0.x // ft5 = max(ft5, 0.0) add ft5, fc1, ft5.x // ft5 = ambient + ft5 mul ft0, ft0, ft5 // color *= ft5 



Phong shading

Introduces the concept of glare from a light source into a Lambert lighting model. It implies that the intensity of the flare is determined by the power function of the cosine of the angle between the vector to the source and the direction resulting from the reflection of the observer's vector relative to the normal to the surface.
Phong = pow ( max (0, dot (LightVec, reflect (-ViewVec, Normal))), SpecularPower) * SpecularLevel
Color = Lamber + Phong
where ViewVec is an observer view vector
SpecularPower - the degree that determines the size of the flare
SpecularLevel - the intensity level of the flare or its color
reflect - the function of calculating the reflection f (v, n) = 2 * n * dot (n, v) - v

For complex models, it is customary to use Specular and Gloss maps, which determine the color / intensity (SpecularLevel), as well as the size of the flare (SpecularPower) on different parts of the textural space of the model. In our case, we will manage constant values ​​of a degree and intensity. In the vertex shader, we will pass a new parameter - the position of the observer for the subsequent calculation of ViewVec:
 // vertex ... sub v3, va0, vc5 // v3 = vertex - viewPos (viewVec) // fragment ... nrm ft3.xyz, v3 // viewVec ft3 = normalize(lerp_viewVec) //    reflect(-viewVec, normal) dp3 ft4.x, ft1.xyz ft3.xyz // ft4 = dot(normal, viewVec) mul ft4, ft1.xyz, ft4.x // ft4 *= normal add ft4, ft4, ft4 // ft4 *= 2 sub ft4, ft3.xyz, ft4 // reflect ft4 = viewVec - ft4 // phong dp3 ft6.x, ft2.xyz, ft4.xyz // ft6 = dot(lightVec, reflect) max ft6.x, ft6.x, fc0.x // ft6 = max(ft6, 0.0) pow ft6.x, ft6.x, fc2.w // ft6 = pow(ft6, specularPower) mul ft6, ft6.x, fc2.xyz // ft6 *= specularLevel add ft0, ft0, ft6 // color += ft6 



Normal mapping

A relatively simple method to simulate surface relief using normal textures. The direction of the normal in such a texture is usually given in the form of an RGB value obtained from reducing its coordinates to the range of 0..1 (xyz * 0.5 + 0.5). Normals can be represented both in the object space (Object Space) and in relative space (Tangent Space), built on the basis of texture coordinates and the normal to the vertex. The first one has a number of sometimes significant drawbacks in the form of a large memory consumption for textures due to the impossibility of tiling and mirror-texturing, but it allows saving on the number of instructions. In the example, we will use a more flexible and generic version with Tangent Space, for which, in addition to the normal, we will need two more additional vectors of the basis Tangent and Binormal. The implementation is reduced to transferring the viewVec and lightVec vectors to the TBN (Tangent, Binormal, Normal) basis, and further sampling the relative normal from the texture in the fragment shader.
 // vertex ... // transform lightVec sub vt1, vc4, va0 // vt1 = lightPos - vertex (lightVec) dp3 vt3.x, vt1, va4 dp3 vt3.y, vt1, va3 dp3 vt3.z, vt1, va2 mov v2, vt3.xyzx // v2 = lightVec // transform viewVec sub vt2, va0, vc5 // vt2 = vertex - viewPos (viewVec) dp3 vt4.x, vt2, va4 dp3 vt4.y, vt2, va3 dp3 vt4.z, vt2, va2 mov v3, vt4.xyzx // v3 = viewVec // fragment tex ft1, v0, fs1 <2d,repeat,linear,miplinear> // ft1 = normalMap(v0) // 0..1 to -1..1 add ft1, ft1, ft1 // ft1 *= 2 sub ft1, ft1, fc0.z // ft1 -= 1 nrm ft1.xyz, ft1 // normal ft1 = normalize(normal) ... 



Toon shading

A type of non-photorealistic lighting model that simulates a cartoon shading layout. It is implemented in a variety of ways, the simplest of which is to select a color from a 1D texture along the cosine of an angle from the Lambert model. In our case, use the 16x1 texture as an example:

 // fragment ... dp3 ft5.x, ft1.xyz, ft2.xyz // ft5 = dot(normal, lightVec) tex ft0, ft5.xx, fs3 <2d,nearest> // color = toonMap(ft5) 



Sphere mapping

The easiest option to simulate reflection, often used for the effect of chromium metal. Represents the environment as a texture with a spherical fish eye distortion, as shown below:

The main task is to convert the coordinates of the reflection vector to the corresponding texture coordinates:
uv = (xy / sqrt (x ^ 2 + y ^ 2 + (z + 1) ^ 2)) * 0.5 + 0.5
Multiplication and shift by 0.5 are needed to bring the normalized result to the space of textural coordinates 0..1. In the simple case, for a perfectly reflective surface, the effect of the map is additive, and for more complex cases when a diffuse component is required, it is customary to use the Fresnel formula approximation. Also for complex models, Reflection maps are often used, indicating the intensity of reflection of different parts of the model texture.
 // fragment ... add ft6, ft4, fc0.xxz // ft6 = reflect (x, y, z + 1) dp3 ft6.x, ft6, ft6 // ft6 = ft6^2 rsq ft6.x, ft6.x // ft6 = 1 / sqrt(ft6) mul ft6, ft4, ft6.x // ft6 = reflect / ft6 mul ft6, ft6, fc0.y // ft6 *= 0.5 add ft6, ft6, fc0.y // ft6 += 0.5 tex ft0, ft6, fs2 <2d,nearest> // color = reflect(ft6) 


On this probably finish. The examples presented here, for the most part, describe the properties of the object's material, but the shaders find their application in other tasks, such as skeleton animation, shadows, water, and other relatively complex tasks (including non-visual ones). And with proper leveling up of skills, they allow to implement fairly complex things in a short time by the type:


Conclusion

Flash games are easy! an example of an article .

Source: https://habr.com/ru/post/130454/


All Articles