The sample Noise code attached to this article includes the implementation of the Perlin noise generation algorithm, useful for shaping natural-looking textures, such as marble or clouds, for three-dimensional graphics. The article includes a test that uses the Perlin noise algorithm to create the “cloud” image. (For more information about the Perlin noise algorithm, see the Reference section.) The two-dimensional and three-dimensional versions of the algorithm are included. This means that the functions take two or three data sets as input to create one output Perlin noise value.
The Noise example also includes pseudo-random number generator (RNG) functions that produce relatively good results, enough to make the resulting image really look random. The one-dimensional, two-dimensional and three-dimensional versions are included: the number of measurements and in this case is equal to the number of input data sets, on the basis of which one pseudo-random output value is formed.
Introduction and Motivation
In many applications, some degree of “randomness” or, more precisely, “pseudo-randomness” is required. We are talking about sequences of values ​​that seem to a person arbitrary, erratic, that is, "noise". In this case, for repeatability purposes, applications often require that a random number generator can, with sufficient reliability, produce exactly the same sequence when obtaining the same input values ​​(or sets of values).
In most algorithms for generating random numbers, this requirement is satisfied by the fact that each generated value depends on the previous value formed, and the first value formed in the sequence is derived directly from the input value. This approach to random number generators is difficult for languages ​​with a high level of parallel computing, such as OpenCL. If you force each of the many computing threads to wait for one sequential source of random numbers, then all the advantages of parallelization and algorithms based on parallel processing will be reduced to zero.
One of the possible ways to solve this problem is to calculate a large table of random values ​​in advance. After that, each of the parallel threads will create unique, but strictly defined indexes on this table. For example, an
OpenCL core that processes an image can select an entry from a previously created table by calculating an index based on the coordinates of the pixel that is being processed or created by the core.
')
The disadvantage of this approach is the need for a lengthy sequential process of creating random numbers before the start of the parallel algorithm, which limits the increase in performance during parallelization. Also in this case, it is required in advance (before running the parallel algorithm) at least approximately to know the number of random numbers needed. This can be difficult for parallel algorithms that need to dynamically determine exactly how many random values ​​each stream will use.
In the OpenCL core-level functions in the Noise code sample, an approach that is more appropriate to the OpenCL partitioning algorithm for parallel operations is used.
Creating noise and random numbers for OpenCL
In OpenCL, a global workspace (work item array) is defined using one, two, or three dimensions. Each work item in this global space has a unique set of identifying integer values ​​corresponding to the coordinates along the X, Y, and Z axes in the global space.
The Perlin noise and random number generator functions in the Noise example create a random number (noise) based on up to three input values, which can be the global identifiers of each work item. There is another algorithm: one or several values ​​can be created by combining global identifiers and any data values ​​obtained or created by the kernel.
For example, the following fragment of the OpenCL core code shows the creation of random numbers based on a two-dimensional global identifier of the work item.
kernel void genRand() { uint x = get_global_id(0); uint y = get_global_id(1); uint rand_num = ParallelRNG2( x, y ); ...
Figure 1. Example of using random numbers (two dimensions)This approach makes it possible to run the functions of a random number generator or noise in parallel between work items, but at the same time get results with repeated sequences of values ​​that differ randomly both between work items and between other values ​​within the same work item. If you need to create multiple two-dimensional value sets, you can use three-dimensional creation functions: the first two input values ​​will be obtained from the global identifiers of the work item, and the third dimension - by successively increasing any initial value for each additional value required. This algorithm can be extended to work with several sets of three-dimensional random values ​​or noise values, as in the following example with Perlin noise.
kernel void multi2dNoise( float fScale, float offset ) { float fX = fScale * get_global_id(0); float fY = fScale * get_global_id(1); float fZ = offset; float randResult = Noise_3d( fX, fY, fZ ); ...
Figure 2. An example of using the Perlin noise algorithm (three dimensions)Restrictions
The Noise_2d and Noise_3d functions use the same basic Perlin noise algorithm, but the implementation is different based on the recommendations of Perlin. (See the first link in the list of reference materials.) In the Noise sample, only Noise_3d is used in the noise example, and the Noise_2d test core is included in the Noise.cl file for readers who want to change this example and experiment with it.
Noise_2d and Noise_3d functions should be called with floating-point input values. Values ​​must span a range, for example (0.0, 128.0), to set the size of the “table” (see Figure 3) for random values. Readers should look at the cloud example in order to understand how Perlin’s noise can be transformed into a variety of “natural-looking” images.
The default ParallelRNG function used in the random test provides random (and seemingly similar) results, but is not the fastest random number generator algorithm. This function is based on the “Wong hash”, which was not originally intended to be used as a random number generator. Nevertheless, some common functions of random number generators (commented out example in the file Noise.cl) showed visible repeatability when filling two-dimensional images, especially in the bits of the lower bit of the results. Readers can experiment with other (faster) functions of random number generators.
The default ParallelRNG function creates only 32-bit unsigned integers as results. If floating-point values ​​are required for a range such as (0.0, 1.0), then the application should apply a mapping for that range. The random sample matches the unsigned random integer result with a range (0, 255) to create pixel values ​​in shades of gray. To do this, simply use the binary AND operation to select 8 bits.
The default ParallelRNG function will not create all 4 294 967 296 (2
32 ) unsigned integer values ​​for successive calls based on the value created earlier. For each individual initial value, the magnitude of the pseudo-random sequences (cycles) can range from just 7,000 unique values ​​to approximately 2 billion values. ParallelRNG creates about 20 different loops. The author considers it unlikely that any operating element of the OpenCL core may require more sequentially generated random numbers than it is formed in the smallest cycle.
The two-dimensional and three-dimensional versions of this function — ParallelRNG2 and ParallelRNG3 — use "blending" of loops by applying a binary XOR operation between the result of a previous ParallelRNG call and the next input value, which changes the loop lengths. However, this changed behavior has not been analyzed in detail, so the reader is advised to carefully verify that the ParallelRNG functions meet the needs of the application.
Project structure
This section lists only the main elements of the sample application source code.
NoiseMain.cpp:
main ()
main input function. After parsing the command line parameters, it initializes OpenCL, builds the OpenCL program from the Noise.cl file, prepares one of the cores for launch, and calls
ExecuteNoiseKernel () , and then
ExecuteNoiseReference () . Checking that these two implementations produce the same results,
main () gives information about the operation time of each of these functions and stores the images resulting from their work.
ExecuteNoiseKernel ()
Configure and run the selected Noise kernel in OpenCL.
ExecuteNoiseReference ()
Set up and run selected Noise reference code in C.
Noise.cl:
defaut_perm [256]
Table of random values ​​0–255 for the core of three-dimensional Perlin noise. To further increase the randomness, this table can be formed and transferred to the Perlin noise core.
grads2d [16]
16 uniformly distributed unit vectors, gradients for the core two-dimensional Perlin noise.
grads3d [16]
16 vector gradients for the core Perlin noise.
ParallelRNG ()
Pseudo-random number generator, one pass for one input value. The alternate random number generator function is commented out, but added to the file in case readers want to test a function that works faster but produces worse results.
ParallelRNG2 ()
A random number generator making 2 passes for 2 input values.
ParallelRNG3 ()
A random number generator that makes 3 passes for 3 input values.
weight_poly3 (), weight_poly5 () and WEIGHT ()
These are alternative weight functions used by the Perlin noise algorithm to ensure continuity of gradients. The second (preferred) function also ensures the continuity of the second derivative. The WEIGHT macro selects which function to use.
NORM256 ()
A macro that converts a range (0, 255) to (-1.0, 1.0).
interp ()
Bilinear interpolation using OpenCL.
hash_grad_dot2 ()
Selects a gradient and calculates the scalar product with input values ​​XY as part of the Perlin noise function Noise_2d.
Noise_2d ()
Perlin noise generator with two input values.
hash_grad_dot3 ()
Selects a gradient and calculates the scalar product with input values ​​XYZ as part of the Perlin noise function Noise_3d.
Noise_3d ()
Perlin noise generator with three input values.
cloud ()
Creates one pixel of cloud output image for CloudTest using Noise_3d.
map256 ()
Converts the output range of Perlin's noise (-1.0, 1.0) to the range (0, 255) required to obtain pixels in shades of gray.
CloudTest ()
The test with the creation of the image of the cloud. The slice parameter is passed to the cloud function so that the system code can create other cloud images.
Noise2dTest ()
Noise_2d test is not used by default.
Noise3dTest ()
Test Noise_3d - Perlin noise functions by default. Uses map256 to get grayscale image pixel values.
RandomTest ()
The ParallelRNG3 test uses the low-order byte of the unsigned integer result to produce a grayscale image.
Two Microsoft Visual Studio solution files for Visual Studio versions 2012 and 2013 are provided. These are the Noise_2012.sln and Noise_2013.sln files. If the reader uses a newer version of Visual Studio, it should be possible to use the Visual Studio solution and project update feature to create a new solution based on these files.
In both solutions, it is assumed that
Intel® OpenCL Code Builder is installed on the system.
Sample management
You can run the sample from the Microsoft Windows * command line from the folder where the exe file is located.
Noise.exe <>
Options
-h --help
Displays help at the command prompt. Demonstrations will not start.
-t --type [ all | cpu | gpu | acc | default | < OpenCL>
Select the type of device for which the OpenCL kernel runs. The default value is all.
CL_DEVICE_TYPE_ALL | CL_DEVICE_TYPE_CPU | CL_DEVICE_TYPE_GPU |
CL_DEVICE_TYPE_ACCELERATOR | CL_DEVICE_TYPE_DEFAULT
<OpenCL device type constant>
-p --platform < >
The choice of the used platform. When you start the demo, a list of all numbers and platform names is displayed. To the right of the platform being used will be [Selected]. When using a string, specify enough letters to uniquely recognize the name of the platform. The default is Intel.
-d --device < >
Select the device on which OpenCL cores are running, by number or name. When you start the demonstration, a list of all the numbers and names of devices on your platform is displayed. [Selected] will be displayed to the right of the current device. The default value is 0.
-r --run [ random | perlin | clouds ]
Select the demo feature to run. The random number generator, the Perlin noise algorithm and the cloud image generator have demo cores. The default is random.
-s --seed < >
Integer input value, depending on which the result of the algorithm changes. The default value is 1.
Noise.exe displays the operating time of the OpenCL core and the reference equivalent C code, as well as the names of the output files of both algorithms. When the output is complete, the program waits for the user to press the ENTER key before exiting. Please note that the functions of the C reference code have not been optimized, their purpose is only to verify the correctness of the OpenCL core code.
Results analysis
After Noise.exe completes, view the BMP OutputOpenCL.bmp and OutputReference.bmp image files in the working folder to compare the results of the OpenCL and C ++ code. These two images should be the same, although very slight differences are possible between two images of Perlin noise or between two images of clouds.
The result of the noise algorithm (Perlin noise) should look like that shown in Figure 3.
Figure 3. The result of the Perlin noise algorithmThe result of the random algorithm (random number generator) should look like that shown in Figure 4.
Figure 4. The result of the random number generatorThe result of the cloud algorithm should look like that shown in Figure 5.
Figure 5. The result of the cloud creation algorithmReference materials
- K. Perlin (Perlin, K.) "Improving Noise"
- "Hashing 4-byte integers"
- M. Overton (Overton, MA), “Fast, High-Quality Parallel Random Number Generators,” Dr. Web site Dobb's (2011)
- Implementing and Using the Intel® Digital Random Number Generator (DRNG) Library
- Intel License Agreement to Use Source Code Samples
- Intel® OpenCL ™ Code Builder