Introducing OpenGL Interoperability

Good day to all

I hope, when reading this block in my reader, my picture did not scare you. But today, I want to describe the use of CUDA technology interaction with OpenGL using the example of my small pet-example, the first version of which I described in an article earlier. Those who are interested in the section known under the English name CUDA and OpenGL interoperability, Please click here.

All articles in Google talk about the theory as follows: according to the classics, before rendering a frame, the program forms the logic and initialization of the scene in the CPU, and then the GPU is engaged in rendering it. Now imagine that you are initializing the same scene in a multithreaded environment. For example, you can generate an array of points in a three-dimensional coordinate system using CUDA, or even draw a picture yourself in your core (kernel), and using OpenGL simply display the result on the screen.
')
That's what I'm going to do on a small example. Oh yeah, if you want to read more words about the theory than I have described here, you can run your eyes over here . In my task, I want to calculate the value of the electric field strength at each point of the current frame and compare it with some color. And the faster this calculation will take place, the more frames per second I can generate.

When you start to create a project template, it is very important not to miss a number of points:

For some functions necessary for binding to a buffer, GLEW is necessary, then its header should be connected above the freeglut header file.
```
#include <GL/glew.h> #include <GL/freeglut.h> ... #include <cuda_gl_interop.h> 
```
ps if that, the compiler will warn you.
When you declare the necessary variables pointing to the CUDA resource and the video buffer, then accidentally do not clear them somewhere in the middle of the code, thinking that they are not needed in the current run. I racked my brains for a long time trying to understand why I get a segfault error:
```
 GLuint vbo; struct cudaGraphicsResource *cuda_vbo_resource; 
```

Now we can use these variables in the functions responsible for associating the frame buffer with CUDA:

 void createVBO(GLuint *vbo, struct cudaGraphicsResource **vbo_res, unsigned int vbo_res_flags) { unsigned int size = width * height * sizeof(uchar4); glGenBuffers(1, vbo); glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, *vbo); glBufferData(GL_PIXEL_UNPACK_BUFFER_ARB, size, NULL, GL_DYNAMIC_DRAW); HANDLE_ERROR( cudaGraphicsGLRegisterBuffer(vbo_res, *vbo, vbo_res_flags) ); } void deleteVBO(GLuint *vbo, struct cudaGraphicsResource *vbo_res) { HANDLE_ERROR( cudaGraphicsUnregisterResource(cuda_vbo_resource) ); glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, *vbo); glDeleteBuffers(1, vbo); *vbo = 0; }

The basic idea is that later, in code, you can get a pointer to the created buffer as an array of pixels, and then change it in the kernel. Unfortunately, I’m not very well versed in OpenGL, and then the moment is not clear to me why often before deleting, call the function glBindBuffer again, and pass 1 as the first argument. If I understand correctly, then if I use only one buffer in my application, I can immediately clear it after completion.

Auxiliary functions are created, now we can use them to create and release a CUDA resource (and bind the pixel buffer):

 HANDLE_ERROR( cudaGLSetGLDevice(deviceId) ); ... createVBO(&vbo, &cuda_vbo_resource, cudaGraphicsMapFlagsWriteDiscard); ... deleteVBO(&vbo, cuda_vbo_resource);

It seems to be all! Now we can handle the idle event at the application, where we are engaged in the formation of the frame, and then just draw it. It should be recalled that we draw a frame immediately in the video card buffer, and then just ask it to display it.

 void idle(void) { uchar4* dev_screen; size_t size; HANDLE_ERROR( cudaGraphicsMapResources(1, &cuda_vbo_resource, 0) ); HANDLE_ERROR( cudaGraphicsResourceGetMappedPointer((void**) &dev_screen, &size, cuda_vbo_resource) ); // Render Image renderFrame<<<blocks, threads>>>(dev_screen); HANDLE_ERROR( cudaDeviceSynchronize() ); HANDLE_ERROR( cudaGraphicsUnmapResources(1, &cuda_vbo_resource, 0) ); glutPostRedisplay(); }

ps From the code, I threw out the area responsible for calculating the frame creation time and displaying it in the application header.
The draw function is handled by the draw function, which is called by freeglut automatically or on demand by glutPostRedisplay :

 void draw(void) { glClearColor(0.0, 0.0, 0.0, 1.0); glClear(GL_COLOR_BUFFER_BIT); glDrawPixels(width, height, GL_RGBA, GL_UNSIGNED_BYTE, 0); glutSwapBuffers(); }

Please note that I do not specify an array of pixels that store the image, because the frame buffer is already changed and you just need to tell OpenGL to reflect the changes.

When I moved away from using simple structures representing charge, in the direction of float4, I could make the execution of the code faster. So, for example, a 1024 * 768 frame with 10 charges is now created in 25-30 ms. And this allowed me to create a smooth animation using the mouse - drag & drop:
Well, as always, ready-made code can be found here . And if you have a question or feedback, write to me. I will try to improve.

Afterword

In the future I plan to make a mode showing the field strength in the form of sawdust thrown on paper. The idea is to draw short sections at regular intervals. Well, if I'm really lucky, I'll try to implement it in 3D :)

Source: https://habr.com/ru/post/170461/

All Articles

Introducing OpenGL Interoperability

Afterword

More articles: