📜 ⬆️ ⬇️

CUDA: Start


This is the first publication of a series of articles on the use of GPGPU and nVidia CUDA. I plan to write is not very volume, so as not to tire the readers too much, but often enough.

I assume that the reader is aware of what CUDA is, if not, then the introductory article can be found on Habré.

What is required for work:


1. Video card from nVidia GeForce 8xxx / 9xxx or more modern
2. CUDA Toolkit v.2.1 (you can download it here: www.nvidia.ru/object/cuda_get.html )
3. CUDA SDK v.2.1 (you can download it in the same place where the Toolkit is)
4. Visual Studio 2008
5. CUDA Visual Studio Wizard (download available here: sourceforge.net/projects/cudavswizard )

Creating a CUDA project:


After installing everything necessary in VS, a new type of project for C ++ with the name CU-DA WinApp will appear, this is exactly what we need. In this type of project, additional configurations are available for CUDA, allowing you to customize compilation settings for the GPU, for example, the version of Compute Capability depending on the type of GPU, etc.
I usually create an Empty Project, since Precompiled Headers are hardly useful for CUDA.
It is important to note how the CUDA application is going. Files with the * .cpp extension are processed by the MS C ++ compiler (cl.exe), and files with the * .cu extension by the CUDA compiler (nvcc.exe), which in turn determines which code will work on the GPU and which on the CPU. The code from * .cu running on a CPU is transferred to MS C ++ compilation, this feature is useful for writing dynamic libraries that will export functions that use GPU for calculations.
Next, I’m listing a simple program on CUDA, which displays information about the hardware capabilities of the GPU.
')
Listing. CudaInfo program.

//FileName: cudaInfo.cu

#include <stdio.h>
#include <cuda_runtime_api.h>

int main()
{
int deviceCount;
cudaDeviceProp deviceProp;

// CUDA PC.
cudaGetDeviceCount(&deviceCount);

printf( "Device count: %d\n\n" , deviceCount);

for ( int i = 0; i < deviceCount; i++)
{
//
cudaGetDeviceProperties(&deviceProp, i);

//
printf( "Device name: %s\n" , deviceProp.name);
printf( "Total global memory: %d\n" , deviceProp.totalGlobalMem);
printf( "Shared memory per block: %d\n" , deviceProp.sharedMemPerBlock);
printf( "Registers per block: %d\n" , deviceProp.regsPerBlock);
printf( "Warp size: %d\n" , deviceProp.warpSize);
printf( "Memory pitch: %d\n" , deviceProp.memPitch);
printf( "Max threads per block: %d\n" , deviceProp.maxThreadsPerBlock);

printf( "Max threads dimensions: x = %d, y = %d, z = %d\n" ,
deviceProp.maxThreadsDim[0],
deviceProp.maxThreadsDim[1],
deviceProp.maxThreadsDim[2]);

printf( "Max grid size: x = %d, y = %d, z = %d\n" ,
deviceProp.maxGridSize[0],
deviceProp.maxGridSize[1],
deviceProp.maxGridSize[2]);

printf( "Clock rate: %d\n" , deviceProp.clockRate);
printf( "Total constant memory: %d\n" , deviceProp.totalConstMem);
printf( "Compute capability: %d.%d\n" , deviceProp.major, deviceProp.minor);
printf( "Texture alignment: %d\n" , deviceProp.textureAlignment);
printf( "Device overlap: %d\n" , deviceProp.deviceOverlap);
printf( "Multiprocessor count: %d\n" , deviceProp.multiProcessorCount);

printf( "Kernel execution timeout enabled: %s\n" ,
deviceProp.kernelExecTimeoutEnabled ? "true" : "false" );
}

return 0;
}

* This source code was highlighted with Source Code Highlighter .


In the program I connect the library “cuda_runtime_api.h”. Although it is not necessary to do this, it will automatically turn on, but without it IntelliSence will not work (although it still periodically cuts all the same).

Conclusion


I think that this is the easiest way to write CUDA programs, since it takes a minimum of effort to configure and set up the environment, the only problem is using IntelliSence.
Next time, the use of CUDA for mathematical calculations and the issues of working with a video card memory will be considered.

PS Ask questions.

Source: https://habr.com/ru/post/54330/


All Articles