📜 ⬆️ ⬇️

Parallel bubble sorting on CUDA

Hi, Habr. I thought someone could use parallel sorting with a relatively simple implementation and high performance on the CUDA platform. This is the bubble sorting method. Under the cat, an explanation is given and a code that may be useful (or maybe not ...). At once I will say that the presented program is a benchmark in comparison of performance on the GPU and CPU. If you do not mind, reader, then compile it, please, and put the calculation results in the comments of this article. This is not for science. Just interesting =)


I will remind a little


The essence of the bubble sorting method is pair-wise comparison of elements and replacing them with places if a certain condition is met, depending on the sorting direction. The following shows the iterations of the algorithm in single-threaded mode, when it sorts an increasing array of integers in descending order (each line is the state of the array at the moment) - this is Table 1.

Table 2 shows the iterations of bubble sorting in multithreaded mode, which is extremely simple: while one element moves to the side, after 2 movements the next element can also be sorted by the bubble method and starts moving, and after 2 more one and so on Thus, the entire array starts to “float up”, which significantly reduces the time for calculation.

Profit


Already by the number of expended actions, you can see that parallel sorting is faster. But this is not always: with small sizes of arrays, single-threaded mode runs faster due to the lack of problems with data transfer. I spent here a little test calculations, and here is their result:
image

The abscissa axis is the number of elements, the ordinate is the execution time in seconds.


Implementation


I present the implementation code in CUDA , which contains the code that performs sorting on DEPLOYMENT on CUDA and CPU (in single-threaded mode). In fact, this is a benchmark for your company on how smart sorting works on the video card and on the process. Part of this program is the sorting function on the video card on the CUDA platform, which you can use to your health in your masterpieces by elementary copy'n'past. The code is presented below the tables.

How to compile it ???


Well, for this you need to download drivers for CUDA from the Nvidia website, install them, download the Nvidia Toolkit, install and the Nvidia SDK, also install. Fortunately, now everything is already in one package. How to install CUDA I will write if there is a demand, and so just wish good luck if you are a beginner.
When everything is ready and nvidia CUDA works on your computer, then it is enough just to compile the main.cu file (or whatever you called it) and run the program, get the result and put it here.
Wat and all on now, thanks for reading. See you later.
PS I would be super grateful if you tell me how to speed up the algorithm =)

Table 1: Bubble Sorting in Single-threaded Mode


0one23fourfive67eight9teneleven121314
one023fourfive67eight9teneleven121314
one203fourfive67eight9teneleven121314
one230fourfive67eight9teneleven121314
one23four0five67eight9teneleven121314
one23fourfive067eight9teneleven121314
one23fourfive607eight9teneleven121314
one23fourfive670eight9teneleven121314
one23fourfive67eight09teneleven121314
one23fourfive67eight90teneleven121314
one23fourfive67eight9ten0eleven121314
one23fourfive67eight9teneleven0121314
one23fourfive67eight9teneleven1201314
one23fourfive67eight9teneleven12130
one23fourfive67eight9teneleven1213140
one23fourfive67eight9teneleven121314
2one3fourfive67eight9teneleven1213140
23onefourfive67eight9teneleven1213140
23fouronefive67eight9teneleven1213140
23fourfiveone67eight9teneleven1213140
23fourfive6one7eight9teneleven1213140
23fourfive67oneeight9teneleven1213140
23fourfive67eightone9teneleven1213140
23fourfive67eight9oneteneleven1213140
23fourfive67eight9tenoneeleven1213140
23fourfive67eight9tenelevenone1213140
23fourfive67eight9teneleven12one13140
23fourfive67eight9teneleven1213one140
23fourfive67eight9teneleven121314one
23fourfive67eight9teneleven121314
32fourfive67eight9teneleven121314one0
3four2five67eight9teneleven121314one0
3fourfive267eight9teneleven121314one0
3fourfive627eight9teneleven121314one0
3fourfive672eight9teneleven121314one0
3fourfive67eight29teneleven121314one0
3fourfive67eight92teneleven121314one0
3fourfive67eight9ten2eleven121314one0
3fourfive67eight9teneleven2121314one0
3fourfive67eight9teneleven1221314one0
3fourfive67eight9teneleven1213214one0
3fourfive67eight9teneleven1213142one0
3fourfive67eight9teneleven121314
four3five67eight9teneleven1213142one0
fourfive367eight9teneleven1213142one0
fourfive637eight9teneleven1213142one0
fourfive673eight9teneleven1213142one0
fourfive67eight39teneleven1213142one0
fourfive67eight93teneleven1213142one0
fourfive67eight9ten3eleven1213142one0
fourfive67eight9teneleven31213142one0
fourfive67eight9teneleven12313142one0
fourfive67eight9teneleven12133142one0
fourfive67eight9teneleven12131432one0
fourfive67eight9teneleven121314one0
fivefour67eight9teneleven12131432one0
five6four7eight9teneleven12131432one0
five67foureight9teneleven12131432one0
five67eightfour9teneleven12131432one0
five67eight9fourteneleven12131432one0
five67eight9tenfoureleven12131432one0
five67eight9tenelevenfour12131432one0
five67eight9teneleven12four131432one0
five67eight9teneleven1213four1432one0
five67eight9teneleven121314four32one0
five67eight9teneleven1213142one0
6five7eight9teneleven121314four32one0
67fiveeight9teneleven121314four32one0
67eightfive9teneleven121314four32one0
67eight9fiveteneleven121314four32one0
67eight9tenfiveeleven121314four32one0
67eight9tenelevenfive121314four32one0
67eight9teneleven12five1314four32one0
67eight9teneleven1213five14four32one0
67eight9teneleven121314fivefour32one0
67eight9teneleven12131432one0
76eight9teneleven121314fivefour32one0
7eight69teneleven121314fivefour32one0
7eight96teneleven121314fivefour32one0
7eight9ten6eleven121314fivefour32one0
7eight9teneleven6121314fivefour32one0
7eight9teneleven1261314fivefour32one0
7eight9teneleven1213614fivefour32one0
7eight9teneleven1213146fivefour32one0
7eight9teneleven121314four32one0
eight79teneleven1213146fivefour32one0
eight97teneleven1213146fivefour32one0
eight9ten7eleven1213146fivefour32one0
eight9teneleven71213146fivefour32one0
eight9teneleven12713146fivefour32one0
eight9teneleven12137146fivefour32one0
eight9teneleven12131476fivefour32one0
eight9teneleven121314fivefour32one0
9eightteneleven12131476fivefour32one0
9teneighteleven12131476fivefour32one0
9teneleveneight12131476fivefour32one0
9teneleven12eight131476fivefour32one0
9teneleven1213eight1476fivefour32one0
9teneleven121314eight76fivefour32one0
9teneleven1213146fivefour32one0
ten9eleven121314eight76fivefour32one0
teneleven9121314eight76fivefour32one0
teneleven1291314eight76fivefour32one0
teneleven1213914eight76fivefour32one0
teneleven1213149eight76fivefour32one0
teneleven12131476fivefour32one0
eleventen1213149eight76fivefour32one0
eleven12ten13149eight76fivefour32one0
eleven1213ten149eight76fivefour32one0
eleven121314ten9eight76fivefour32one0
eleven121314eight76fivefour32one0
12eleven1314ten9eight76fivefour32one0
1213eleven14ten9eight76fivefour32one0
121314eleventen9eight76fivefour32one0
1213149eight76fivefour32one0
131214eleventen9eight76fivefour32one0
131412eleventen9eight76fivefour32one0
1314ten9eight76fivefour32one0
141312eleventen9eight76fivefour32one0

')

Table 2: Bubble Sorting in Multi-threaded Mode


0one23fourfive67eight9teneleven121314
0one23fourfive67eight9teneleven121314
0one23fourfive67eight9teneleven121314
one023fourfive67eight9teneleven121314
one203fourfive67eight9teneleven121314
2one30fourfive67eight9teneleven121314
23onefour0five67eight9teneleven121314
32fouronefive067eight9teneleven121314
3four2fiveone607eight9teneleven121314
four3five26one70eight9teneleven121314
fourfive3627oneeight09teneleven121314
fivefour6372eightone90teneleven121314
five6four73eight29oneten0eleven121314
6five7foureight392tenoneeleven0121314
67fiveeightfour93ten2elevenone1201314
76eightfive9fourten3eleven212one13014
7eight69fivetenfoureleven312213one140
eight796tenfiveelevenfour12313214one0
eight97ten6elevenfive12four133142one0
9eightten7eleven612five13four1432one0
9teneighteleven712613five14four32one0
ten9eleveneight12713614fivefour32one0
teneleven912eight137146fivefour32one0
eleventen12913eight1476fivefour32one0
eleven12ten13914eight76fivefour32one0
12eleven13ten149eight76fivefour32one0
1213eleven14ten9eight76fivefour32one0
131214eleventen9eight76fivefour32one0
131412eleventen9eight76fivefour32one0
141312eleventen9eight76fivefour32one0
141312eleventen9eight76fivefour32one0



Code:


#include <stdio.h> #include <time.h> __global__ void bubbleMove(int *array_device, int N, int step){ int idx = blockDim.x * blockIdx.x + threadIdx.x; if (idx<(N-1)) { if (step-2>=idx){ if (array_device[idx]<array_device[idx+1]){ int buffer = array_device[idx]; array_device[idx]=array_device[idx+1]; array_device[idx+1] = buffer; } } } } void bubleSortCUDA(int *array_host,int N, int blockSize){ int *array_device; cudaMalloc((void **)&array_device, N * sizeof(int)); for (int i = 0; i < N; i++) array_host[i]=i; cudaMemcpy(array_device,array_host,N*sizeof(int),cudaMemcpyHostToDevice); int nblocks = N / blockSize+1; for (int step = 0; step <= N+N; step++) { bubbleMove<<<nblocks,blockSize>>>(array_device,N,step); cudaThreadSynchronize(); } cudaMemcpy(array_host,array_device,N*sizeof(int),cudaMemcpyDeviceToHost); cudaFree(array_device); } void bubleSortCPU(int *array_host, int N){ for (int i = 0; i < N; i++){ for (int j = 0; j < Ni-1; j++) { if (array_host[j]<array_host[j+1]){ int buffer = array_host[j]; array_host[j]=array_host[j+1]; array_host[j+1] = buffer; } } } } int checkArray(int *array_host, int N){ int good=1; for (int i = 0; i < N-1; i++) if (array_host[i]<array_host[i+1]) {good=0; printf("i=%da=%d\n", i,array_host[i] );} return good; } float measureCUDA(int N,int blockSize){ int *array_host = (int *)malloc(N * sizeof(int)); for (int i = 0; i < N; i++) array_host[i]=i; clock_t start = clock(); bubleSortCUDA(array_host,N,blockSize); clock_t end = clock(); if (checkArray(array_host,N)==1){ free(array_host); return (float)(end - start) / CLOCKS_PER_SEC; } else { free(array_host); return -1; } } float measureCPU(int N) { int *array_host = (int *)malloc(N * sizeof(int)); for (int i = 0; i < N; i++) array_host[i]=i; clock_t start = clock(); bubleSortCPU(array_host,N); clock_t end = clock(); if (checkArray(array_host,N)==1){ free(array_host); return (float)(end - start) / CLOCKS_PER_SEC; } else { free(array_host); return -1; } } int main(int argc, char const *argv[]) { for (int i = 1; i < 10000000; i*=2) printf("%d %f\t%f\n",i, measureCUDA(i,256),measureCPU (i )); return 0; } 

Source: https://habr.com/ru/post/235391/


All Articles