We write on Rust + CUDA C

Hello!

In this guide I want to tell how to make friends CUDA C / C ++ and Rust. And as an example, we write a small program on Rust to calculate the scalar product of vectors, the calculation of the scalar product will be performed on the GPU using CUDA C.

Who cares under the cat!

CUDA C

The first step is to install the CUDA compiler - nvcc. What is CUDA and why I should not describe it, you can read about it for example here . Let me just say that with its help, you can write code that will run on NVIDIA video cards (hereinafter referred to as GPU) and use all their power for parallel computing and graphics processing. Once again, this tutorial is not about how to write code on CUDA, but about how to use its advantages and write parallel computations on the GPU as well as from the code on Rust.
')
So install the nvcc and CUDA Toolkit. With this difficulty, detailed instructions should not arise: on the off site .

RUST + CUDA C

In the course of this tutorial, as mentioned earlier, we will write a program on Rust to find the scalar product of two vectors, the calculation process itself will occur on the GPU.

Scalar product of two vectors.

Suppose we have two vectors:

a = [a_{1}, a_{2}, . . . a_{n}]

$a = [a_1, a_2, ... a_n]$ and

b = [b_{1}, b_{2}, . . ., b_{n}]

$b = [b_1, b_2, ..., b_n]$ The dot product of these vectors is:

a c d o t b = s u m_{i = 1}^{n} a_{i} b_{i}

$a \ cdot b = \ sum_ {i = 1} ^ {n} {a_ib_i}$

Let's start creating our program. Further, I assume that nvcc is successfully installed, rustc and cargo are also installed to compile rust code.

First of all, create a project folder. In the project folder, create the Cargo.toml file, which contains instructions for the cargo collector. The file looks like this:

[package] name = "rust_cuda" #   version = "0.1.0" #   authors = ["MoonL1ght <ixav1@icloud.com>"] #    build = "build.rs" #    rust links = "cudart" #  cuda,    [dependencies] libc = "0.2" #  rust     rand = "0.5.5" #  rust      [build-dependencies] cc = "1.0" # rust

Also in the root folder of the project create a file build.rs which will contain instructions for building a program for rust and compiling CUDA C code.

Add the src folder to the root of the project in which we place the files with the source code. Create four files in the src folder: main.rs is the code of the main program, dot.cpp is C ++ binding (a wrapper for CUDA C), dot_gpu.h, dot_gpu.cu is a file containing the code executed on the GPU.

Total we have the following project structure:

 rust-cuda/ src/ main.rs dot.cpp dot_gpu.h dot_gpu.cu Cargo.toml build.rs

The most important thing in the build.rs file is to register it:

 println!("cargo:rustc-link-search=native=/Developer/NVIDIA/CUDA-10.1/lib"); println!("cargo:rustc-link-search=/Developer/NVIDIA/CUDA-10.1/lib"); println!("cargo:rustc-env=LD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-10.1/lib"); println!("cargo:rustc-link-lib=dylib=cudart");

where /Developer/NVIDIA/CUDA-10.1/lib is the path to the CUDA executable files, in unix-like system this path can be found for example with the command:

 which nvcc

In addition, in the build.rs file you need to specify the path to the dot.cpp and dot_gpu.cpp files:

 .files(&["./src/dot.cpp", "./src/dot_gpu.cu"])

All code of build.rs file

 extern crate cc; fn main() { cc::Build::new() .cuda(true) .cpp(true) .flag("-cudart=shared") .files(&["./src/dot.cpp", "./src/dot_gpu.cu"]) .compile("dot.a"); println!("cargo:rustc-link-search=native=/Developer/NVIDIA/CUDA-10.1/lib"); println!("cargo:rustc-link-search=/Developer/NVIDIA/CUDA-10.1/lib"); println!("cargo:rustc-env=LD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-10.1/lib"); println!("cargo:rustc-link-lib=dylib=cudart"); }

Now you can start writing the main program code. In the main.rs file, you need to create an interface with C / C ++ functions to call directly from the code in Rust. More information about this can be found in the official documentation in the FFI section.

 extern "C" { //  C        fn dot(v1: *mut c_float, v2: *mut c_float, N: size_t) -> c_float; }

To call it, you need to use unsafe code block, passing mutable pointer to type Vec as arguments:

 unsafe { gpu_res = dot(v1.as_mut_ptr(), v2.as_mut_ptr(), VEC_SIZE); }

The full code of the main.rs file

 extern crate libc; extern crate rand; use libc::{c_float, size_t}; use rand::Rng; const VEC_SIZE: usize = 10; const MAX: f32 = 10.; const MIN: f32 = 0.; extern "C" { fn dot(v1: *mut c_float, v2: *mut c_float, N: size_t) -> c_float; } fn cpu_dot(v1: Vec<f32>, v2: Vec<f32>) -> f32 { let mut res: f32 = 0.; for i in 0..v1.len() { res += v1[i] * v2[i]; } return res; } fn main() { let mut v1: Vec<f32> = Vec::new(); let mut v2: Vec<f32> = Vec::new(); let mut gpu_res: c_float; let mut cpu_res: f32 = 0.; let mut rng = rand::thread_rng(); for _ in 0..VEC_SIZE { v1.push(rng.gen_range(MIN, MAX)); v2.push(rng.gen_range(MIN, MAX)); } println!("{:?}", v1); println!("{:?}", v2); println!("GPU computing started"); unsafe { gpu_res = dot(v1.as_mut_ptr(), v2.as_mut_ptr(), VEC_SIZE); } println!("GPU computing finished"); println!("GPU dot product result: {}", gpu_res); cpu_res = cpu_dot(v1, v2); println!("CPU dot product result: {}", cpu_res); }

Now we start writing the binding in C ++, as well as the code for calculating the scalar product of vectors in CUDA C.

In the dot.cpp file, we will write a function binding, and we actually call this function from the Rust code:

 extern "C" { float dot(float *v1, float *v2, size_t N) { float *gpu_res; float res = 0.0; gpu_res = gpu_dot(v1, v2, N); //   GPU for (int i = 0; i < blocksPerGrid; i++) { res += gpu_res[i]; } free(gpu_res); return res; } }

Full dot.cpp file code

 #include <iostream> #include "dot_gpu.h" using namespace std; void display_vector(float *v, size_t N) { cout << "["; for (size_t i = 0; i < N; i++) { cout << v[i]; if (i != N - 1) { cout << ", "; } } cout << "]" << endl; } extern "C" { float dot(float *v1, float *v2, size_t N) { cout << "Calling gpu dot product" << endl; cout << "Got two vectors from rust:" << endl; display_vector(v1, N); display_vector(v2, N); float *gpu_res; float res = 0.0; gpu_res = gpu_dot(v1, v2, N); for (int i = 0; i < blocksPerGrid; i++) { res += gpu_res[i]; } free(gpu_res); return res; } }

The following is the code from the dot_gpu.cu file in which the main calculation is made, I will not explain the code itself in this tutorial, since it is not dedicated to programming in CUDA.

dot_gpu.cu

 #include "dot_gpu.h" __global__ void dot__(float *v1, float *v2, float *res, int N) { __shared__ float cache [threadsPerBlock]; int tid = threadIdx.x + blockIdx.x * blockDim.x; int cacheIndex = threadIdx.x; float temp = 0.0; while (tid < N) { temp += v1[tid] * v2[tid]; tid += blockDim.x * gridDim.x; } cache[cacheIndex] = temp; __syncthreads(); int i = blockDim.x / 2; while (i != 0) { if (cacheIndex < i) { cache[cacheIndex] += cache[cacheIndex + i]; } __syncthreads(); i /= 2; } if (cacheIndex == 0) { res[blockIdx.x] = cache[0]; } } float * gpu_dot (float *v1, float *v2, size_t N) { float *dev_v1, *dev_v2, *dev_res, *res; res = new float[blocksPerGrid]; cudaMalloc((void**)&dev_v1, N * sizeof(float)); cudaMalloc((void**)&dev_v2, N * sizeof(float)); cudaMalloc((void**)&dev_res, blocksPerGrid * sizeof(float)); cudaMemcpy(dev_v1, v1, N * sizeof(float), cudaMemcpyHostToDevice); cudaMemcpy(dev_v2, v2, N * sizeof(float), cudaMemcpyHostToDevice); dot__<<<blocksPerGrid, threadsPerBlock>>>(dev_v1, dev_v2, dev_res, (int)N); cudaMemcpy(res, dev_res, blocksPerGrid * sizeof(float), cudaMemcpyDeviceToHost); cudaFree(dev_v1); cudaFree(dev_v2); cudaFree(dev_res); return res; }

All our little program is written and ready for assembly. In order to collect it in the console, call the command:

 cargo build

For start:

 cargo run

After assembling the program in the main directory of the project folder will appear The executable file of our program will be located in the folder: ./target/debug/

Moreover, if we just run our executable file, we get an error: dyld library not loaded. That is, it cannot find the path to the cuda dynamic library. To solve this problem, you can set the environment variable LD_LIBRARY_PATH = path_to_CUDA_lib_directory / before running the executable file or create symbolic links in the rust toolchain folder for CUDA:

 ln -s /Developer/NVIDIA/CUDA-10.1/lib/* /Users/Alexander/.rustup/toolchains/nightly-x86_64-apple-darwin/lib

where /Users/Alexander/.rustup/toolchains/nightly-x86_64-apple-darwin/lib is my way to the installed rust toolchain, it may be slightly different for you.

When launching the program through cargo run, such an error did not occur, because we registered the environment variable LD_LIBRARY_PATH in the file build.rs.

Eventually

We have the ability to run CUDA C code straight from the Rust code. In order to check this, we created a small program, it works with vectors and performs all calculations on the GPU. The full code can also be viewed on github .

Source: https://habr.com/ru/post/447968/

All Articles

We write on Rust + CUDA C

CUDA C

RUST + CUDA C

Eventually

More articles: