Sometimes there is a need to speed up the calculations, and preferably at once at times. At the same time, it is necessary to abandon convenient, but slow tools and resort to something lower and faster. R has quite advanced features for working with dynamic libraries written in C / C ++, Fortran or even Java. Out of habit, I prefer C / C ++.Rh and Rmath.h (if mathematical functions R are used)
#include <Rh> void iprod(double *v1, double *v2, int *n, double *s) { double ret = 0; int len = *n; for (int i = 0; i < len; i++) { ret += v1[i] * v2[i]; } *s = ret; } R CMD SHLIB inner_prod.c dyn.load() function. To call the function itself on C, .C() use .C() (there is also .Call() and. .External() , but with slightly different functionality, and hot arguments are sometimes between supporters of .C () and .Call () ). I will only note that when writing C code to call through .C() it turns out to be cleaner and more readable. Special attention should be paid to the correspondence of the types of variables in R and C (in the documentation on the function .C() this is written in detail). Wrapper function on R: iprod <- function(a, b) { if (!is.loaded('iprod')) { dyn.load("inner_prod.so") } n <- length(a) v <- 0 return(.C("iprod", as.double(a), as.double(b), as.integer(n), as.double(v))[[4]]) } > n <- 1e7; a <- rnorm(n); b <- rnorm(n); > iprod(a, b) [1] 3482.183 > sum(a * b) [1] 3482.183 nvidia-cuda-toolkit package. CUDA, of course, deserves a separate huge topic, and since my level in this topic is “beginner”, I will not frighten people with my curves and unfinished code, but allow myself to copy a few lines from the manual.
Thrust , which allows us to abstract from low-level CUDA / C operations. The data are presented in the form of vectors, to which some standard algorithms are applied (elementwise operations, reductions, prefix-sums, sorting). #include <thrust/transform_reduce.h> #include <thrust/device_vector.h> #include <cmath> // , 6 GPU (device) template <typename T> struct power{ __device__ T operator()(const T& x) const{ return std::pow(x, 6); } }; extern "C" void nrm(double *v, int *n, double *vnorm) { // , GPU, *v thrust::device_vector<double> dv(v, v + *n); // Reduce- , .. power // . *vnorm = std::sqrt( thrust::transform_reduce(dv.begin(), dv.end(), power<double>(), 0.0, thrust::plus<double>()) ); } extern "C" necessary here, otherwise R will not see the function nrm (). To compile the code we will now use nvcc. Remember the output of the command R CMD SHLIB... ? Here we adapt it a bit so that a library using CUDA / Thrust can be called from R without any problems: nvcc -g -G -O2 -arch sm_30 -I/usr/share/R/include -Xcompiler "-Wall -fpic" -c thr.cu thr.o nvcc -shared -lm thr.o -o thr.so -L/usr/lib/R/lib -lR gpunrm <- function(v) { if (!is.loaded('nrm')) dyn.load("thr.so") n <- length(v) vnorm <- 0 return(.C("nrm", as.double(v), as.integer(n), as.double(vnorm))[[3]]) } 
gpu_time <- c() cpu_time <- c() n <- seq.int(1e4, 1e8, length.out=30) for (i in n) { v <- rnorm(i, 1000) gpu_time <- c(gpu_time, system.time({p1 <- gpunrm(v)})[1]) cpu_time <- c(cpu_time, system.time({p2 <- sqrt(sum(v^6))})[1]) } gputools ) adapted for working with the GPU ( here you can read more about this).Source: https://habr.com/ru/post/220927/
All Articles