Running R functions on multiple machines

As we showed in “A little introduction to parallel programming on R” , one of the advantages of R is the ease with which you can take advantage of parallel programming to speed up computations. In this article we will explain how to go from running functions on multiple processors or cores to running on multiple machines (with the goal of even greater scaling and acceleration).

R itself is not intended for parallel computing. It does not have many parallel designs available to the user. Fortunately, data processing tasks, for which we most often use R, are very well suited for parallel programming, and there are a number of excellent libraries that use it. Here are three main ways to take advantage of the parallelization provided by libraries:

Connect more powerful parallel libraries, for example, Intel BLAS (available under Linux, OS X and Windows as part of the Microsoft R Open distribution). This will replace the already used libraries with their parallel versions, thereby obtaining acceleration (on appropriate tasks, for example, related to linear algebra in lm()/glm() ).
')
Bring the processing of simulation tasks from R to an external library for parallelization . This is the strategy that the following systems use: rx methods from RevoScaleR (now Microsoft Open R) , h2o methods from h2o.ai , RHadoop .
Use the parallel utility in R to run functions on other instances of R. This strategy is from the “Small introduction to parallel programming on R” and a number of libraries based on parallel . In fact, this is an implementation of a remote procedure call via a socket or network.

Let us consider the third approach in more detail.

In fact, the third approach is a very finely detailed remote procedure call. It depends on the transfer of code and data copies to remote processes and the subsequent return of results. This is bad for very small tasks, but great for an acceptable number of medium or large ones. This strategy is used in the R parallel library and in the Python multiprocessing library (although with multiprocessing for Python, you may need a number of additional libraries to move from one machine to cluster computing).

This method may seem less effective and less complex than distributed memory methods, but relying on the transfer of an object, it is very easy to spread the technique from one machine to several (“cluster computing”). This is what we will do with the R code in this article (moving from one machine to a cluster will lead to many systems / network / security problems, and will have to be managed).

You will need all the R code from the previous article. It also assumes that you can configure ssh , or you have someone who can assist with the configuration. Instead of starting a parallel cluster with the “ parallelCluster <- parallel::makeCluster(parallel::detectCores()) ” parallelCluster <- parallel::makeCluster(parallel::detectCores()) , do the following.

Collect a list of addresses of machines to which you can apply ssh ssh This is the hard part, depends on the operating system and may require assistance if you have not done this before. In this example, I use IPv4 addresses, and for Amazon EC2 , host names.

In my case, the list is:

My car (main): “192.168.1.235”, user “johnmount”
Other machine Win-Vector LLC: “192.168.1.70”, user “johnmount”

Please note that we do not collect passwords, assuming that the correct "authorized_keys" and key pairs are installed in the ".ssh" configurations of all these machines. We will call the machine with which the calculation will be carried out as a whole, “primary”.

You should definitely try all these addresses with “ssh” in the terminal before using them in R. Also, the address of the machine selected as “primary” must be reachable from the working machines (i.e. you cannot use “localhost” or select an unreachable machine "Primary"). Try manually ssh between the primary and other machines, and in the opposite direction, after which the settings can be used in R.

When the system settings are over, the part on R will look like this. Start your cluster:

 primary <- '192.168.1.235' machineAddresses <- list( list(host=primary,user='johnmount', ncore=4), list(host='192.168.1.70',user='johnmount', ncore=4) ) spec <- lapply(machineAddresses, function(machine) { rep(list(list(host=machine$host, user=machine$user)), machine$ncore) }) spec <- unlist(spec,recursive=FALSE) parallelCluster <- parallel::makeCluster(type='PSOCK', master=primary, spec=spec) print(parallelCluster)

 ## socket cluster with 8 nodes on hosts ## '192.168.1.235', '192.168.1.70'

That's all. Now you can run your functions on multiple cores of multiple machines. For the right tasks, acceleration will be essential . It always makes sense to act in steps: first write a simple “hello world” on your cluster, then make sure that the smaller version of your calculations works locally, and only after that transfer the work to the cluster.

There is another way to work with clusters in R. First we need a version of R with pre-installed Rmpi and snow packages. For this purpose, I propose a build R HSPCC supported by Casper Daniel Hansen. Here are the installation instructions.

We will consider two more simple methods of parallel data processing in R. The first, using the multicore package, is limited to processors on a single node. And although this may seem to be a serious disadvantage, in fact it can be a strong advantage, since the interaction between processes will be several orders of magnitude faster. The second option is to use the snow package, which allows using MPI (data transfer interfaces between nodes) for computing the subcluster.

Using multiple cores inside a node

The multicore package is very effective in speeding up simple computations by using more than one processor core at a time. It is very easy to use and quick to implement.

Start the process on a cluster node with multiple processors by typing the following command:

 qrsh -l mcmc -pe local 10-12

Notice, here we are requesting 10-12 processors on one node in the mcmc queue. The number of available cores can be viewed in the NSLOTS environment variable available in R with the following command:

 as.integer(Sys.getenv("NSLOTS"))

The next step is to run R and load the multicore library. Finally, you can continue using the mclapply command instead of lapply in R (passing the number of kernels used as an argument to mc.cores).

Using snow between multiple nodes

Below is a simple example of how to raise an MPI cluster in R and how to use the cluster for simple calculations.

First of all, I would recommend setting access without a password to the cluster nodes before continuing. This will save some time (however, it is not absolutely necessary).

To run an MPI cluster with 12 nodes (cores), you need to type this:

 qrsh -V -l cegs -pe orte 12 /opt/openmpi/bin/mpirun -np 12 ~hcorrada/bioconductor/Rmpi/bstRMPISNOW

This command should run 12 instances of R, one of which will be primary. Notice that the starting process was in the cegs queue. Then you can use the nodes installed with mpirun by typing in R such:

 cl <- getMPIcluster()

Then you can view and use the snow commands you need. For example,

 clusterCall(cl, function() Sys.info()[c("nodename","machine")])

will display a list of nodes in your cluster (more precisely, 11 running work nodes). Here is a list of available commands.

When calculations on the cluster are complete, close the nodes with the following command:

 stopCluster(cl)

Warning: make sure that you have not canceled the R processes by pressing the Ctrl + C key combination. This can cause problems with snow.

Example: comparison of sequential and parallel processing

 > f.long<-function(n) { + xx<-rnorm(n) + log(abs(xx))+xx^2 + } # multicore ############ > system.time(mclapply(rep(5E6,11),f.long,mc.cores=11))

  user system elapsed 26.271 3.514 5.516

 # snow  MPI ############ > system.time(sapply(rep(5E6,11),f.long))

  user system elapsed 17.975 1.325 19.303

 > system.time(parSapply(cl,rep(5E6,11),f.long))

  user system elapsed 4.224 4.113 8.338

Please note that parallel processing with snow gives an improvement of more than 50% of the computation time. Although it can be assumed that the improvement should be 10/11 = 91%, it is important to remember that the processors are not necessarily located on the same node, and the interaction between the nodes can be rather slow. This interaction can be so slow that a procedure in multicore without it can give a 40% improvement in computation time.

Accordingly, it makes sense to remember that the advantages in computation time from using more than one cluster node can largely be leveled by the time it takes for interaction between the nodes. Of course, it depends on the performed calculations and the way they are implemented.

Source: https://habr.com/ru/post/309052/

All Articles

Running R functions on multiple machines

Using multiple cores inside a node

Using snow between multiple nodes

Example: comparison of sequential and parallel processing

More articles: