⬆️ ⬇️

Java for HPC. Calculation of the scalar product of vectors

Hello,



This post is a continuation of the first post on the topic.





This post is a brief extract from the article “Java for High Performance Computing”, which will be presented by me at the university conference of Tomsk Polytechnic.

')

The scalar product of vectors is the sum of all products of the corresponding elements of the vectors.



To solve the problem, two programs were written - in C (not by me :-) and in Java.



Both programs were tested on the SKIF-Polytech supercomputer cluster installed at Tomsk Polytechnic University and consisting of 24 nodes with 2 Intel Xeon 5150 2.66 Ghz processors and 8 GB of RAM on each running Linux SuSE Enterprise version 10.3.



In the first case, two vectors with a dimension of 99999999 whole elements initialized at random were used as data sets, and in the second case two vectors of 99999999 real elements. Both programs were launched 21 times for each data set with a change in the number of processor cores used (from 2 to 40), two times in each case.



It should be noted that both programs:

* not optimized;

* use the same functionality (except for the internal features of languages).



Therefore, in the comments in every way welcome additions.



Theory needed to solve the problem



There are two differences in the implementation of MPI for C and Java, which can be confusing at first:

1) In the message transfer functions, the first argument in C is an object, in Java, a one-dimensional array is mandatory ;

2) A different sequence of arguments.



To calculate the scalar product of vectors, it is necessary to solve the following problems:



1) Create two vectors of N elements each and initialize the values;

2) Divide the vector into particles that will be sent to the nodes;

3) Send out the particles;

4) Take the particles on the nodes;

5) Make calculations;

6) send back;

7) Sum and get the result;

8) Calculate the time spent on the program.



The points:

1) Create two vectors of N elements each and initialize the values.



For C, it is necessary to allocate the appropriate piece of memory into arrays - malloc (n * sizeof (double)) and in a loop with the random rand () to initialize the values. For Java, it is enough just to create vector arrays, an object of the Random class (it should be noted that it takes a lot of time to create objects, be careful) and, using this object, initialize vector arrays.



2) Divide vectors into particles that will be sent to nodes.



For C and Java, the solution is the same:

n = total / numprocs + 1, where

N is the number of particles per node,

Total is the length of the vector

numprocs - the number of processes (MPI_COMM_Size) in the pool.



3) Send out the particles;



A function from the MPI library, MPI_Bcast, is used to send an object to all processes in the pool. For specifications you can refer to the manufacturer's website .



As a result, sending arrays to Java looks like this:



MPI.COMM_WORLD.Bcast(d, 1, 0,MPI.DOUBLE, 0);

MPI.COMM_WORLD.Send(a,0,a.length,MPI.DOUBLE,dest,0);

MPI.COMM_WORLD.Send(b,0,b.length,MPI.DOUBLE,dest,0);





where d is the length of the piece from the arrays,

a is the first vector

b is the second vector.



4) Accept particles on nodes

MPI.COMM_WORLD.Recv(a,0,d[0],MPI.DOUBLE,0,0);

MPI.COMM_WORLD.Recv(b,0,d[0],MPI.DOUBLE,0,0);



No comments.



5) Make calculations



for (int i=0; i<d[0];i++){



sum[0]+=a[i]*b[i];

}





6) send back; 7) Sum and get the result;

And here is an interesting point - we will combine two tasks into one. Let us use the reducing function, which itself will perform all the necessary actions for us - it will collect the results and put them into a one-dimensional array (do not forget that the implementation for Java should not have simple variables!) Result.

MPI.COMM_WORLD.Reduce(sum,0,result,0,1,MPI.DOUBLE,MPI.SUM,0);





8) Calculate the time spent on the program



For this, two built-in functions are used, both of the wrappers for standard functions - MPI.Wtime (wall time). We put the first call at the beginning of the program and calculate the total execution time (not the calculation!) Of the program at the end.



findings



Despite all the drawbacks of Java and the strong difference between the execution time of C programs and Java, the final decision on the choice of a programming language can only be made after a thorough analysis of the subject area and the situation in which the research team finds itself. In some cases, the use of C is much more justified due to higher productivity and greater focus on iron (hence, greater optimization of the whole process), at the same time using C imposes greater responsibility on the programmer, who must be competent enough not to let the situation out of - under control and to prevent the occurrence of critical cases in which the program can "leak" and drag along the entire program. This is a very important point in serious research.



On the other hand, the use of Java is also justified. Despite the loss of performance, problems with floating-point computing and other Java have such advantages as controlling the situation with a virtual machine, a developed toolkit for intercepting exceptional situations, a low threshold of entry for developing a number of crushers, the absence of such complex and ambiguous tools as pointers or manual memory allocation can all be a sufficient argument for choosing Java as a programming language for developing parallel programs for the research team. Atelier, not being composed of a competent programmer in C.



C program

#include "mpi.h"

#include <stdio.h>

#include <stdlib.h>

#include <math.h>

#include <signal.h>



#define MYTAG 1



int myid, j;

char processor_name[MPI_MAX_PROCESSOR_NAME];

double startwtime = 0.0, endwtime;



int main(int argc,char *argv[])

{

int total, n, numprocs, i, dest;

double *a, *b, sum, result;

int namelen;

MPI_Status status;



MPI_Init(&argc,&argv);

MPI_Comm_size(MPI_COMM_WORLD,&numprocs);

MPI_Comm_rank(MPI_COMM_WORLD,&myid);

MPI_Get_processor_name(processor_name,&namelen);



if (myid == 0) {

total = atoi(argv[1]);

}



printf("Process %d of %d is on %s\n",

myid, numprocs, processor_name);



startwtime = MPI_Wtime();



n = total / numprocs + 1;

MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);



a = malloc(n*sizeof(double));

b = malloc(n*sizeof(double));



if ((a == NULL) || (b == NULL)) {

fprintf(stderr,"Error allocating vectors (not enough memory?)\n");

exit(1);

}



if (myid == 0) {

for (dest=1; dest < numprocs; dest++) {

for (i=0; i < n; i++) {

a[i] = 4294967296;//rand();

b[i] = 4294967296;//rand();

}

MPI_Send(a, n, MPI_INT, dest, MYTAG, MPI_COMM_WORLD);

MPI_Send(b, n, MPI_INT, dest, MYTAG, MPI_COMM_WORLD);

}

n = total - n*(numprocs-1);

for (i=0; i < n; i++) {

a[i] = rand();

b[i] = rand();

}

} else {

MPI_Recv(a, n, MPI_INT, 0, MYTAG, MPI_COMM_WORLD, &status);

MPI_Recv(b, n, MPI_INT, 0, MYTAG, MPI_COMM_WORLD, &status);

}



printf("Process %d on node %s starting calc at %f sec\n",

myid, processor_name, MPI_Wtime()-startwtime);



sum = 0.0;

for (i=0; i<n; i++)

sum += a[i]*b[i];



printf("Process %d on node %s ending calc at %f sec\n",

myid, processor_name, MPI_Wtime()-startwtime);

MPI_Reduce(&sum, &result, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);



if (myid == 0) {

endwtime = MPI_Wtime();

printf("Answer is %f\n", result);

printf("wall clock time = %f\n", endwtime-startwtime);

fflush(stdout);

}



MPI_Finalize();

return 0;

}





Java program



import mpi.*;

import java.util.*;



public class scalar {



public static void main(String args[]){

MPI.Init(args);



double[] result = new double[1];

int me = MPI.COMM_WORLD.Rank();

int size = MPI.COMM_WORLD.Size();



double startwtime=0.0;

double endwtime=0.0;

int total = 99999999;



int[] d = new int[1];

d[0] = total/size+1;



double[] a = new double[d[0]];

double[] b = new double[d[0]];

Random r = new Random();



MPI.COMM_WORLD.Bcast(d, 1, 0,MPI.INT, 0);



if (me == 0){

startwtime = MPI.Wtime();

for (int dest=1; dest<size;dest++){

for (int i=0; i<d[0]; i++){

a[i] = r.nextDouble();

b[i] = r.nextDouble();

}



MPI.COMM_WORLD.Send(a,0,a.length,MPI.INT,dest,0);

MPI.COMM_WORLD.Send(b,0,b.length,MPI.INT,dest,0);

}



d[0] = total - d[0]*(size-1);

for (int i=0; i<d[0];i++){



a[i] = r.nextDouble();

b[i] = r.nextDouble();

}



} else {



MPI.COMM_WORLD.Recv(a,0,d[0],MPI.INT,0,0);

MPI.COMM_WORLD.Recv(b,0,d[0],MPI.INT,0,0);



}



int[] sum = new int[1];



for (int i=0; i<d[0];i++){



sum[0]+=a[i]*b[i];



}



MPI.COMM_WORLD.Reduce(sum,0,result,0,1,MPI.INT,MPI.SUM,0);



if (me == 0){



System.out.println("answer is"+result[0]+" time of calcs is equal to "+(MPI.Wtime()-startwtime));



}

MPI.Finalize();



}

}







Source: https://habr.com/ru/post/105408/



All Articles