"Weakly load 40 cores?" Or a simple parallel programming contest Acceler8 2011

Continuing the week of expressing opinions about higher education at Habré , I’ll tell you about the competition, which is somewhere in the middle, between work (as denied by the respected AlexanderLarin ) and the ability to learn (as promoted by the user Alexnn ).

Competition parallel programming Acceler8 2011

And I will tell you about the Acceler8 parallel programming contest . Yes, this is a contest in which there is an opportunity not only to learn , but also to work for one of the “small” prizes: 2 powerful HP laptops and 10 Asus netbooks .
Want to try your hand?

At first glance, this time the competition is quite simple: you need to solve only one programming task.
At second glance, the solution should be parallel and, if possible, scalable.
So what are we talking about?

Briefly about the task

Given a two-dimensional matrix of values, you need to find a rectangular submatrix in it with the maximum sum of elements in it among all submatrices.
More information about the task can be found on the contest page .

Briefly about the decision

It is clear that the successive search into the forehead will solve the problem, but the solution will be too slow:

int maxSum = A[0][0]; int maxI0 = 0, maxJ0 = 0, maxI1 = 1, maxJ1 = 1; for (int i = 0; i < M; i++) for (int j = 0; j < N; j++) for (int width = 1; width <= M - i; width++) for (int height = 1; height <= N - j; height++) { int currentSum = sum_subarray(A, i, j, i + width, j + height); if (currentSum > maxSum) { maxSum = currentSum; maxI0 = i; maxJ0 = j; maxI1 = i + width - 1; maxJ1 = j + height - 1; } } cout << "Max subarray: [" << maxI0 << "][" << maxJ0 << "] .. [" << maxI1 << "][" << maxJ1 << "]"; cout << endl << "Max value = " << maxSum << endl; cout << "Area = " << (maxI1 - maxI0 + 1) * (maxJ1 - maxJ0 + 1) << endl;

Obviously, such an approach is “not at all epic”.
')

How to solve intelligently

There are many approaches to this problem.

First, you can use interesting caching (based on the properties of matrices), described in the Daily One Algorithm :
To begin with, we will create an intermediate matrix B, initially filled with zeros, in which each element of B [i] [j] stores the sum of the elements of the submatrix from the upper left corner (A [0] [0]) to A [i] [j].
Now the sum of elements in any submatrix can be found as
```
 sum = B[i2][j2]- B[i1-1][j2] - B[i2][j1-1] + B[i1-1][j1-1] 
```
Secondly, you can try to apply the Kadane algorithm , originally written for a one-dimensional array, or you can port and parallelize the modified algorithm given by one of the users of stackoverflow.com
Thirdly, you can think a little and write a beautiful, parallel and quick solution;)

We will not reveal all the cards. Now everyone is on an equal footing - a month of time (decisions are made until November 15) and free access to Google and textbooks.
Sounds easy, doesn't it?
And the most delicious - for a snack.

How will we evaluate scalability?

There are many approaches and a lot of flood on this issue. We decided to take the simplest implementation, but also the solution closest to reality:
The code will be tested on two different platforms. The first is a 40-core Manycore Testing Lab machine, with well-known hardware features, a well-known brand of processor and a known second-level cache size (see “ MTL Configuration ”). The second is a small system, about which almost nothing is unknown (2-4 cores, a few gigs of memory — that's all the information).
Is it difficult in such a situation to optimize the solution? Yes.
On the other hand, it is possible to work the fastest on a “small machine” and moderately on MTL — this may be enough to win.
Or maybe someone will achieve better results on both machines? Time will tell.
So, for those interested, a small link: registration for the competition . Questions can be written in the comments or leave on the forum of the competition .

And, this, be epic ! ;)

Source: https://habr.com/ru/post/130588/

All Articles

"Weakly load 40 cores?" Or a simple parallel programming contest Acceler8 2011

Briefly about the task

Briefly about the decision

How to solve intelligently

How will we evaluate scalability?

More articles: