
Hello!
Last week on Habré was marked by a series of hacker posts - they cracked both
VoIP and
online traffic jams .
I propose to continue the week more creatively - to solve
the global problem of genetics on parallel programming.
In a month, you need to do nothing at all:
find in two strings consisting of nucleotides of the characters A, T, G and C, the longest common substring .
The prizes have grown and strengthened compared to the previous time - today there are
6 Asus Zenbook UX31E ultrabooks and
10 SSD disks with a total capacity of 800 gigs at stake.
Tempting?
What is it about?
You are given a reference string (for example, the following:
GATGAGCATGTGTTGAATCCTCA
) and many long input lines (here’s one of them:
GTCCTCCAGTTTGTAGCATGTGTATTTATGTCCTCCAGTTTGTAGCATGTGTATTTAT
). It is necessary for each pair of reference and input strings to find the longest common substrings and return their “coordinates”. In the example, the answer is
(5 13 15 23)
and
(5 13 44 52)
, that is, two substrings are found:
At your disposal there is a
reference code that correctly, but very slowly solves this problem by brute force.
Sounds easy?
How to solve?
This is where the fun begins. I will offer several options that may be useful:
- The easiest way: parallelize the reference code by data, for example, using OpenMP and divide the work on input tests between threads. But you can divide the work in different ways. Input lines only? Fragments of the reference line? It strongly depends on their size and quantity - you decide.
- A more interesting way: take the reference code, run it through the Intel VTune Amplifier XE and parallelize the smart algorithm itself.
- A smarter approach: take one of the more than 9000 substring search algorithms in a string and try to find the best match for this task.
- The most advanced approach: merge the previous paragraphs, taking a smart algorithm and parallelize it according to the instructions and data
')
What to choose? I want a hint!
What you choose, we can not advise. Some people like to write on pthreads, someone on OpenMP, and someone likes to use parallel functions from TBB and maybe even MKL.
One thing is for sure - very clever ideas and thoughts are often discussed on our
forum . For example, it is worth looking at the instructions in SSE4.2.
What can I write?
Unfortunately, our automated testing system is too young to support all programming languages.
This time we taught her to understand C ++ (despite my sincere love of python and Java Script), so I’ll have to write in good old C ++.
The development platform can be any, but the code will be compiled and tested on a Linux machine (Debian stable - kernel 2.6.32) with gcc installed (version 4.6.2 for pthreads fans) and Intel Parallel Studio XE 2011 (for those who choose Intel Compiler, optimizing code for our processors).
And what about the prizes? I want an ultrabook!
The first three teams that wrote the fastest code and sent the solution before May 16, we will give
Asus Zenbook UX31E to the participant.
The second three teams - on
SSD 320 Series 80Gb .
Another 4 participants who write us the most interesting posts in the
blogs of the Intel Software Network will also get SSD.
So, once again:
one task , one month,
6 ultrabooks and 10 SSD for the best participants from Russia and the CIS.
Anyone who
decides to participate , good luck!
The organizers and judges of the competition are ready to answer any of your questions in the comments.