On the eve of the new post of the program "
5 weeks with Intel ", I suggest reading the interesting text, which describes the applications of high-performance systems (HPC) on the example of several Russian developments.

In May, Intel held a
32 Core Testing Plan competition, offering the scientific community to test their applications on a multi-core computing system — among the five
winners from Russia are as many as three teams. Below are a few words about the complex and beautiful mathematics behind the works of the finalists.
Anton Pankratov works at the
Institute of Mathematical Problems of Biology , which deals with a variety of tasks (there is a complete list on the main page of the Institute): starting with the study of the primary structures of biopolymers and modeling the dynamics of biomolecular systems, ending with neural network models and biodiversity problems.
')

Together with his colleague
Ruslan Tetuev, Anton is working on spectral methods for processing and analyzing genetic data on the recognition of homologous genetic sequences. If you do not understand the last sentence, it is not scary - almost no one outside their sphere of interest understands what is at stake.

Genetics for teapots: DNA is a very long polymer, consisting of a large number of building blocks - nucleotides. Nucleotides basically have four repetitive nucleic bases: adenine, thymine, guanine, and cytosine, abbreviated A, T, G, and C. Serious genetic studies bypass chemical processes and work directly with these four letters, which encrypt almost all living organisms on Earth: ATCGATTG, something like this is a continuing DNA code. Ongoing and ongoing, because these sequences are very long: the longest human chromosome, number one, is about 220 million base pairs long.
One of the major problems of modern computational genetics is pattern recognition, that is, finding duplicate parts of DNA. It is one thing to simply write down the entire human genome, it is completely different to find repetitive or similar parts in it and try to establish connections between them, relationships. This is the work of Anton’s team, continuing the twenty-year project of the Institute of IMPB, started by Anton’s mentors and based on the work of the great Russian mathematician Pafnutia Chebyshev. “We call our NASCA method, Numeric Analytical Spectral Comparing Approach: a method for approximating spectral analysis,” says Anton, while we are strolling along the sun-drenched second building of Moscow State University on Sparrow Hills, alma mater Anton.
“Using the formulas of Chebyshev's orthogonal polynomials, which made a great contribution to the theory of approximation, we can process very large sequences, which conventional dynamic programming methods that establish a direct correspondence with a letter cannot.” This is “homology”, that is, it is similar, but not an exact match - Anton's team is looking for similar, but not exact repetitions inside the DNA code. "We abstract from the literal text and moved to its statistical profiles: we no longer see the individual letters A, T, G and C, but we see some statistics along the text that we can process using spectral analysis approximation methods."
A fragment of a matrix that considers a sequence of one and a half million base pairs; Each color pixel of the matrix is ​​not a single repetition, but a repetition of 500 nucleic bases. Similar segments are marked in red (red diagonal is, of course, repetitions of the matrix, closing themselves), reverse sequences are shown in green.This map is a ready-made scientific tool for genome research. Anton's team
has already found a replay , which is very difficult to detect using well-known methods: “Ruslan sent our find to the
Genetic Information Research Institute database, and we hope that our method will also take its place in the arsenal of modern genetics.”
“We learned about the competition in the laboratory of Intel at Moscow State University. We are interested in multi-core architectures, because our method is very well parallelized on them thanks to the method of calculating the coefficients of decomposition, which we developed. The current implementation of our work uses the popular
OpenMP and
Intel IPP packages and is accelerated almost linearly on multi-core architectures. On Intel's 32-core test system, we achieved a 27-fold acceleration of the algorithmic part of our program. ”

“It is already clear that while we tested our algorithms on a remote 32-core machine provided by Intel, we have learned quite a lot. Parallel programming makes you think and strive for beautiful solutions. ”
Ekaterina Zhmud from Novosibirsk State University also uses multi-core systems for an extremely beautiful code. “Our project is related to coding theory and deals with algorithms for computing automorphism groups of Q-valued codes. It does not sound very clear, but in general automorphism groups are widely used in modern cryptography, which is now becoming an increasingly important part of the technical world. My part of the work on this project concerns the detection of symmetric groups of combinatorial objects, not necessarily codes. In the future, we are going to make a special tool that could be used by cryptographic scientists. ”
Photo by: Maya ShelkovnikovaLike Anton's team, Catherine uses matrices to check the code; individual rows of a matrix or parts of rows can be viewed independently, so the algorithm can be extremely parallel. “We use
vTune and
Thread Checker to analyze the parallelization of our code,” Ekaterina adds.
Anton Pankratov and his team are thinking about using Intel's tools for testing and optimizing programs, but “for the time being we have not used any specialized means of debugging our programs — we just write them very carefully. Our observations confirm the conventional wisdom that memory is a bottleneck in high-performance computing systems, so we use indexing, data compression, which relieves memory load and which in turn has a positive effect on parallelization. Intel's help was also important in the organizational sense - in 2006, I went to a conference on parallel computing and found my way around existing tools and parallel programming environments. ”
Alexey Nikolaev , director of Intel education programs in Russia, summarizes the topic: “Both the competition and the laboratory’s activities show that we are at the forefront of industry trends, science trends and education issues; their combination allows us to discover new methodological aspects, and to provide new knowledge, and to determine the most effective way of movement of education. ”
* * *

Text taken from the
Intel Galaxy project.

And in the afternoon, within the framework of the program “
5 weeks with Intel ”, one more material will be posted on the topic of supercomputers (HPC), so that you can already start preparing tricky questions.
To be continued.