📜 ⬆️ ⬇️

Genomics Computers

Sergey Naumenko, Researcher, Laboratory of Evolutionary Genomics, Faculty of Bion-Engineering and Bioinformatics, Moscow State University Lomonosov told PostNauka about the laboratory, the supercomputers that are used to process genomic data, and the problems that must be solved in this connection.


In 2003, the human genome project, begun in the early 1990s, was completed, thanks to which the human genome was sequenced, although the lacunae remained in it.
Below is a table showing the dynamics of research:


')
In 2003, the Human Genome Project, launched in the early 1990s, was completed. In order to sequence the genome, $ 3 billion was spent and a huge amount of research efforts gathered into an international project. Now the situation has changed with the invention of new high-performance sequencing instruments: you can read 10 genomes in 2 weeks for a low fee. The microbiologist Konstantin Severinov has already spoken about this, citing as an example the European and American experience, when scientists outsource various practical and experimental tasks to third-party firms.

The main difficulty facing bioinformatics now is the creation of a special computer and its maintenance. This requires a specialist who will be responsible for its architecture. Such people, however, are few in Russia. Why it is needed: because if you do not select a personally responsible person, then the architecture will be formed on the basis of existing solutions that may not be suitable for the task of processing genomic data.

But when there is an architect and a computer, then new tasks arise:

1. You need to decide which operating system to use.
For example, in the laboratory of evolutionary genomics of the Faculty of Bioscience and Computer Science of Moscow State University Lomonosov is an operating Linux Scientific Linux, developed on the basis of industrial distribution Redhat Enterprise Linux leading CERN and FERMILAB.

2. The problem of using file systems. Again, MSU uses Luster file systems (a deployed file system that is commonly used in very powerful supercomputers, in particular on Lomonosov, and allows you to distribute the load on disk arrays) and OCFS2, XFS.

3. The task of monitoring the system. At MSU, this is Nagios. Such a system is needed so that the administrator immediately knew if something broke or turned off.

4. The problem of configuration nodes. Her at Moscow State University is solved by the Puppet system, which allows you to automatically configure the configuration of all nodes.

In this case, all of the above for specialists is common, but when biologists encounter this problem, they begin to solve it from scratch. In Russia, the process of equipping laboratories with sequencers is just beginning, and when this happens, the problem of equipment for processing sequencing data arises, because biologists, as Naumenko admits, cannot handle anything more powerful than a laptop.
And from this they are forced to turn either to vendors for help, or to physicists who have been using similar computers since the calculations of the atomic hydrogen bomb. However, their computers do not fit the tasks of biologists.

What distinguishes a supercomputer from a computer in the laboratory of evolutionary genomics at Moscow State University: A supercomputer (the list can be found on supercomputers.ru) is a powerful computer with a huge number of processors, a very fast network of connections between processors and a relatively small data store. And biologists need a computer that processes the data, and which can hold a huge amount of data, transfer them at high speed, but which at the same time has a relatively small computing power. That is, in it the number of processors is comparable to the number of data stores.

Such a computer is in the lab of Moscow State University. It contains about 500 terabytes of disk arrays, which is about one third of the disk drives of the Lomonosov supercomputer, and contains about 300-400 cores.


Computer Lomonosov

A computer in the laboratory of evolutionary genomics at Moscow State University solves the following problems: it receives data from two sequencers and assembles de novo genomes from short readings, annotation, that is, marking them into the regions encoding proteins and noncoding proteins, and simply processing raw data which come from sequencers.

To solve these problems, we had to make nodes with large memory - to assemble the de novo genomes, a large amount of RAM is needed - 512 GB of RAM.

The problem, which oddly enough stands straight, is the problem of electricity, even at MSU there are interruptions and it is necessary to install powerful uninterruptible power supplies in order to feed the entire computer system for 5 minutes to half an hour.
With the advent of high-performance sequencers, the first opportunity was to obtain cheap genomic data at the level of the complete genome, rather than certain parts of certain genes, and this opened up entirely new possibilities in evolutionary and medical genomics. You can take a population and read the genomes at once 20-50 samples, and check the entire population genetics, just on the basis of the genotypes of these organisms. In medicine, for any predictions, you need a lot of repetitions, that is, you need to sequence 50-100 patients so that you can talk about something. Therefore, sequencers are needed, and such huge amounts of data need to be processed.
Therefore, now biologists have to relearn, learn to write programs, study statistics in order to understand the meaning of the data.

“Maybe sometime this era will pass and be replaced by a more sensible approach, when they plan experiments and understand what data should be obtained, what is not needed, but now is the time of conquering the prairies, when all that is possible is sequenced and all data is received, to which can reach and try to process them. Therefore, in connection with this, the need for computers will increase until a more mature research methodology is developed, ” says Sergey Naumenko.

Source: https://habr.com/ru/post/204076/


All Articles