Bioinformatics is a promising field of science and a rapidly growing industry. The use of information technology in biological research today allows you to test drugs in a virtual environment and decipher DNA sequences in a matter of hours. In this material, we will talk about bioinformatics and what developments are underway in this area at ITMO University .What is bioinformatics
Many scientists agree that bioinformatics is designed to study biological processes with the help of modern computing technologies. In fact, experts in this field use programs for visualizing amino acid sequences, as well as developing algorithms based on probability theory and mathematical statistics. However, the initial goal of bioinformatics was more general: Polina Hogeveg and Ben Hesper in 1970
defined it as “the study of information processes in biotic systems”.
If you focus on this definition, the birth of science can be attributed to the XIII century, when Fibonacci built the first mathematical model of the process of breeding rabbits. Since then, scientists have begun to apply more formal methods to describe biological processes. In 1953, one of the most important events in the history of bioinformatics, and possibly science in general, took place: Francis Crick and James Watson
revealed the structure of DNA, which is known to everyone since high school.
')
Two decades later, DNA sequencing methods were developed - deciphering its sequence, and then the first complete genome of a living organism, the bacteriophage phX174, was obtained. The development of sequencing technologies made it possible to accelerate this process, as a result of which it was possible to assemble sequences of the genomes of yeast and Drosophila fly.
A turning point in the history of bioinformatics was the
assembly of the human genome in 2003: scientists from around the world for 13 years have been compiling its sequence in pieces. From this moment begins the so-called postgenomic era in the development of bioinformatics. Its main feature is the colossal amount of biological data that cannot be processed manually. This is where digital technologies come in, which allow not only to interpret molecular data, be it nucleic or amino acid sequences or protein structure, but also organize them into databases. For example, the GenBank data bank
stores more than 11 billion genes of more than one hundred thousand organisms.
By the way, the researchers themselves are not very fond of the term “deciphering” the genome: they prefer to use the word “assembly” or “determine the sequence of the genome” - this allows us to point out that even in areas that have been monitored by scientists for many years, there remain unresolved tasks. For example, in the human genome there is still a fraction of unknown fragments.
Moreover, even knowledge of the entire genome sequence does not indicate its function. That is why many scientists involved in bioinformatics are now studying the links between already known genes and their influence on the phenotype: in fact, researchers have to solve already known problems, but faster and more qualitatively, using new methods and technologies.
Bioinformatics is closely intertwined with other
sciences , in particular, with genomics and proteomics. Genomics studies the totality of genes in the body. Having a large base of genomes, we can identify the similarities and differences in the genotypes of living beings, thus drawing conclusions about the characteristics of individual species and about evolution in general — comparative genomics deals with this. The functions of genes, as well as the influence of some genes on others, are studied by functional genomics. Thanks to the methods of structural genomics, three-dimensional models of proteins encoded by a specific gene are created.
Proteomics studies the totality of the products of gene expression - proteins. The area of ​​comparative proteomics is especially actively developing, the essence of which is in comparing the protein composition, or proteome, of living organisms. A comparison of the proteomes of two organisms makes it possible to identify the reasons for the differences in their phenotypes, which in turn helps to understand the course of evolution. Also, comparative proteomics makes it possible to identify proteins that adversely affect the development of the disease, and to test drugs for them.
On the one hand, bioinformatics is an interdisciplinary field that contains knowledge from molecular biology, genetics, mathematics and computer science. On the other hand, using discoveries in these sciences, bioinformatics also makes a significant contribution to their development: this is partially reflected in the names of modern technologies - decision trees, neural networks, and genetic algorithms.
Development University ITMO
On the basis of the University ITMO conducted numerous studies in the field of bioinformatics. In 2011, the
Laboratory of Structural Bioinformatics was established, where experiments on protein modeling and protein-protein interaction prediction are conducted. One of the latest developments of the laboratory is the
method of studying the dynamics of proteins, based on the principle of mass transfer. The model of movements that are carried out at relatively large distances is quite adequate and eliminates the drawbacks of the previous models.
One of the leaders of the Research Institute of Bioengineering, Andrei Kayava,
considers it equally important to identify the functions of proteins. Random changes in protein structure can lead to neurodegenerative diseases, such as Alzheimer's and Parkinson's. Bioinformatics allows you to study the sequence of amino acids and predict the likely occurrence of these diseases. The ArchCandy method and program,
developed by Andrei Kayava’s research team, helps solve the problem of diagnosing neurodegenerative diseases at an early stage.
In a number of research projects, employees of the Computer Technologies Department took an active part. The beginning of their research path in bioinformatics was participation in the international
de novo Genome Assembly Assessment Project . Participants were able to
develop and test a genome assembly method that allows to eliminate read errors — data that is obtained from special sequencer machines.
Another
work by young researchers from the ITMO University describes a method for assembling contigs — long overlapping DNA segments — which implies dividing the assembly into two stages: the first one uses the de Bruyin graph, the second uses the overlap graph. In a later work, the method is also described, where one of the stages is microassembly: the de Bruijn graph is constructed from readings, the size of which turns out to be much smaller than the graph from the first stage - hence the name “microassembly”. The result of the work of scientists was the program for assembling the ITMO Assembler genome, which can be downloaded from the
link .
DNA sequencing machinesThe continuation of this work was the
participation of ITMO University employees in the MetaFast project. The essence of the project is to develop a software package that allows you to compare metagenomes - the totality of the DNA of microorganisms - in various environments. The DNA of organisms that are incapable of reproduction, for example, viruses, is difficult to collect, since they provide only fragmentary data. There is too little data on viruses and other bacteria in DNA bases to compare fragments of the obtained metagenomes with them, and deep analysis takes too much time.
The developed program works much faster, carrying out only partial collection and comparison of genomes. In addition, the algorithm allows to detect patterns even in unfamiliar environments. According to the employee of the computer technologies laboratory of the ITMO University and the main algorithm developer Vladimir Uliantsev, this approach helps to find microorganisms in patients that are responsible for the propensity for a particular disease. Comparing the microflora of healthy and sick people, you can quickly identify the cause of the disease and take measures to eliminate it.
MetaFast has been tested in a wide variety of environments, including those with a high content of viruses. So, for example, scientists have proved the safety of microbes living under the ground. They found that the samples taken in the metro of New York, for the most part belong to the already known bacteria.
The new algorithm can also be useful in studying the processes of urbanization. The urban atmosphere negatively affects our microflora, and modern products destroy the bacteria that the body needs. By comparing the metagenomes of the inhabitants of large cities and remote settlements, one can find out what these beneficial bacteria are and how to preserve them.
Employees of the ITMO University also participated in an international
project to develop a web service for the integrated study of the work of cells. The GAM program (genes and metabolites), developed by a graduate student at ITMO University Alexey Sergushichev, identifies links between genes and changes in metabolism.
For example, when it is necessary to study the process of tumor development, the program takes baseline data on the concentration of metabolites — simple substances involved in metabolism — and gene expression and compares them with data in the KEGG database. After that, a map of metabolic pathways is constructed, showing the process of changing substances as a result of chemical reactions.
The service will be useful in the treatment of diseases associated with impaired immune system and cancer. Maps of metabolite changes help to follow the development of a tumor and develop mechanisms for its containment in the early stages. With the help of the developed algorithm, scientists have already proved that if you slow down the metabolic process in lung cancer, the tumor growth rate will decrease.
Unlike its counterparts, the GAM web service is both simple, efficient and, importantly, free, so anyone can use it. The service is already used in dozens of laboratories and pharmaceutical companies.
Conclusion: short for those interested in bioinformatics

Many students and graduates, including programmers and mathematicians, are interested in how to get into the field of bioinformatics. First you need to decide on what problems you are interested in solving. In bioinformatics, the range of tasks is very wide: starting with pure computer science and proving theorems and ending with pure biology, which newcomers have to actively understand. It is clear that most of the research is located at the junction of several areas.
After you need to find out in what places they do what you are interested in. To do this, you will have to study the articles of specific
laboratories and assess whether you really want to participate in their work. In parallel, it does not hurt to enroll in courses at the
Institute of Bioinformatics or search online courses like those offered by
Coursera . So you can get an idea of ​​what bioinformatics is doing now and how it works.
It is important to understand: since bioinformatics is a discipline at the junction of several areas, projects in this area can be associated not only with the use of computer science to solve problems of biology, but also vice versa. A striking example is the
compilation of a training schedule using DNA computers. Not to mention synthetic biology, in which they try to create or modify microorganisms for a specific purpose: for example, to better process biofuels.
These projects and bioinformatics in general are a vivid example of the fact that modern science can be exciting and fascinating - not only on the screen of a “big movie”, but also in real life. And in order to take part in such developments, it is not necessary to study or work abroad: many interesting and significant projects in the field of bioinformatics are being developed in Russian universities, in particular, at ITMO University.