Hello again! We share the publication, the translation of which is prepared specifically for students of the course
“Neural networks in Python” .

Today we will talk about the first important event in the history of the development of DeepMind, to show how research using artificial intelligence can stimulate the emergence of scientific discoveries. Due to the interdisciplinary nature of our work, DeepMind has brought together experts from the field of structural biology, physics and machine learning to use advanced methods for predicting the three-dimensional structure of a protein based solely on its genetic sequence.
')
The AlphaFold system, which we have been working on for the last two years, is based on many years of research experience using extensive genome data to predict protein structure. The three-dimensional models of proteins that AlphaFold generates are much more accurate than those that were obtained earlier. This marked significant progress in one of the main tasks of biology.
What is the problem of protein folding?
Proteins are large and complex molecules necessary for sustaining life. Almost all the functions of our body, be it muscle contraction, light perception, or the transformation of food into energy, can be traced through one or more proteins and how they move and change. The recipes of these proteins, called genes, are encoded in our DNA.
The properties of a protein depend on its unique three-dimensional structure. For example, the antibody proteins that make up our immune system are “Y-shaped” and resemble special hooks. Clinging to viruses and bacteria, antibody proteins are able to detect and tag pathogens for subsequent destruction. Similarly, collagen proteins are in the form of cords that transmit tension between cartilage, ligaments, bones, and skin. Other types of proteins include Cas9, which, guided by CRISPR sequences, act like scissors that cut DNA and insert new sections there. Antifreeze proteins, whose three-dimensional structure allows them to bind with ice crystals and prevent freezing of organisms; and ribosomes that act as a programmed pipeline that is involved in the construction of proteins.
Determining the three-dimensional structure of a protein solely from its genetic sequence is a complex task, over which scientists have been fighting for decades. The problem is that DNA contains only information about the sequence of the building blocks of a protein, called amino acid residues, which form long chains. The prediction of how these chains will form into a complex 3D protein structure is known as the “protein folding problem”.
The larger the protein, the more difficult it is to model it, since between the amino acids there are more bonds to be taken into account. As follows from
the Levintal paradox , to enumerate all possible configurations of an ordinary protein, before its correct three-dimensional structure is reached, it will take more time than the Universe exists.

Why is protein folding important?
The ability to predict the shape of a protein is extremely useful because it is fundamental to understanding the role of protein in the body, as well as diagnosing and treating diseases such as Alzheimer
's ,
Parkinson’s ,
Huntington ’s , and
cystic fibrosis , which are believed to be caused by improperly folded proteins.
We are especially pleased that the ability to predict the shape of a protein can improve the understanding of how our body works, this will allow us to effectively develop new drugs. As we get more information about the forms of proteins and how they work with the help of modeling, new opportunities for creating drugs are opening up, as well as the costs of experiments are reduced. Ultimately, these discoveries can improve the quality of life for millions of patients around the world.
Understanding the protein folding process can also help develop a type of protein that will make a significant contribution to the surrounding reality. For example, advances made through the development of protein in the field of biodegradable enzymes can help cope with contaminants such as plastic and oil, helping to break down waste without spoiling the environment. In fact, researchers have already begun to
design bacteria that produce proteins that will make the waste biodegradable and make it easier to handle.
In order to stimulate research and evaluate progress in the field of advanced prediction accuracy, a large-scale two-year competition was established in 1994 called
the Community Experiment on the Critical Assessment of Protein Structure Prediction Methods (CASP), which became the gold standard for evaluation methods.
How does AI change the situation?
Over the past five decades, scientists have been able to recognize forms of proteins in the laboratory using experimental methods such as
cryo-electron microscopy ,
nuclear magnetic resonance or
X-ray diffraction , but each method was derived through many tests and errors that took years and cost tens of thousands of dollars. That is why biologists are now turning to AI methods as an alternative to the long and laborious process of studying complex proteins.
Fortunately, the field of genomics has enough data due to the rapid decline in the cost of genetic sequencing. As a result, in the past few years,
approaches to the problem of forecasting, using deep learning and based on genome data, have become increasingly popular. DeepMind's work on this issue led to the emergence of AlphaFold, which we presented to CASP this year. We are proud to be part of the progress that CASP experts called “unprecedented progress in the ability of computational methods to predict protein structure”. As a result, we
won the first place in the rating of teams (we are A7D).
Our team focused precisely on the task of modeling target forms from scratch, without using previously solved proteins as templates. We achieved a high degree of accuracy in predicting the physical properties of the protein structure, and then used two different methods to predict the total protein structures.
Using Neural Networks to Predict Physical Properties
Both of these methods used deep neural networks that are trained to predict the properties of a protein according to its genetic sequence. Properties that predict networks: (a) the distances between pairs of amino acids and (b) the angles between the chemical bonds that connect these amino acids. The first development was a real progress in the use of popular methods that determine whether there are pairs of amino acids next to each other.
We trained the neural network to predict a separate distribution of distances between each pair of protein residues. These probabilities were then combined into a score, which shows how correct the developed protein structure is. We also trained another neural network that uses all the distances together to evaluate how close the proposed structure is to the correct answer.


New methods for predicting protein structures
Using these evaluation functions, we were able to find structures that match our predictions. Our first method is based on methods widely used in structural biology; it has repeatedly replaced parts of the protein structure with new fragments. We have trained the generative-adversary neural network to propose new fragments that are used to continually improve the assessment of the proposed protein structure.

The second method optimized estimates using a gradient descent, (a mathematical method commonly used in machine learning for small incremental improvements), which resulted in high accuracy of structures. This method was applied to whole protein chains, and not to pieces, which must be laid separately before assembly, which reduces the complexity of the prediction process.
What's next?
The success of our pen on protein coagulation shows that machine learning systems can integrate various sources of information to help scientists quickly develop creative solutions to complex problems. We have already seen how AI helps people master complex games through systems such as
AlphaGo and
AlphaZero , we also hope that one day the breakthrough of AI will help humanity solve fundamental scientific problems.
It is interesting to see the first progress in protein folding, demonstrating the usefulness of AI in making scientific discoveries. Even though we still have a lot to do, we understand that we can contribute to finding a cure for various diseases, helping the environment and much more, because in reality the potential is huge. With a dedicated team focused on learning how machine learning can advance the world of science, we explore various ways and methods by which our technology can influence the world around us.