RNAInSpace and folding of tRNA - season closing, new season - Structural alignment

And in less than a year and a half how I got to assemble the tertiary structure of tRNA. Let me remind you that earlier I wrote an article on this topic on the Habré "Development of RNAInSpace, CRA algorithm, code problems on Linux and others . " I have to say that for about a year I didn’t do this, but during this time my second scientific article on this topic “Application of game theory for the task of folding ribonucleic acids” was published (this is for those who want to talk about it professionally). But recently, I can say that I have received the tertiary structure of tRNA and verified it with the sample available in the database (PDB), which was obtained by biological methods (crystallography).

Under the cut there are 3D tRNA structure drawings, explanations and future plans.

Tertiary structure of tRNA - results

')
One could make a folding roller, but I was too lazy - it shows very little, as an example of the beginning, you can watch this , then this , and then folding turns the chain into tRNA shown in the figure.

Figures tRNA from two angles of view. Green is the model I received, red is the model from the PDB database. Now I can tell the specialists that RMSD = 6.71 (this is a measure of the similarity of the two models). As we can see, the overall profile is almost the same. Also, in my model almost all standard hydrogen bonds are formed and non-canonical hydrogen bonds are close to formation.

It should be noted (recall, if someone has already read my articles) that I only get the tertiary model based on the primary structure (the so-called de novo), if it is possible to predict hydrogen bonding sites and find critical stacking sites. If there is interest, I am ready to explain the details and discuss these results.

Season closing

Having brought this direction of my activity to some logical conclusion - in this article I would like to close the series of articles that I wrote on Habré. In essence, I achieved my goals. And here I will tell about it:

1. The first article on the Internet dates back to 2009. In it, the task of folding is set in the spirit of cybernetic ideas.
2. Next, I tried to develop an open source project at Wikiversity .

The main thesis was the following: " you can get serious results, knowing only a certain minimum and not having a specialized education either in biology, or physics, or in chemistry ." Now I have no doubt that I have obtained serious results, and the method I received surpasses all other methods that exist at the moment.

So gentlemen, do not be afraid to start, on your way you will meet a lot of opposition and criticism of those who know little about, but are ready to show their erudition. If results are available, they will have to retreat.

3. I had to abandon many modern approaches in this direction, sometimes there was even a feeling that the methods were used not to solve the problem, but to show how a particular method works. and if at first I pinned my hopes on some methods, including methods of artificial intelligence, it turned out that they are not suitable. Only the general ideology of game theory and the agent-based approach is suitable. And so it all comes down to certain heuristics in finding the objective function (of course, if we talk in more detail, there are fine goodies in the algorithms developed by me - but this is not for this article - not the level of immersion in the problematic).

4. Two articles in refereed journals are enough for me personally on this topic. Thanks for attention.
5. In fact, I developed a method and approach, now it's up to technology and followers.

6. Next, I come to the question "for what and why"? About this in the next section.

"The difference between the living and the non-living"

Back in the first article, the answer to the question of why studying the three-dimensional structure of RNA was given (this is in addition to being interesting in itself, and may be useful to biologists).

We have a clear biological task: "To find out exactly what and how much change in the three-dimensional structure of the 50-100 nucleotide chain of RNA fundamentally affects the fact that this chain of RNA is a ribozyme." In other words, which ribozyme mutations improve or worsen the possibility of self-replication, including their absence. And by popularizing, this will be the detailed answer to the question of how the living differs from the non-living.

Of course, looking around is now somewhat naive. But nevertheless it has a certain meaning. I will try to explain.

Earlier, I have repeatedly pointed out that the modern theory of sequence alignment is essentially erroneous, it allows in essence to customize the results, and not to get the true picture. I also wrote that annotation in biological bases contains many errors “Genomes of sequenced organisms - errors in bases” , and those who work there were forced to agree with this.

Now, looking back, I can say that then, without knowing essentially bioinformatics, in my first article “I made a bet” on the so-called. Structural alignment. This is the finding of genes in the genome, and the subsequent comparison of genomic sequences that takes into account NOT the mutations of individual nucleotides and their statistics, but focuses on the tertiary structure of functionally similar genes.

Indeed, now my approach to obtaining a tertiary structure makes it possible to judge whether a certain nucleotide sequence will be able to fold into one or another structure. This means that it is possible to understand which parts of the nucleotide sequence should be conservative, and in which mutations are possible.

All this information, which really affects the ability of the same tRNA to function, or ribozyme or any other RNA structure, is not used in a simple analysis (alignment), which means there will definitely be errors that will not even be noticeable for a researcher who does not pay attention on the functionality of the tertiary structure. And the statistical approach, which is now commonly used for this, will further obscure this question.

And now, when we know (approximately) the tertiary structure - we can build, let's call it - a functional profile, for example, tRNA. And after that, and only after that - we can find with sufficient accuracy in the DNA the locations of all tRNAs.

But to build this functional profile is not so easy. It turns out we have a few 100% conservative areas - almost everything can change in absolute terms. To understand this, consider the example of tRNA.

So let's compare two tRNAs:

gcgcggauagcucagucgguagagcaggggauugaaaauccccguguccuugguucgauuccgaguccgcgc
gcggauuuagcucaguugggagagcgccagacugaagucuggagguccuguguucgauccacagaauucgca

try to align these two tRNAs and say how are they different? In reality, the problem is much worse - these sequences are not highlighted, as in this example - they are among millions of similar gcau signs. And we do not know where we need tRNA.

You can of course engage in nonsense and align these signs, making assumptions where the breaks and where the inserts occurred during the mutation.

But you can do it easier, let's find hydrogen bonds, for a start, at least classical ones. We get:

(((((((.. ((((........)))). (((((((...)))) () (((.......))))))))))))
(((((((.. ((((........)))). ((((((....))))) ..... ((( ((.......)))))))))))).

Isn’t it becoming more fun? It turns out the difference is not so big. It is necessary to make tolerances on plus minus 1-3 points (unpaired nucleotides) and 1-3 pairs of brackets (paired with a hydrogen bond nucleotides). For greater accuracy, it will be possible to find a correspondence of non-canonical hydrogen bonds (which stabilize the structure at the 3D level).

Of course, it is still difficult to find these structures among the millions of gcau signs. But here there is a landmark. We divide the task into parts, and look for not all the tRNAs, but those that bring phenylalanine. And if so we know for sure that in the center is the sequence gaa. Then we can search for all such sequences in the genome, which has gaa in the middle, and also has a corresponding profile:

(((((((.. ((((........)))). ((((((((gaa)))))) ..... ((((( (.......))))))))))))
(((((((.. ((((........)))). ((((((gaa.))))) ..... (((((( .......)))))))))))).

With permissible limits in the structure.

That's what I'm going to do in the near future - reliably find all the tRNAs in the sequenced genomes of bacteria. Maybe someone wants to participate in this - I invite.

Source: https://habr.com/ru/post/230615/

All Articles

RNAInSpace and folding of tRNA - season closing, new season - Structural alignment

Tertiary structure of tRNA - results

Season closing

"The difference between the living and the non-living"

More articles: