PrehistoryImmediately I apologize for the complexity, but both the situation itself is difficult to apply and the solution, but the result is beautiful and effective :)
It began with the fact that he described one problem
about the problems of the PLO . Then, by chance, thanks to the conversations, I began to think about design patterns. And in connection with the topic “full copying of an object” I went on the Flyweight pattern. Who does not know - please first read about it in
Object-oriented design techniques. Design patterns (Not in the wiki, but in the original).
')
The basic idea there is this:
The Flyweight pattern describes how to jointly share very small objects without excessively high costs. Each adaptive object has two parts: the internal and external states. The internal state is stored (shared) in the opportunist and consists of information that does not depend on its context. The external state is stored or calculated by client objects and passed to the opportunist when calling his methods.TaskWe will look at how to improve on a concrete example. I will say very little about bio-computations - but an example will be built on this. I will try to completely erase the essence of bio-calculations, leaving only the scheme.
PS If someone is interested in the problems of bio-calculations on the task of folding RNA / proteins - place an order and then write a separate article.
So there is an RNA object (RNA). He is the heir to the more general object Chain (Chain of molecules) (for example, there may be DNA, proteins, etc.). Each RNA consists of a variety of molecules (nucleotides) (Molecule). In this case, the molecule can be of four types (the heirs of Molecule) - Cytosine, Uracil, Guanine, Adenine. Each molecule, depending on the type, consists of 28-33 atoms (Atom). Each atom has three calculated angles.
public class Angles
{
private float phi ;
private float theta ;
private float d ;
}
Required on the basis of the primary structure of the chain, for example, aagaggucggcaccugacgu - to build a three-dimensional model. Those. create about 1000 interconnected atoms, which in general form represent a graph. Creating this graph takes a lot of time.
Next you need to perform bioassays. This chain takes a certain 3D configuration. The coordinates of each atom and its angles with respect to the other atom are calculated. The calculations are carried out according to the scheme - we take the last "good" state of RNA and try to "improve" it. We make, say, 1000 attempts to rotate one molecule in one corner. From this 1000, only one best option is selected — fixed (becomes a “good” state) and the calculations are repeated.
Solution to the foreheadActually, it was decided at the beginning, it was inherited from the code of the Rosseta @ home project.
As you can see, to spend 1000 attempts to rotate, you need to copy the initial state. Those. there are objects
RNA BestRNA ;
RNA CurrentRNA ;
And you need to do
CurrentRNA = BestRNA. Clone ( ) ;
Calculate ( CurrentRNA ) ;
The problem is precisely this Clone (). Remember that constructing a graph of all atoms is a very resource-intensive procedure.
And if you do not build, then we wipe the corners of all atoms in a “good” state.
Classic FlyweightClassic Flyweight suggests that we extend the properties of the corners from atoms and place them in the objects above. To the extent that the array to be located in the RNA object. This array will be indexed in some way so that the index can unambiguously get into the desired atom.
Then, during cloning, only this array will need to be cloned, and the atomic graph is not needed.
But all this seriously violates the principles of the PLO. In essence, the properties of an object are selected from objects in order to increase the speed of calculations. The model loses its essence - the corners are no longer properties of atoms, but properties of the whole chain. Not serious. This is a sign of structured programming, not an object.
What to do?The main thing to realize first is that we actually do the calculations
in time . Those. we need to maintain the trajectory of atoms over time. The object picture is then restored.
Further there will be a lot of code with a minimum of comments, which is not clear, ask.
So, the first thing we are doing is replacing single angles with arrays of time, but with access by the current time.
public class Angles
{
private float [ ] phi = new float [ Chain. TimeHistory ] ;
private float [ ] theta = new float [ Chain. TimeHistory ] ;
private float [ ] d = new float [ Chain. TimeHistory ] ;
public float phi
{
get { return phi [ Chain. Time ] ; }
set { phi [ Chain. Time ] = value ; }
}
public float theta
{
get { return theta [ Chain. Time ] ; }
set { theta [ Chain. Time ] = value ; }
}
public float D
{
get { return d [ Chain. Time ] ; }
set { d [ Chain. Time ] = value ; }
}
public void SetInitial ( int argTime )
{
Phi = phi [ argTime ] ;
Theta = theta [ argTime ] ;
D = d [ argTime ] ;
}
}
Then time is controlled at the circuit level, i.e. the topmost object. I give a piece of code and gradually explain. The whole idea will be clear only at the end (the presentation is like in mathematics - at first it is not clear what is needed, and only at the end it is clear why all this is).
public class Chain
{
private static Chain instance ;
public Chain ( int MolCount )
{
instance = this ;
}
/// Current time
private int time = 0 ;
public static int Time
{
get { return instance. time ; }
set { instance. time = value ; }
}
public static int OldTime
{
get
{
int oldTime = instance. time - 1 ;
if ( oldTime == - 1 )
{
oldTime = TimeHistory - 1 ;
}
return oldTime ;
}
}
/// The number of time steps that the object stores
public static int TimeHistory = 5 ;
private static int generation ;
public static int Generation
{
get { return generation ; }
}
/// Setting the time one step further.
protected void NextTime ( int argGeneration )
{
Time ++ ;
if ( Time > = TimeHistory )
{
Time = 0 ;
generation = argGeneration + 1 ;
CurrentMaxTimeID = 0 ;
}
}
public void CheckTime ( int argTimeID, int argGeneration )
{
// There can be no generation more than 2 times older, and if this is the current generation,
// then the time stamp cannot be longer than the stamps issued
// (this can only be between current and past generation)
// there can be no more generation than current
if ( argGeneration < ( generation - 1 ) ||
( argGeneration == generation && CurrentMaxTimeID < argTimeID ) ||
argGeneration > generation )
{
Console. WriteLine ( "ErrorGenerationRNA" ) ;
Console. ReadLine ( ) ;
}
if ( Time ! = argTimeID )
{
Time = argTimeID ;
}
}
private static int CurrentMaxTimeID = 0 ;
public static int GetNextTimeID ( int argCurrentTimeID, int argGeneration )
{
int NextTime = - 1 ;
if ( argGeneration == generation )
{
NextTime = argCurrentTimeID + 1 ;
}
if ( argGeneration == generation - 1 )
{
NextTime = 0 ;
}
if ( CurrentMaxTimeID < NextTime )
{
CurrentMaxTimeID = NextTime ;
}
return NextTime ;
}
}
Chain performs the Singleton pattern, but not quite classically. First, when the first RNA object (the successor of the Chain) is created, through a specific constructor, the link to the instance is replaced. Those. this is not the only object at all, but the last one created. And actually, the link to the object is not provided. You can get only the current time Time. Thus, “old objects” from RNA may still exist, but not in time.
Arrays are created by the TimeHistory value. This is a relative value, the number of steps in the trajectory of states, which is important to have SIMULTANEOUSLY. In our example there could be two in general: Best and Current. But for optimality, slightly increased to 5.
Further, what is being done. You need to fully control access to RNA properties / methods in order to make it work in time. We use the “Mediator” pattern (but so that the using class does not understand this — that is, transparently)
RNA class is renamed to RNARealise. And instead of a real RNA, we create a “middleman”:
public class RNA
{
private RNARealise body ;
private int TimeID = 0 ;
private int GenerationID = 0 ;
public molecule [ ] molecules
{
get
{
body CheckTime ( TimeID, GenerationID ) ;
return body. Molecules ;
}
}
public RNA ( RNASeq argSeq )
{
body = new RNARealise ( argSeq ) ;
}
public RNA Clone ( )
{
body CheckTime ( TimeID, GenerationID ) ;
RNA NewRNA = ( RNA ) this . MemberwiseClone ( ) ;
NewRNA. body = NewRNA. body Clone ( this . GenerationID ) ;
NewRNA. TimeID = Chain. GetNextTimeID ( this . TimeID , this . GenerationID ) ;
NewRNA. GenerationID = Chain. Generation ;
return NewRNA ;
}
public void Refold ( int argPosition )
{
body CheckTime ( TimeID, GenerationID ) ;
body Refold ( argPosition ) ;
}
}
What is important here? An identifier is added to the object — the TimeID time stamp and the generation of GenerationID objects. And before each access to the property and before each time call we check whether the body.CheckTime object (TimeID, GenerationID) is in that time.
And consider how public cloning is emulated by public RNA Clone ().
in reality, only the mediator shell is cloned, in which a new time and generation stamp is assigned. Inside there will be another translation time clone NextTime (argGeneration). And at the level of molecule cloning, the initialization of the angles will be the previous value
for ( int i = 1 ; i < FullAngles. Length ; i ++ )
{
FullAngles [ i ] . SetInitial ( Chain. OldTime ) ;
}
That is significantly less than full cloning. And all the calling code doesn’t change at all - there, as was done, the cloning is done. The calculation had something like this.
public void BlockFolding ( )
{
RNA saveRNA = CurrentRNA. Clone ( ) ;
locScore = NucFluctuation ( saveRNA ) ;
// save the best state found
save ( ) ;
}
public double NucFluctuation ( RNA argRNA )
{
for ( int j = 1 ; j < FragmentCount ; j ++ )
{
RNA locRNA = argRNA. Clone ( ) ;
// Making a turn
Rotate. AtomAngleRotate ( locRNA ) ;
// Calculate the profitability of the turn
RNAScore. Score ( locRNA ) ;
}
}
Try to guess how it happens that angles are not confused at different times, although no one controls the time from above directly? (in general, I will leave it to the “homework” to the reader, if it is still difficult to understand, I will understand by the comments).