From bubble sorting to genetic algorithms

This article is a brief overview of what genetic algorithms are. Being a novice in bioinformatics, I will start with things that are close to and understandable to people of technical specialties: sorting algorithms, then I will go on to describe one of the tasks of genetic programming and what biologists understand algorithms.

Sorting

I think that every programmer implicitly believes that he knows on this subject if not all, then almost everything. Perhaps, at the moment it is so, and the reader of these lines has studied the third volume of the Art of Programming by Donald Knut. But let's do an experiment: take your favorite algorithm and try to implement it with a pencil on a piece of paper. If there is no favorite algorithm, then I suggest sorting by bubble. Happened? I have repeatedly offered this task at interviews, but few of the proposed options turned out to be workers, and only a few turned out to be a “bubble”. Recently, I happened to see the lecture notes on Python, where the teacher gives the following code under the guise of bubble sorting:

def bubble_sort(numbers): for first in range(0, len(numbers)-1): for second in range(first+1, len(numbers)): if numbers[first] > numbers[second]: numbers[first], numbers[second] = numbers[second], numbers[first]

The algorithm is working, it will be sorted, but only this is clearly not a “bubble”. With some stretch, it can be called sorting by sampling, but even here there is an annoying oversight: the advantage of sorting by sampling is to minimize the number of substitutions, that is, the number of times when we swap the elements should be limited to O (n), which is not observed in the above code.

In a conversation on the topic of algorithms, it is interesting to reverse the question of quick sorting. As a rule, people easily answer the question of how good it is: this is both a good algorithmic complexity on average, and no need for additional memory (the stack is usually forgotten). But then it is interesting to ask, why is it bad, since, apart from her, there are many other algorithms? Answers to this question are much more scarce and monotonous. And what are the disadvantages of Quick Sort you can call?
')
To my satisfaction, sometimes it is possible to meet aesthetes who demonstrate knowledge of exotic methods, for example, gnome sort.
Conclusion:
1. If you want to pass for an interview with an esthete, study a couple of little-known sorting methods;
2. If you yourself accept interviews, learn to distinguish different methods from each other, and then the candidate following paragraph 1 is not enough ...

Pancake sorting

Following the tips above, we consider a problem that has no practical meaning, but is interesting from a theoretical point of view: pancake sorting. Its impracticality lies in the fact that this algorithm does not take into account the number of comparisons of elements at all (it is believed that these operations are very cheap and fast), and the only operation that has a price is to turn the top of a stack of sorted pancakes. Of course, initially they are arranged in a random order, and the desired result is a kind of tower of Hanoi, when pancakes of a larger diameter lie at the bottom and smaller ones are located at the top.

Regarding this task, I would like to note two interesting facts. First, the method of finding a fairly effective (although not optimal) solution was at one time proposed by the notorious Bill Gates while he was still a student. This algorithm, proposed in 1979, remained the most effective until 2008, when the result was surpassed. Secondly, as it was proved later, the problem of finding the optimal solution belongs to the class NP-complete. Also, Gates and his leader, Christ Papadimitriou, proposed a complicated version of the problem, known as the problem of burnt pancakes.

Burnt pancake problem

In this form of the task, each element, in addition to its size, also has a binary attribute “direction”, that is, each “pancake” has one side burnt and the other is rosy; the result of solving the problem is a stack, ordered by size, but, in addition to this, all the "pancakes" are burnt side down. Without going into details, let us say that this problem is also NP-complete, and that, in the general case, its sufficiently effective, but non-optimal solutions are known. However, there are restrictions on the data, under which the task becomes polynomial. For the problem in question, such data are simple permutations . To understand what it is, consider the permutation of numbers from 1 to 7: 2 6475 13. Note that the bold sequence is itself a permutation of numbers from 4 to 7 (called a block ). Simple permutations are those where there are no non-trivial blocks (blocks of length 1 and n).

Behind the figurative formulation of the problem, when the elements are represented by burnt pancakes, it is difficult to discern the fact that it has a biological significance. However, a common mutation is a revolution in the DNA molecule, and if you count the minimum number of revolutions necessary to convert one molecule to another (without considering smaller point mutations), you can roughly estimate the relationship of organisms. For example, the human and mouse genomes are only separated by about 120 global mutations; I confess, I used to believe that between these types of difference much more ...

Genetic algorithm for solving the problem of burnt pancakes

In comparative genomics, the algorithm is used for analytically finding the number of mutations that separate organisms, but look at the problem in another projection. Now we will declare the solution of the analytical task to be the goal, and the living organisms and the processes proceeding in them will be made an instrument for its solution.

Bacteria are known to be able to divide, ensuring an exponential growth of the population if they are provided with the necessary conditions; Of course, after a while the colonies of bacteria will no longer be enough nutrient medium, other factors will also appear that affect the growth of the colony, but this takes time. Experimentally, we can find out the average time it takes for the bacteria of a given species to develop a global mutation associated with the upheaval of a part of the DNA.

Now we set the task for the bacteria. We genetically modify one of them (biologists like to use E. coli in such experiments), where the antibiotic resistance gene will be broken down into several parts and mixed together, changing not only the order but also the direction of some pieces. Thus, each inverted and rearranged piece of a gene in our case will be a “burnt pancake”.

The task of the bacteria is set, you can start the experiment. We place it in a nutrient medium and wait for the allotted time, during which one mutation-coup of DNA is expected. I draw attention to the fact that we consider one mutation: it is unique not for the whole colony (there will be many of these mutations in the colony); this is just the expected number of coups in each of the living bacteria compared to their progenitor.

Is one pancake coupling enough to solve the problem? We can easily verify this by placing part of the colony in a dangerous environment for it. The bacteria placed in the antibiotic did not survive, and we continue to monitor the remaining ones. It will take another two or three periods, and, finally, the group of bacteria placed in the antibiotic will remain alive. "Collective Intelligence" coped with the task!

Results

Of course, the usefulness of the solved problem is unlikely to impress us: in actual experiments performed, the number of sorted “pancakes” does not exceed four, and the number of mutations occurring in the allotted time can be estimated only as probabilistic. And yet I personally was struck by the fantasy of those biologists who were able to set up the experiment; Those who could solve the traveling salesman problem by a biological method had no less fantasy (I will leave the details of this experiment beyond the scope of this article). In many respects, the complexity of the problems solved by genetic algorithms is comparable to quantum computing, and I want to believe that both directions of non-classical calculations will be able to produce results that are not yet achievable in modern conditions.

Source: https://habr.com/ru/post/212893/

All Articles