For the soul, experimenting with genetic algorithms.
I wanted to share my experience, observations and reflections in a small series of essays.
So, the first question over which I thought: what size of the population to choose.
If we formulate this question systemically, it will sound like this:
')
What is the principle, based on what criteria, to calculate the size of the population?
To answer this question, it’s worth deciding what genetic algorithms are:
In my opinion, this is such a street smart kind of brute force algorithms.
(street smart, since the efficiency of using a class of genetic algorithms for finding optimal solutions in systems with an enormous amount of possible combinations of elements is empirically confirmed by billions of years of the evolution of life on Earth)
Simplified mechanics of the genetic algorithm is as follows:
- we initiate the initial population - by randomly choosing a subset of possible values (chromosome population);
further actually the genetic algorithm itself:
- we calculate for each chromosome from the population the value of the fitness function (which is set in such a way that would characterize the parameter by which the search for the optimal solution is carried out);
- choose the chromosomes that will participate in the crossing (the greater the value of the chromosome fitness function, the more likely it is to take part in the crossing)
- we form the next generation with the help of a genetic crossing operator, i.e. we carry out pairwise recombination of the original chromosomes (the influence of the mutation operator is aside for the time being)
And so we continue cyclically (more or less consistently improving the population according to the optimized trait) until we find the chromosome with the maximum value.
For simplicity, take as an example the chromosome consisting of L genes, in which the alleles of each gene are the values {0, 1}
Theoretically, chromosome populations may be within [1, 2 ^ L],
where the population size 2 ^ L is the limiting case in which the genetic algorithm degenerates into a brute force algorithm (the situation when the solution is found without the use of genetic operators, since the fitness function is calculated for all possible values);
and the population size 1 is a degenerate case (a random choice of any value and voluntaristly declaring it optimal).
This degenerate case may well be applicable when having any answer (even not optimal) - much better than no answer at all.
By the way, the scope of applicability of such an approach is well characterized by an anecdote:
Chapaev Petka: devices?
Petka: twenty
Chapaev: what is twenty?
Petka: and what devices?
Let the size of our chromosome population be N
Since we produce recombinations on a subset, but our goal is to find the optimal value for the whole set, the set of chromosomes must contain all the values necessary for recombining any of the elements of the whole set.
Those. The population size must be such that for each gene (in our case, the locus), all possible allele values are represented in the entire population.
Let me explain with an example: let L = 5, N = 4, and at some point the population looks like this:
10110
10011
11,100
10001
So:
- the first locus in all four chromosomes is represented only by allele 1;
- the second locus is represented by three values of 0 and one value of 1;
- the third locus is represented by two values of 0 and two values of 1;
- the fourth locus is represented by two values of 0 and two values of 1;
- the fifth locus is represented by two values of 0 and two values of 1.
Accordingly, if the optimal value is reached for chromosome 01010, then no matter how much the genetic algorithm works (without the use of the mutation operator), this value will never be found.
Now let's calculate the probability that a random set of chromosomes will contain all the necessary elements:
The probability that one of the loci will contain in all chromosomes only 1 is equal to (1/2) ^ N
The probability that one of the loci in all chromosomes will contain only 0 is equal to (1/2) ^ N
It is obvious that in all other cases there will be a full set of {0,1}
Thus, the probability for one locus is: 1 - (1/2) ^ N - (1/2) ^ N = 1 - (1/2) ^ (N-1)
Accordingly, the probability that all L loci will contain a full set of alleles each:
P1 = (1 - (1/2) ^ (N-1)) ^ L
Now we express N (the number of chromosomes in the population) through P (the probability that the population will contain all the elements necessary for recombination) and L (the number of loci in the chromosome)
N = 1 + LOG2 (1 / (1-P1 ^ (1 / L)))
For example, for L = 100 and P1 = 0.999 (that is, 99.9%) we get the required population size N = 18 (with the total number of possible different chromosomes ~ 10 ^ 30)
Note that no matter how large we take the value of P1, the probability that any of the iterations will result in such a set of chromosomes from which any possible chromosome cannot be obtained by recombination is non-zero.
This is where the need for a mutation operator comes into effect.
The fact is that in itself it is rather harmful, since we cross breeding with an eye to the values of the fitness function of the crossed chromosomes (i.e., we recombine the best-fit sets) and by this we achieve a population fitness increase, but the mutation operator is absolutely chaotic.
(It is worth noting that in nature damage to chromosomes, which can lead to mutations, is a fairly frequent phenomenon, but effective repair mechanisms for the overwhelming majority of these damages are developed in cells. So, as a result, mutations are a very rare phenomenon and this is not accidental)
But in the “homeopathic doses” the mutation operator turns out to be useful precisely because of the impossibility of maintaining a set of 100% sufficient for recombinations.
The probability that a locus will contain either only 1 for all chromosomes or only 0 is equal to (1/2) ^ (N-1)
We will try to express to some extent the necessary (to restore the lost diversity) probability of mutation P2
So, P2 is the probability of a single chromosome mutation.
Then the probability of mutation of a particular chromosome locus is P2 / L.
The probability that a given chromosome locus does not mutate is 1 - P2 / L
The probability that this locus does not mutate on any of the chromosomes is (1 - P2 / L) ^ N
Accordingly, the probability that a given locus mutates in at least one of the chromosomes is 1 - (1 - P2 / L) ^ N
For reliability, it is necessary that the probability of a mutation of a given locus in at least one of the chromosomes is an order of magnitude greater than the probability that any of the values (0 or 1) are absent in a given locus in all chromosomes.
Moreover, the fact of mutation does not mean that we are dealing with the mutation we need.
Thus, we have the inequality:
1 - (1 - P2 / L) ^ N> 10 * (1/2) ^ (N-1)
Transforming it, we get the following value:
P2> L * (1- (1-10 * (1/2) ^ (N-1)) ^ (1 / N))
Focusing on the lower limit of inequality, we obtain the following formula for calculating the chromosome mutation probability:
P2 = L * (1- (1-10 * (1/2) ^ (N-1)) ^ (1 / N))
For N = 18 (calculated by us, based on the values of L = 100 and P1 = 99.9%), we get P2 ~ 0.05%
Those. for chromosomes of length L = 100 loci, we obtain the minimum required population size N = 18 and the necessary probability of mutation of one chromosome P2 = 0.04%
For L = 1000, we get N = 22 and P2 = 0.03%
Update:It should be noted that the above calculations of the population size and mutation frequency set only the necessary conditions for the stable performance of the genetic algorithm (that is, if these conditions are not met, the algorithm will be unstable or the algorithm will not work in principle - it will not even reach local optima).
To assess the adequacy of conditions, it is necessary to take into account the factor of convergence / rate of convergence to local / global optima (this is the topic of the next essay - I think that in a week - two).