📜 ⬆️ ⬇️

Genetic algorithm: optimal population size

In the previous essay ( Genetic algorithm: struggling with premature convergence ), the use of the mitosis operator was chosen as an effective method of dealing with premature convergence:

Each chromosome to which the mitosis operator is applied, deliberately produces at least one completely identical descendant itself.
In addition, the more the chromosome's fitness exceeds the average value of fitness over the entire population, the greater the likelihood that the chromosome will produce a second descendant.

With less adaptability, chromosomes form a pool for interbreeding. Pairwise crossing of chromosomes from the pool is carried out until the required total population size is reached (taking into account the chromosomes already created using the mitosis operator). The choice of chromosomes for crossing is made randomly by the roulette method (chromosomes with greater adaptability correspond to a larger sector on the roulette wheel). The population size remains constant throughout the operation of the genetic algorithm.
')
The proportion of chromosomes to which the mitosis operator is applied is inversely proportional to the average population's fitness value and directly proportional to the standard deviation of the population's fitness (with the dependence choosing logarithmic rather than linear). For the first generation, we set the proportion of the population that multiplies by mitosis, equal to 50%.


So the dependence of the result on the population size is as follows:
image
- the vertical axis shows the average values ​​of the fitness function of chromosomes obtained as a result of the genetic algorithm, where 1 is the value of the fitness function equal to the global maximum;
- along the horizontal axis, the population size values ​​are plotted, where 1 is the population size equal to N (the minimum required population size).
N = 1 + LOG2 (1 / (1-P ^ (1 / L))) where:
P is the required probability that the population will contain all the elements necessary for recombination;
L is the number of loci in the chromosome.
For this example, N = 20

It is clearly seen from the graph that in a situation of practically unlimited resources it is possible to consistently increase the population size until saturation is reached (corresponds to the achievement of the absolute maximum).

If resources are limited (in the end, it all comes down to time constraints), then the question of the effectiveness of their use becomes important.

Consider the question of what population size is the most advantageous from the point of view of the ratio of the achieved result / resources spent on the calculation.

Let us reformulate the problem in the following way: what is more advantageous to run a GA once with a population size of 2N or twice to launch an algorithm with a population size N and select the best result from the 2 obtained?
The results of the comparison can be seen in the following graph:
image

- the vertical axis shows the average values ​​of the fitness function of chromosomes obtained as a result of the genetic algorithm, where 1 is the value of the fitness function equal to the global maximum;
- the number of calculations of the fitness function is plotted on the horizontal axis, where 1 is N counts.
The number of calculations is defined as the size of the population (expressed in units of N), multiplied by the number of generations (until the algorithm stops) and multiplied by the number of runs of the algorithm.

The points marked x1 correspond to the size of the population N.
In this case, the first point corresponds to a single launch of the GA;
the second point corresponds to the 2nd GA launches and the choice of the maximum value from the obtained results, etc.

The points indicated by x2 correspond to a population size of 2N
etc.

It is clearly seen that a significant increase in efficiency is achieved only in the transition from a population of size N to a population of size 2N.
Further increasing the size of the population does not lead to any significant increase in efficiency.

Thus, in a tight time limit situation, an effective strategy would be to run a GA with a population size of 2N, if the time limit is not reached, then run the GA again and select the best result, etc. - until the limit is reached.
As a result, consistently improving the result until the time limit has been reached.
The number of launches until the limit is reached will constantly vary (both due to variations in the number of generations before the algorithm stops; and due to the fact that, depending on the CPU load by other tasks, the same processor time will differ in physical time).
Also, in the presence of such physical capacity, several launches are possible in parallel.

Source: https://habr.com/ru/post/124758/


All Articles