run_step
method, which is run in each delta_t
(full code): class Creature(object): INPUTS = 2 # z_setpoint, current z position OUTPUTS = 1 # PWM for all 4 motors STATE_VARS = 1 ... def run_step(self, inputs): # state: [state_vars ... inputs] # out_values: [state_vars, ... outputs] self.state[self.STATE_VARS:] = inputs out_values = np.dot(self.matrix, self.state) + self.constant self.state[:self.STATE_VARS] = out_values[:self.STATE_VARS] outputs = out_values[self.STATE_VARS:] return outputs
self.state
contains arbitrary values of unknown size that are passed from one step to another;self.matrix
and self.constant
contain the logic itself. By transferring the “correct” values to them, theoretically, we could get a perfectly tuned PID controller. They randomly mutate between generations.run_step
is called at 100 Hz (in the virtual simulation time interval). The generation consists of 500 creatures, each of which we test for 12 virtual seconds. Thus, each generation contains 600,000 run_step
executions. $ python -m ev.main Generation 1: ... [population = 500] [12.06 secs] Generation 2: ... [population = 500] [6.13 secs] Generation 3: ... [population = 500] [6.11 secs] Generation 4: ... [population = 500] [6.09 secs] Generation 5: ... [population = 500] [6.18 secs] Generation 6: ... [population = 500] [6.26 secs]
$ pypy -m ev.main Generation 1: ... [population = 500] [63.90 secs] Generation 2: ... [population = 500] [33.92 secs] Generation 3: ... [population = 500] [34.21 secs] Generation 4: ... [population = 500] [33.75 secs]
cpyext-avoid-roundtrip
brunch, the result is better than with CPython, but we'll talk about this in another post). $ pypy -m ev.main # using numpypy Generation 1: ... [population = 500] [5.60 secs] Generation 2: ... [population = 500] [2.90 secs] Generation 3: ... [population = 500] [2.78 secs] Generation 4: ... [population = 500] [2.69 secs] Generation 5: ... [population = 500] [2.72 secs] Generation 6: ... [population = 500] [2.73 secs]
np.dot(...)
+ self.constant
are between lines 1217 and 1456. Below is an excerpt calling np.dot(...)
. Most of the operators cost nothing, but in line 1232 we see the RPython function call descr_dot ; on implementation, we see that it creates a new W_NDimArray to store the result, which means it will have to do malloc()
:+ self.constant
- the call to W_NDimArray.descr_add
was embedded in JIT, so it is easier for us to understand what is happening. In particular, we see a call to __0_alloc_with_del____
, allocating W_NDimArray for the result, and aw_malloc, allocating the array itself. Then there is a long list of 149 simple operations that define the fields of the final array, create an iterator, and finally call acall_assembler
— this is the actual logic to perform the addition, which JIT 'or separately. call_assembler
one of the operations for making JIT-to-JIT calls:self.matrix
always (3, 2) - which means we do a lot of work, including 2 malloc()
calls for temporary arrays, just to call two functions that the total amount is 6 multiplications and 6 additions. Note that this is not the fault of JIT: CPython + numpy having to do the same, but in hidden calls C. class SpecializedCreature(Creature): def __init__(self, *args, **kwargs): Creature.__init__(self, *args, **kwargs) # store the data in a plain Python list self.data = list(self.matrix.ravel()) + list(self.constant) self.data_state = [0.0] assert self.matrix.shape == (2, 3) assert len(self.data) == 8 def run_step(self, inputs): # state: [state_vars ... inputs] # out_values: [state_vars, ... outputs] k0, k1, k2, q0, q1, q2, c0, c1 = self.data s0 = self.data_state[0] z_sp, z = inputs # # compute the output out0 = s0*k0 + z_sp*k1 + z*k2 + c0 out1 = s0*q0 + z_sp*q1 + z*q2 + c1 # self.data_state[0] = out0 outputs = [out1] return outputs
Creature.run_step
. $ python -m ev.main Generation 1: ... [population = 500] [7.61 secs] Generation 2: ... [population = 500] [3.96 secs] Generation 3: ... [population = 500] [3.79 secs] Generation 4: ... [population = 500] [3.74 secs] Generation 5: ... [population = 500] [3.84 secs] Generation 6: ... [population = 500] [3.69 secs]
Generation 1: ... [population = 500] [0.39 secs] Generation 2: ... [population = 500] [0.10 secs] Generation 3: ... [population = 500] [0.11 secs] Generation 4: ... [population = 500] [0.09 secs] Generation 5: ... [population = 500] [0.08 secs] Generation 6: ... [population = 500] [0.12 secs] Generation 7: ... [population = 500] [0.09 secs] Generation 8: ... [population = 500] [0.08 secs] Generation 9: ... [population = 500] [0.08 secs] Generation 10: ... [population = 500] [0.08 secs] Generation 11: ... [population = 500] [0.08 secs] Generation 12: ... [population = 500] [0.07 secs] Generation 13: ... [population = 500] [0.07 secs] Generation 14: ... [population = 500] [0.08 secs] Generation 15: ... [population = 500] [0.07 secs]
$ git clone https://github.com/antocuni/evolvingcopter $ cd evolvingcopter $ {python,pypy} -m ev.main --no-specialized --no-numpypy $ {python,pypy} -m ev.main --no-specialized $ {python,pypy} -m ev.main
Source: https://habr.com/ru/post/349230/
All Articles