Sometimes you spend the day trying to explain to the chief accountant without using the terms “recursive call” and “idiots” why a simple change in the accounting system is actually delayed for almost a week due to a spelling mistake made by someone in the 2009 code. On such days, I want to wrestle the hands of the wise man who created this world and rewrite everything from scratch.

TL; DR
Under the cut, the story of how I, as a practice to learn Python, develop my library for
agent-based modeling with machine learning and gods.
')
Link to github. To work out of the box you need
pygame . For an introductory example,
sklearn is required.
The origin of the idea
The idea to build a bicycle, which I will talk about, appeared gradually. First, the popularity of the machine learning topic has not bypassed me.
Several courses on the cursor gave a deceptive sense of belonging. Several open contests and registration for kaggle slightly corrected self-esteem, however, they did not cool enthusiasm.
Secondly, being a representative of the
untouchable caste of the domestic IT community, I rarely have the opportunity to practice my favorite Python. And from smart people I heard that my project in this regard is just what we need.
But the impetus was the disappointment of No Man's Sky. Technically smart idea, but the procedurally generated world was empty. And like any disappointed fan, I began to think what I would do if they asked me. And he came up with the idea that the world was empty because there is actually very little intelligent life in it. The endless expanses, the habit of relying only on oneself, the joy of the discoverer is all, of course, good. But there is not enough opportunity to return to the base, look around the market, find out the latest gossip in the diner. Deliver the parcel and get your 100 gold for it, after all. It is clear that any city, any dialogue or quest in games is the fruit of the labor of a living person and it is not possible to populate such a huge world with human forces. But what if we could also procedurally generate NPCs with their needs, small stories and quests?
Plan in general
This is how the idea of a library appeared, or even, if you will, a framework that would have the following usage scenarios:
- Classical agent-based modeling (the existence of which I learned only when I sat down to write this article). We create the world, describe the actions of agents in this world, look what happened, change some parameters, run the simulation again. And so in a circle, until we find out how changes in the actions of individual agents affect the overall picture. Very useful stuff.
- Training with reinforcement (it is also reinforced learning ). Building learning models that adapt to interact with a specific environment. A simple example is learning a game whose rules you do not know, but at any time you can get information about the status of the party, choose one of a specific set of actions and see how it affected the number of points you earned (the competition on this topic , however, has already ended ). There are many differences from the usual models of classifiers or regressions. This is a possible delayed outcome, and the need for planning, and many other features.
- And finally, after we create the world and populate it with living creatures that are reasonable and not very good, it would be nice to be able to go there personally, grabbing your faithful blaster, favorite sword, multi-functional pickaxe or red nail puller.
Few technical details
So, first we need to decide on the low-level physics of our world. It should be simple, but flexible enough to simulate different situations:
- We take as a basis the usual cellular automaton - a two-dimensional rectangular discrete world, each object of which occupies one Planck length in a square. Distances shorter than the Planck length will not make sense - you cannot place an object between two cells, you cannot arrange it so that it takes up more than one cell, not even completely.
- We will measure the distance in steps in only four directions, that is, the cells next to each other will have 4, not 8. There will be 2 steps in each diagonal direction.
- To slightly dilute the solidity of the resulting structure, add a little depth: each object will have a sign of permeability. For the same spatial coordinates in the world there can be at least one, but not more than two objects: passable and / or impassable. You can think of it as a surface on which objects stand and on which objects move. Types of surfaces are different, types of objects, too. It is possible to put a curbstone (impassable object) on the carpet (passable object). But you can not lay linoleum on laminate (because who does this at all?) And you can not put a chair on the cabinet.
- But in the cabinet can be stored different items. And in the carpet can, and in the pockets of active objects, too. That is, any object can be a container for items. But not for other objects, otherwise we will violate the third law.
- Time also goes discretely. Each step, each object lives one Planck time, during which it can receive information about the world around it as of this epoch from outside. Now this is the weakest point - the objects have to act in turn, because of this some rassinhron is obtained. Objects to which the “move” comes later have to take into account the state of objects that have already “walked” in this era. If you allow objects to focus only on the beginning of an epoch, then this can lead to two impassable objects, for example, taking up the same free cell at the beginning of an era. Or they will take the same sock out of the dresser. This can be leveled out a little by addressing the objects of each epoch in a random order, but this approach does not solve the whole problem.
This gives us a few necessary basic objects: the world itself (Field), the object of this world (Entity) and the object (Substance). Hereinafter, the code in the article is just an illustration. You can view it fully
in the library on github.
Entity classes with examplesclass Entity(object): def __init__(self):
Field class class Field(object): def __init__(self, length, height): self.__length = length self.__height = height self.__field = [] self.__epoch = 0 self.pause = False for y in range(self.__height): row = [] self.__field.append(row) for x in range(self.__length): if y == 0 or x == 0 or y == (height - 1) or x == (length - 1): init_object = Block() else: init_object = Blank() init_object.x = x init_object.y = y init_object.z = 0 row.append([init_object])
The Substance class does not make sense to describe, it has nothing.
For the time we will meet the world itself. Each era, he will poll all objects in it and force them to make a move. How they will make this move is their business:
Time forward! class Field(object): ... def make_time(self): if self.pause: return for y in range(self.height): for x in range(self.length): for element in self.__field[y][x]: if element.z == self.epoch: element.live() self.__epoch += 1 ...
But why do we need peace, and even with the planned opportunity to place a protagonist in it, if we cannot see it? On the other hand, if you start to deal with graphics, you can get distracted and the world rule will be postponed indefinitely. Therefore, without wasting time, we master
this wonderful article about writing a platformer using
pygame (in fact, we only need the first third of the article), give each object a color sign, and now we already have some sort of map.
Visualization code class Field(object): ... def list_obj_representation(self): representation = [] for y in range(self.height): row_list = [] for cell in self.__field[y]: row_list.append(cell[-1]) representation.append(row_list) return representation .... def visualize(field): pygame.init() screen = pygame.display.set_mode(DISPLAY) pygame.display.set_caption("Field game") bg = Surface((WIN_WIDTH, WIN_HEIGHT)) bg.fill(Color(BACKGROUND_COLOR)) myfont = pygame.font.SysFont("monospace", 15) f = field tick = 10 timer = pygame.time.Clock() go_on = True while go_on: timer.tick(tick) for e in pygame.event.get(): if e.type == QUIT: raise SystemExit, "QUIT" if e.type == pygame.KEYDOWN: if e.key == pygame.K_SPACE: f.pause = not f.pause elif e.key == pygame.K_UP: tick += 10 elif e.key == pygame.K_DOWN and tick >= 11: tick -= 10 elif e.key == pygame.K_ESCAPE: go_on = False screen.blit(bg, (0, 0)) f.integrity_check() f.make_time() level = f.list_obj_representation() label = myfont.render("Epoch: {0}".format(f.epoch), 1, (255, 255, 0)) screen.blit(label, (630, 10)) stats = f.get_stats() for i, element in enumerate(stats): label = myfont.render("{0}: {1}".format(element, stats[element]), 1, (255, 255, 0)) screen.blit(label, (630, 25 + (i * 15))) x = y = 0 for row in level: for element in row: pf = Surface((PLATFORM_WIDTH, PLATFORM_HEIGHT)) pf.fill(Color(element.color)) screen.blit(pf, (x, y)) x += PLATFORM_WIDTH y += PLATFORM_HEIGHT x = 0 pygame.display.update()
Of course, later it will be possible to write a somewhat more intelligible visualization module, but not one. But while the colorful running squares are enough to immerse themselves in the atmosphere of the emerging life. In addition, it develops a fantasy.

Now you need to think about how active agents will act. First, all significant actions will be objects (Python objects, not objects of the world, I apologize for ambiguity). So you can keep history, manipulate their state, distinguish one action from another, even if they are of the same type. So, the actions will look like this:
- Every action must have a subject. The subject of action can only be the object of our world (Entity).
- Every action must have results. At a minimum, “completed / not completed” and “goal achieved / goal not achieved”. But there may be additional ones, depending on the type of action: for example, the action “Search for the Next Pizzeria” may have, in addition to the required ones, the coordinates or the pizzeria object as results.
- Each action may or may not have a set of parameters. For example, the action “Pouring a CupCoffee” may not have parameters, since it does not require clarification, while for the action “Pour” you need the opportunity to clarify what to pour and where.
- Action may be instantaneous or non-instantaneous. During one epoch, one object can perform no more than one non-instantaneous action and any number of instantaneous. This is a controversial point - if we have discretely space and we cannot move half a cell, then the ability to perform an unlimited number of actions during one epoch looks strange and blurs a clear discrete course of time. There was also an idea to ask each type of action the time that it is necessary to spend on it, ranging from 0 to 1, where the action of a duration of 1 takes the entire era. While I stopped at the version with the sign of instantaneousness, since for the definition of discrete time, all actions necessary for the simulation can always be made not instantaneous, but the option with the duration makes it all too complicated.
Thus, from a technical point of view, an action object (Action) is some kind of function, which can be set parameters, execute, get a result, and which itself in itself stores both the parameters passed to it, and the result, and everything that associated with its implementation, starting with the one who caused it, and ending with the state of the world around during its implementation. Therefore, we can create it at one time, set the parameters in another, execute the parameters in the third, get the return value and put it on the shelf for further analysis.
Action object class Action(object): def __init__(self, subject): self.subject = subject self.accomplished = False self._done = False self.instant = False def get_objective(self): return {} def set_objective(self, control=False, **kwargs): valid_objectives = self.get_objective().keys() for key in kwargs.keys(): if key not in valid_objectives: if control: raise ValueError("{0} is not a valid objective".format(key)) else: pass
If someone besides me suddenly wants to create a cozy little world for himself with the help of this library, then it is assumed that out of the box it will contain a set of necessary low-level actions - go to the coordinates, follow the object, find a specific object or object, pick up an object, etc. d. These actions can be used both on their own and combined to produce some complicated manipulations. An example of such complex actions will be further, in the description of the first experiment.
Secondly, every self-respecting active agent should be able to plan their actions. Therefore, we divide the phase of its activity during the epoch into 2 stages: planning and action. As a planning tool, we will have a simple lineup of actions that we are going to consistently perform. However, if we already have a plan, then there is nothing to ponder once more, we must act quickly, decisively. It turns out that at the beginning of the move, the active object determines whether it is necessary to plan for this move (for a start, we will assume that it is necessary when the action queue is empty), then plans if it decided that this is necessary, and at the end performs actions. Should planning, as a serious process, which does not tolerate haste, take up the whole course - a debatable question. For my own purposes, I have stopped at the fact that there is no - my agents do not think for a long time and start to fulfill the plan on the same turn.
Planning and action class Agent(Entity): ... def live(self): ... if self.need_to_update_plan(): self.plan() if len(self.action_queue) > 0: current_action = self.action_queue[0] self.perform_action(current_action) while len(self.action_queue) > 0 and self.action_queue[0].instant: current_action = self.action_queue[0] self.perform_action(current_action) def need_to_update_plan(self): return len(self.action_queue) == 0 def perform_action(self, action): results = action.do_results() if results["done"] or not action.action_possible(): self.action_log.append(self.action_queue.pop(0)) return results ... ...
In addition to this, it seemed to me convenient to introduce such an entity as a state of an object that could influence its actions. After all, the agent can be tired, not in the mood, get wet, get poisoned or vice versa, be cheerful and full of energy. Sometimes even at the same time. Therefore, we add to our objects an array of states, each of which will affect the object at the beginning of an epoch.
Status code class State(object): def __init__(self, subject): self.subject = subject self.duration = 0 def affect(self): self.duration += 1 class Entity(object): def __init__(self): ... self._states_list = [] ... ... def get_affected(self): for state in self._states_list: state.affect() def live(self): self.get_affected() self.z += 1 self.age += 1 ...
For modeling and training, it is necessary to be able to assess how well we have written an action algorithm or chose a training model. To do this, we add a simple simulation and evaluation module with the ability to describe a method for determining the end of the simulation and collecting the results.
Like that import copy def run_simulation(initial_field, check_stop_function, score_function, times=5, verbose=False): list_results = [] for iteration in range(times): field = copy.deepcopy(initial_field) while not check_stop_function(field): field.make_time() current_score = score_function(field) list_results.append(current_score) if verbose: print "Iteration: {0} Score: {1})".format(iteration+1, current_score) return list_results
At this stage, everything, in principle, is ready to close the first scenario of using our library: modeling, if we don’t want to train agents, but want to prescribe the logic of their actions independently. The procedure in this case is as follows:
- We decide which static objects we want to see in the world: walls, mountains, furniture, types of surfaces, etc. We describe them by inheriting the Entity class. We do the same with objects and the Substance class.
- Create a world of the right size, fill it with a landscape of these objects and objects.
- We inherit the class Action and describe all the actions we need. We do the same with the State class and states if we need them for our simulation.
- We create a class of our agents, inheriting Agent. We add service functions to it, we describe the planning process.
- We inhabit our world with active agents.
- To debug actions and enjoy the contemplation of your creation, you can drive visualization.
- And in the end, having played enough with visualization, we start the simulation and evaluate how well the agents we created play according to the rules we create in the world we created.
Proof of concept i
So, we announce the conditions of the first experiment.
- The world consists of: walls, earth. Walls - just impassable walls, nothing else. With the land it is more interesting - every era there is a non-zero probability that a unit of resource will appear in any or several cells of the earth.
- Population: creatures of two sexes. Since, for simplicity, we will store the floor in a logical variable, the floor may be false or true.
- Creatures of the false sex are greedy and have the purpose of their life to collect as much resource as possible. As soon as they appear, they find the nearest cell with the resource, go to it, collect the resource and so on in a circle. However, it is they who are endowed with the ability to bear children.
- Creatures of the true sex are somewhat more diverse. They have a choice of two actions: also collect a resource or look for a partner for mating (naturally, of the opposite sex, so as not to accidentally land in places where there are few resources, and it’s better not to even think about possible partners for mating).
- When the creature of the true sex that decides to mate catches up with the chosen partner and invites him to retire, the creature of the false sex decides whether it is located for mating, according to certain rules based on the amount of the resource from both participants. If the provider has more resources, he gets consent. If less or the same, then the probability of pairing depends on the difference in the amount of resources.
- Ten epochs after conception, a creature of random sex is born. It is born immediately, an adult, and acts in accordance with the rules for its gender.
- All creatures must die. Every being, every epoch, beginning with the tenth after its birth, has a non-zero and constant probability of ceasing its active existence.
Our task will be to write a planning procedure for creatures of the true sex in such a way that the population of creatures reproduces as quickly as possible.
Out of gratitude, I will not tire out reading to this place with long illustrations, I will show only:
A variant of the implementation of integrated action on the example of mating class GoMating(Action): def __init__(self, subject): super(GoMating, self).__init__(subject) self.search_action = SearchMatingPartner(subject) self.move_action = MovementToEntity(subject) self.mate_action = Mate(subject) self.current_action = self.search_action def action_possible(self): if not self.current_action: return False return self.current_action.action_possible() def do(self): if self.subject.has_state(states.NotTheRightMood): self._done = True return if self.results["done"]: return if not self.action_possible(): self._done = True return first = True while first or (self.current_action and self.current_action.instant) and not self.results["done"]: first = False current_results = self.current_action.do_results() if current_results["done"]: if current_results["accomplished"]: if isinstance(self.current_action, SearchMatingPartner): if current_results["accomplished"]: self.current_action = self.move_action self.current_action.set_objective(**{"target_entity": current_results["partner"]}) elif isinstance(self.current_action, MovementXY): self.current_action = self.mate_action self.current_action.set_objective(**{"target_entity": self.search_action.results["partner"]}) elif isinstance(self.current_action, Mate): self.current_action = None self.accomplished = True self._done = True else: self.current_action = None self._done = True else: break def check_set_results(self): self.accomplished = self._done
And the planning option that I decided that the model works class Creature(Agent): ... def plan(self): nearest_partner = actions.SearchMatingPartner(self).do_results()["partner"] if nearest_partner is None: chosen_action = actions.HarvestSubstance(self) chosen_action.set_objective(** {"target_substance_type": type(substances.Substance())}) self.queue_action(chosen_action) else: self_has_substance = self.count_substance_of_type(substances.Substance) partner_has_substance = nearest_partner.count_substance_of_type(substances.Substance) if partner_has_substance - self_has_substance > 2: self.queue_action(actions.GoMating(self)) else: chosen_action = actions.HarvestSubstance(self) chosen_action.set_objective(**{"target_substance_type": type(substances.Substance())}) self.queue_action(chosen_action) ...
About machine learning and gods
Making sure that the simple modeling works, we will begin to increase the degree of fun and add the possibility of machine learning. At the time of this writing, not all of the planned capabilities are implemented, however, I promised to tell you about the gods.
But first we need to decide how we want to train our creatures. Take the same task with resource searches and pairing. If we solved it in the traditional way, then first we would have to decide on a set of signs, based on which we plan to make decisions. Then, acting randomly or somehow, collect the training and test datasets and save them. You should train a couple of models on these datasets, compare them and choose the best one. Finally, rewrite the planning process using this model, run the simulation and see what happens. And here we would have thought of using a new feature, which means reassembling the data, overtraining the models, re-aligning them with each other and restarting, in order to reexamine what will happen again.
And what would we like ideally? Ideally, I would like to define a set of features, configure the training model and run a simulation that would already assemble the datasets, train the model, connect it to the planning process and give us ready results of several runs that we could compare with the results of other models or other sets signs.
And that's how I imagine it:
- In order to gain datasets and make decisions, we need to declaratively describe the receipt of features on which we will train and which we will use to predict the model in the planning process. Now I implemented it as an array of functions, each of which returns the value of a particular attribute. For our test task, this is, for example, a function that returns the presence of possible partners, another one that considers the amount of a resource in a decision maker, a distance to the nearest resource, etc. Perhaps it would have been more successful to have a function that returns an array of attributes, but at the time of writing, the array of functions seemed more convenient to me. Thus, we get the opportunity at any time to get a description of the world around us.
- We also declaratively describe the method of obtaining the result of interest to us. In the case of our example, this is, say, mating successfully or unsuccessfully.
- Specify the training model that we want to use, and its parameters. For example, a stochastic gradient descent, a random forest, some kind of neural network, or anything at all.
- Run the simulation. First, every time when it is necessary to make a decision, the creatures, according to the rules described by us, receive an array of signs and choose some kind of action without using the model (which is still empty). Having performed this action, they again, according to the rules described by us, determine the result, connect it with a description of the environment in which the decision was made (set of signs obtained before the action), and voila, we have a sample ready for the training dataset. Having done this a certain number of times and having accumulated enough data, the creatures finally feed the training dataset into the model. After the model is trained, they begin using it in the planning process.
- Further options are possible. You can stop at this. And you can continue to memorize the sets of signs and the results of actions and feed them with pieces of the model, if the model allows you to complete your education. Or, for example, overtraining it if the percentage of desired results starts to decrease.
Here we need a few new objects. First, creatures should have some kind of memory in which they will add their datasets. She should be able to separately memorize a set of signs. Separately attach to them the result of the decision made with this set of features. Return to us in a convenient way. Well, and forget all that we were taught in high school.
Secrets of memory class LearningMemory(object): def __init__(self, host): self.host = host self.memories = {} def save_state(self, state, action): self.memories[action] = {"state": state} def save_results(self, results, action): if action in self.memories: self.memories[action]["results"] = results else: pass def make_table(self, action_type): table_list = [] for memory in self.memories: if isinstance(memory, action_type): if "state" not in self.memories[memory] or "results" not in self.memories[memory]: continue row = self.memories[memory]["state"][:] row.append(self.memories[memory]["results"]) table_list.append(row) return table_list def obliviate(self): self.memories = {}
Secondly, we need to teach agents to get jobs and to remember the environment and the results of their actions.
Getting jobs class Agent(Entity): def __init__(self): ... self.memorize_tasks = {} .... ... def set_memorize_task(self, action_types, features_list, target): if isinstance(action_types, list): for action_type in action_types: self.memorize_tasks[action_type] = {"features": features_list, "target": target} else: self.memorize_tasks[action_types] = {"features": features_list, "target": target} def get_features(self, action_type): if action_type not in self.memorize_tasks: return None features_list_raw = self.memorize_tasks[action_type]["features"] features_list = [] for feature_raw in features_list_raw: if isinstance(feature_raw, dict): if "kwargs" in feature_raw: features_list.append(feature_raw["func"](**feature_raw["kwargs"])) else: features_list.append(feature_raw["func"]()) elif callable(feature_raw): features_list.append(feature_raw()) else: features_list.append(feature_raw) return features_list def get_target(self, action_type): if action_type not in self.memorize_tasks: return None target_raw = self.memorize_tasks[action_type]["target"] if callable(target_raw): return target_raw() elif isinstance(target_raw, dict): if "kwargs" in target_raw: return target_raw["func"](**target_raw["kwargs"]) else: return target_raw["func"]() else: return target_raw def queue_action(self, action): if type(action) in self.memorize_tasks: self.private_learning_memory.save_state(self.get_features(type(action)), action) self.public_memory.save_state(self.get_features(type(action)), action) self.action_queue.append(action) def perform_action_save_memory(self, action): self.chosen_action = action if type(action) in self.memorize_tasks: results = self.perform_action(action) if results["done"]: self.private_learning_memory.save_results(self.get_target(type(action)), action) self.public_memory.save_results(self.get_target(type(action)), action) else: results = self.perform_action(action) ...
, , , - , , , , , , . , , - -, , , .
Demiurge, . , , insert_object. , :
class Demiurge(object): def handle_creation(self, creation, refuse): pass class Field(object): def __init__(self, length, height): ... self.demiurge = None ... def insert_object(self, x, y, entity_object, epoch_shift=0): if self.demiurge is not None: refuse = False self.demiurge.handle_creation(entity_object, refuse) if refuse: return assert x < self.length assert y < self.height self.__field[y][x][-1] = entity_object entity_object.z = self.epoch + epoch_shift entity_object.board = self entity_object.x = x entity_object.y = y ...
:
- ( ) . , .
- — , , , . , , ,
.
- ( ) , ( , , ). , , .
. — - . . — , , . , — . — , , .
Proof of concept II
, . , , . , .
:
- (Priapus).
- , . , , SGDClassifier sklearn .
- handle_creation.
- , : , . , , , .
- : , (, , sklearn , ?). ( 20 , ), , . , . , .
- 30 500 500 . , , .
- .
Conclusion
, , - . ( ) . :
- . .
- (/ , , / ..). — , - -.
- .
- .
- , , .
- Optimization
- ...
- PROFIT
, , , -, , -, . , , - , -, , . , , . , .
github.
pygame sklearn , .