Dirty programming with a pure soul: the development of heuristic systems (part 2)

In the first part of this article, we talked about complex heuristic software systems, which I called dirty. In this part, we discuss some practical aspects of working with such systems.

We talked about the frightening complexity of heuristic systems. It's about life and death: either the complexity that you pay for improving the quality of the system is growing, or it is growing too fast. In the second case, even small improvements are made more and more difficult each time, and Achilles never gets to the tortoise. In the first case, there is a chance to have time to eat soup.

Rubber gloves

It is more convenient to protect oneself in the dirt, therefore, for the normal development of the system it is necessary to observe sanitary standards. Much here echoes the recommendations of my colleague, so I will describe the necessary measures briefly.
')
• Testing system. It is necessary for an integrated study of the behavior of the program, as well as to identify hidden patterns. Ideally, it should allow you to feel what is happening at your fingertips. Well, if the system is equipped with a set of metrics to identify various aspects of the behavior of the program under study. Get ready to spend serious resources on writing and developing a testing system, as well as on preparing and expanding reference databases. Use the testing system every time you change the program.

• Logging system. It is necessary to quickly and conveniently understand the subtleties of the program and the heuristics of the interaction in this particular case. Depending on the problem being solved, the logging system may have a different look, but it is worth expecting that it will be something more than a dump into a text file. Take care of your convenience, because with the analysis of flights you will have to deal often.

• Documentation system. All means are good here, but it seems that the most effective and everyday is the comments on the code. Do not be afraid to reflect. In comments to heuristics, it is necessary to describe the situations that you had in your head when writing them. For example, “we cannot do too much fine due to the fact that long separators are often decorative elements”, or “Rough protection against L-shaped false inserts. It also works in the case of very high U-shaped true frames. Consider sharing these cases. ”

• Communication with colleagues. Notify them about the changes you have made to the program and about what aspects they may affect.

• Focused. This is your difference from unsystematic evolution and a fundamental advantage. It is necessary to make changes intelligently, with an eye to the future. How to do this will be discussed below.

• Code quality. Someone from the great said: “it is necessary to argue transcendentally transcendentally clearly.” Whatever it means, dirty code is heavy to maintain. You can't write good unit tests for heuristics; errors are detected when it is often too late to correct them; the code is hard to read because it is difficult to fully understand what the situation implied the author. It is necessary to compensate for this with the increased quality of those aspects of the code that can be avoided.

Division of responsibility

In heuristic programming, the situation of adaptation of some components to undesirable features of the work of others, grinding components under each other is inevitable. Suppose there are two components A and B , indirectly affecting each other. Let with some change of component A, component B begins to work in the new conditions in which the defects of its work manifest themselves more strongly. As a result, the change may be unacceptable, even if it improves A (recall the example with the classifier from the first part of the article). Of course, in this case, changes should be made in B to make this component work better in the new conditions - but we live in the real world, where most of the tasks have deadlines, beyond which PM resorts in a not-so-good mood. As a result, only such changes are made in A under which B defects do not manifest themselves. A begins to implicitly adapt to the roughness and irregularities of B. After this, defects B will not be easily corrected: A is already adapted to the unpleasant features of B , and a change in the status quo will disrupt its operation. If you need to create a C component that directly uses B , then it may be easier to write C , explicitly taking into account B's errors and correcting them yourself, than correcting Obstinate B. There was a new addiction, and now the chances of improving B have become very slim. With a generous hand let's draw the components D , E and so on to the picture, and also recall their interactions with each other. It is easy to come to a situation where any change in the system turns out to be harmful.

The task of the programmer is to try to minimize the situation of such unwanted rubbing. To do this, you must constantly monitor the implicit dependencies between components. Carefully identify and analyze the changes made by the side effects and decide how critical they are for the system to work in the long term. Do not make changes that have a positive effect, but “blab” logic and complicate the interaction. Dirty components are by definition a blurred area of responsibility. Your task is to keep them within the limits and, if possible, not to allow them to do someone else’s work or simply not characteristic of their spirit.

Deep penetration

For example, in some case the system did not work perfectly. It is possible to follow the work of the components from the chain from the end to the beginning, identify the incorrectly worked components and correct them. But perhaps the real problem was hidden in a completely different place. The component implicitly assumed that some other parts of the system would create conditions for its best performance, but such conditions were not created. How could you even guess that such parts of the system exist and that they should have worked out exactly like this?

Unfortunately, there is no better answer than “one should have known”. It follows that in order to properly configure the system, it is necessary to know all the implicit agreements on the operation of components affecting the process. If you didn’t know something, an immediate catastrophe will not happen - you only complicate the interactions more than you could, which, as we know, is fraught with the premature death of the heuristic system.

Hence the excessive demands on the training of developers embarking on work on dirty projects. A programmer who has not studied the system deeply enough is inclined to harm it, even if he writes excellent code that improves the quality of heuristics. It is ridiculous: when I first started working with our document analysis system, I ... no, I hesitate to tell about it.

Unravel the connection

Divide and Conquer is one of the powerful principles for managing any complex system. Having a compact but strongly connected system is much worse than having a system that is more cumbersome and voluminous, but with interactions covered by imagination. Necessary evil when unraveling the coil of bonds - swelling of the system, increasing the amount of code.

Methods of dealing with complexity in general are well known, but in the heuristic programming world there are a number of effective techniques that are considered dirty in the ordinary world. Some of these techniques will be discussed below.

Once not copy-paste

In traditional programming, code duplication is rightly scolded. In heuristic programming, this technique sometimes turns out to be the least evil when unraveling the coil of interactions.

Recall that components in dirty programming have a fuzzy area of responsibility, and that any dependencies tend to deform this area. Suppose there is a component A , which is used by components B and C. It may turn out that B and C will tend to “pull” A in different directions (each of the two components needs something different from A ). In order for A to withstand the load, it will be necessary to complicate it (for example, to enter additional parameters). As a result, the complication may be a less acceptable solution than simply copying code A and turning it into components A_ for_B and A_ for_C . (By the way, copying is one of the main creative tools of evolution. Often, at first, the gene is simply duplicated by chance, and then one of its copies adapts to new tasks.)

The main risk when copying is that new changes are difficult to synchronize between copies. In the case of fuzzy components, this is often not required: you can allow copies to go free floating. Of course, it is not bad to bring useful ideas into all copies (in nature, this is called convergent evolution ), and forgetting to do so is unpleasant. This disadvantage is outweighed by an important advantage: each copy is “ground up” to its user much more closely. Above it was said about the harm of grinding components, but in case of a one-to-one ratio between the client and user, most of the harm is leveled, but heuristics can be made more expressive.

Copy-paste at the level of individual heuristics, not exceeding a couple of dozen lines of code, is often used with success. Copying large pieces of code is reasonable less often, but I have had to deal with justified copy-paste at the level of a subsystem of several classes.

Love for special occasions

A common programming error is that the solution does not take into account any special cases. The main defense against this is to try to make the decision the most common, so that particular cases flow out of it automatically. In heuristic programming, this is often not the best way to solve problems.

Suppose you need to develop a search module on the image of a certain type of pictures. Many of the most sophisticated and inhuman serial killers since childhood have been distinguished by pathological tendencies to cruel mockery of animals. This fact is not related to the subject of the article - I just wanted to make sure that you did not fall asleep. So, let's say you need to develop a search module on the image of a certain type of pictures. Initially it is assumed that the pictures can have any shape. Therefore, it is necessary to write a solution that does not impose requirements on the form of pictures. Nevertheless, the idea to implement together with this module also the search module of only rectangular pictures can be quite productive (despite the fact that the general solution is also designed to find rectangular pictures too). The reasons for this are as follows. First, in real images, the proportion of rectangular images is very representative. Secondly, the search module of rectangular images may be much easier to write, and you can count on an increase in the quality of work compared to the general solution. The third reason is less obvious. If we separately handle the most important, but also a simple case, the load on the main module will be seriously reduced. After some evolution, this module may, on average, be worse looking for rectangular pictures than it would have done, developing alone. But we should expect better quality of work in non-trivial cases.

Zoo making

One of the principles of the Python philosophy is: "there must be one - and, preferably, only one - the obvious way to do it." In dirty programming, a situation is normal when the same problem is solved in several ways, and at the same time.

In essence, this principle is a generalization of the principle of creating private solutions discussed above along with the general ones. The argument here is the same: reducing the load on the component allows it to better focus on what it is most powerful about.

Once the problem is solved in several ways, different solutions may appear. A popular technique is the creation of an independent component that performs the role of arbitrator and chooses the best solution among those represented by competitors. If such a judge works well, there is hope that at least one of the components will “shoot”.

The importance of the principle of diversity of solutions is difficult to overestimate, and it is widely used in practice. For example, it can be used to gradually get rid of components that have stalled in their development, i.e. reached such a degree of complexity that control over them can be considered lost. Instead of immediately abandoning a morally obsolete subsystem, you can gradually reduce its area of responsibility by introducing and developing components that partially replace it. In parallel, the problematic subsystem can be simplified by removing parts that have become superfluous from it. Sooner or later, the subsystem will become either manageable or painlessly withdrawn from the entire system.

Of course, the principle under discussion has side effects. Firstly, its use leads to serious swelling of the code. Secondly, competing solutions create nontrivial dependencies between the respective components, and the harm of such dependencies has already been discussed.

Conclusion

Of course, the term "dirty programming" is inaccurate (but the title of the article attracts attention). The code, called dirty here, should be treated with no less, but rather, with even more attention and trembling than with ordinary code. However, despite the pleasure in writing it, it does not come from a good life. There is a feeling that this is a kind of brute force in programming. Therefore, dirty programming is needed only for those tasks for which other solutions have not yet been invented.

Source: https://habr.com/ru/post/144913/

All Articles