The Internet with immunity, or why God does not play Lego

Most programmers are constantly learning. We read the books of the gurus and look at the code of professionals. And I argue which method is better and which solution is more beautiful.
But let's imagine that there is a super professional, whose code we managed to see. What can we learn from him? And what conclusions can we draw?

So - artificial immune systems.

The real immune system is an extremely complex and completely incomprehensible thing, consisting of cells with long-term memory, whose weight is comparable to the weight of the human brain.
Describing the entire immune system is expensive, and a detailed article would be more suitable for a portal suffering from insomnia than for Habr. Therefore, I’ll go over the key points very superficially, skipping many of the less critical in terms of in silico reproduction (although very important in vivo).

The immune system itself (hereinafter the IP) is usually divided into two parts - innate and adaptive. However, nature is a poor system architect, and does not like levels of abstraction. But he likes to use inherited solutions. Human IP is a conglomeration of ancient algorithms and newer developments and is not always easy to separate.
Therefore, we try to first list the objects and processes.
')
1. Lymphocytes. Cells form the basis of the immune system, an abstract base class for special cell versions (B-cells and T-cells). Some types of lymphocytes carry on their surface a molecular detector (Major histocompatibility complex, MHC), which is responsible for binding to a protein molecule (the closest analogy is a pair of keys and a lock). This detector may be unique within the system. Many types of lymphocytes can change their type during maturation, have states that switch under the influence of various activators.
1.1. T lymphocytes. Cells numbering many subspecies and specializations. Various types are inherited from this base class, which can change their specialization.
1.1.1 T-killers (Tk), killer cells. Established executioners in the service of total control, cleaning the territory of their own. They destroy cancer cells, damaged cells, or cells infected with a virus.
1.1.2 T-helpers (Th). Cells that activate other cells and carry information.
1.2. B-cells. Cells responsible for the adaptive response of the organism. During their lifetime, B-cells can change their genome and grow different detectors at different stages of development.
1.2.1 Plasmatic B-cells. The main supplier of antibodies.
1.2.2 Cells of long-term memory carry information about the mechanism of production of antibodies.
2. Antigen-presenting cells (APC). A group of cells, which include macrophages and dendritic cells (DC). Their purpose is to “show” to other cells of the immune system protein fragments, digests formed from cellular proteins by splitting them into fragments. The fragment binds to the MHC and is exposed on the cell membrane. In fact, this is an imitation of the "lock" for the selection of the key.
2.1. DC, dendritic cells. It is with them that the concept of a “theory of danger” is primarily associated. Most of their lives, they are inactive, but in case of problems they become more active and start working.
3. Antigen (Ag). That part of the external threat that is detected by the immune system. These are usually proteins or toxins.
4. Antibody. Proteins of a special Y-shape with the ability to bind to the antigen, they are also immunoglobulins. They include molecular “hinges” to hold various digests. Toxins are blocked by binding to the antigen directly. If a cell is affected, the antibody binds to the complexes on the cell surface and marks it for subsequent destruction.

When we talk about the key-lock metaphor, we need to understand that this is only a metaphor.
The key either fits the lock or does not fit - this is a binary attribute. In the case of a molecular detector, the value “suitable-not suitable” is real and is called affinity. This is a very important point and it is used to adjust the affinity on the fly.

A certain part of the IC (primarily the innate subsystem) basically corresponds to the traditional detector-reaction scheme, which is also used in modern antiviruses. But not all. In a real IP, there is something that does not exist in our defense systems, processes that make a zero-day attack very unlikely.

But we will come back to this, but for now let's proceed to the description of the processes (very abbreviated).

1. Apoptosis is controlled cell death, a common genocide in a single organism with the consent of the central government and the general public. Sometimes for some reason, the cell of your own body must be destroyed. Planned withdrawal of used material from use. Garbage collector, he is a bioreactor. The integrity of the membrane is maintained until the end.
2. Necrosis is unplanned destruction, such as an external attack. At the same time, the permeability of the membrane falls, the cell is released and the body is able to fix the mass death. This death, caused by external causes, is a signal to start. After this signal, the body's forces are mobilized, the temperature rises (with an increase in the speed of the necessary chemical reactions), the blood flow accelerates, cellular training mechanisms are activated so that the attack is met fully armed.

The difference between these two processes is the essence of what is called the Danger Theory.
This is a relatively new theory, its birth is attributed to 1994 and associated with the name of the legendary Polly Metzinger . The fundamental boundary between concepts lies in the fact that the organism (unlike the traditional antivirus) does not respond to external signals, but to a combination of the internal state and external reactions. Or, in other words, the system is essentially reflexive. She may not know that someone is attacking her. But she knows when she feels bad.

This is a very critical boundary.
In fact, we are moving from a simple “friend-foe” classifier to a complex anomaly detector, which determines the moment when the system begins to lose integrity and something needs to be done. And this anomaly detector has excellent support for concept drift (concept drift).

However, the reaction to external patterns in the real immune system also exists.
The bone marrow continuously creates B and T cells, many with a randomized part of the genome responsible for detection. Then the sensory part is tested for reaction with the antigens of its body, and any cell that detects at least something loved from itself is immediately destroyed. This provides a hard zero false positive criterion. And at the same time, a huge array of sensors is created with a random mate that provides maximum coverage of possible attacks, but without covering the organism itself. This stage is called negative selection.
In fact, this is the first step of what is called an anomaly detector in the computer world (novelty detector).
The last point is what is called affine maturation. This is the speed change of the genome, somatic hypermutation. When activated, in case of danger, the cell begins to randomly modify the genome, thereby arranging the detector, changing the generalization, expanding or narrowing the capture space, and being able to more accurately catch a particular attack variant.

Now imagine the entire array of sensors scattered around the body of a loved one. All these millions of cells whose weight is comparable to the weight of the brain.
And each of which has a unique detector.
Personally, I am impressed by the scale and possibilities of such a system.
No modern supercomputer is incomparable with the computational power of the immune system of an average gopnik, chewing seeds, not to mention a mobile cluster of such gopnik.
However, let's not forget that the PM of this project (may it not be lost backups) had unlimited funding and indefinite terms, i.e. what we can only dream of. It is possible that with such a budget, we would have done better.

According to the logic of the story, I would have to further describe the equivalent of the immune system in silico. However, this occupation is more ungrateful than listing all variants of genetic algorithms.
I will give only the names of the main methods created on the basis of the analysis of immune networks and seemed interesting to me personally. Those interested can find the description in the relevant journals or simply google a lot of text in pdf format.

DCA - dendritic cell algorithm.
AIRS - artificial immune recognition system
FIN - formal immune networks
CLONALG - clonal selection algorithm
aiNet - an artificial immune network for clustering and filtering
libtissue - library for experiments in the field of immune systems

For professionals, I can also specify the site www.dangertheory.com .
Practitioners may be interested in the site www.artificial-immune-systems.org/algorithms.shtml
A number of algorithms are ported to Weka and you can play with them in your personal sandbox.

We now turn to a closer topic to the author - a discussion of the practical implementation in the code. I mean, not the implementation of algorithms that were pulled from pieces of a prepared live system. We are interested in a holistic understanding.

The review of the code of the Main Programmer who accidentally fell into our hands may be more useful than reading the books of C ++ practitioners. Nature's design patterns are very different from our politically correct restrictions and the rules of “good” code. Therefore, in this article I focus specifically on general ideas, and not on specific algorithms — there are too many of them and, therefore, none of them is fully working.

The study of immune systems by interdisciplinary teams with the participation of biologists, mathematicians, and programmers was a very useful step, distributing a considerable number of grants among stakeholders. Open-source libraries, DCA algorithms, the beautiful term Immunocomputing and a new look at the problems of information security are also considerable achievements and achievements.
But what do we want to end up with?
What can we learn from the Chief System Architect?
Directly reproduce the decisions of nature in the gland is silly - otherwise on our cars instead of wheels there would be metal legs. Neural networks have not so much in common with biological neurons, genetic algorithms are not an absolute copy of real processes of reproduction of a double helix.

Below, I will give a modest opinion of an engineer who is not so versed in immunology.

1. We are accustomed to classify processes according to our own primitives, subjective and strictly human. Therefore, all of the above algorithms implement only a part of what we were able to understand in nature. Should a future AIS have an anomaly detector, clusterizer, training with or without a teacher?
The real immune system does not have such components in a pronounced form. Unlike us, God does not play Lego. The present purpose of the immune system is not even the distinction of "I-NAY", as previously thought. Maintaining your own integrity is something that does not have a catchy title in the booklets of IT marketers. So when we say that AIS includes a component, this is not entirely correct. AIS is not similar to a classifier, anomaly detector, or data fusion, although it has the corresponding functionality. This is a separate term, with the decomposition of which into components something is lost.
2. The security system is not a superstructure over the protected object. It is part of this facility. It does not protect against attacks - we all live in constant defense from the outside world. It maintains the viability and integrity of the object in an unfriendly environment. Therefore, the deep reflexivity of the protected system is incorporated into it initially; it cannot be attached to the outside by a separate level.
3. As a consequence of deep reflexivity, internal sensors should be randomly selected from the maximum possible list of component states and their combinations. Then the defense can monitor its own state quite fully. In the limit, the system should have access to all its own data, including its code at an arbitrary level of detail. In modern commercial systems, this looks unrealistic if we recall all the politically correct rules of “good” coding and the closeness of the code of the components. Gödel car is difficult to create, adhering to the style and generally accepted programming practices or expert recommendations.
4. The security system is individual in relation to the object of protection. Computer secretaries in the bank can be protected from playing solitaire. Sandbox in the antivirus laboratory should be open to infection with a trojan. In each case, the concept of "norm" and "danger" - are different. The accounting server and render farm are too different things and it’s pointless to build their defenses with a one-time list of filters, patterns and access lists.
For someone to read a political post for an unloved president - an attempt to invade. And, therefore, his Kindle should be protected from such texts. The concept of “danger” expands to “undesirable behavior”. Spam filters, porn filters, opinion filters are other markets, but AIS can be used in each one, creating in the limit the Lem's Ethicosphere in the information space.
5. Multi-level protection system - and this is not what marketers of antivirus companies write about. This is a conglomerate of subsystems, in which not only each component has its own immune system, but the conglomerate itself is protected by its own rules, independent of the protection of components. Theoretically, you can create a corporate network protection along with the individual protection of each computer - the threats to the computer and the parent network are different. And in theory, you can create protection for the entire Internet - the system scales very easily.
6. Protection must be adaptive. This does not mean the need to train it every time a new installation is made. But it must evolve with the advent of new threats without waiting for updates of a fashionable antivirus.
7. The real immune system is a thing with a large number of objects, but with a small number of types of relationships, such a low level of connectivity is not easily accessible to modern architects. There is no single coordination center, although the dendritic cells and the medium itself do some of the interaction work. It is a pure distributed system, making maximum use of distribution and parallelism.
8. The most important and last point. There are no business models for mass application of such systems, as there were no business models of peer-to-peer networks up to Napster and the CDS industry. This was confirmed by people from several anti-virus companies with whom I spoke. However, it was clear from the very beginning, the anti-virus market is too large and capitalized for sudden movements and revolutionary changes.

And finally, a few thoughts on the possible implementation of the prototype.

Anomaly detectors are a key point. There are many of them, from one-class classifiers to probabilistic models. However, this is a very rough level, giving a lot of FP in real situations. For research prototypes, one could try incremental decision trees, for example, asymmetric random forests from one-class very fast decision tree.
Ideally, it should be genetics, if we forget its greed for resources and weak generalization.

Sensors are a random combination of system states with good selectivity. It is possible to obtain them with the help of genetics (which is somewhat expensive), or simple Monte-Carlo. In any case, the design of the sensor should be done fully automatically, without human intervention - there should be too many of them.
One of the examples of implementation is DASTON - a set of software sensors with minimal intelligence in the heap of an executable program.

Drift concept (concept drift). The computer can transfer from bukh to virgins or gamers. And the “normal” state immediately becomes “abnormal.” The system should be able to forget the unimportant for the new environment. But remember the common to both behaviors. Those. there should be a characteristic period of re-initialization or retraining. The affine maturation technique itself allows this, but the limitations of this technique are not yet clear.

The system should be able to separate the general from the particular, the exchange of rules and components of the detectors between computers is possible, but as an option of co-evolution, without the possibility of compromise from the neighboring machine. That even in the case of co-evolution is quite difficult to guarantee.

The real immune system is often considered an anomaly detector. As I mentioned earlier, this is fundamentally wrong. The body knows that it is sick. How can a computer or network get a signal that something is wrong?
This is a very difficult question. There are no formal signs of a “sick” system. The trojan differs from the browser only by its authorship - both allow remote control and downloading updates without the user's consent, both collect passwords and can save them remotely. What is a healthy computer and what is sick? A living organism can distinguish an infection from an accidental change in its state by observing the level of necrosis. What can watch a computer? User reaction? The count of broken files? Change Kolmogorov complexity of the system?
As soon as we weaken the requirements and move from “necrosis” to abstraction, we lose a certain part of the logic of the AIS operation.

To understand that an anomaly has occurred is simple. , , — ?
. AIS . , . — - , , .
. ? , , , ?
. . — , DoS .
— , .
.
, , , – .

, . 146% . . SVM .

.
— , .
( ) , , (.. ), ( ).
— .
, , .
— , . . , helper T-cells, , . , – .
. , , . , AIS .

Source: https://habr.com/ru/post/189826/

All Articles

The Internet with immunity, or why God does not play Lego

More articles: