📜 ⬆️ ⬇️

How did regular expressions appear

Small preface


I was always interested in the history of the emergence of scientific concepts. Before studying a new subject, a series of faceless definitions first appears. Some of them remain so, others attract attention and eventually grow into full-fledged objects of the “picture of the world”. As an inaccessible ideal of such a striving, one can cite Littlewood’s statement about Ramanujan :
every natural number was his best friend

It was always interesting for me not only to master the concept, but also to figure out how it appeared. Behind every definition is always a personality. It is interesting to understand what ideas were at the basis of a particular concept and why the new definitions were accepted and supported by other people with such enthusiasm that they remained in textbooks.

Below is a small study of this kind, the object of which is the concept of a regular expression .


McCulloch-Pits Neural Networks


Surprisingly, neural networks were the main motivation for introducing the concept of a regular expression. A little about the history of awareness of this concept.
')
The concept of a neural network appeared as a result of interdisciplinary interaction. In 1942, Walter Pitts, an American logician who worked mainly in the field of cognitive psychology, met a renowned physiologist, Warren McCulloch. McCullock came to Chicago to work at a local university. Pitts had no shelter and McCullock offered to stay with him. The result of the emerging relationship was the work "The logical calculus of ideas relating to the nervous activity."

A model of human nervous activity was proposed. This model was based on the abstraction of the concept of a neuron. A mathematical object derived from such an abstraction is called a MacCulloch-Pits neuron. The MacCulloc-Pitts neuron is a formal model of a nerve cell consisting of a body and several processes called nerve fibers. In the McCulloch-Pitts model, a cell is a cell equipped with a special numeric attribute, the so-called threshold. From each neuron comes a rib that simulates nerve fiber and is called an axon. An axon can branch, but is necessarily an input for one or several other neurons. Thus, in addition to the outgoing fiber (axon), the neuron has several incoming fibers (synapses).

Some of the inputs can be exciting, others - inhibitory. Exciting fibers are indicated by a normal arrow, and braking fibers - by a discontinuous arrow. Each neuron generates a signal on the axon only if the number of stimulating input signals exceeds the threshold of the neuron. Axons are denoted by solid outgoing arrows. The body of a neuron is indicated by a triangle. Thus, the McCulloch-Pitts neuron can be represented as follows:
image
Neurons unite in networks called McCulloch-Pitts neural networks. The whole network functions as a state machine. To do this, the network introduces the concept of time. At each moment of time, the output signal from some neuron goes to the input of the neuron to which this neuron is connected as an input. Thus, fixing the moments of time, we get the state of the nervous network, and the transition from one state to another occurs with an increase in the current point in time.

The axons of some neurons in the network can be synapses for the same neurons, i.e. neurons can form "loops". The presence of loops is very important for the McCullock-Pitts neural network. Loops allow you to memorize the signals that were sent to the input. If a signal was sent to the input of a neuron, then an axon, which is the input for the same neuron, can be excited, and thus, from this moment on, the neuron will be in an active state, self-excited with each new time point.

Consider, for example, the McCulloch-Pitts neural network, which is the simplest storage device. Actually, the storage device will be one neuron with a loop, the remaining network elements are the "serving" elements.

The excitation of a signal at the output of the network will be called an event. The question arises: is it possible to determine its background by this event, i.e. sequence of signals passing through the network at previous points in time so as to reach the entrances to this network? If we take into account that the McCulloch-Pitts networks were designed to simulate brain activity, then this question can be reformulated as follows: what external factors led to the state of mind in which it is in this situation? It turns out that the presence of loops makes it impossible to determine exactly when this event occurred, and the presence of alternatives (inputs from two different neurons to a given neuron) does not allow to determine exactly where this event occurred. McCullock and Pitts believed that it was this restriction that was the basis of the psychological phenomenon called “abstraction.”

Regular events


American mathematician Stephen Kleene studied the events in the networks of McCulloch-Pitts. In the work “Representation of Events in Nervous Networks and Finite Automata”, Kleene proposed an elegant way of describing such events using the regular expression language . It can be said that every McCullock-Pitts network can be represented as a regular expression describing the inputs of this network and the history of the input signals, which caused one of the output signals to be excited.

Recall that the occurrence in the McCulloch-Pitts network is the appearance of a signal at its output, i.e. on the axon, which is not a synapse for any neuron of this network. Kleene studied the question: the excitation of which inputs (i.e., the synapses of some neurons of the network that are not axons of any neurons of this network) led to the appearance of a signal at this output? Consider as an example, the network shown in the figure below:

Suppose that at the input of a synapse, denoted by a, a signal is given. The only neuron of this network has a threshold of 2, so the axon of this neuron will be excited only if the synapse of the beginning S is active. In this case, the axon R is active, but its branch is also active, again entering the neuron. What can we say at time t about what is the background to the appearance of the signal R at the output? It turns out that only at the moment of time t-1 axon a was exactly active. It can also be assumed that at time point t-2 and earlier, axon a could also be excited. We cannot say for sure whether it was in reality, and if it was, then at what point in time axon a became active. Therefore, we express our hypothesis by means of a *. Thus, the history of the occurrence of the event R can be described by the expression a * a. Obviously, wherever there are loops in the network, we cannot say for sure about the time of appearance of the exciting signal at the input.

There are also inaccuracies of another kind. Consider the following network:

Here, the output synapse R is excited if the axons c and the synapse of the previous neuron are active. What can we say about when this synapse will be excited? It turns out that only one of the axons a or b is excited. This fact we denote by the expression a | b. Thus, the history of the occurrence of the event R can be described as (a | b) c.

In the considered examples, we obtained a convenient way of describing events in McCulloch-Pitts networks. Three operators are used for the description: *, | and the concatenation operator (join). It turns out that with the help of these three operators one can describe events in any McCulloch-Pitts network. There are no events that would not be described in this way on McCulloch-Pitts networks. This is the subject of evidence wedge theorems. Events that can be described by the above operators, Kleene called regular (in the sense that there are no other events). Regular events can be described by expressions using the operators *, | and concatenation. Such expressions are called regular expressions. Considering that every McCulloch-Pitts network is a finite state machine, we got a convenient way to describe automaton languages ​​using regular expressions.

Forgotten returns


Kleene's work was published in the mid 50s of the twentieth century. As often happens with scientific concepts that do not find their application, the work was forgotten. It would have continued this way until now, but in the late 60s, American programmer Ken Thompson found that regular expressions were conveniently used to set patterns for searching strings in long texts. A regular expression is converted to a state machine that searches for strings that match the patterns. To build a finite automaton, Thompson invented a special algorithm, which is now called the “Thompson construction”. Thompson included a regular expression search engine in the ed text editor he developed and since then regular expressions have become the de facto standard for defining search patterns.

The history of the concept of regular expression provides a good opportunity to see how the objects of human thought evolve, traveling from some areas of human activity to others. These areas can be very far from each other, and the exchange of concepts between these areas often looks unexpected. Interestingly, the deep work of Kleene, in which regular expressions were introduced, did not make this concept known. But, in essence, the simple, utilitarian application of the concept elevated regular expressions to the peak of fame.

Source: https://habr.com/ru/post/169765/


All Articles