📜 ⬆️ ⬇️

Regular expressions from the inside

Regular expressions (RV) is a very convenient form of writing so-called regular or automaton languages. Therefore, RVs are used as an input language in many chain processing systems. Consider the examples of such systems:



In this article, we first take a look at finite automata and their types (DFA and NFA), and then consider an example of constructing a minimal DFA using a regular expression.

State machines


A finite state machine (AC) is a converter that allows you to match the input to the corresponding output, and this output may depend not only on the current input, but also on what happened before, on the history of the state machine operation. Even human behavior, and not just artificial systems, can be described using spacecraft. For example, your reaction to the fact that your neighbor is listening to loud music at night, will be one after the first such case, and completely different after several such cases. There can be an infinite number of such prehistories, the question arises: what kind of memory does a spacecraft have to have in order to behave differently for each predistroy? It is clear that it is impossible to store an infinite number of prehistories. Therefore, the automaton breaks all possible prehistories into equivalence classes. Two stories are equivalent if they equally influence the behavior of the automaton in the future. The equivalence class to which the automaton assigned its current prehistory is also called the internal state of the automaton.
')
Consider the example of a primitive spacecraft:



This spacecraft consists of:



The reader can move in one direction, usually from left to right, thereby reading the characters in the input chain. For each such movement it can count one symbol. Further, the read character is transferred to the control block. The control unit changes the state of the machine based on the transition rules. If the list of transition rules does not contain rules for the character read, the automaton “dies”.

Now consider how you can set the spacecraft. They can be specified as graphs or as control tables. In the form of a graph, the spacecraft is defined as follows:



In the form of a control table, so:



An example of a spacecraft in the form of a graph and in the form of a control table will be presented below.

DCA and NCA



The main difference between DFA and NKA is that in the process of operation, DFA can only be in one state, and the NKA in several states at the same time. As an example of the work of the NCA, one can cite the idea of ​​the American physicist Hugh Everett that any event breaks the world into several worlds, in each of which this event ended differently. For example, in one world, Hitler won the Second World War, in the other, Newton instead of physics went into business and the discovery of the laws of classical mechanics had to be put off for 50 years. After the entire input chain has been read, we assume that the NCA allows this chain if it has completed the work in a permissible state in at least one of the many “worlds”. Accordingly, the automaton rejects the chain if it has completed the work in the non-admissible state in each "world". DKA accepts a chain, which is obvious if, after reading the entire input chain, it is in a tolerable state.

In most cases, building an NCA is much easier than DCA. But, despite this, using the NCA for modeling is not a good idea. Fortunately, for each NCA you can build a DFA that allows the same input language. In this article, we will not give an algorithm for constructing a DFA on the NCA, but consider this algorithm based on a visual example below.

Constructing the minimum DFA by regular expression



To begin with, we give a list of PB operations used in this article, in order of their priority:



Consider an example, given a regular expression:

xy * (x | y *) | ab (x | y *) | (x | a *) (x | y *)

It is necessary to construct a minimal DFA using a regular expression and demonstrate the recognition of a correct and incorrect chain.

To begin with, we simplify this RT using the right-hand distributive law of concatenation with respect to a union, we get the following RT:

(xy * | ab | (x | a *)) (x | y *)

Now we are building an automaton for this PB:



According to the rule of concatenation conversion (we will not give the rules for converting RV to KA, since they are fairly obvious), we get the following automaton:



According to the union conversion rule:



According to the concatenation rule:



And at the end we apply the closure transformation rule and get ε . It should be noted here that εNKA is an NKA that contains ε transitions. In turn, the ε-transition is a transition in which the automaton does not take into account the input chain or in other words, the transition along an empty symbol.



We get rid of ε transitions (the "asterisk" indicates the final states):





In this NCA, the states s3 and s5 are equivalent, since δ (s3, x) = δ (s5, x) = s1 and δ (s3, y) = δ (s5, y) = s5, s7. Rename the states s6 -> s5 and s7 -> s6:



We build a DCA on NCA:





In this DCA, the states p1 and p5 are equivalent, since
δ (p1, x) = δ (p5, x) = p4 and δ (p1, y) = δ (p5, y) = p5. Rename the states p6 -> p5 and p7 -> p6:



This machine is the minimum DKA.

Let δ be the transition function, then the extended transition function constructed from δ is denoted by δ ', and ω is the input chain.

If we admit the input chain ω = aaax, we expect that the automaton will be in one of the admitting states.

δ '(p0, ε) = p0
δ '(p0, a) = δ (δ' (p0, ε), a) = δ (p0, a) = p3
δ '(p0, aa) = δ (δ' (p0, a), a) = δ (p3, a) = p5
δ '(p0, aaa) = δ (δ' (p0, aa), a) = δ (p5, a) = p5
δ '(p0, aaax) = δ (δ' (p0, aaa), x) = δ (p5, x) = p4

p4 is a valid end state, so the aaax chain is correct for this automaton.

Now suppose that ω = xyyb:

δ '(p0, ε) = p0
δ '(p0, x) = δ (δ' (p0, ε), x) = δ (p0, x) = p1
δ '(p0, xy) = δ (δ' (p0, x), y) = δ (p1, y) = p1
δ '(p0, xyy) = δ (δ' (p0, xy), y) = δ (p1, y) = p1
δ '(p0, xyyb) = δ (δ' (p0, xyy), b) = δ (p1, b) =

Here we see that if the b symbol is input to the automaton when it is in the p1 state, then this automaton will die, hence the xyyb chain is incorrect.

PS In this article, the algorithm for constructing a DCA on RT was considered, but there are more convenient algorithms, in particular for programming, but this is a topic for another article ...

Source: https://habr.com/ru/post/166777/


All Articles