Regular expressions (RV) is a very convenient form of writing so-called regular or automaton languages. Therefore, RVs are used as an input language in many chain processing systems. Consider the examples of such systems:
- Unix operating system grep command or similar commands for searching for chains that can be found in web browsers or text formatting systems. In such systems, PBs are used to describe patterns that the user searches for in a file. Different search engines convert RTs to either a deterministic finite state machine (DFA) or a non-deterministic finite state machine (NKA) and apply this automaton to the file that is being searched.
- Generators lexical analyzers. Lexical analyzers are a component of the compiler, they break the source program into logical units (lexemes), which can consist of one or several characters and have a certain meaning. The generator of lexical analyzers receives formal descriptions of tokens, which are essentially RT, and creates a DFA that recognizes which of the tokens appears at its input.
- RV in programming languages.
In this article, we first take a look at finite automata and their types (DFA and NFA), and then consider an example of constructing a minimal DFA using a regular expression.
State machines
A finite state machine (AC) is a converter that allows you to match the input to the corresponding output, and this output may depend not only on the current input, but also on what happened before, on the history of the state machine operation. Even human behavior, and not just artificial systems, can be described using spacecraft. For example, your reaction to the fact that your neighbor is listening to loud music at night, will be one after the first such case, and completely different after several such cases. There can be an infinite number of such prehistories, the question arises: what kind of memory does a spacecraft have to have in order to behave differently for each predistroy? It is clear that it is impossible to store an infinite number of prehistories. Therefore, the automaton breaks all possible prehistories into equivalence classes. Two stories are equivalent if they equally influence the behavior of the automaton in the future. The equivalence class to which the automaton assigned its current prehistory is also called the internal state of the automaton.
')
Consider the example of a primitive spacecraft:

This spacecraft consists of:
- tape represented by the input chain.
- reading device.
- a control unit that contains a list of transition rules.
The reader can move in one direction, usually from left to right, thereby reading the characters in the input chain. For each such movement it can count one symbol. Further, the read character is transferred to the control block. The control unit changes the state of the machine based on the transition rules. If the list of transition rules does not contain rules for the character read, the automaton “dies”.
Now consider how you can set the spacecraft. They can be specified as graphs or as control tables. In the form of a graph, the spacecraft is defined as follows:
- vertices of the graph correspond to the states of spacecraft.
- directed edges correspond to transition functions (next to each such edge is indicated the symbol on which the transition is performed).
- a vertex with an edge that does not go out of more than one state corresponds to the initial state.
- the final states of the spacecraft are marked with a bold contour.
In the form of a control table, so:
- The states of the spacecraft are located in the rows of the table.
- characters recognized language - in columns.
- at the intersection indicates the state in which you can get from this state for this symbol.
An example of a spacecraft in the form of a graph and in the form of a control table will be presented below.
DCA and NCA
The main difference between DFA and NKA is that in the process of operation, DFA can only be in one state, and the NKA in several states at the same time. As an example of the work of the NCA, one can cite the idea of the American physicist
Hugh Everett that any event breaks the world into several worlds, in each of which this event ended differently. For example, in one world, Hitler won the Second World War, in the other, Newton instead of physics went into business and the discovery of the laws of classical mechanics had to be put off for 50 years. After the entire input chain has been read, we assume that the NCA allows this chain if it has completed the work in a permissible state in at least one of the many “worlds”. Accordingly, the automaton rejects the chain if it has completed the work in the non-admissible state in each "world". DKA accepts a chain, which is obvious if, after reading the entire input chain, it is in a tolerable state.
In most cases, building an NCA is much easier than DCA. But, despite this, using the NCA for modeling is not a good idea. Fortunately, for each NCA you can build a DFA that allows the same input language. In this article, we will not give an algorithm for constructing a DFA on the NCA, but consider this algorithm based on a visual example below.
Constructing the minimum DFA by regular expression
To begin with, we give a list of PB operations used in this article, in order of their priority:
- iteration (wedge closure), using the "*" symbol
- concatenation is specified with a space or an empty string (for example: ab)
- union using the "|"
Consider an example, given a regular expression:
xy * (x | y *)
| ab (x | y *)
| (x | a *) (x | y *)
It is necessary to construct a minimal DFA using a regular expression and demonstrate the recognition of a correct and incorrect chain.
To begin with, we simplify this RT using the right-hand distributive law of concatenation with respect to a union, we get the following RT:
(xy * | ab | (x | a *)) (x | y *)
Now we are building an automaton for this PB:

According to the rule of concatenation conversion (we will not give the rules for converting RV to KA, since they are fairly obvious), we get the following automaton:

According to the union conversion rule:

According to the concatenation rule:

And at the end we apply the closure transformation rule and get ε . It should be noted here that εNKA is an NKA that contains ε transitions. In turn, the ε-transition is a transition in which the automaton does not take into account the input chain or in other words, the transition along an empty symbol.

We get rid of ε transitions (the "asterisk" indicates the final states):


In this NCA, the states s3 and s5 are equivalent, since δ (s3, x) = δ (s5, x) = s1 and δ (s3, y) = δ (s5, y) = s5, s7. Rename the states s6 -> s5 and s7 -> s6:

We build a DCA on NCA:


In this DCA, the states p1 and p5 are equivalent, since
δ (p1, x) = δ (p5, x) = p4 and δ (p1, y) = δ (p5, y) = p5. Rename the states p6 -> p5 and p7 -> p6:

This machine is the minimum DKA.
Let δ be the transition function, then the extended transition function constructed from δ is denoted by δ ', and ω is the input chain.
If we admit the input chain ω = aaax, we expect that the automaton will be in one of the admitting states.
δ '(p0, ε) = p0
δ '(p0, a) = δ (δ' (p0, ε), a) = δ (p0, a) = p3
δ '(p0, aa) = δ (δ' (p0, a), a) = δ (p3, a) = p5
δ '(p0, aaa) = δ (δ' (p0, aa), a) = δ (p5, a) = p5
δ '(p0, aaax) = δ (δ' (p0, aaa), x) = δ (p5, x) = p4
p4 is a valid end state, so the aaax chain is correct for this automaton.
Now suppose that ω = xyyb:
δ '(p0, ε) = p0
δ '(p0, x) = δ (δ' (p0, ε), x) = δ (p0, x) = p1
δ '(p0, xy) = δ (δ' (p0, x), y) = δ (p1, y) = p1
δ '(p0, xyy) = δ (δ' (p0, xy), y) = δ (p1, y) = p1
δ '(p0, xyyb) = δ (δ' (p0, xyy), b) = δ (p1, b) =
Here we see that if the b symbol is input to the automaton when it is in the p1 state, then this automaton will die, hence the xyyb chain is incorrect.
PS In this article, the algorithm for constructing a DCA on RT was considered, but there are more convenient algorithms, in particular for programming, but this is a topic for another article ...