📜 ⬆️ ⬇️

MySQL Turing Machine Emulator

Recently, at one of the interviews I was asked a puzzle to parse a string using only MySQL tools.
After that, I wondered: in general, how much complex tasks of this kind can be solved with the help of only one DBMS? The answer came quickly: using MySQL, you can solve any problem of string recognition at all. And in order to make it more convenient and versatile, it is enough to write a primitive state machine emulator, and even better - Turing machines, of course, using only the constructions kindly provided by MySQL. So let's start the experiment.

We design

Any program starts with a project. So it will be this time. First of all, what is a Turing machine, what does it do, what can it do? She knows how, frankly, a little. With the endless belt and control device (carriage), the machine can:
  1. Move the tape left and right
  2. Read symbol from tape
  3. Tape character
  4. Move to different states

The instruction for the Turing machine, translated into Russian, sounds like this: “Being in state 1 and reading the“ a ”symbol, move to the right and go to state 2”.
Of course, thus giving instructions to a Turing machine is not very convenient, therefore, we formalize our instructions as follows:
'>' - move right
'<' - move left
'#' - stop
Here is an example set of instructions for the machine:
0,1,>, 1
1.0,>, 2
2.0, 3
3,>, 4
4, 1, 4
4.1, #, 4

This rather useless program checks whether the binary number written on the tape is 4, and if it is, it writes the number 1, separated by a space.
This machine implies some assumptions.
First, the initial position of the control device may well be to the left, and not to the right of the original data.
Secondly, the tape is "infinite" only in one direction.
Thirdly, as you may have noticed, the above Turing machine after reading a symbol can do only one action: either write a symbol or move along the tape. We will design the emulator with these assumptions.
')
The whole project will consist of the following parts:

  1. A table designed to simulate a single-field tape.
  2. Error table, also with one field
  3. A table for the instructions themselves, from 4 fields (initial state, readable symbol, action, resulting state).
  4. A stored recursive procedure containing the logic of the emulator.


What is the error output table for? Well, it's still some kind, but an interpreter of a primitive language. Therefore, information about why the program has stumbled would not be superfluous.
So, how will the emulator work? You insert the text of the program into the intended for it table according to the rule one instruction - one line, insert the initial data into the table-tape and start the procedure. Depending on what you write in your program, it will end up on a tape. And if there is something completely unexpected, you can look at the error table.
Well, it's time to get down to the coding part.

We develop

To begin, create a database and a table.

CREATE DATABASE turing; CREATE TABLE program ( in_state INT NOT NULL , sread VARCHAR( 1 ) NOT NULL , actions VARCHAR( 1 ) NOT NULL , out_state INT NOT NULL ) ENGINE = MYISAM CHARACTER SET utf8 COLLATE utf8_unicode_ci; CREATE TABLE output_string ( output TEXT NOT NULL ) ENGINE = MYISAM ; CREATE TABLE ribbon ( sinput TEXT NOT NULL ) ENGINE = MYISAM ; 


In the table for outputting output_string errors, you need to create one empty line and do not touch it anymore.

Now we proceed to the most important thing - the creation of a procedure. But before that, you need to allow recursion in the MySQL config, since we will need it very much.
To do this, write max_sp_recursion_depth = 255 in my.cnf.

I will first give the procedure code in its entirety, and below will give a decoding so as not to spoil the view with comments.

 delimiter // CREATE procedure turing(IN sstate INT(11), IN pos INT(11)) BEGIN turing:BEGIN SET @p=pos; SET @sstate=sstate; SELECT @inread:=SUBSTRING(sinput, @p, 1) FROM ribbon; SELECT @instate:=in_state, @sread:=sread, @actions:=actions, @out_state:=out_state, @numrows:=count(in_state) FROM program WHERE in_state = @sstate AND sread = @inread; IF @numrows > 1 THEN UPDATE output_string SET output= 'Confused :('; LEAVE turing; ELSEIF @numrows = 0 THEN UPDATE output_string SET output='Do not understand what to do next :('; LEAVE turing; ELSE SELECT @lin:= LENGTH(sinput) FROM ribbon; SELECT @input:=sinput FROM ribbon; IF @actions = '>' THEN IF @lin = pos THEN SELECT @new_input:=CONCAT(sinput, ' ') FROM ribbon; UPDATE ribbon SET sinput=@new_input; END IF; SET @pos=pos+1; ELSEIF @actions = '<' THEN IF pos>1 THEN SET @pos=pos-1; ELSE UPDATE output_string SET output='Carriage has left the ribbon'; LEAVE turing; END IF; ELSEIF @actions = '#' THEN LEAVE turing; ELSE SELECT @head:=SUBSTRING(sinput, 1, pos-1) FROM ribbon; SELECT @tail:=SUBSTRING(sinput, pos+1, @lin) FROM ribbon; SELECT @inp:=CONCAT(@head, @actions, @tail); UPDATE ribbon SET sinput=@inp; SET @pos=pos; END IF; CALL turing(@out_state, @pos); END IF; END; END // 

The beginning is clear, we create a procedure and pass two parameters to it - the state of the state machine and the position of the control device pos.

Next, we mark our BEGIN ... END block with a turing label so that we can use it to exit.

After that, we read into the @inread variable the symbol on which the carriage is currently located.

From the program table, we select rows in which the state is equal to that passed in the parameter, and the sread field — the readable symbol is equal to the actual symbol on which our control device is located. These two parameters uniquely define the instruction for the Turing machine, so the line that satisfies the condition must be unique.

We handle the situation when there are no relevant instructions - the emulator completes its work with an error - it just does not know what to do next.

We process a situation when there are more than one instructions satisfying the condition. In this case, the machine also does not know what to do next, and the emulator shuts down with an error, stating that it is confused.

If everything is in order, that is, the number of lines is 1, then we consider the length of the input line and begin to process actions. There are also several nuances. First, we must remember that the tape must be infinite in one direction, and the length of the input data is not infinite. Therefore, if the control device is at the end of the input line, and the program instructs to move to the right, add a few spaces and execute the instruction, increasing the position by 1.

In the case of a movement to the left, we handle the situation when the carriage extends beyond the tape. There's nothing you can do. Sorry, the tape is "endless" in one direction only.

When the program sends a “stop” signal, denoted by the # symbol, we exit the procedure.

If the character does not match any of the above, then it is not>, <or #, it means that you need to write it to tape, wiping the current one. We turn this trick with the help of the simple built-in function CONCAT ().

Further we continue in the same spirit, transferring already modified values ​​of parameters. @out_state in this case is the output state, which is the input for the next instruction.

We are testing

So, we come to the final stage - testing. I wanted to do something significant on this emulator to prove that MySQL is really all-powerful. Therefore, we will solve a rather non-trivial, albeit a common task - to validate Roman numbers.
The program will work quite simply - at the input to accept a string of characters and, if this string is a correct Roman number, type a space on the tape “Ok”, separated by a space.
Despite the apparent complexity, the task is simple. It is necessary for each character used to write a Roman number to determine which ones cannot follow it. For example, L cannot follow C, that is, the entry LC cannot be considered a Roman number. The number 50 is written simply as L. Or, for example, the maximum number of consecutive I - 3, therefore IIII is not a Roman number, etc. I will not dwell on these rules in detail, for this is the topic of a separate article.
Our input alphabet consists of the following symbols: M, D, C, L, X, V, and I. I decided to call the initial state number 47, because I like the number 47 and also to demonstrate that any state can be used and it is not necessary to start from 0 (at least in this emulator the zero state is not necessary).
Paste our program into the database:

insert into program values
(47, '', '#', 47), (48, '', '>', 64), (49, '', '>', 64),
(50, '', '>', 64), (51, '', '>', 64), (52, '', '>', 64),
(53, '', '>', 64), (54, '', '>', 64), (55, '', '>', 64),
(56, '', '>', 64), (57, '', '>', 64), (58, '', '>', 64),
(59, '', '>', 64), (60, '', '>', 64), (61, '', '>', 64),
(62, '', '>', 64), (63, '', '>', 64), (65, '', '>', 64),
(66, '', '>', 64), (64, '', 'O', 64), (64, 'O', '>', 70),
(70, '', 'k', 70), (70, 'k', '>', 71),

(47, 'I', '>', 48), (48, 'V', '>', 51), (48, 'X', '>', 51),
(51, 'V', '#', 51), (51, 'C', '#', 51), (51, 'L', '#', 51),
(51, 'D', '#', 51), (51, 'M', '#', 51), (51, 'X', '#', 51),

(48, 'C', '#', 48), (48, 'L', '#', 48), (48, 'D', '#', 48),
(48, 'M', '#', 48), (48, 'I', '>', 49), (49, 'I', '>', 50),
(50, 'I', '#', 50), (50, 'V', '#', 50), (50, 'C', '#', 50),
(50, 'L', '#', 50), (50, 'D', '#', 50), (50, 'M', '#', 50),
(50, 'X', '#', 50),

(47, 'V', '>', 52), (52, 'I', '>', 48), (52, 'V', '#', 48),
(52, 'X', '#', 48), (52, 'C', '#', 48), (52, 'M', '#', 48),
(52, 'D', '#', 48), (52, 'L', '#', 48), (47, 'X', '>', 53),
(53, 'V', '>', 52), (53, 'I', '>', 48), (53, 'D', '#', 53),

(53, 'M', '#', 53), (53, 'L', '>', 53), (53, 'C', '>', 53),
(53, 'X', '>', 55), (55, 'D', '#', 55), (55, 'M', '#', 55),
(55, 'C', '#', 55), (55, 'L', '#', 55), (55, 'V', '>', 52),
(55, 'I', '>', 48), (55, 'X', '>', 56), (56, 'X', '#', 56),
(56, 'D', '#', 56), (61, 'C', '>', 62), (56, 'M', '#', 56),
(56, 'C', '#', 56), (56, 'L', '#', 56), (56, 'V', '>', 52),
(56, 'I', '>', 48),

(47, 'L', '>', 54), (54, 'X', '>', 53), (54, 'V', '>', 52),
(54, 'I', '>', 48), (54, 'D', '#', 54), (54, 'C', '#', 54),
(54, 'M', '#', 54), (54, 'L', '#', 54),

(47, 'C', '>', 59), (59, 'X', '>', 53), (59, 'I', '>', 48),
(59, 'L', '>', 54), (59, 'V', '>', 52), (59, 'M', '>', 65),
(65, 'M', '#', 65), (65, 'V', '>', 59), (65, 'X', '>', 59),
(65, 'I', '>', 59), (65, 'L', '>', 59), (65, 'C', '#', 65),

(59, 'D', '>', 66), (66, 'D', '#', 66), (66, 'M', '#', 65),
(66, 'V', '>', 59), (66, 'X', '>', 59), (66, 'I', '>', 59),
(66, 'L', '>', 59), (66, 'C', '#', 65), (59, 'C', '>', 61),
(61, 'C', '>', 62), (61, 'I', '>', 47), (62, 'X', '>', 53),
(62, 'I', '>', 48), (62, 'L', '>', 54), (62, 'V', '>', 52),
(62, 'C', '#', 62), (61, 'M', '#', 59), (61, 'D', '#', 59),
(62, 'M', '#', 62), (62, 'D', '#', 62),

(47, 'D', '>', 60), (60, 'M', '#', 60), (60, 'C', '>', 59),
(60, 'D', '#', 60), (60, 'X', '>', 53), (60, 'I', '>', 48),
(60, 'L', '>', 54), (60, 'V', '>', 52),

(47, 'M', '>', 63), (63, 'M', '>', 63), (63, 'C', '>', 59),
(63, 'D', '>', 60), (63, 'X', '>', 53), (63, 'I', '>', 48),
(63, 'V', '>', 52), (63, 'L', '>', 54)

The first block of the program is responsible for the final stage - here we get, when the number has successfully passed the validation, and the carriage has reached a space. From this point on we start typing “Ok”.

Subsequent blocks are instructions from different combinations of machine states and letters of the input alphabet. Most pieces of code are used many times. For example, when we wrote all the rules for symbol I, and we write for symbol V, then we can write (47, 'V', '>', 52), (52, 'I', '>', 48), where state 48 is already described by us earlier.

And what is the result? Insert as an input string, say CCCXXIV (correct number). Now we type in the console:

mysql> call turing (47,1) //

47 - the initial state, 1 - the position of the control device.

We get:

+ -------------------------------------- +

| CCCXXIV Ok |

+ -------------------------------------- +

Now let's try an incorrect number, say, MXXLC.
We get:

+ ---------------- +

| MXXLC |

+ ---------------- +

Everything!

Summing up

This embodiment of the emulator is not ideal and can be continuously refined and improved. This is just a prototype, demonstrating far from obvious possibilities of using MySQL. Well, of course, this implementation was created solely for the sake of sports interest and fun pastime. Hope you also enjoyed reading this article.

More to you abnormal ideas and successful programming!

Source: https://habr.com/ru/post/109611/


All Articles