The algorithm continuously performs the following steps:
Waiting until a new tetrimino is created.
Checks the type of the newly created Tetrimino, the type of the next Tetrimino (a figure in the preview field) and the contents of the playing field.
Investigates all possible ways to add two tetrimino to the playing field and evaluates each probability.
Moves the newly created Tetrimino to match the location of the best detected probability.
Each of these steps is described in detail below. ')
Lock search
Consider a simplified version of Tetris, in which the figures do not fall automatically. The only way to bring the figure down is a soft descent. By removing the timings from the game, we can fully describe the state of the active Tetrimino by its position and orientation. The figure has a known place of initial creation, and the following operations are used to convert from one state to another:
Move one step down
Move one step left
Move one step to the right
Turn one step counterclockwise
Rotate one step clockwise
These operations are applicable only when the squares of the resulting tetrimino correspond to empty cells of the playing field. When it is impossible to move one step down, the state is considered to be blocked. However, since we have simplified Tetris and the blocking wait is essentially infinite, the locked state can be further transformed by other operations, performing slips and scrolling.
A set of blocked states with a minimal sequence of operations creating them can be found using a wide search (breadth-first search, BFS). As mentioned below, it uses a queue to store intermediate results.
Queued state at creation.
We deduce a status from queue.
We receive the subsequent states, applying operations of transformation.
If there is no downward movement among them, the dequeued state is disabled.
We bring in the next state that we have not visited yet.
If the queue is not empty, repeat from step 2.
The program presents each state as an object with the following fields:
{ x, y, rotation, visited, predecessor }
During the preparation process, the program creates a three-dimensional array of state objects (20 rows × 10 columns × 4 turns), initializing x , y and rotation accordingly.
The visited field is marked when the status is enqueued. In BFS, this is permissible, because each subsequent state increases the total path length by 1. That is, by increasing the path length, it is impossible to create a subsequent state that needs to be inserted somewhere other than the end of the queue to maintain order.
The predecessor field points to the state object from which the current state originates. It is set when the status is queued. The state of creation has no preceding states.
The set of blocked states detected during the search is determined by the type of tetrimino and filled blocks on the playing field. The sequence of the moves that generated them can be clarified (in the reverse order) by following the predecessor links to the creation state. When the PLAY_FAST constant is PLAY_FAST to PLAY_FAST at the beginning of the program, it completely skips the preceding states by directly putting tetrimino on the field and blocking it.
A three-dimensional array of state objects, a queue, and a BFS are packed into a class. It has a search method that receives a playing field (two-dimensional array), a type of tetrimino and a listener. Each time a blocked state is detected, the playing field is updated by adding tetrimino to the appropriate place. Then, the modified playing field along with information about the changes is transmitted to the listener for processing. After the listener completes a return, the playing field is restored.
The listener is used to combine several search operations into a chain, which makes it possible to find all possible ways to add two (or more) tetrimino to the playing field. The first search engine in the chain performs BFS only once. However, the second search engine performs BFS every time the first search detects a blocked state. And so on, if there are other search engines in the chain.
The listener of the last search engine evaluates the changed playing field. When he finds the playing field better than what was previously investigated, he writes down the used object of the locked state, which at that time uses the first search engine in the chain. Since the first search engine performs BFS only once, the predecessor fields of its state objects remain valid until the entire search process is completed. That is, the last listener essentially writes down the path that the first tetrimino should take in order to come to the best configuration of the playing field as a result.
Evaluation function
The estimated function assigns a value to a modified game field — a weighted sum of the various influencing parameters. The evaluation function used in this case is based on the function developed by Islam Al-Ashi . It uses the following parameters:
Total number of cleaned rows : this is the total number of rows that will be cleared due to the addition of two tetrimino.
Total height of the block : the height of the block is the height above the floor of the playing field on which the figure is blocked. This is the vertical distance to which a blocked figure would fall, if you remove all other occupied squares of the playing field and preserve the orientation of the figure. The total blocking height is the sum of the blocking heights of two tetriminos.
Total number of “well” cells : a cell-well is an empty cell located above all occupied cells in a column so that its left and right neighbors are occupied cells; in determining wells, the walls of the playing field are considered occupied cells. The idea is that the well is a structure that is open at the top, closed at the bottom and surrounded by walls on both sides. The likelihood of intermittent gaps in the walls of a well means that cell wells do not necessarily occur in a continuous heap within a column.
The total number of holes in the columns : the hole in the column is an empty cell located directly below the occupied cell. The floor of the playing field is not compared with the cell above it. There are no holes in the empty columns.
Total number of transitions in columns : a transition in columns is an empty cell adjacent to the occupied cell (or vice versa) within a single column. The combination of the topmost occupied column block with the empty space above it is not considered a transition. Similarly, the floor of the playing field is also not compared with the cell above it. Therefore, there is no transition in a completely empty column.
Total number of transitions in the rows : the transition in the rows is an empty cell adjacent to the occupied cell (or vice versa) within one row. Empty slots near the walls of the playing field are considered transitions. The total number is calculated for all lines of the playing field. However, completely empty lines are not counted in the total number of transitions.
Al-Ashi suggested that useful weights can be found using the particle swarm optimization (PSO) algorithm, which iteratively improves the set of solutions by imitating the swarm behavior observed in nature. In our case, each solution is a vector of weights, and the fitness of a variant is determined by playing Tetris; this is the total amount of tetrimino during which he survived to the end of the game.
These ideas are applied in the Java version described below; it runs outside of FCEUX and can be configured for a non-graphical, memory-based game that runs at a much higher speed. After preparing the PSO, I was surprised to see that the algorithm does not move further after the initial iteration. After this iteration, several randomly generated solutions have already played quite well. For several days, the size of this set decreased until there was only one option left. Here are the values for this solution:
Parameter
Weight
Total number of cleaned rows
1.000000000000000
Total lock height
12.885008263218383
Total number of well cells
15.842707182438396
The total number of holes in the columns
26.894496507795950
Total number of transitions in columns
27.616914062397015
Total number of transitions in lines
30.185110719279040
The playing field was estimated by multiplying the parameters by their respective weights and adding the results. The lower the value, the better the solution. Since all parameters and weights have positive values, all parameters harm the overall assessment; each must be minimized. It also means that the best score is 0.
Since these weights were chosen randomly, the range of suitable values can be very wide. This particular set of numbers and the perceived relative importance of each parameter may not be significant. Nevertheless, it will be interesting to watch them carefully.
The least damaging parameter is the total number of rows cleared. The very fact that this parameter is harmful is counterintuitive. But the main goal of AI is survival. He does not aim for the most points. Instead, he plays conservatively, usually clearing the rows one by one. To get a Double, Triple or Tetris will have to grow a bunch, which goes against the long-term goal.
Next in the list is the total height of the lock. It can be minimized by dropping tetrimino as close to the floor as possible. This is a simple strategy that contributes to survival in the long run and, in the short run, to quality packaging of the pieces.
The weight assigned to the total number of well cells seems a little surprising, because experienced players usually deliberately build deep wells to collect several Tetris (combinations of four lines) in a row. But as mentioned above, this is a risky game, contrary to the main goal - survival. In addition, the number of wells is an indicator of "bumps" heap. A certain level of unevenness is beneficial when placing certain shapes or combinations of shapes. But high unevenness causes damage to the tight package.
The total number of holes in the columns is approximately half of the total number of transitions in the columns. These parameters can be combined and collapsed into a common related parameter, resulting in a more extensive and most harmful parameter: the total number of transitions.
The dense packing areas have a small number of transitions in all directions. Therefore, the basic strategy that drives artificial intelligence, can be briefly described as follows: pack the figures as close as possible to each other.
Other options
Here is a list of a few more parameters that I experimented with during the development of AI:
Heap height : occupied blocks can hang over empty cells, creating protrusions and holes; however, it is impossible to fix occupied blocks above completely empty lines. Therefore, the height of the heap is the number of rows that contain at least one occupied block.
Total number of columns occupied : this is the number of columns that contain at least one occupied cell.
Total number of occupied cells : the number of occupied cells on the playing field.
Total Number of Connected Areas : Here, the fill algorithm is used to count the number of continuously connected areas. In addition to finding busy "islands", he discovers holes that stretch along both axes.
Variance of column heights : this is a statistical measure of the magnitude of the scatter of column heights. It is an indicator of surface roughness.
Total Adaption Value : Calculates the heap adaptation value for the next unknown figure. It counts the total number of ways in which 7 types of pieces can be added to the playing field without the appearance of new holes. For accurate counting, multiple BFS applications will be required. But for an approximate count, the search tree can be very truncated.
Average score of the next figure : this parameter deepens the search by analyzing all the possibilities for the next unknown figure. It uses other parameters for a separate arrangement of each type of figures, and then returns the average for 7 estimates. BFS is required for each shape placement.
Averaged simulated game : the parameter simulates a series of games in Tetris, selecting figures using its own pseudo-random number generator and using AI to work with them. At the end of each game, the playing field is evaluated using other parameters. Returns the average for all batches.
All parameters can be customized by adding custom factors. For example, instead of simply counting the cleaned rows, you can assign your own weights for Single, Double, Triple and Tetris, by simulating a points system. If the simultaneous cleaning of several rows harms the long-term goal of survival, then a single weight can be assigned a negative weight, while others can get a positive one.
Another useful factor is the bias value. For example, a completely flat surface of a heap has a dispersion of the heights of columns 0. But a perfectly flat surface does not adapt to S and Z, as well as other combinations of figures. Therefore, by subtracting a constant, the variance needs to be centered around the optimal irregularity.
The adjusted and shifted parameters can be raised to any power so that they can be logarithmically or exponentially scaled before calculating the weighted sum. All these probabilities can be considered as additional weights that can potentially be optimized using methods like PSO.
Many of the parameters give an understanding of how well the heap can handle additional figures, for example, those dealing with surface roughness, but “Total Adaptation”, “Average score of the next figure” and “Average simulated game” estimate the modified playing field insertion of figures not included in the two known. In the study of subsequent figures due to the rapid elimination of rows, the amount of additional knowledge gained decreases with depth. This means that the past for a long time the party is not so important, and the course of the party in the distant future is also not very important. In fact, if the short sequence of pieces is randomly set incorrectly, the AI quickly restores the course of the game, using the following few pieces to clear the affected rows. The determination of the optimal value of the analysis of subsequent figures requires further research.
Another aspect of parameter utility is computational overhead. Costs are greatly increased because the evaluation function is called for each possible placement of two figures. Since AI must be able to play Tetris in real time, costly factors providing valuable information can be exchanged for more approximate techniques that run faster.
AI training
There are pathological sequences that can lead to Game Over, regardless of strategy. The simplest example is an infinite sequence of T and S S and Z, which, as shown in the animation, quickly causes the AI to lose.
Since it takes days to run the AI variant until several batches are completed and the average is calculated, it is completely impractical to use the average batch duration as the PSO control metric. Instead, it is possible with controlled speed to increase the difficulty of the game, increasing the frequency of S and Z, which will eventually lead to the alternate creation of only this pair of figures.
I tried using this teaching method, but found that teaching AI to work with frequent S and Z actually harms the ability to cope with evenly distributed random figures.
In an alternative method, which I was inspired by the B-Type game, the PSO control metric is the frequency of cleaning rows. The game field is a scheme of 10 lines of random garbage blocks, and each time a line is cleaned, a new line of garbage appears at the bottom, restoring the height of the heap. Since the playing field is 10 columns wide, and each tetrimino consists of 4 squares, on average, the AI should clear a row every 2.5 tetrimino. And to get rid of garbage, he must do it even faster.
Unfortunately, this technique also did not improve performance. One of the probable reasons is that random holes in the garbage do not exactly correspond to the lines that the AI deals with in this gameplay. In addition, row cleaning is a short-term goal; greedy cleaning of the ranks will not necessarily improve long-term survival. From time to time, rows should not be touched to handle certain combinations of subsequent figures.
On his web page, Colin Fay suggested a different approach. He created a histogram showing the percentage of pieces blocked in each row during trial lots. Interestingly, all histograms look almost identical regardless of the number of created figures. Based on this, he suggested that it is possible to use an approximate image of a function for any trial game when evaluating the statistical waiting for a blockage in the creation line, thus obtaining the time during which the AI will play until the end of the game. I decided to explore this opportunity.
The following is a heat map of the set of test lots, in the amount containing 2,039,900,000 tetrimino. Each cell is colored based on the percentage of figures blocked in it. To enhance the visual contrast is selected non-linear palette. The map was created by normalizing the cell values by dividing by the maximum percentage of cells, and then by announcing to a degree of 0.19 (see “gamma correction” ).
Colour
Percent
■
0.00000000
■
0.00000315
■
0.00024227
■
0.00307038
■
0.01860818
■
0.07527774
■
0.23582574
■
0.61928352
■
1.42923040
■
2.98867416
■
5.78182519
The dark orange and red bars in lines 17 and 18 mean that the vast majority of the figures end up there. The pale green tint below is a consequence of the geometry of the figures: only 4 of the 7 types of tetrimino can appear on the bottom line. The bottom corners are black because it’s impossible to be there.
The color along each line is almost uniform, and this suggests that horizontally the figures are distributed evenly. Minor gaps can be explained by looking at the histograms of certain types of shapes:
T
J
Z
O
S
L
I
T turns out to be the most universal type: its histogram is more uniform than that of all others. Anomalies in the histogram J - the result of the influence of the walls; only Jr and Jl can be in the side columns, which causes the AI to use columns 1 and 9 more often to compensate. The same applies to L. The histograms Z and S look about the same, which underlines the imbalance due to the fact that Zv and Sv not perfect mirror images of each other. Type O is limited to the 19 × 9 playing field, and it seems that the AI tends to use O more often on the sides than in the center. Tetrimino I is shifted to the right, because its starting point is located there; therefore, blocking in column 1 rarely happens.
The table shows the percentage of figures blocked in each row.
Line
Percent
0
0.0000000000
one
0.0000000000
2
0.0000004902
3
0.0000026472
four
0.0000066180
five
0.0000172557
6
0.0000512280
7
0.0001759400
eight
0.0006681210
9
0.0023187901
ten
0.0077928820
eleven
0.0259672043
12
0.0866187068
13
0.2901315751
14
0.9771663807
15
3.3000408353
sixteen
10.6989059268
17
28.5687976371
18
50.0335706162
nineteen
6.0077671454
Here is a graph of values:
If you do not take into account line 19, the graph shows exponential growth.
AI ai = new AI(); PlayfieldUtil playfieldUtil = new PlayfieldUtil(); int[] tetriminos = newint[AI.TETRIMINOS_SEARCHED]; int[][] playfield = playfieldUtil.createPlayfield(); Random random = new Random();
Slide = (Left == PRESSED) or (Right == PRESSED) Rotate = (A == PRESSED) or (B == PRESSED) if Down == RELEASED or Down == PRESSED_FOR_3_FRAMES then if Down == RELEASED then nextDown = PRESSED_FOR_1_FRAME else nextDown = PRESSED_FOR_2_FRAMES end addChild(Down = nextDown) ifnot Rotate then addChild(A = PRESSED, Down = nextDown) addChild(B = PRESSED, Down = nextDown) end if Slide then addChild() ifnot Rotate then addChild(A = PRESSED) addChild(B = PRESSED) end else addChild(Left = PRESSED) addChild(Right = PRESSED) ifnot Rotate then addChild(Left = PRESSED, A = PRESSED) addChild(Left = PRESSED, B = PRESSED) addChild(Right = PRESSED, A = PRESSED) addChild(Right = PRESSED, B = PRESSED) end end elseif Down == PRESSED_FOR_1_FRAME then nextDown = PRESSED_FOR_2_FRAMES else nextDown = PRESSED_FOR_3_FRAMES end addChild(Down = nextDown) ifnot Rotate then addChild(A = PRESSED, Down = nextDown) addChild(B = PRESSED, Down = nextDown) end end
, addChild , (, , ).
nextFallTimer = fallTimer + 1if Left == PRESSED and testPosition(x - 1, y, rotation) then x = x - 1 elseif Right == PRESSED and testPosition(x + 1, y, rotation) then x = x + 1 end if A == PRESSED and testPosition(x, y, nextClockwiseRotation) then rotation = nextClockwiseRotation elseif B == PRESSED and testPosition(x, y, nextCounterclockwiseRotation) then rotation = nextCounterclockwiseRotation end if Down == PRESSED_FOR_3_FRAMES or nextFallTimer >= dropSpeed then if testPosition(x, y + 1, rotation) then y = y + 1 nextFallTimer = 0else lockTetrimino() return end end childState = states[y][x][rotation][Left][Right][Down][A][B] ifnot childState.visited then childState.visited = mark childState.predecessor = state childState.fallTimer = nextFallTimer queue.enqueue(childState) end