📜 ⬆️ ⬇️

Algorithm for quick search of words in the game noodle

Once in one social network I came across a noodle game with non-standard rules (large fields and knots). Pick-up programs mainly work according to the classical rules and on the 5x5 margins. Therefore, I had a sporting interest in writing my matching kit fully adapted to non-standard rules. And not just to write a matching list, but to implement the fastest possible word search algorithm.


A rather interesting mode “knots” is distinguished from non-standard rules, in which you can make a word using one letter up to 2 times (for example, in the screenshot above you can make the word SHALASH, the letters W and A used 2 times). As it turned out later, in the same position, the algorithm searches for all words in the nodule mode almost exactly 1.5 times longer than in the classical mode without nodules (this is not surprising, since there are more variations of words and the words found are longer). At the end of the article, the described algorithm will be tested in a test position in nodule mode.

Vocabulary


In order to make life more difficult for the algorithm, one of the largest dictionaries in such programs will be used. It has approximately 110,000 words.
An important role in the algorithm is played by the method of storing the dictionary in the program's memory. When loading each word is recorded in the structure in the form of a tree. Each node of this tree means a specific letter of the word and refers to no more than 32 subtrees (by the number of letters in the alphabet). The tree node in the program looks like this:
typedef TREE *NEXT; //    struct TREE { NEXT *next; //       char *str; //      char *wrd; //  NULL,      }; 

The str field contains a string with all letters that go out of this node, the size of the next[] array is equal to the length of this string.
The str field is used to find a specific node among the array of nodes next[] for this letter. To do this, we have introduced the ind() function based on strchr():
 int k=ind('',str); //  ()  ''   str 

If k<0 , then this letter is not in str , otherwise next[k] points to the node corresponding to this letter. For example, if str contains the string "" , then exactly 5 subtrees go from this node, and next[2] refers to the letter '' , since str[2]== '' , the same with other letters.
To illustrate all of the above, I will give an example of a tree on a specially prepared dictionary consisting of words:
 , , , ,  


If the field str==NULL , then this node is a sheet and it contains the dictionary word wrd . The node corresponding to the letter '' contains the word "" , nor is it a sheet, because there is another word in this dictionary, starting with - . The wrd field contains a string, which is obtained if you go from the root to the corresponding node, but if this string is not a dictionary word, then wrd contains NULL .
Let's call this tree a dictionary, it contains all the words from the dictionary. In addition to the dictionary tree, you need to prepare another one, which contains all possible inverted prefixes of words for each word (let's call this tree inverted). For example, for the word "" following lines will fall into the inverted tree: "", "", "", "", "" . Even for a test vocabulary with five words, such a tree will be cumbersome, so I’ll give a simplified version:

Words Vocabulary words prefixes in this tree are located backwards. The node in the red circle corresponds to the last letter of the inverted vocabulary prefix, i.e. matches the first letter of the word. To read the beginning of the vocabulary word you need to move from the red node to the root. The only difference between an inverted tree and a dictionary tree is that wrd serves as a flag ( NULL/ NULL ). Those. it does not store the string with the inverted prefix of the word, but:


This is done to save space. For all red nodes wrd!=NULL , for the rest wrd==NULL . For example:
 root_inv->str == ””; root_inv->next[0]->str == ””; root_inv->next[5]->next[0]->wrd == NULL; //      root_inv->next[5]->next[0]->next[1]->wrd != NULL; //     


Both trees occupy 33M in memory, and directly from the dictionary they are built in about 5 seconds on i5, so it was decided to save the image of trees to a file and load it in the future. So the download speed was only 300 milliseconds.
')

Search algorithm


According to the rules for the move, you need to select an empty cell, and put there any letter.
Hidden text
Everywhere they write that this cell must necessarily be adjacent to at least one occupied cell, but this is an extra rule, since even if you put a letter in a non-adjacent cell, in any case, you cannot make a word, because there are no single-letter words in the dictionary.

Then you need to make a word using the letter. This letter can be both the first letter in a word, and not the first letter (although captaincy, but this is important). If the rules had said that the letter put must be exactly the first, then we simply use the dictionary tree, moving across the field from the set letter while descending down the tree. But since The set letter may not be the first, the dictionary tree cannot be used, because it is not known from which node to start the search. But we still have an inverted tree.
The idea is this: out of the cage with the set letter we move across the field, descending along an inverted tree. If we find a node with wrd!=NULL (red node), then this is a valid inverted word prefix. Having the prefix of a word, it is possible to find a node in the dictionary tree corresponding to the set letter, and then simply donate the remaining piece of the word already in the dictionary tree, starting from this node.
In order for the algorithm not to enter the cells that have already been passed for this word, there is a count in each cell using the count . Before starting the search, this counter is initialized with the number 2 (for knot mode) or 1 (for the classic mode) - this is the only difference between the modes in terms of the algorithm.
More detailed algorithm:

Naturally, after the cycle, we spell the current adjacent cell.
The search function in the find subtree takes the playing field, the tree node ( tree ), and the cell coordinates (i, j) , from which it is necessary to start moving along the floor, descending along the tree :

Consider the work of the algorithm for the central cell on the example of the following position:

We put each letter from the string ”” into this cell (the remaining letters of the alphabet simply do not exist in root_inv->str and the algorithm for them will end quickly without even starting). For each letter, call find . For the letter function will end quickly, because in the inverted tree there are no branches beginning with -, - and - , the same is true for , , , -

For the remaining letters in the inverted tree, valid prefixes will be found:
– , – , – , – , – . For each prefix, we find a node in the dictionary tree (as described in item 1.b) and call the function find for it.
The prefix “” corresponds to the node root->next[0]->next[0]->next[0] :

The next adjacent letter may be or , but the str field of this node contains “” , so we immediately finish the find function, i.e. There are no branches in the tree starting at - and - . For the prefix “” in the dictionary tree there are no branches starting with -, -, -, - , -, -, -, - , the same is true for the prefixes “” and “” .

The prefix “” corresponds to the node root->next[0]->next[0]->next[2] .
Since wrd!=NULL field wrd!=NULL , put the wrd () string wrd () in the array of found words and continue the search further. For the and there are no branches, we complete the find function for them, but the line contained in the tree, we place it in the array of found words.

results


The algorithm was tested on the following field (of course, not on a test dictionary, but on a dictionary of 110,000 words):
Hidden text

This is a really played game.
Here, the field is 99 , and it is half filled (perhaps this is one of the worst positions for any algorithm, because there are quite a lot of cells where you can go, and quite a good mutual arrangement of letters, because of which many long words can be made). In the nodules mode, the algorithm finds all words in the following time:


For other positions, the search time is even less.

application


This algorithm is applied in applications for iOS (available in the app store) and Android. In addition to the search for all words, two more types of search are implemented:


Another feature: the same length words are sorted according to the rarity of the added letter (for example, it is more profitable to go off the letter than the letter ).

Source: https://habr.com/ru/post/207734/


All Articles