📜 ⬆️ ⬇️

About busting on the example of generating crosswords

With the article " Algorithm of Crossword Formation ", several heuristics were proposed, which were implemented in the program of automatic generation of crossword puzzles. Despite the fact that the proposed heuristics are well developed, even they did not allow for a reasonable time to generate a crossword puzzle for the most complex of the grids:

image
This and all subsequent figures are taken from the original article.

In this article, I would like to discuss a much simpler solution that allows you to solve this problem.

The proposed solution is based on two main ideas:
')
  1. Correct brute force
  2. Do not perform the same actions again.

All code is written in Scala.

Data structures


Imagine a crossword box in the form of a two-dimensional list of cells (Cell):

val field: List[List[Cell]] = readField("cross.in") 

In addition, each cell is either a protein and must contain a letter (Letter), or black, and must remain empty (Empty):

 sealed abstract class Cell case object Empty extends Cell //     case class Letter() extends Cell //     -  

A word is a sequence of letters:

 class Word(val letters: Seq[Letter]) 

A dictionary is a set of words grouped by length:

 val dictionary: Map[Int, Set[String]] = readWords("cross.txt", "WINDOWS-1251").groupBy(_.size) 

Word highlighting


Note that there are no words in the field description. We will take the word vertical or horizontal set of (future) letters, bounded by empty cells, or the edges of the board.
Let's learn how to highlight words on one line:

 def extractWords(row: List[Cell]): Seq[Word] = { val rest = row.dropWhile(_ == Empty) //    if (rest.isEmpty) Seq() //    else { //      ,      val word = new Word(rest.takeWhile(_ != Empty).map(_.asInstanceOf[Letter])) //         word +: extractWords(rest.dropWhile(_ != Empty)) } } 

To select all the words, select the words from the rows, transpose the field and select the words from the columns. At the same time, remove single-letter words.

 val words = (field.flatMap(extractWords) ++ field.transpose.flatMap(extractWords)).filter(_.letters.size != 1) 

For a quick search for intersecting words, we will store information in (future) letters, which words pass through them and the number of the letter in this word:

 case class Letter() extends Cell { var words = Array.empty[(Word, Int)] } for (word <- words; (letter, index) <- word.letters.zipWithIndex) { letter.words :+= (word, index) } 

Bust


In the search procedure, we will transfer two mappings (Map): variants and assignment. In words, for each word that has not yet been considered, there is a list of words that can stand in this place (initially all the words of the required length), and in the assignment, the recorded values ​​of the words already considered (initial empty):

 //  def search(variants: Map[Word, List[String]], assignment: Map[Word, String]) //  search(words.map(w => (w, random.shuffle(dictionary(w.letters.size).toList))).toMap, Map.empty) 

Note that when you call, you shuffle the words in the list, so that with different launches there will be different crosswords.

Now we can write the first version of busting:

 def search(variants: Map[Word, List[String]], assignment: Map[Word, String]) { if (variants.isEmpty) { //    -   dump(assignment) } else { //        val (word, vars) = variants.head //    for (variant <- vars) { //      var newVariants = variants - word //   ,   ,     for ((letter, char) <- word.letters.zip(variant); (other, index) <- letter.words; if newVariants.contains(other)) { //    ,      (index)    newVariants = newVariants.updated(other, variants(other).filter(var => var(index) == char)) } //   search(newVariants, assignment.updated(word, variant)) } } } 

Unfortunately, in a reasonable time, this iteration is not able to find a solution even for the simplest of the grids (9x9) mentioned in the original
article:

image

Correct brute force


Fortunately, this problem is easy to fix. To do this, simply select the correct order of consideration. Now the first available word is taken in the search.

Are there more effective strategies? There is, and one of them was proposed back in 1965 in the article SW Golomb, LD Baumert Backtrack Programming // Journal of the ACM Volume 12 Issue 4, 1965 . Unfortunately, I was not able to find this article in the public domain, but the idea itself is very simple: it is necessary to sort through the thing for which there are fewer options.

In our case, this means that we should not take an arbitrary word, but one that has the least suitable options left:

 val (word, vars) = variants.minBy(_._2.size) 

Note that with this we get two nice bonuses for free. First, if for some word there are no options left, then it will be immediately selected and this branch will be cut off without further search. Secondly, if there is only one option left for the word, then it will be immediately put in place, which will reduce brute force in future.

This optimization allows you to instantly find solutions for the 9x9 grid and in more than 50% of cases, find a solution for the “simple” 15x15 grid:

image

However, the 11x11 grid and the 15x15 “complex” grid still require too much time.

Do not perform the same actions again.


As in most programs, in our enumeration most of the time is occupied by the innermost cycle, in which we remove the “inappropriate” words:

 newVariants = newVariants.updated(other, variants(other).filter(_(index) == char)) 

At first glance, there is no loop in this line, but in fact it is hidden in the implementation of the filter method. How can we save time here? And let's cache the results of this cycle!

Note that here lists of words are calculated, in which some of the letters are fixed and some are not known. We will represent such a "partial" word in the form of a mask of the format " ??? ", that is, indicating question marks instead of unknown letters.

Now in the search, we will not operate with lists of "suitable" words, but with masks + cache the results of the calculation of lists:

 def search(masks: Map[Word, String], assignment: Map[Word, String]) { if (masks.isEmpty) { //    -   dump(assignment) } else { //       val (word, mask) = masks.minBy(w => cache(w._2).size) //   for (variant <- cache(mask)) { var newVariants = masks - word //  ""  for ((letter, char) <- word.letters.zip(variant); (other, index) <- letter.words; if newVariants.contains(other)) { val oldMask = masks(other) val newMask = oldMask.updated(index, char) //  ,     cache.getOrElseUpdate(newMask, cache(oldMask).filter(_(index) == char)) newVariants = newVariants.updated(other, newMask) } search(newVariants, assignment.updated(word, variant)) } } } 

Before using the new search function, you need to fill the cache with words from the dictionary and generate initial masks for all words:

 for ((length, variants) <- dictionary) { cache.getOrElseUpdate("?" * length, random.shuffle(variants.toList).toArray) } search(words.map(w => (w, "?" * w.letters.length)).toMap, Map.empty) 

Adding caching allows iteration to find a solution even for the most complex grid:

image

A crossword puzzle, generated in 30 seconds (in fact, in this case, lucky with random, the crossword is quickly generated somewhere in one of 20 cases):

 Akast Ahum Ahum 
 lager aloa lara 
 anami rokk erma 
 radio transmitter 
 young tinder     
    aarne phloem 
 maar ketch actor 
 antidepressants 
 atasu aapa yauk 
 Kartli whole    
     insi clouds 
 addition 
 Raha Alla Anand 
 head loik Kindy 
 anna aedie cake  

findings


The use of a pair of simple optimizations made it possible to easily implement a solution that is better than a heuristic solution based on a deep knowledge of the subject area. Choose the correct brute force and do not repeat the same calculations!

Source: https://habr.com/ru/post/251207/


All Articles