Haskell Japanese Crossword Solving

Japanese crossword puzzle is a puzzle in which you need to recreate the original black and white image by a set of numbers. Each row and each column of pixels corresponds to its own set, each number in which, in turn, corresponds to the length of a block of consecutive black pixels. There must be at least one white pixel between such blocks, but their exact number is unknown. Magazines devoted entirely to these puzzles are in most newsstands, so I think almost everyone has met them at least once, and therefore a more detailed description can be omitted here.

At some point I wanted to “teach a computer” to solve Japanese crosswords in the way I solve them myself. No high end, just for fun. Later, methods were added that I myself cannot use due to the limited capabilities of the human brain, but, in fairness, the program copes with all the crossword puzzles from the journals without them.

So, the task is simple: solve a crossword puzzle, and if there are many solutions, then find them all. The solution is written in Haskell, and, although the code rather substantially complements the verbal description, even without knowledge of the language, the general essence can be understood. If you want to feel the result live, on the project page you can download the source code (I did not post binary builds). Solutions are exported to Binary PBM, and conditions can be extracted from it.
')

Despite the fact that I tried to write as clearly as possible, I did not fully succeed. There are a lot of letters and code under the cut and almost no pictures.

Bitmask

The basis of the entire program is your bike for a bit mask. It is not too fast, but it has a property that was important to me during the debugging process: it drops during operations that have no meaning, namely, during any operation on masks of different lengths. I will give here only the signatures of functions and a picture explaining the principle of their work; the implementation is very primitive and has no direct relation to the solution.

bmCreate :: Int -> BitMask bmLength :: BitMask -> Int bmSize :: BitMask -> Int bmIsEmpty :: BitMask -> Bool bmNot :: BitMask -> BitMask bmAnd :: BitMask -> BitMask -> BitMask bmOr :: BitMask -> BitMask -> BitMask bmIntersection :: [BitMask] -> BitMask bmUnion :: [BitMask] -> BitMask bmSplit :: BitMask -> [BitMask] bmByOne :: BitMask -> [BitMask] bmExpand :: BitMask -> BitMask bmFillGaps :: BitMask -> BitMask bmLeftIncursion :: Int -> BitMask -> BitMask bmRightIncursion :: Int -> BitMask -> BitMask bmTranspose :: [BitMask] -> [BitMask]

I think that such a graphic description is exhaustive for all functions, except, perhaps, bmLeftIncursion and bmRightIncursion . Why they are needed, it will be clear later, the principle of their work is the following: bmLeftIncursion finds the leftmost filled bit and creates a mask in which all the bits before it are filled, as well as so many bits since it, as specified when calling the function; the second function works the same way.

Structure

Since the solution of a crossword puzzle is along lines, the type corresponding to the entire field is a set of all horizontal and vertical lines, although this leads to the duplication of all cells of the crossword puzzle.

 data Field = Field { flHorLines :: [Line], flVerLines :: [Line] } deriving Eq

Each line stores information about cells and blocks (the block corresponds to the number in the condition).

 data Line = Line { lnMask :: LineMask, lnBlocks :: [Block] } deriving Eq

Information about cells is stored in the form of two bit masks of the same length, representing filled and blocked cells.

 data LineMask = LineMask { lmFilledMask :: BitMask, lmBlockedMask :: BitMask } deriving Eq

The block, in addition to the number itself, contains a mask that corresponds to the area of the line in which this block may be located.

 data Block = Block { blScopeMask :: BitMask, blNumber :: Int } deriving Eq

At the beginning of the solution, the masks of painted and blocked cells are empty, and the block mask, on the contrary, is completely filled. This means that all cells are empty, and each block can be in any part of the line. The solution process is to narrow the area of each block to a size equal to its number, and fill the masks accordingly.

Completion and synchronization

All of the above types (except BitMask ) are instances of two classes: Completable and Syncable .

The only function of the Completable class is showing the "completeness" of an object. A field is considered complete if all its lines are completed. A line is completed if all its blocks are completed; the completeness of the mask is unnecessary to demand (it follows from the completeness of the blocks; why, again, it will be clear a little later). To complete a block, as mentioned above, it is necessary that the size of its area coincides with its number.

 class Completable a where clIsCompleted :: a -> Bool instance Completable Field where clIsCompleted fl = all clIsCompleted (flHorLines fl) && all clIsCompleted (flVerLines fl) instance Completable Line where clIsCompleted ln = all clIsCompleted (lnBlocks ln) instance Completable Block where clIsCompleted bl = bmSize (blScopeMask bl) == blNumber bl

The Syncable class provides functions that allow you to bring together different solution branches. snAverage distinguishes from two branches only the general, and snSync - that manifested itself in at least one branch (we can consider them as generalizations of the bmAnd and bmOr respectively). snAverageAll and snSyncAll do exactly the same thing, but they work not with two objects, but with lists of objects.

 class Syncable a where snSync :: a -> a -> Maybe a sn1 `snSync` sn2 = snSyncAll [sn1, sn2] snAverage :: a -> a -> Maybe a sn1 `snAverage` sn2 = snAverageAll [sn1, sn2] snSyncAll :: [a] -> Maybe a snSyncAll [] = Nothing snSyncAll sns = foldr1 (wrap snSync) (map return sns) snAverageAll :: [a] -> Maybe a snAverageAll [] = Nothing snAverageAll sns = foldr1 (wrap snAverage) (map return sns) wrap :: Monad m => (a -> b -> mc) -> ma -> mb -> mc wrap f mx my = do x <- mx y <- my fxy

Consistency

From the description of the functions of the Syncable class Syncable clear that their result is an object wrapped in the Maybe monad. In fact, this is how the important concept of consistency manifests itself, which is also defined for all the above types, but not in a separate class for reasons of encapsulation. As an example, the same cell cannot be simultaneously painted over and blocked; if any operation can lead to such a situation, then it is labeled with the Maybe monad (usually has the type type TransformFunction a = a -> Maybe a ), and if it leads to this situation, then the result is Nothing , because no object in the program can exist in an inconsistent state. Since Nothing , in turn, cannot be a component of other objects, the entire field will become inconsistent, which will mean the absence of solutions.

The consistency of the field is provided by the synchronization of horizontal and vertical lines. Thus, if a cell is in some state (painted over, blocked or empty) in a horizontal line, then it is in exactly the same state in a corresponding vertical line, and vice versa.

 flEnsureConsistency :: TransformFunction Field flEnsureConsistency fl = do let lnsHor = flHorLines fl let lnsVer = flVerLines fl lnsHor' <- zipWithM lnSyncWithLineMask (lmTranspose $ map lnMask lnsVer) lnsHor lnsVer' <- zipWithM lnSyncWithLineMask (lmTranspose $ map lnMask lnsHor) lnsVer return $ Field lnsHor' lnsVer' lnSyncWithLineMask :: LineMask -> TransformFunction Line lnSyncWithLineMask lm ln = do lm' <- lm `snSync` lnMask ln return ln { lnMask = lm' }

We will talk about consistency of the line later, as it is directly related to the decision process.

The consistency of the block is provided nontrivially: for it, it is necessary to exclude those continuous parts that cannot accommodate it from the block area. Thus, if from a block area with the number 3 and the source area

exclude mask

(for example, due to the fact that this cell was blocked), then the final result of this operation will be a block with

and not at all

 blEnsureConsistency :: TransformFunction Block blEnsureConsistency bl = do let bms = filter ((blNumber bl <=) . bmSize) $ bmSplit $ blScopeMask bl guard $ not $ null bms return bl { blScopeMask = bmUnion bms }

For a mask, consistency is obvious and has already been described above: you cannot simultaneously paint over and block the same cell.

 lmEnsureConsistency :: TransformFunction LineMask lmEnsureConsistency lm = do guard $ bmIsEmpty $ lmFilledMask lm `bmAnd` lmBlockedMask lm return lm

Transformations

The transformation operations of masks and blocks are very limited, because in the process of solving the cells you can only paint over and block (you can not change your mind, take an eraser and erase it), and you can only narrow the area of the block.

 lmFill :: BitMask -> TransformFunction LineMask lmFill bm lm = lmEnsureConsistency lm { lmFilledMask = lmFilledMask lm `bmOr` bm } lmBlock :: BitMask -> TransformFunction LineMask lmBlock bm lm = lmEnsureConsistency lm { lmBlockedMask = lmBlockedMask lm `bmOr` bm } blExclude :: BitMask -> TransformFunction Block blExclude bm bl = blEnsureConsistency $ bl { blScopeMask = blScopeMask bl `bmAnd` bmNot bm } blKeep :: BitMask -> TransformFunction Block blKeep bm bl = blEnsureConsistency $ bl { blScopeMask = blScopeMask bl `bmAnd` bm }

Decision

The decision process will be considered in separate parts, until they finally develop into the big picture.

Line consistency

To begin, restore the gap left in the section on consistency, and declare that the line is considered consistent if its mask is filled in accordance with its blocks. Behind this phrase are two points. First, those cells that do not fall into the area of any block should be blocked (if the line does not contain a single block, then all cells are, respectively).

 lnUpdateBlocked :: [Block] -> TransformFunction LineMask lnUpdateBlocked [] lm = lmBlock (bmNot $ lmBlockedMask lm) lm lnUpdateBlocked bls lm = lmBlock (bmNot $ bmUnion $ map blScopeMask bls) lm

Secondly, for each block, using the blToFillMask function, blToFillMask can get a mask that you need to paint over. It is the intersection of two masks, obtained by “driving” the block into the leftmost and rightmost parts of its area.

 blMinimumLeftMask :: Block -> BitMask blMinimumLeftMask bl = bmLeftIncursion (blNumber bl) (blScopeMask bl) blMinimumRightMask :: Block -> BitMask blMinimumRightMask bl = bmRightIncursion (blNumber bl) (blScopeMask bl) blToFillMask :: Block -> BitMask blToFillMask bl = blMinimumLeftMask bl `bmAnd` blMinimumRightMask bl lnUpdateFilled :: [Block] -> TransformFunction LineMask lnUpdateFilled [] = return lnUpdateFilled bls = lmFill (bmUnion $ map blToFillMask bls)

(Note: here we finally used the bmLeftIncursion and bmRightIncursion . Strictly speaking, if they were used only for this purpose, then most likely they would have looked a little different, namely they would not fill the bit mask to the very first bit the original mask.)

Thus, as mentioned earlier, the consistency condition for a line guarantees that its mask will always be completed if all its blocks are completed.

 lnEnsureConsistency :: TransformFunction Line lnEnsureConsistency ln = do let bls = lnBlocks ln lm <- lnUpdateBlocked bls >=> lnUpdateFilled bls $ lnMask ln return $ ln { lnMask = lm }

Simple line conversion

The solution within the line is essentially reduced to two transformations.

The first transformation, in fact, is inverse to the consistency condition: it guarantees that the blocks will be completed if the mask is completed. Three actions are used for this.

All blocked cells must be excluded from the areas of all blocks.

 lnRemoveBlocked :: LineMask -> TransformFunction [Block] lnRemoveBlocked = mapM . blExclude . lmBlockedMask

If the block cannot accommodate any continuous shaded part of the mask (that is, if it crawls out of the block area or has a size larger than its number), then it should be excluded from the block area.

 lnRemoveFilled :: LineMask -> TransformFunction [Block] lnRemoveFilled lm = mapM (\ bl -> foldM f bl $ bmSplit $ lmFilledMask lm) where f bl bm = if blCanContainMask bm bl then return bl else blExclude (bmExpand bm) bl blCanContainMask :: BitMask -> Block -> Bool blCanContainMask bm bl = let bm' = bmFillGaps bm in bmSize bm' <= blNumber bl && bmIsEmpty (bm' `bmAnd` bmNot (blScopeMask bl))

blMinimumLeftMask its left neighbor and blMinimumRightMask right neighbor should be excluded from the area of each block (here they are needed exactly in the form described above). To be precise, these masks are expanded by one cell, since there must be at least one empty cell between the blocks.
```
 lnExcludeNeighbours :: TransformFunction [Block] lnExcludeNeighbours bls = sequence $ scanr1 (flip $ wrap $ blExclude . bmExpand . blMinimumRightMask) $ scanl1 (wrap $ blExclude . bmExpand . blMinimumLeftMask) $ map return bls 
```

Together, these actions form the following function (the slLoop function will be described later):

 lnSimpleTransform :: TransformFunction Line lnSimpleTransform ln = do let lm = lnMask ln bls <- lnRemoveBlocked lm >=> slLoop (lnRemoveFilled lm >=> lnExcludeNeighbours) $ lnBlocks ln lnEnsureConsistency ln { lnBlocks = bls }

Second line transform

If we take the leftmost of all blocks, which in principle can contain some shaded part of the mask, then its extreme right position will be limited to this mask itself, because if it moves further to the right, then there will be no one to “give” this shaded area. The same considerations are true for the rightmost of such blocks.

 lnExtremeOwners :: BitMask -> TransformFunction [Block] lnExtremeOwners bm bls = do bls' <- fmap reverse $ maybe (return bls) (f bmLeftIncursion bls) (h bls) fmap reverse $ maybe (return bls') (f bmRightIncursion bls') (h bls') where fg = varyNth (\ bl -> blKeep (g (blNumber bl) bm) bl) h = findIndex (blCanContainMask bm) varyNth :: Monad m => (a -> ma) -> [a] -> Int -> m [a] varyNth f xs idx = do let (xs1, x : xs2) = splitAt idx xs x' <- fx return $ xs1 ++ x' : xs2

Applying this reasoning to each continuous part of the mask, we obtain the second transformation of the line:

 lnTransformByExtremeOwners :: TransformFunction Line lnTransformByExtremeOwners ln = do bls <- foldM (flip lnExtremeOwners) (lnBlocks ln) $ bmSplit $ lmFilledMask $ lnMask ln lnEnsureConsistency ln { lnBlocks = bls }

Field conversion

The field has no special transformations of its own, the only option for it is to take some ready-made transformation and apply it to all its lines.

 flTransformByLines :: TransformFunction Line -> TransformFunction Field flTransformByLines f fl = do lnsHor <- mapM f (flHorLines fl) fl' <- flEnsureConsistency fl { flHorLines = lnsHor } lnsVer <- mapM f (flVerLines fl') flEnsureConsistency fl' { flVerLines = lnsVer }

Branching

Since the solution of Japanese crosswords is an NP-complete problem, it will not be possible to do without branching. A branch is defined by a function of type type ForkFunction a = a -> [[a]] , where the internal list includes mutually exclusive options, and the external list contains various ways to produce these options.

The simplest way is branching into cells: each empty cell spawns one element of the external list, which in turn is a list of two elements, in one of which this cell is filled and in the other one is blocked.

 lnForkByCells :: ForkFunction Line lnForkByCells ln = do let lm = lnMask ln bm <- bmByOne $ lmEmptyMask lm return $ do lm' <- [fromJust $ lmBlock bm lm, fromJust $ lmFill bm lm] maybeToList $ lnEnsureConsistency ln { lnMask = lm' } flForkByCells :: ForkFunction Field flForkByCells fl = do let lnsHor = flHorLines fl let lnsVer = flVerLines fl idx <- findIndices (not . clIsCompleted) lnsHor let (lns1, ln : lns2) = splitAt idx lnsHor lns <- lnForkByCells ln return $ do ln' <- lns maybeToList $ flEnsureConsistency $ Field (lns1 ++ ln' : lns2) lnsVer

For the line, another branching method is also available: for each continuous shaded part of the mask (external list), we can consider a set of blocks that can contain it (internal list) as options defining branches.

 lnForkByOwners :: ForkFunction Line lnForkByOwners ln = do let bls = lnBlocks ln bm <- bmSplit $ lmFilledMask $ lnMask ln case findIndices (blCanContainMask bm) bls of [_] -> [] idxs -> return $ do idx <- idxs maybeToList $ do bls' <- varyNth (g bm) bls idx lnEnsureConsistency ln { lnBlocks = bls' } where g bm bl = blKeep ((bmAnd `on` ($ bm) . ($ blNumber bl)) bmLeftIncursion bmRightIncursion) bl

Generalized functions

Most of the changes it makes sense to apply iteratively. In this case, you can simply apply the transformation as long as it changes at least something, but it is possible (in the case when an extra application can take a considerable amount of time) to preliminarily check the object for completeness.

 slLoop :: Eq a => TransformFunction a -> TransformFunction a slLoop fx = do x' <- fx if x == x' then return x else slLoop fx' slSmartLoop :: (Completable a, Eq a) => TransformFunction a -> TransformFunction a slSmartLoop fx | clIsCompleted x = return x | otherwise = do x' <- fx if x == x' then return x else slLoop fx'

The results of the branch can be processed regardless of the specific data type and branching method. To do this, applying a certain branching method, and then applying any transformation to each resulting object, for each set of mutually exclusive branches, it is necessary to take an average value, and then synchronize these averaged objects obtained by different branch points. I will not describe in detail, but an optimized version is also available for this operation, which is related to completeness testing.

 slForkAndSyncAll :: (Syncable a) => ForkFunction a -> TransformFunction a -> TransformFunction a slForkAndSyncAll fgx = do xs <- mapM (snAverageAll . mapMaybe g) $ fx snSyncAll (x : xs) slForkAndSmartSync :: (Syncable a, Completable a, Eq a) => ForkFunction a -> TransformFunction a -> TransformFunction a slForkAndSmartSync fgx = foldr h (return x) (fx) where h xs mx = do x' <- mx if clIsCompleted x' then mx else case mapMaybe (snSync x') xs of [] -> Nothing xs' -> case filter (/= x') xs' of [] -> return x' xs'' -> snAverageAll . mapMaybe g $ xs''

Finally, if nothing else helps, you can go into recursion. Only in this way you can get all the solutions, if there are several.

 slAllSolutions :: (Completable a) => ForkFunction a -> TransformFunction a -> a -> [a] slAllSolutions fgx = do x' <- maybeToList $ gx if clIsCompleted x' then return x' else case fx' of (xs : _) -> do x'' <- xs slAllSolutions fg x'' [] -> []

Fina venko

Everything. The tools available are enough to get a solver in a few simple steps.

Combine the two line transformations.

 lineTransform = slSmartLoop $ lnSimpleTransform >=> lnTransformByExtremeOwners

Process line-specific branching.

 lineTransform' = slForkAndSyncAll lnForkByOwners lineTransform

We compose a field transform from these two transformations.

 fieldTransform = slSmartLoop $ slSmartLoop (flTransformByLines lineTransform) >=> flTransformByLines lineTransform'

We process the results of the field branching in the cells.

 fieldTransform' = slForkAndSmartSync flForkByCells fieldTransform

Combine the previous two conversions.

 fieldTransform'' = slSmartLoop $ fieldTransform >=> fieldTransform'

And finally, add recursion.

 solve = slAllSolutions flForkByCells fieldTransform''

Afterword

The program works fairly quickly on crosswords that have the only solution: approximately out of the thousands of crosswords I have on my laptop, only two (including the preface) are solved for more than a minute, almost all fit in 10 seconds, and none required recursion.

Theoretically, with some refinement, the program can be used to automatically assess the complexity of crossword puzzles (since the solution methods are generally similar to those used by humans) and proof of the uniqueness of the solution; There is an export to LaTeX, and it may even appear soon in SVN. So if you wish, you can organize a home issue of magazines :)

Source: https://habr.com/ru/post/151819/

All Articles