Cryptarithm generator

In a recent article, I returned a seine net with mud , I gave a link to the Wikipedia frequency dictionary. The number of downloads on orders exceeded all my expectations. I felt a great spiritual relationship with the readers of Habr. One part of those who downloaded (like me!) Loves to mess around with words and dictionaries in every possible way, and the second part (like me!), Seeing an interesting artifact in the open spaces of the network, immediately grabs it and drags it into its nest, and what to do with it - then we'll figure it out!

For the first part I have a request. If you have found an interesting use of the dictionary or you have an idea of such an application and this is all not a commercial secret, please share in the comments.

And for the second part, for those who downloaded the dictionary, and now painfully think what to do with fallen happiness, I want to write a few articles. Actually with this and start.

Cryptaphysm is a type of math puzzle. In an arithmetic identity, each digit is replaced by a letter (the same numbers - the same letters, different - different). The solution of a cryptariff is such a substitution, in which the identity becomes true. As a rule, cryptographic has only one solution.
')
The most famous cryptariff is “SEND + MORE = MONEY”, published in 1924 by the English mathematician Henry Dudeny. His (only) solution is 9567 + 1085 = 10652.

Solving a cryptariff manually can be quite difficult, depending on the specific puzzle. But it is even harder to come up with cryptariffs in such a way that it consists of several related words or even a meaningful phrase. To invent a puzzle to multiply at least three-digit numbers, the task for a person is extremely difficult.

Much has been written about automatic cryptariff solvers, however, with modern machines, a stupid brute force solver is written in a few lines and decides in quite reasonable time.

In this article I will show how, having a large dictionary at hand, write an assistant program for composing beautiful cryptagrams.

The program will work as follows:
the function gets the beginning of the cryptariff, up to the equal sign, for example, “SEND + MORE”, and prints a list of complete cryptariffs with one solution.

Under the spoiler explanation of the algorithm.

Algorithm.
We iterate over all possible substitutions for the resulting cryptariff beginning. Substitutions in which one or several numbers start from zero are rejected as superfluous. For each substitution, we calculate the result of the arithmetic expression resulting from its use. If there are numbers from the substitution in this result, we replace them with the corresponding letters and remember that those numbers in the substitution could not be replaced with letters from the substitution. We are looking for words in the dictionary that match the resulting pattern, along the way we fill in the dictionary, in which each found word corresponds to a list of substitutions, as a result of which this word can turn out. In the end, we select only those words that can be obtained as a result of only one substitution (the cryptariff has one solution) and derive them. You can display by sorting by frequency in descending order, so that any garbage (and there is a lot of it in Wikipedia) appears at the bottom of the resulting list.

Consider one step of the algorithm:

user entered - habr * habr

the algorithm tries the next substitution, for example - x = 7, a = 6, b = 1, p = 8

first the calculation is 7618 * 7618 = 58033924

in the resulting number there is 8, which corresponds to the letter p, so we need to find in the dictionary all the words that match the pattern 5p033924 (or more simply, the words in which the second letter “p” and the fourth and fifth letters are the same), and Note that the numbers 5, 0, 3, 9, 2, 4 cannot correspond to the letters x, a and b, since these letters are already occupied by the numbers 7, 6 and 1, respectively.

When the algorithm works, we leave only those words that can be obtained only by one substitution and display the corresponding lines, in particular Habr * Habr = trolling, 7618 * 7618 = 58033924 ( ATTENTION! kriftrifmov).

The only non-trivial part of the algorithm is the search for words with certain properties in a huge dictionary. To do this, when loading a dictionary, the program fills in several indices.

0) word_list - a list of all vocabulary words

1) pattern_index - an index of structures. The structure of the word is determined by such a function:

def pattern(st): d={} rv=[] for l in st: if not(l in d): d[l]=len(d) rv.append(d[l]) return tuple(rv)

The index in the dictionary is the structure, and the value is the set of all words (or rather not the words themselves, but their indices in the word_list list) with a similar structure. With the help of this dictionary, you can instantly find all the words in which the number and location of the letters coincides with the given word or pattern:

 for i in pattern_index.get(pattern(u''),set()): print words_list[i],

Caracas Kabakah Cossacks Balaban ordered Bilibin Galagan of income Drum Barash ordered Quail flood of rotors Caracal Cossacks Cilicia Cilicia Orders ordering Belebei rotor argument

2) letters_order_index - the index of the arrangement of letters. The key is the letter and its serial number in the word. Meaning - a set of all words (more precisely ... well, you understand) in which this letter is in a given position.

 for i in letters_order_index.get((1,u''),set()): print words_list[i],

Bykov's sayings of Duc was a trotter to show that the ski looked were quick to breathe a haze a ski rash ousted by release after listening won won a trotting protruding skier Sychev redeemed expressed expulsion to the highest ....

3) letters_existance_index - the index of finding letters in words. The key is the letter. Meaning - a set of all words (more precisely ..., etc.) in which there is this letter.

 for i in letters_existance_index.get((1,u''), set()): print words_list[i],

announced declared united conjugation presented united traveled to travel inedible load-lifting elevated driveways voluminous detour combined inexplicable entry announced volume announced unifying filming filming present entry removed access lenses ...

Since The listed indices contain sets (sets), with them you can easily perform operations on sets - union, intersection, difference, etc.

Now it becomes completely clear how to search for words in the lookup example.

first we find the set of all words whose structure is the same as that of 58033924, then we get its intersection with the set of words whose second letter is “p”, because of what happened we subtract sets of words with letters “x”, "A" and "b". what comes out in the end is the set of words for the substitution x = 7, a = 6, b = 1, p = 8.

And under this is the program code.

 import re, itertools import codecs from collections import defaultdict from copy import copy def pattern(st): d={} rv=[] for l in st: if not(l in d): d[l]=len(d) rv.append(d[l]) return tuple(rv) def substitutions_generator(st): words = re.split('[-*+]',st) letters = list(set(''.join(words))) first_letters = set([w[0] for w in words]) for comb in itertools.combinations(range(10),len(letters)): d = dict(zip(letters,comb)) if not any(d[k] == 0 for k in first_letters): yield d def eval_substitution(st,substitution): reverse_substitution = {} for k in substitution: reverse_substitution[str(substitution[k])] = k st = st.replace(k,str(substitution[k])) result = str(eval(st)) tojd = st + "=" + result forbidden = set([]) #,      substitution for k in reverse_substitution: if not(k in result): forbidden.add(reverse_substitution[k]) else: result = result.replace(k,reverse_substitution[k]) return result,tojd,forbidden def gen_indexes(path, limit = None): freq_dict = {} pattern_index = defaultdict(set) letters_order_index = defaultdict(set) words_list=[] letters_existance_index = defaultdict(set) for i,l in enumerate(codecs.open(path,"r","utf-8-sig")): if limit and i>limit:break w,n=l.split() words_list.append(w) index = len(words_list)-1 freq_dict[index]=int(n) pattern_index[pattern(w)].add(index) for k in list(enumerate(w)): letters_order_index[k].add(index) for l in w: letters_existance_index[l].add(index) return words_list, pattern_index, letters_order_index, letters_existance_index, freq_dict def generate_cryptarithm(st, indexes): words_list, pattern_index, letters_order_index, letters_existance_index, freq_dict = indexes d=defaultdict(list) for sub in substitutions_generator(st): res,tojd,forbidden = eval_substitution(st,sub) cur_indexes=copy(pattern_index.get(pattern(res),set([]))) if not cur_indexes: continue for lk in list(enumerate(res)): if not(lk[1] in '0123456789'): cur_indexes&=letters_order_index.get(lk,set([])) for l in forbidden: cur_indexes-=letters_existance_index[l] if cur_indexes: for w in cur_indexes: d[w].append((sub,tojd)) for k in sorted(d.keys(), key = lambda x:freq_dict[x], reverse = True): if len(d[k]) ==1: tojd=d[k][0][1] print "%s=%s,%s"%(st,words_list[k],tojd)

In order to use this module, you can import two functions from it - gen_indexes and generate_cryptarithm .

Let's generate a decent response to send + more = money:

 # -*- coding: utf-8 -*- from cryptarithm import generate_cryptarithm,gen_indexes indexes = gen_indexes("wiki_freq.txt", 400000) l1=[u'',u'',u'',u''] l2=[u'',u'',u'',u'', u''] for w1 in l1: for w2 in l2: generate_cryptarithm(w1+u'+'+w2,indexes) generate_cryptarithm(w1+u'*'+w2,indexes) generate_cryptarithm(w1+u'-'+w2,indexes)

We get a lot of options, it remains only to view and select.

Personally, I am most sympathetic out-yet = roach, 43198-505 = 42693

The code is distributed under license!

License.
Take who wants. Do what you want - change, distribute, sell. It is not necessary to indicate my authorship. But if you do something interesting with the program or get interesting results as a result of its use, please write a comment.

Source: https://habr.com/ru/post/189050/

All Articles

Cryptarithm generator

More articles: