Using monads in C ++. Part 1: List Monad

Part 1

Part 2

Sometimes C ++ programmers ask for an example of a task that cannot be solved without the use of monads. Let's start with the fact that this question is incorrect in itself - it’s all the same as asking whether there is a task that cannot be solved without cycles. Obviously, if your language has support for the goto statement, you can do without using loop statements. What monads (and loops) can do for you is to simplify your code and help structure it better. Just as the use of loops turns spaghetti code into normal, and using monads can turn your imperative code into declarative. This transformation can help you write, understand, maintain and extend your code more easily.

Well, here's a problem for you that can get caught in an interview. It is not entirely trivial, perhaps several approaches to the solution and the best of them is not immediately obvious - just something that is worth thinking about.

You are invited to the following puzzle:

')

send + more --------- money

Each letter corresponds to a digit from 0 to 9. It is necessary to write a program that selects such matches so that the written addition operation is correct. Before you continue reading the article - think for a moment, how would you solve this problem?

Analysis

It never hurts to impress the interviewer with your knowledge by correctly classifying the problem. This task belongs to the class of " problems of satisfaction of restrictions ". The obvious limitation is that the numbers obtained by substituting digits instead of letters should, when added, give the number corresponding to the given one. There are less obvious limitations, for example, the terms of numbers should not start from scratch.

If you try to solve this problem with a pen and paper, you will quickly find several heuristics. For example, it is immediately clear that m should be equal to 1, because there are no two such four-digit numbers that, if added, would give a number greater than 19998. Then you find out that s should be equal to 8 or 9, because otherwise we will not get transfer of discharge to obtain a five-digit number, etc. Having a sufficient amount of time, you may write an expert system, with a sufficiently large number of rules, able to solve this problem (and maybe others like it). By the way, the mention of the expert system can give you a couple of extra points at the interview.

There is another approach - it is worth noting that the task has a rather small dimension - we work with 4-5 digit integers, which are not so many. The interviewer may ask you to ask for the number of permutations that you will need to sort through - there are 10 of them! / (10-8)! = 1814400 pieces. Just a little for a modern computer. Thus, the solution of the problem is reduced to the generation of these permutations and checking them for compliance with the constraints of the original problem.

The solution "in the forehead"

The classical imperative approach immediately generates a code with 8 nested loops (we have 8 unique letters s, e, n, d, m, o, r, y ). It turns out something like this:

 for (int s = 0; s < 10; ++s) for (int e = 0; e < 10; ++e) for (int n = 0; n < 10; ++n) for (int d = 0; d < 10; ++d) ...

And here we need to check the condition. But not the sum condition, from the original problem - first of all we need to check that all the numbers in our permutation are different, otherwise there is no sense in it.

 e != s n != s && n != e d != s && d != e && d != n

...

and so on, in the last line there will be 7 comparisons, and there will be 28 of them in total. And only after that it is possible to collect from the numbers " send ", " more " and check whether the amount of money has turned out.

In general, the problem is solved and the interviewer will not be able to say what is wrong. But just a minute - can't we think of something better?

Smart decision

Before continuing, I must say that everything that will be written next will be almost a direct translation from Haskell of the program written in the blog Justin Le . I strongly advise everyone to study Haskell and read similar articles in the original, but in this case I will work as a translator.

The problem is in our “frontal” solution in those 28 checks. Yes, our program has earned, but it happened because of its small dimension. There are problems with huge dimensions and a much larger number of conditions, so it makes sense to come up with some general approach to solving them.

The problem can be formulated as a combination of two smaller problems. One of them concerns the depth, and the second - the width of the search for a solution.

Let's first look at the problem of depth. Imagine the task of creating just one substitution of numbers instead of the letters of the original puzzle. This can be described as choosing 8 digits from a list of 0, 1, ... 9, one digit at a time. As soon as the number is selected - we remove it from the list. We do not want to hard-score the list in our code, so we will make it a parameter of our algorithm. Note that with this approach the algorithm will work even for a list with duplicates or for a list in which comparison and equality operators cannot be defined for elements (for example, the list from std :: future ).

Now let's look at the width problem: we need to repeat the above process for all possible permutations. This is exactly what 8 nested loops do in the code above. There is one problem: each step in selecting the next item from the list is destructive - it changes the list. This is a known problem in the search for a solution space and its standard solution is backtracking. Once you have processed a candidate, you put the item back into the list and try the next one. This means that you need to keep track of your current state, either implicitly, storing it in the function call stack, or explicitly in some data structure.

Wait a second! Are we not going to talk about functional programming? So why all this talk about modifications and condition? Well, who said that we cannot have a state in functional programming? Programmers in functional languages have used the state monad since the days when dinosaurs walked the Earth. And modifications are not a problem if you use persistent data structures. So fasten the seat belts and bring the back of the chair into a vertical position, we take off.

List monad

We begin with a brief reference to quantum mechanics. As you can remember from school, quantum processes are not deterministic. You can do the same experiment several times and get different results in each case. There is a very interesting point of view on quantum mechanics, called the " Multi-World Interpretation ", in which each experiment generates several alternative historical lines, in each of which it gives its result, and we just fall into one of these "universes" and observe one specific ( unknown in advance) the outcome of the experiment.

We use the same approach to solving our puzzle. We will create “alternate universes” for each permutation of numbers corresponding to our letters. Thus, we will start with 10 universes for the letter s : in the first it will have the value 0, in the second it will be 1, etc. Then we divide each of these universes by another 10 by the letter e, and so on. Most of these universes will not satisfy our conditions, so we will have to destroy them. Such a simple approach to the destruction of universes may seem wasteful and resource-intensive, but in functional programming this is not a problem: they quickly created an alternate reality and destroyed it just as quickly. This is because new universes are not so different from those on which they were generated and they can share most of the data with their “parents”. This is the idea of persistent data structures. We have an immutable data structure that can generate a new one by cloning. The new data structure shares most of the data with the parent, except for a small “delta”. We will use the persistent lists described here in this post .

Once you become aware of this approach with “multiple universes”, the implementation becomes quite simple. First, we need a function that will create new universes. Since we have agreed that it should be “cheap”, we will only generate parts that are different. What is the difference between all universes generated by variable s ? Only the value of the variable s . There are a total of 10 such universes, corresponding to the values of s from 0 to 9 (the fact that s cannot be equal to 0 will be discussed later). Thus, all we need is a function that generates a list of 10 digits. Well, here they are - our 10 universes in all their pure, pristine beauty.

And here we are in one of these alternative universes (actually in all at once, but right now we are considering getting into one of them). We must somehow live on. In functional programming, your entire remaining life is a call to a function called " continuation ." I know it sounds like a terrible simplification. All your actions, emotions and hopes are reduced to just one function call. Well, let's think that the “continuation” describes only some part of your life in this universe, its “computational” component, and everything else happens somewhere separately.

So, what does our life in this universe come down to and what does it generate? The input is the universe itself, in which we are, in particular for our example, one of the values of s , which we chose when generating this universe. But since we live in the space of quantum experiments, we will have a set of new universes at the output. Thus, the “continuation” receives a number as an input, and generates a list. It is not necessarily a list of numbers, but a list of something that describes the differences of the generated universes from each other.

So what is the point of this new approach? This is a binding choice of the universe to the "continuation". This is where all the action takes place. This binding, again, can be expressed as a function. This is the function that receives the list of universes and the “continuation”, which generates a new list of universes (larger). We will call this function for_each and we will try to write it as generalized as possible. We will not assume anything about the types of universes transferred or generated. We will also make the "continue" type a template parameter and define the type of the return value using auto and decltype :

 template<class A, class F> auto for_each(List<A> lst, F k) -> decltype(k(lst.front())) { using B = decltype(k(lst.front()).front()); // ,    F        , //         ++ static_assert(std::is_convertible< F, std::function<List<B>(A)>>::value, "for_each requires a function type List<B>(A)"); List<List<B>> lstLst = fmap(k, lst); return concatAll(lstLst); }

The fmap function is similar to the standard std :: transform — it applies a “continuation” k to each element of the lst list. Since k itself returns a list, the result is a list of lists, lstLst . The concatAll function combines all the elements of all these lists into one large list.

My congratulations! You just looked at the monad. It is called the list monad and can be used, in particular, for modeling non-deterministic processes. A monad is actually defined by two functions. One of them is the above for_each , and here is the second:

 template<class A> List<A> yield(A a) { return List<A> (a); }

This is the yield function, which returns a list of one item. We use yield at that stage, when we finish “multiplication of the universes” and reach the moment when we want to return one value. We use it to create a “continuation” containing only one value. It is a lonely boring life, fully predetermined and devoid of any choice.

Later I will rename these functions to mbind and mreturn , since they define the monad in general, and not just the list monad.

Names like for_each and yield have a rather “imperative” sound. This is, of course, because in the functional programming of the monad, in a certain sense, they play the role of an imperative code. But neither for_each nor yield are data structures — they are functions. More precisely, for_each , which sounds and works like a loop, is a higher order function, like fmap , which is used in its implementation. Of course, at some level the code will become imperative - the same fmap can be written recursively or using a cycle, but at the top level we have only the function declarations. Well, here it is - declarative programming!

There is a significant difference between the loop and the function, like for_each : for_each takes the entire list as an argument, while the loop can generate the necessary values (in our case, integers) on the fly. This is not a problem in a functional language with the support of lazy calculations, like Haskell - the list will be calculated as needed. Similar behavior can be implemented in C ++ using threads or lazy iterators. I will not do it here because the lists we deal with in this task are relatively short, but you can read more about this approach here in this article .

We are not yet ready to write a complete solution to our puzzle, but I will show you a sketch of how it will look. Imagine that StateL is just a list. Look at the meaning of the overall picture:

 StateL<tuple<int, int, int>> solve() { StateL<int> sel = &select<int>; return for_each(sel, [=](int s) { return for_each(sel, [=](int e) { return for_each(sel, [=](int n) { return for_each(sel, [=](int d) { return for_each(sel, [=](int m) { return for_each(sel, [=](int o) { return for_each(sel, [=](int r) { return for_each(sel, [=](int y) { return yield_if(s != 0 && m != 0, [=]() { int send = asNumber(vector{s, e, n, d}); int more = asNumber(vector{m, o, r, e}); int money = asNumber(vector{m, o, n, e, y}); return yield_if(send + more == money, [=]() { return yield(make_tuple(send, more, money)); }); }); });});});});});});});}); }

The first call for_each takes a sample of integers, sel (until we think about their uniqueness), and “continuation” is a lambda function that accepts one integer, s and generates a list of solutions — tuples of three integers. This “continuation”, in turn, calls for_each with a selection for the next letter, e, etc.

The innermost “continuation” is the conditional version of the yield function, called yield_if . It checks the condition and generates either an empty list or a list consisting of one element - a solution. Inside, it once again calls yield_if , which causes a final yield . In this final yield , when it is called (or will not, if all the higher conditions did not work), a solution will be generated - a triple of numbers that satisfies the condition of the original puzzle. If there is more than one solution, all of them will be added to the result list created inside for_each .

In the second part of this series of articles, I will return to the problem of choosing unique numbers and consider the state monad. You can also take a look at the final github code.

Task for independent work

Implement for_each and yield for a vector instead of a list. Use std :: transform instead of fmap
Using the list monad (or the “vector” modification implemented in the previous step), write a function that will generate the names of all the cells of the chessboard (sets of pairs from 'a' to 'h' are numbers from 1 to 8)
Implement the version of the for_each function (let's call it repeat ), which will take the “continuation” k of the function <List <B> ()> type (note the absence of arguments). The function must call k successively for each item in the lst list.

Source: https://habr.com/ru/post/260809/

All Articles