📜 ⬆️ ⬇️

Catamorphism in F #

Introduction


I’ll mention right away that this article was written based on a whole series of posts on the excellent blog Inside F # . Nevertheless, it is not a translation in its pure form, but rather a free statement, in order to explain in accessible language what kind of animal this catamorphism is, and what it is eaten with. I think this word is not very popular, which is at least the fact that there are no articles on it in the Russian Wikipedia (and in general not in one national, except for some reason the Dutch one. Probably the OP somehow corresponds to the spirit of anger smoking)
So, strictly speaking, catamorphism in functional programming is a generalization of the convolution of lists, which (as I have already said ) are a specific type of labeled union, to arbitrary labeled associations.

Roll-up lists


Let's start in order - with the convolution of lists. Actually, we already know what it is and how to use it, but in order to extend it to other data types, we still need to understand how it is implemented.
Here we have a task - to sum up the elements of the list. And as you know , there is no sex in the USSR . (Of course there is, but we will not tell anyone about this). Variant List.fold_left (+) 0 is officially declared cheat. What comes to mind? Well, like this (in the style of beloved teachers, examples of recursion for Fibonacci or factorial):
let rec sum list =
match list with
|[] -> 0
|head::tail -> head + sum tail

No friends, this is certainly a solution, but let's be honest with ourselves - there is nothing to brag about. Because with a cycle length, say, a million, the compiler will give us a System. StackOverflowException, and it will be a thousand times right — you can't mock it like that. Ok, rewrite this case as tail recursion:
let sum_tail list =
let rec loop list acc =
match list with
|[] -> acc
|head::tail -> loop tail (acc+head)
loop list 0

Here we have all the calculations immediately, so dragging the whole tail along doesn’t have any need for the compiler, which always makes him happy. Well, well, what if we need to say to find the length of the list (again without the List.length method)? No problem.
let rec length list =
let rec loop list acc =
match list with
|[] -> acc
|head::tail -> loop tail (acc+1)
loop list 0

I think that even the most observant reader can see the similarity of these two algorithms. The only difference is in the way the battery value is processed. In the first case, for each element under consideration (the head of the remaining piece of the list) we add its value to the battery, in the second - just one. What is it really? Nothing more than a function of 'a -> ' b -> 'a, where' a is a battery type, 'b is a type of a list item.
fun acc h -> acc + h (or just (+)) for the first, fun acc h -> acc + 1 for the second. Convolution of a list is just a higher-order function that applies such a function to all elements of the list in order to get some kind of atomic value. Here's what it looks like:
// ('a -> 'b -> 'a) -> 'a -> list<'b> -> 'a
let rec fold func acc list =
match list with
|[] -> acc
|head::tail -> fold func (func acc head) tail

And it is obvious that:
let sum_tail = fold (+) 0
let length = fold ( fun acc _ -> acc + 1) 0

By the way, you should not be so formal with the words that convolution should return an atomic value. See what this feature does?
let reverse = fold ( fun acc h -> h::acc) []

I think you guessed it turns the list over. That is, its result is also a list - such is the atomic value itself.
Well, well, sort of dealt with the convolution. Not really, really. After all, this is our so-called left-associative convolution, that is, we look at the elements and perform the coagulation function on them from head to tail. So (f - coagulation function): f (... f (acc i0) i1) i2) ... ik) And how would we make a right associative function, so that it would be like this: f i0 (f ... (f ik acc)) )) (Why? Because it will be very convenient for her to spread our convolution to other data types)
We write the function by analogy with the left associative convolution:
let rec fold_right func acc list =
match list with
|[] -> acc
|head::tail -> func head (fold_right func acc tail)

and note that we now have recursion inside the function calculation, so the compiler will have to drag it to the stack, in short, forgive-bye tail recursion, hello inevitable stack overflow. To avoid this shame, we need to somehow get to the very end of the list, while remembering a sequence of elements in reverse order to collapse them. But not in the stack, of course, to do this, as in the above example, but in some data structure. The simplest way is obvious - expand the list, and then apply a left-associative convolution on it. In this case, our supporting structure will be an inverted list. Everything is simple and obvious.
However, we will go the other way. As an auxiliary structure, we will use the continuation function. What it is? - This is a function that contains all the necessary course of calculations, but does not perform, notice, the calculations themselves, until we specifically point out to her. Here we want to get such a function: cont x = f i0 (f ... (f ik x)))), everything is strictly according to the definition. At the right moment, it will take as a parameter the initial value of the battery and will count everything at once, as ordered. A great feature, isn't it? It remains to receive it:
let fold_right func acc list =
let rec loop list cont = //
match list with
|[] -> cont acc // .
|head::tail -> loop tail ( fun racc -> cont (func head racc))
loop list ( fun x -> x)

Notice that the function has now become tail-recursive again, all calculations are performed immediately, and the next recursion step is passed to their result — the updated continuation function. And yet, throughout the work of the function, until the list is exhausted, acc is equal to the initial value, that is, it is not worthy of being called a battery. Rather, it is init_value. And the continuation function itself changes from step to step like this:
0: x -> x
1: x -> f i0 x
2: x -> f i0 (f i1 x)
3: x -> f i0 (f i1 (f i2 x))

I think further to paint a course of calculations is not necessary. Substituting the initial value instead of x, we get exactly what we wanted.
Well, we figured out the convolutions of the lists. True, it is not clear yet what the garden was for, but soon you will understand everything.

Convolution of trees


So, as mentioned in the introduction, cathamorphism is a generalization of the convolution of lists to any algebraic types, or labeled unions, as they are called in F #. (As you remember, the list is also a marked association)
And now we will consider the marked association:
type Tree<'a> =
| Node of 'a * Tree<'a> * Tree<'a>
| Leaf

It’s not hard to guess, we have a binary tree that has intelligent nodes (the tuple that describes it is a value and two branches), and stub sheets, the use of which in the node tuple simply means that it doesn’t have this branch .
Here is an example of such a tree:
let tree = Node(4, Node(2,Node(1, Leaf,Leaf),Node(3,Leaf,Leaf)),Node(6,Node(5, Leaf,Leaf),Node(7,Leaf,Leaf)))

And now we really want to perform some quite vital operations on this tree: to find the sum of all values, or to find the height, well, it would be useful to stretch the list. The naive recursive solutions of these problems are as indigestible as the ones on the lists.
For example, the simplest way to decompose into a list
let rec to_list tree =
match tree with
|Node(v, ltree, rtree) -> (to_list ltree)@[v]@(to_list rtree)
|Leaf -> []

This method is not only that again can lead to stack overflow, it also uses the concatenation function of lists, which, let's say in secret, processes them very inefficiently. So we do not go.
But it’s not for nothing that we, in the end, suffered for so long until we wrote fold_right using the continuation function. It was no accident. This method is very well applicable to wood.
Note: And now, if you have drugs that expand consciousness in your home medicine cabinet, you do not take it for work, go and drink a pill. May be useful.
What is the difference between a tree and a list? There was simply a second branch for each node. That is, at each step of our convolution we will have not one, but two tails, and therefore the convolution function should have the form: 'b -> ' a -> 'a -> ' a, where the second and third arguments denote it batteries for the left and right tails. Such will be the convolution function for summing and searching for height:

fun x left right -> x + left + right
fun _ left right -> 1 + max left right

Let's try. Since we have two possibilities at each stage to continue viewing, the nested loop should be double. How to do it in relation to our function of continuation? - very simple:

let FoldTree treeF leafV tree =
let rec loop tree cont =
match tree with
|Node (val, left, right) -> loop left ( fun lacc ->
loop right ( fun racc ->
cont (treeF val lacc racc)))
|Leaf -> cont leafV
loop tree ( fun x -> x)

As you can see, here, in fact, at each step there are two continuation functions - one accumulates the value for the left subtree, the second, the inner one for the right, after which all this is folded using the treeF function. When we hit the leaf, we apply the function we have accumulated for the current subtree to the initial value corresponding to the leaves - leafV.
It can be noted that while we are going through the values ​​of the left subtree, the continuation function will accumulate recursive calls to the loop for the right subtrees, but it’s in the function and not in the stack, so from the point of view of the tails, everything is fine.
Here’s how it looks for our tree:
four
2 6
1 3 5 7
4: x -> x
2: x -> loop (6,5,7) (y -> treeF 4 xy)
1: x -> loop (3) (y -> loop (6,5,7) (z -> treeF 4 (treeF 2 xy) z))
Ll: x -> loop Lr (y -> loop (3) (z -> loop (6,5,7) (q -> treeF 4 (treeF 2 (treeF 1 xy) z) q)))

Here Ll is the left leaf, Lr is the right one. Once on the leaf, we must apply the function to the initial value, which means we must also execute the outermost loop.
Lr: y -> loop (3) (z -> loop (6,5,7) (q -> treeF 4 (treeF 2 (treeF 1 leafV y) z) q))

Perform the same operation for the right sheet and we get:
3: z -> loop (6,5,7) (q -> treeF 4 (treeF 2 (treeF 1 leafV leafV) z) q)

Notice that now (treeF 1 leafV leafV) is no longer a function, but a value, i.e. for the leftmost tree (the one that is just 1), the fold has already been produced. Then everything happens in the same way, I think the reader can imagine how.
Now our desired operations are as follows:
let SumTree = FoldTree ( fun x left right -> x + left + right) 0
let HeightTree = FoldTree ( fun _ left right -> 1 + max left right) 0
let Tree2List = FoldTree ( fun x left right -> left @ [x] @ right)

In fact, the problem of concatenation in the last function is preserved, but, although it is solvable, we will not talk about this decision now, so as not to delve even more into the not so friendly jungle of AF.

Convolution on generalized tagged unions


Finally, we consider catamorphism on another curious type of labeled associations, which, relatively speaking, defines a programming language:
type Op =
|Plus
|Minus
override this.ToString() =
match this with
|Plus -> "+"
|Minus -> "-"

type Expr =
|Literal of int
|BinaryOp of Expr*Op*Expr
|IfThenElse of Expr*Expr*Expr

In this case, it is a very simple language for calculating mathematical expressions, provided with an additional conditional operator (for simplicity, we assume that any non-zero value of the expression in the condition corresponds to true). An example of such an expression:
let expr = IfThenElse (Literal 1, BinaryOp (Literal 12, Minus, Literal 10), Literal 32)

What would we like to do with this expression? Well, for example, bring it in a digestible form:
if 1 then (12 - 10) else 32 endif

and also - actually calculate the result, here it will be 2. What will help us to simultaneously solve these two seemingly not similar tasks? - right, katamorfizm. I think, after the example on the trees, it will not be difficult to write a convolution for this type. In addition to the expression itself, we need to have three more argument functions — one for each type of expression. We write the convolution function for it:
let FoldExpr funL funB funIf expr =
let rec loop expr cont =
match expr with
|Literal x -> cont (funL x)
|BinaryOp (left,op,right) -> loop left ( fun lacc ->
loop right ( fun racc ->
cont (funB lacc op racc)))
|IfThenElse (condExp,thenExp,elseExp) -> loop condExp ( fun cacc ->
loop thenExp ( fun tacc ->
loop elseExp ( fun eacc ->
cont (funIf cacc tacc eacc))))
loop e ( fun x -> x)

Look, absolutely nothing new compared to trees, except when considering IfThenElse we already have three nested loops, but this is not surprising, because we need to expand both the condition and the two possible branches of the continuation. Functions funL, funB, funIf are used to process literals, binary operations and conditional operators, respectively.
Now we can calmly write both functions we need. So we will output an expression in a string (by the way, notice that we have not for nothing redefined the ToString () method in the Op type:
let Printer =
FoldExpr ( fun x -> sprintf "%d" x) //
( fun l op r -> sprintf "(%s %s %s)" l (op.ToString()) r) //
( fun cte -> sprintf "if %s then %s else %s endif" cte) //

And now the “compiler” itself:
let Eval =
FoldExpr ( fun x -> x)
( fun l op r -> match op with |Plus -> l + r |Minus -> l - r)
( fun cte -> if c > 0 then t else e)

As you can see, the processing functions are very simple and intuitive.
Of course, the Expr type itself is not very complicated in this example, but in fact with the help of labeled associations you can implement quite sophisticated “YAP”, well, not only “YAP”, of course. And the convolution on such a marked union will most likely be an indispensable tool for all occasions. Such is the catamorphism.
PS I don’t know if it is worth transferring to a collective blog, this is a specific topic.
UPD: I chose where to transfer for a long time, I decided to .Net as the most neutral.

')

Source: https://habr.com/ru/post/57503/


All Articles