Categories

Logging composition

You have seen how to make categories of types and pure functions. I also mentioned that there is a way to model side effects, or impure functions, within the framework of category theory. Let's look at an example: functions that log or record their progress.

Something that in an imperative language is likely to be realized through the mutation of some global state, for example:

string logger; bool negate(bool b) { logger += "Not so! "; return !b; }

You know that this is not a pure function, because its version with caching the results will not be able to write to the log. This feature has side effects.

In modern programming, we try to stay away from the global changeable state, while it is possible - at least, because of the difficulties with parallelism. And you never write similar code to libraries.
')
Fortunately for us, you can make this feature clean. You just need to pass the log explicitly to and from the function. Let's do this by adding a string argument, and returning the pair from the main result and the line containing the updated log:

 pair<bool, string> negate(bool b, string logger) { return make_pair(!b, logger + "Not so! "); }

This function is clean, it has no side effects, it returns the same pair every time it is called with the same arguments, and its results can be cached if necessary. However, given the cumulative nature of the log, you will have to cache all possible stories that may lead to this challenge. There will be a separate cached entry for:

 negate(true, "It was the best of times. ");

and

 negate(true, "It was the worst of times. ");

and so on.

This is not a very good interface to the library function. Users can freely ignore the string in the return value, so there is a small additional complexity, but they have to pass the string as input, which can be inconvenient.

Is there a way to do the same thing less intrusive? Is there a way to share the problems? In this simple example, the main purpose of the negate function is to turn one Boolean into another. Logging is secondary. Of course, a message that is logged is function-specific, but aggregating messages into one continuous log is a separate task. We still want the function to return a string, but we would like to free it from the task of creating a log. Here is a compromise solution:

 pair<bool, string> negate(bool b) { return make_pair(!b, "Not so! "); }

The idea is that the log will be aggregated between function calls.

To see how this can be done, let's move on to a slightly more realistic example. We have one function from string to string that turns lower case letters into upper case:

 string toUpper(string s) { string result; int (*toupperp)(int) = &toupper; // toupper is overloaded transform(begin(s), end(s), back_inserter(result), toupperp); return result; }

and another one, which breaks a line into a vector of lines, breaking it up by spaces:

 vector<string> toWords(string s) { return words(s); }

The main work is done in the auxiliary function words:

 vector<string> words(string s) { vector<string> result{""}; for (auto i = begin(s); i != end(s); ++i) { if (isspace(*i)) result.push_back(""); else result.back() += *i; } return result; }

We want to change the toUpper and toWords functions so that they cling the message string on top of their normal return values.

We will "enrich" the return values of these functions. Let's do it in a general way by defining a Writer template that encapsulates a pair, the first component of which is a value of an arbitrary type A, and the second component is a string:

 template<class A> using Writer = pair<A, string>;

Here are the enriched functions:

 Writer<string> toUpper(string s) { string result; int (*toupperp)(int) = &toupper; transform(begin(s), end(s), back_inserter(result), toupperp); return make_pair(result, "toUpper "); } Writer<vector<string>> toWords(string s) { return make_pair(words(s), "toWords "); }

We want to group these functions into another rich function that raises the string to upper case and breaks it into words, while logging these actions. Here's how we can do it:

 Writer<vector<string>> process(string s) { auto p1 = toUpper(s); auto p2 = toWords(p1.first); return make_pair(p2.first, p1.second + p2.second); }

We have achieved our goal: merging a journal is no longer a concern for individual functions. They produce their own messages, which are then, outside, collected into a whole log.

Now imagine the whole program written in this style. This is a nightmare of repetitive and error-prone code. But we are programmers. We know how to deal with duplicate code: we abstract it! These, however, are not your usual abstractions: we must abstract the very composition of functions. But composition is the essence of category theory, so before writing code further, let's analyze this problem from a categorical point of view.

Category Writer

The idea of enriching return types of functions in order to attach some additional functionality turns out to be very useful. We will see many more such examples. The starting point is our usual category of types and functions. We leave the types as objects, but redefine morphisms as enriched functions.

For example, suppose we want to enrich the isEven function, which converts an int into a bool. We turn it into a morphism, which is represented by an enriched function. It is important that this morphism is still considered an arrow between int and bool objects, although the decorated function returns a pair:

 pair<bool, string> isEven(int n) { return make_pair(n % 2 == 0, "isEven "); }

According to the laws of the category, we must be able to combine this morphism with another morphism, which goes from bool to anything. In particular, we should be able to combine it with our previous negate function:

 pair<bool, string> negate(bool b) { return make_pair(!b, "Not so! "); }

Obviously, we cannot compose these two morphisms in the same way as we compose the usual functions, due to the mismatch of input / output. Their composition should look like this:

 pair<bool, string> isOdd(int n) { pair<bool, string> p1 = isEven(n); pair<bool, string> p2 = negate(p1.first); return make_pair(p2.first, p1.second + p2.second); }

So, here is the recipe for the composition of two morphisms in this new category that we are building:

Perform enriched function corresponding to the first morphism.
Extract the first component of the result pair and pass it to an enriched function corresponding to the second morphism.
Connect the second component (line) of the first result and the second component (also line) of the second result
Return the new pair that combines the first component of the final result with the concatenated string.

If we want to abstract this composition as a higher order function in C ++, we must use a template parameterized by three types, corresponding to the three objects in our category. You should take two enriched functions that are compiled according to our rules, and return the third enriched function:

 template<class A, class B, class C> function<Writer<C>(A)> compose(function<Writer<B>(A)> m1, function<Writer<C>(B)> m2) { return [m1, m2](A x) { auto p1 = m1(x); auto p2 = m2(p1.first); return make_pair(p2.first, p1.second + p2.second); }; }

Now we can return to our example and implement the composition toUpper and toWords using this template:

 Writer<vector<string>> process(string s) { return compose<string, string, vector<string>>(toUpper, toWords)(s); }

There is still a lot of noise with the transfer of types to the compose template. This can be avoided if as long as you have a C ++ 14-compatible compiler that supports generic lambda functions with return type deduction (thanks to Eric Niebler for the code):

 auto const compose = [](auto m1, auto m2) { return [m1, m2](auto x) { auto p1 = m1(x); auto p2 = m2(p1.first); return make_pair(p2.first, p1.second + p2.second); }; };

In this new definition, the implementation process is simplified:

 Writer<vector<string>> process(string s){ return compose(toUpper, toWords)(s); }

But we are not done yet. We defined the composition in our new category, but what are the single morphisms? These are not our regular identity functions! They must be morphisms from type A back to type A, which means they are enriched functions of the form:

 Writer<A> identity(A);

They should behave as units in relation to the composition. If you look at our definition of composition, you will see that the identical morphism, must pass its argument without change, but only add an empty line to the log:

 template<class A> Writer<A> identity(A x) { return make_pair(x, ""); }

You can easily make sure that the category we have just defined is a truly legitimate category. In particular, our composition is trivially associative. If you trace what happens to the first component of each pair, you will see that it is just a simple composition of functions that is associative. The second components are connected, and concatenation is also associative.

An astute reader may notice that one can easily generalize this construction to any monoid, and not just strings. We would like to use mappend inside compose and mempty inside identity (in place of + and ""). Indeed, there is no reason to limit yourself to logging only strings. A good library writer should be able to determine the necessary minimum of restrictions that are necessary for a library to work - here the only requirement of a logging library is that the log has monoidal properties.

Haskell Writer

The same construction on Haskell will be much more concise, and the compiler will help us a lot more. Let's start by defining the type of Writer:

 type Writer a = (a, String)

Here I simply define a type alias, the equivalent of typedef (or using) in C ++. The Writer type is parameterized by a type variable and is equivalent to a pair of a and String. The syntax for pairs is minimal: there are only two names in brackets, separated by commas.

Our morphisms are functions from an arbitrary type to a specific type of Writer:

 a -> Writer b

We will declare the composition as a funny infix operator, sometimes called “fish”:

 (>=>) :: (a -> Writer b) -> (b -> Writer c) -> (a -> Writer c)

This is a function of two arguments, each of which is itself a function, and it returns a function. The first argument is of type (a -> Writer b), the second (b -> Writer c) and result (a -> Writer c).

Here is the definition of this infix operator - two arguments m1 and m2, appear on both sides of the fish symbol:

 m1 >=> m2 = \x -> let (y, s1) = m1 x (z, s2) = m2 y in (z, s1 ++ s2)

The result is a lambda function of one argument x. Lambda is written as a backslash - you can think of it as the Greek letter λ with an amputated leg.

The word let allows you to declare auxiliary variables. Here the result of calling m1 is matched with a pair of variables (y, s1), and the result of calling m2 with the argument y from the first pattern will be matched with (z, s2).

In Haskell, matching pairs is a common substitute for using getters, as we did in C ++. In addition, there is a fairly simple correspondence between the two implementations.

The value of the let expression is contained after in: here is a pair, whose first component is z, and the second component is the union of two strings, s1 ++ s2.

I will also define a unit morphism for our category, but for reasons that will become clear much later, I will call it return.

 return :: a -> Writer a return x = (x, "")

For completeness, let's write the Haskell version, enriched with upCase functions (approx. Translator: meant toUpper from C ++ example, but a function with this name already exists in the standard Prelude module) and toWords:

 upCase :: String -> Writer String upCase s = (map toUpper s, "upCase ") toWords :: String -> Writer [String] toWords s = (words s, "toWords ")

The map function corresponds to the transform function in C ++. It applies the function on the toUpper characters to the string s. The helper function words is defined in the standard Prelude library.

Finally, the composition of these two functions is built using the fish operator:

 process :: String -> Writer [String] process = upCase >=> toWords

Categories

Logging composition

Category Writer

Haskell Writer

Categories

More articles: