Cofree Will Tear Us Apart

Hello.

Recently, I have been working with distributed systems and I often encounter problems working with data, parts of which may be located in different places. Well, since I’ve been writing Haskell for a long time, the description of the problem and the powerful type system have helped a lot in the further development of this idea. It will be a question of how only one carrier algebraic construction allowed to solve the problem of recycling data in general form.

Introduction and Motivation

In the course of the narration, we will meet three main points that will give an idea of the desired result.

Imagine that you need to write a program that works with some data. If there is too much data, then you will have to think about organization and storage - because storing them in RAM is too unreliable, there may not be enough space or you will completely lose them during the program crash.

Use the database? What kind? Relational, document-oriented, key-value storages or something simpler - just write to a file? Well, each of these options has its own advantages and disadvantages.

Wherever we store them, in order to do something with this data that goes beyond the possibilities of our chosen method, in any case we need to load them back into RAM. Of course, only some part, not everything, for the reasons described above.

Thesis 1. We do not need to store all the data in memory, but only some part.

And the truth is, when we work with databases, we write requests for which we can receive this data into memory. Record in a database is after all only a slice from all our general file with information. In the case of relational, these are entries in tables. In the case of key-value stores, the key-value pair.

When you write applications for the real world, you have to adjust the data organization scheme from your subject area to the limitations of the method you choose. This will depend on the degree of connectivity, aspects of performance, and many other factors.

Thesis 2. We want to abstract from the method of storing and processing information.

Based on the two previous properties, we need a way to separate the data that we are working with and that are actually in memory, and data that are conserved somewhere on the hard disk. We need a way to separate them.

How are we going to separate them? We need a structure that allows you to safely share and merge data. Let's think about this topic?

Lists? It is possible, but there are problems. The cost of accessing an arbitrary element is O (n). The cost of combining the two lists is the same.
Any trees? If it is binary, then the cost of access in good case is reduced to O (log n). But we will not always be required to store sorted data. Too special case, does not suit us.
Arrays? Access cost - O (1). But also a special case, just run into other problems - this structure is not algebraic at all.

Thesis 3. We need a supporting structure that can cover many other data structures in its description.

And there is such a structure!

Cofree bearing structure

Many Haskell programmers are familiar with the Free type. But for some reason its duality, Cofree , is not paid much attention. And the difference between them is one detail: Free type is the sum of some and t (Free ta) , and Cofree is the product:

 Free ta = a + t (Free ta) Cofree ta = a × t (Cofree ta)

This means that if we choose Cofree as our supporting structure, the data structure defined through the latter will have several features:

This structure will always be non-empty, it will always have at least one element.
Since Cofree also has an instance of the Comonad type class, we already have many useful methods for free:
- extract - Get the value that is in focus.
- extend - Update values in the whole structure depending on the context.
- unwrap - Get the multiplier of a work, a segment of information.
- coiter - Generate structure from initial value.

So how are we going to collect the various data structures using Cofree? We just need to instantiate type t in Cofree ta , which has an instance of the Functor type class.

Imagine that we need a stack or a non-empty list — a simple data structure. Here Maybe , in this case, the constructors of the latter play the role of a generator - Just allow you to continue to describe the structure, and Nothing is a terminating invariant:

 data Maybe a = Just a | Nothing type Stack = Cofree Maybe example :: Stack Int example = 1 :< Just (2 :< Just (3 :< Nothing))

Auxiliary Shape Design

Well, we figured out how to describe the data structures on the Cofree . We started this conversation to find a way to separate data from the point of view of types located in different places. To do this, we will provide Cofree with another construction:

 data Shape t raw value = Ready (t value) -- ^      | Converted raw -- ^ C  -    data Apart t raw value = Apart (Cofree (Shape t raw) value)

And we get a wonderful type of Apart, which will control which part of the data is located.

Example of using Apart

And now let's create an illustrated example. Imagine that we want to work with a binary tree. How can we describe it through Cofree ? Through the "branch functor". A tree node may have no children, a left child, a right child, or both. Let's just right and encode:

 data Crotch a = End | Less a | Greater a | Crotch aa type Binary = Cofree Crotch

Great, now we can write a value for this type; let's take an example of a binary search tree from a Wikipedia article of the same name:

 example :: Binary Int example = 8 :< Crotch (3:< Crotch (1 :< End) (6 :< Crotch (4 :< End) (7 :< End))) (10 :< Greater (14 :< Less (13 :< End)))

Let's try the first combinator - limit , it will allow us to trim our tree by height:

 limit 3 do_something_with_the_rest example

I deliberately ignored the method of saving, so as not to focus on this - we can store segments that are not in the range in the file and the do_something_with_rest function can return the file name and line number to us. Or even put in Redis / Memcashed / Tarantool and return the connection parameters and key for the saved segment. In general, not so important.

 scattered :: Scattered Binary Int _ scattered = 8 :< Crotch (3 :< Crotch (1 :< {RESTORE_INFO}) (6 :< {RESTORE_INFO})) (10 :< Greater (14 :< {RESTORE_INFO}))

And that's what remains of our tree - it was cut off in height. But the information for recovery remained in place of the missing three segments. The view above actually hides the Ready constructor, and Converted replaced with curly braces (thanks to the tricky instance of the Show class).

With the help of the recover combinator, we can return the entire data structure to memory:

 recover back_to_RAM scattered

Or even go with the effect on the scattered data structure, while restoring the segments in the memory and applying the same function to them as to the values in the memory.

 fluent do_something_whereever_they_are scattered

As a conclusion

This is how algebraic data structures and concepts from category theory allowed to first describe and then solve the problem of recycling data in the most general form.

The ideas described above were implemented in the library , which is not yet available on Hackage, but is in the phase of active development.

At the moment I was able to describe the directed acyclic graph, binary, prefix, pink, AVL-trees and some separate functions for working with them.

The idea of using Cofree as a supporting structure for other data structures was picked up by me from the description to the Control.Comonad.Cofree module in the free package of Edward Kmett.

The idea of algebraic graph description was used here from the work of Andrei Mokhov .

The plans also remain:

Implement finger, Splay-trees and other structures more difficult.
Write more functions to work with them (insert, delete, balance, etc.).
Natural transformations between them (since the functor just determines the features of a separate structure).
Optical interfaces for working with the internal structures.
To study ways of compatibility of combinators with streaming libraries ( pipes , conduit , io-streams , machines ).
Write complete tests of the properties of individual data structures.
Benchmarks, primarily with popular containers .

Write in the comments what data structures you would like to use in this way and which combinators could be useful in everyday practice. I would welcome any comments and criticism.

Thanks for attention.

Source: https://habr.com/ru/post/358976/

All Articles