Scala type classes (with a little overview of the cats library)

When the word "polymorphism" immediately recall object-oriented programming, in which polymorphism is one of the pillars ( Polymorphism for beginners ). (And, apparently, more important than the other pillars.) It turns out that you can achieve a similar effect in a different way, which in some cases is more preferable. For example, using classes of types, you can assign new features to already existing types that cannot change the ancestor, or, using a data type with incompatible classes, "solve" the problem of multiple inheritance.

On Habré there are already several publications that give an idea of the type classes:

@IvanGolovach Developing → "FP on Scala: What is a functor?" - 2015.

Here types of classes are affected when considering functors. During the review, several examples of type classes are given.
Mikhail Potanin @potan Development → "Type Classes in C ++" - 2013.

This publication implements type classes in C ++. Apparently, it is assumed that the reader is to some extent already familiar with type classes, so not much is said about the type classes themselves.
@VoidEx Development → "Classes of types, monads" - 2009.

Classes of types in Haskell are described (with examples of C ++ implementation).
@IvanP Development → "Polymorphia and classes in Haskell" - 2013.

Parametric and ad-hoc polymorphism, type classes and several standard classes are described. All description is made for Haskell.
update: @vuspenskiy Development → Type classes in Scala - 2012.

A brief description of how to use type classes through implicits is exemplified by the Comparator[T] .

In this post I would like to reflect the view on the classes of types as one of the everyday tools of programmers. Moreover, if Haskell doesn’t do without them at all, then in Scala you can ~~perfectly~~ tolerably live, not knowing about their existence. However, upon closer inspection, it turns out that such a tool is very useful when writing reusable programs. In addition, there are a number of universal libraries that are based on this tool, and to use them, you also need to understand the types of classes.

Immutable data types

In functional programming, immutable data types have gained wide popularity. Since the data cannot arbitrarily change, there is no reason to hide them, and instead of hiding the data, now open types are used, where the data is public. (Thus, among the three pillars of OOP — polymorphism, inheritance, and encapsulation — one is somewhat pushed aside.)

Freely available data allows the use of external processing algorithms. There is no need to bind algorithms to the object itself and to overcome artificial barriers between objects if the processing algorithm uses several objects of different types.

It turns out that a significant part of the set of data structures can be modeled using only two mechanisms ( Algebraic data types ). First, the creation of records or tuples ("type-product"). Secondly, the creation of alternative implementations of the same parent type - enum, interface inheritance, sealed trait - "type-sum".

Example:

 // -   : sealed trait Form object Form { // - String X String case class Form1(number: String, title: String) extends Form // - UUID X String X Int case class Form2(id: UUID, documentation: String, count: Int) extends Form // == Unit (    ) case object BadForm extends Form }

As you can see, such algebraic data types do not imply data hiding, and the presence of any built-in methods. All data is directly accessible, and we can process these types of external algorithms, for example, such:

generate an HTML representation for the user
perform validation by business rules
serialize / deserialize
calculate some metrics
...

All these types of processing are implemented separately and do not have to know anything about each other. In such algorithms, pattern-matching is the main way of processing different subtypes with access to real data. Using pattern-matching, we simultaneously check which particular subtype an object is, and extract the values of the fields of interest.

Placing algorithms outside specific types has the following advantages:

The logic of the algorithm is not spread over each subtype, but is localized in a separate module.
The logic of one processing method is not intermingled with other processing methods within each data class. Simplified support for different algorithms by different developers.
There is no need to add a dependency on the DBMS library to the module in which the data model is declared.
Easily add new processing methods to existing data types. They are added "orthogonally" in independent modules.

Type Classes

Suppose we implemented some algorithm outside our data types. If this algorithm directly uses our types, then we will not be able to reuse it for other similar data. This, on the one hand, is not bad, since such an algorithm is easier to write, but, on the other hand, its generality is limited. This means, in general, that the algorithm will be used less frequently, and, apparently, it will be worse tested (with the same total economic costs), or the cost of support will be higher.

Therefore, I would like to have mechanisms that allow us to generalize our algorithm to other data types (existing and prospective). This will allow using the same algorithm in many cases and will recoup the costs of its development and testing.

OOP proposes to allocate a common interface for "similar" data and implement the algorithm in terms of this common interface. In concrete classes that inherit this interface, we just need to implement these common methods. Thus, we obtain to some extent a polymorphic algorithm. However, these operations included in the interface of "similar" data, we need to implement in the data itself.

Type classes represent the next step in isolating code that plays different roles in a program. The operations we want to perform on the data are moved to a separate class that is not an ancestor of the data. An instance of this class for this data type is passed to the algorithm along with the data.

Example:

Suppose that in the algorithm of interest to us we use data comparison in order. Such a comparison can be represented by a function.

 def compare[T](a: T, b: T): Int

Put this function in the class of the Ordering type:

 trait Ordering[T] { def compare(a: T, b: T): Int }

Now let our universal algorithm do the sorting. It must accept the type of data it works with:

 def sort[T](list: List[T]): List[T]

since elements are compared within the algorithm, an algorithm must be passed to the algorithm for our type T :

 def sort[T: Ordering](list: List[T]): List[T]

either, which is the same:

 def sort[T](list: List[T])(implicit o: Ordering[T]): List[T]

The algorithm, if it is necessary to invoke the compare operation, should obtain an instance of the type class using implicitly[Ordering[T]].compare(a,b) .

We only need to provide an instance of the type class for our data type:

 implicit object FormOrdering extends Ordering[Form] { def compare(a: Form, b: Form): Int = (a,b) match { case (Form1(numberA, titleA), Form1(numberB, titleB)) => numberA - numberB case (BadForm, BadForm) => 0 ... case _ => 0 } }

Thus, we achieve a common algorithm without cluttering our data with pieces of code related to a specific algorithm.

Extra convenience

How to make methods accessible directly to the type itself? For example, we would like to compare the objects inside the algorithm using the method a compare b , without explicitly calling a method of the type class.

To do this, use the usual Scala pimp-my-library mechanism:

 implicit class OrderingOps[T:Ordering](a:T){ def compare(b:T): Int = implicitly[Ordering[T]].compare(a,b) }

Thus, for all types for which there is an instance of Ordering , a new "method" will appear.

If such a desire arises every time, then you can use the library simulacrum , which creates such an auxiliary method with all the necessary strapping using macros:

 import simulacrum.typeclass @typeclass trait Ordering[T]{ def compare(a: T, b: T): Int }

Example: Type class for rewriting trees (symbolic equation solving, program optimization)

Consider an example of a type class for a custom data structure. One of the mechanisms used to optimize programs is to rewrite AST with preserving semantics. In this case, all nodes of the tree are traversed (in depth or in width), and for each node, a matching pattern is searched (pattern matching) and in the case of pattern matching, the node is rewritten according to the appropriate rule.

For different tasks (equations, programs), the types that make up the AST tree are different, the matching / optimization patterns are also different. However, the traversal algorithm is the same.

This algorithm is a contender for abstraction using type classes. To an arbitrary type of tree, we must add some operations used in the tree traversal algorithm.

 import simulacrum.typeclass @typeclass trait RewritableTree[T] { def children(node: T): List[T] def replaceChildren(node: T, newChildren: List[T]): T }

proper rewriting algorithm

 object RewritableTree { def rewrite[T: RewritableTree](f: PartialFunction[T, T]): T => T = t => { rewrite0(f)(t).getOrElse(t) } private def rewrite0[T: RewritableTree](f: PartialFunction[T, T])(t: T): Option[T] = { import RewritableTree.ops._ //  "",  simulacrum' val rt = implicitly[RewritableTree[T]] -        "" val children = t.children // rt.children(t) var changed = false //   ,   ,   ,        val updatedChildren = children.map{child => val res = rewrite0(f)(child) changed = changed || res.isDefined res.getOrElse(child) } //       //def rewriteList(lst: List[T], result: mutable.ListBuffer[T], changed: Boolean): (List[T], Boolean) = lst match { // case Nil => (result.toList, changed) // case head :: tail => // val res = rewrite0(f)(head) // rewriteList(tail, result.append(res.getOrElse(head)), changed || res.isDefined) //} //val (updatedChildren, changed) = rewriteList(t.children, mutable.ListBuffer(), false) val updatedTree = if(changed) t.replaceChildren(updatedChildren) else t var changed2 = true val updatedTree2 = f.applyOrElse(t1, (_:T) =>{changed2 = false; updatedTree}) if(changed || changed2) Some(updatedTree2) else None } }

Using the same type class, you can implement the collect method that collects any values as you traverse the tree.

Inductive definition of type classes for derived types

Suppose we have already implemented a class of type Ordering[T] for our type T And we would like to sort the Option[T] list. Can we use the already implemented type class and simply add the missing functionality?

This can be done if we provide an implementation of a type class on the fly, constructing an implementation of the existing type classes.

 implicit def optionOrdering[T:Ordering]: Ordering[Option[T]] = new Ordering[Option[T]] { def compare(a: Option[T], b: Option[T]): Int = (a, b) match { case (Some(c), Some(d)) => implicitly[Ordering[T]].compare(c,d) case (None, None) => 0 case (_, None) => 1 case (None, _) => -1 } }

Such an implementation is automatically inserted into the sorting algorithm for any types for which there is an instance of the class of the type Ordering[T] .

Similarly, you can construct type classes for any generic types, such as, List[T] , Tuple2[A,B] , ...

Standard classes of types (cats)

The operations that are declared inside the type classes can be anything. For our algorithm, we can draw an abstraction boundary arbitrarily enough, for example, put the entire algorithm in a type class, or vice versa, put data access methods directly in a type class. Both of these boundary options are not optimal in terms of reuse. Therefore, it is worthwhile to place the minimum number of operations into types classes that are easy to implement for other types, and our algorithm can be expressed through these operations.

As soon as we begin to look at algorithms and access to data from this point of view, we are more likely to come to some frequently used type classes.

The Scala standard library has several type classes: Ordering[T] , Equiv[T] , Numeric[T] , Integral[T] , ...

The typelevel / cats library (as well as the scalaz library) has declared several additional classes for simple types with frequently used operations ( http://typelevel.org/cats/typeslasses.html ):

A semigroup ( Semigroup ) is a single operation combine .
A monoid - a semigroup with an empty ("zero") element - empty .

For example, for numbers, the operation combine can be defined as the sum of numbers; in this case, the zero element is the usual zero. We get an additive monoid. You can also use a multiplicative monoid if you take multiplication as a combine operation, and empty as a unit. The list of numbers can also be viewed as a monoid. In the role of the operation combine will be gluing lists, and the zero element is an empty list.

Example:

You can use a monoid to implement accumulation. Create a state with an initial value of empty from the monoid. Further, when data is received at the input, we can combine with those that are already in a state. For example, we can take the type Int with the operation "sum". In this case, the sum of the incoming data will be accumulated in one value. Or take a monoid for List[T] . In this case, all data will be available in this list (there should be lists at the entrance, or each number should be wrapped in a list). In both cases, the accumulation algorithm is identical - call the combine method for existing data and new data. And the algorithm does not depend on the specific type with which it works.

Also, if we know about some type that it is a monoid (i.e., there is an instance of the monoid class for this type), then we can use foldLeft , a convolution for the collection of these elements (we do not need to re-implement it).

Types of higher orders

In addition to simple base types, type classes can be used to work with types that themselves have parameters. (Thus, a type class requires support for higher order types in the language.) Higher order types are characterized by a kind of "type type":

a simple type has kind * (for example, Int , String );
a type that takes one argument - * -> * (for example, List[T] , Option[T] , Future[T] );
A type that takes two arguments - * -> * -> * (for example, the Function1[A,B] function Function1[A,B] ). (Please note that the function itself contains one arrow A => B , and at the type level A => B => (A=>B) two arrows (the third arrow is already inside the type itself).)

In the cats library, in addition to the type classes that work with base types, there are type classes that are used when working with type constructors. In particular, for types * -> * :

A functor is a type class that contains one operation - map . An operation accepts an object, for example, of the List[Int] and applies the specified function to each element. For List and Option , this operation, generally speaking, is already implemented in the data type itself, and one could not create a type class. However, if we want to implement universal algorithms using the map operation, then such a type class is necessary.
A monad is a functor containing a flatMap operation, or bind , or >>= (and also flatten , map , pure ). This type of class seems to be the most famous. Its advantage is due to the fact that flatMap ( bind ) is a fairly universal way to glue successive calculations. On the operation flatMap even based "sugar" in Scala - for-comprehension.

Example:

Processing lists Collect all children for a collection of objects - val allChildren = objects.flatMap(_.children)
Handling missing values: val street = personOpt.flatMap(_.addressOpt).flatMap(_.streetOpt)
Delayed execution of requests. Let the result of some query from the database can be represented by the DataTable[T] . Using flatMap we can define a subquery that retrieves data from the results of this query. Such a query can be glued to the original query without performing the first query and not working with the collection of results . We can compile the glued query in SQL and send it to the database for execution on the DBMS side. This approach is implemented, for example, in the Slick library.

For types * -> * -> * in the cats library there is also a type class:

Category - operation compose + "zero" element - identity . The type for which a category type class is defined is called an Arrow. Arrows are like functions. In particular, for normal functions, the compose operation corresponds to the andThen method, and the identity operation corresponds to the identity function.

Examples of categories:

Normal functions.
Model functions (in model language).
Lenses (properties of objects separated from classes) (see the monocle library).
Directed graph in Functional Reactive Programming (for example, SynapseGrid ).

Example:

For categories, the key feature is compose . Those. if our algorithm can be expressed in terms of compose, then we can apply this algorithm for any categories.

Let us simulate a chain of data transformations using our own DSL. Suppose that each transformation can be represented by some type of Transform[A,B] .

phantom types

A and B are not necessarily types from our data model. This may be the so-called phantom types . Using phantom types allows you to define your own rules for allowed combinations of transformations that will be checked by the compiler. Those. we cannot use the compose method for incompatible conversions.

After the user has described his task using this DSL, we can convert our conditional transformations into real actions with data, which are already represented by ordinary functions on real types. The conversion of one category (model functions) into another category (real functions) is called "natural conversion" .

Laws for type classes

The type classes implemented in the cats library model concepts of category theory. Therefore, the methods of these classes of types must satisfy certain properties (which are described in theory). For example, for a monoid:

a combine empty = a = empty combine a - the definition of an empty element
(a combine b) combine c = a combine (b combine c) - combine operation associativity

All properties that are required by category theory for a type class are implemented as "laws" - sets of "properties" of the ScalaCheck library. And it is possible for any instance of a type class to check whether this instance meets the requirements for this type class. Many algorithms rely on these properties, so when you implement type classes for your data, you should check these laws in unit tests.

Once we have verified that our type class implementations satisfy the existing laws, we can be pretty sure that our program is correct, using algorithms from libraries based on these properties of type classes.

Type Class Benefits

Compared to regular interfaces that need to be implemented in descendants, type classes have the following advantages:

You can implement a class for an inaccessible type
You can declare operations that work with zero instances of this type. In particular, the method is empty: T , or the method parse: String => T ;
You can inductively define an instance of a composite type if there are instances for the base types. For example, for Option[T] or for A \/ B

These benefits can be used independently in any program. It is enough to take a different look at the structure of your code.

The cats library (as well as the scalaz library) has a well-organized and tested set of type classes (from category theory), which are used in many algorithms and libraries. , , .

Source: https://habr.com/ru/post/318960/

All Articles