Just about Hindley Milner

Introduction

If you have ever been interested in not too popular languages, then you must have heard of Hindley Milner. This type inference algorithm is used in F # and Haskell and OCaml, as in their predecessor ML. Some researchers even try to use HM to optimize dynamic languages like Ruby, JavaScript and Clojure.

And despite its prevalence, until now there has not been a simple and understandable explanation of what it is. How does this magic work? Are output types always correct? Or is Hindley Milner better, say, Java? And while those who really know what XM will recover from another mental overstrain, we will try to figure it out for ourselves.

First step

Essentially, Hindley-Milner (or "Damas-Milner") is an algorithm for outputting value types based on how they are used. Literally, it formalizes the intuitive notion that a type can be inferred from the operations it supports. Consider the following code on pseudo-Scala ^[4] :

def foo(s: String) = s.length // :    def bar(x, y) = foo(x) + y

Just by looking at the definition of the function bar , we can say that its type should be (String, Int)=>Int . It is not difficult to withdraw. We simply look at the function body and see two ways to use the x and y parameters. x is passed to foo , which expects a String . Therefore, x must be of type String in order for this code to be compiled. Moreover, foo returns an Int . Method + class Int expects a parameter, also Int , therefore y must be of type Int . Finally, we know that + returns Int , which means this is the type returned by bar .

And this is exactly what Hindley-Milner does: by looking at the definition of a function, it calculates the many constraints imposed by how value is used. And that is exactly what we did when we noticed that foo accepts a parameter of type String . Once many constraints have been created, the algorithm completes type recovery using the unification of these constraints. If the expression is well-typed, then ultimately the constraints will result in an unambiguous type. Otherwise, one (or several) constraints will be mutually exclusive or intractable with a given set of types.
')

Informal algorithm

Simple example

In order to make it easier to understand the algorithm, we will execute every step of it manually. First, you need to bring a lot of restrictions. We start by assigning each variable ( x and y ) a new type (that is, “not existing”). If we recorded bar with these types, it would have been something like ^[1] :

 def bar(x: X, y: Y) = foo(x) + y

No matter what the types are called, the main thing is that they do not overlap with the “real” types. They are needed so that the algorithm can explicitly refer to the yet unknown type of each value. Without which it is impossible to create a lot of restrictions.

Next, we dive into the function body in the search for operations that impose some type constraints. And so, the first operation is to call the foo method. ^[2] We know that type foo is String=>Int and this allows us to write the constraint:

X ↦ String

The next operation will be + associated with the value of y . Scala represents all operators as a method call, so this expression actually means " foo(x).+(y) ". We already know that foo(x) is an expression of type Int (from the type returned by foo ), as well as the fact that + defined as a method of class Int with type Int=>Int (of course, it is not entirely correct to use magic inherent only in Scala) . That allows you to add one more constraint to our set:

X ↦ String
Y ↦ Int

The last step of type recovery is the unification of all these restrictions in order to obtain real types, which we substitute for the variables X and Y Unification, literally, is the process of going through all the restrictions in an attempt to find the only type that suits them all. For example, imagine that you were given the following facts:

Masha is tall
Vasya is tall
Masha loves red
Vasya loves blue

Now, look at these limitations:

Someone tall
Someone likes red

Hmm, who could anyone be? This process of combining a set of constraints together with a set of facts can be mathematically formalized in the form of unification. In the case of type recovery, it is enough to rename “types” into “facts”.

In this example, unifying the set of constraints is trivial. We had only one restriction for each value ( x and y ) and both of them are mapped to real types. The only thing required was to substitute " String " instead of " X " and " Int " instead of " Y ".

Difficult example

To feel the power of unification, consider the example more difficult. Suppose we have the following function ^[3] :

 def baz(a, b) = a(b) :: b

Here, the baz function is declared, which takes a function from the parameters and something else, calling the function from the second parameter passed to it and then attaching the result to it itself. It is easy to derive many restrictions for this function. As before, we will start by creating new types for each value.
Notice that in this case we annotate not only the parameters, but also the return value.

 def baz(a: X, b: Y): Z = a(b) :: b

The first constraint that we derive is that the parameter a must be a function, that takes a value of type Y and returns the value of a new type of Y' . Moreover, we know that :: is a function on the List[A] class, which takes a new element A and returns a new List[A] . Thus, we know that Y and Z must be of type List[ Y' ] . Restrictions written in the formal form:

X ↦ ( Y => Y' )
Y ↦ List[ Y' ]
Z ↦ List[ Y' ]

Now unification is not so trivial. It is important that the variable X depends on Y , that is, at least one more additional step is required:

X ↦ (List[ Y' ]=> Y' )
Y ↦ List[ Y' ]
Z ↦ List[ Y' ]

This is the same set of constraints, only this time we substituted the well-known mapping for Y into the mapping for X Using this substitution, we got rid of X , Y and Z from the types we derived, getting the following, typed baz function:

 def baz(a: List[Y`]=>Y`, b: List[Y`]): List[Y`] = a(b) :: b

Of course, this is still not the end result. Even if the name of the type Y' would be syntactically correct, then the error would be the absence of this type. This happens quite often when working with the Hindley-Milner type recovery algorithm. Somehow, after the derivation of all constraints and unification, we have “residuals” of variable types for which there are no longer constraints.
The solution can be considered unlimited variables like type parameters . In the end, if the parameter is not limited by anything, then we can simply substitute any type, including the generic. And so, the final version of the baz function, where we added an unlimited parameter of type " A " and replaced it with all occurrences of Y' in the inferred types:

 def baz[A](a: List[A]=>A, b: List[A]): List[A] = a(b) :: b

Conclusion

And it's all! Hindley Milner is no more difficult than what we have described here. Just imagine how this algorithm can be used to restore the types of expressions that are much more complex than we considered.

I hope this article has helped you better understand how the Hindley-Milner algorithm works. This type of type inference can be very useful, reducing the amount of code needed for static typing to the minimum necessary. The " bar " example was specifically cited in a Ruby-like syntax to show that this information is still sufficient for type checking. This can be useful the next time someone says that all statically typed languages are too expressive.

Footnotes and comments

The return type is omitted for ease of reading. Technically, types are always derived for the expression as a whole.
This is a depth-first search on ASD , which means that we primarily look at operations with high priority. Although the traversal order is technically not important, it is thus easier to think about the algorithm.
" :: " is the so-called cons operator in Scala. He is responsible for attaching the left element to the right list.
About the correct use of Scala for code samples

Source: https://habr.com/ru/post/125250/

All Articles