Introduction to OCaml: Data Types and Mapping [3]

[approx. Lane: continued translation. first part , second part ]

Linked lists

In OCaml, as well as in Perl, there is support for lists that is built in at the language level. All list items must be of the same type. To determine the type of expression is used:

 [one;  2;  3]

Note: semicolon, not comma.
')
[] means an empty list.

The list has a head (the first element) and a tail (the other elements, except the head). Head - element. Tail - list. In the example above, the head is an integer 1, and the tail is a list [2; 3] [2; 3] .

An alternative form of writing is to use the construction statement (cons) in the form head :: tail . The lines below are completely equivalent to each other.

 [one;  2;  3]
 12;  3]
 1 :: 2 :: [3]
 1 :: 2 :: 3 :: []

Why did we mention the construction operator? It is useful when we start pattern matching for lists, we will discuss this a little later.

Data type for linked list

The data type for the linked list of integers will be int list ; The common type for a connected list of objects of type foo will be foo list .

This implies that all list items must be of the same type. But we can use polymorphic lists (for example, 'a list ), which are very useful when writing generic functions that work with a "list of something." (Note that 'a list does not mean that the elements of the list can be of different types, you still cannot, for example, make a list consisting of a mixture of integers and strings. This form of record means "a list of elements of any type, but one the same type ").

A good example is the function length defined in the List module. It doesn't matter if the list contains integers, or strings, or objects, or small furry animals, the List.length function can be used for any type of list. Thus the type List.lenght :

 List.length: 'a list -> int

Structures

In C and C ++ there is the concept of a struct (abbreviation for structure). Java has classes that can be used in a similar way, although using them requires more laborious work.

Let's look at a simple structure in C:

 struct pair_of_ints {
   int a, b;
 };

The simplest equivalent in OCaml is a tuple (touple), such as (3,4) , whose type is int * int . Unlike lists, tuples can contain elements of different types, so, for example, (3, "helo", 'x') is of type int * string * char .

A somewhat more complex implementation of the C's structure is the use of a record. Records, like C structures, allow you to use named elements. Tuples do not have named elements, but in return store the order in which elements enter the tuple. Here is the equivalent of a C structure using entries:

 type pair_of_ints = {a: int;  b: int} ;;

The above line defines the type, and below we actually create an object of this type:

 {a = 3;  b = 5}

Notice that we used ":" in the definition of the type and "=" at the time of creating an object of a given type.

Here are some examples for the top level (toplevel):

 # type pair_of_ints = {a: int;  b: int} ;;
 type pair_of_ints = {a: int;  b: int;  }
 # {a = 3;  b = 5} ;;
 -: pair_of_ints = {a = 3;  b = 5}
 # {a = 3} ;;
 Some record field labels are undefined: b

OCaml prohibits to leave elements of the record undefined.

Options (qualified associations and listings)

The type of objects “qualified union” does not exist in C, although there is some support for it in gcc. Here is an example of how often a qualified union is implemented in C:

 struct foo {
   int type;
 #define TYPE_INT 1
 #define TYPE_PAIR_OF_INTS 2
 #define TYPE_STRING 3
   union {
     int i;  // If type == TYPE_INT.
     int pair [2];  // If type == TYPE_PAIR_OF_INTS.
     char * str;  // If type == TYPE_STRING.
   } u;
 };

As you can see, this is not a very pleasant sight. And in general, not too safe: the programmer can use the value of ui at the moment when the structure contains a string. And the compiler does not help here to verify that all possible values have been checked inside the switch statement (this problem is partially solved using the enum ). The programmer may forget to set the value of the type field, which will lead to all sorts of fun and games with beetles. Yes, and all this is cumbersome.

Here is an elegant and compact version on OCaml:

 type foo = Nothing |  Int of int |  Pair of int * int |  String of string ;;

This type definition. The first part of each parcel, limited | called constructor. It can be called whatever you want, but it must begin with a capital letter. If a constructor can be used to determine a value, it is followed by of part , where type always begins with a small letter. In the example above, Nothing is a constant, and all other constructors are used with values.

For real creation , the following is written:

 Nothing
 Int 3
 Pair (4, 5)
 String hello
      & c.

Each expression from the example above is of type foo .

Note: of used in the type definition, but NOT in the creation of elements of the specified type.

Continuing. A simple C listing is defined as

 enum sign {positive, zero, negative};

It can also be written to OCaml as follows:

 type sign = Positive |  Zero |  Negative ;;

Recursive options (for trees)

Options can be recursive. Most often it is used to define tree structures. This is where the expressiveness of functional languages becomes visible:

 type binary_tree = Leaf of int |  Tree of binary_tree * binary_tree ;;

Here are some binary trees. To practice, try to draw them on paper:

 Leaf 3
 Tree (Leaf 3, Leaf 4)
 Tree (Tree (Leaf 3, Leaf 4), Leaf 5)
 Tree (Tree (Leaf 3, Leaf 4), Tree (Tree (Leaf 3, Leaf 4), Leaf 5))

Parameterized options

The binary tree in the examples above stored an integer in each sheet. But what if we want to describe the form of a binary tree, and clarifying what exactly to store in the leaves we leave for later? We can do this using parameterized (polymorphic) variants, like this:

 type 'a binary_tree = Leaf of' a |  Tree of 'a binary_tree *' a binary_tree ;;

This is a common type. The exact type that stores integers in each leaf is called int binary_tree . Similarly, the exact type that stores strings in each sheet is called string binary_tree . In the following example, we will define the types of some instances in the top level and let the type inference system show their types for us:

 # Leaf "hello" ;;
 -: string binary_tree = Leaf "hello"
 # Leaf 3.0 ;;
 -: float binary_tree = Leaf 3.

Note that the type name is written in reverse order. Compare this with the type names for lists, for example, int list .

In fact, it is not a coincidence that a' list also written in the same "in reverse order". Lists are just parameterized variants with the following (somewhat strange) definition:

  type 'a list = [] |  :: of 'a *' a list (* this is not real OCaml code *)

Although it is not quite a complete definition. Here is a substantially more precise definition:

 # type 'a list = Nil |  :: of 'a *' a list ;;
 type 'a list = Nil |  :: of 'a *' a list
 # Nil ;;
 -: 'a list = Nil
 # 1 :: Nil ;;
 -: int list = :: (1, Nil)
 # 1 :: 2 :: Nil ;;
 -: int list = :: (1, :: (2, Nil))

Recall what we said earlier - lists can be written in two ways: either with a small amount of syntactic sugar [1; 2; 3] [1; 2; 3] [1; 2; 3] , or more formally, as 1 :: 2 :: 3 :: [] . If we look at the definition of a' list above, we will see an explanation of the formal record.

Lists, structures, options - total

OCaml name	Example Type Definition	Use example
list	int list	[one; 2; 3]
tuple	int * string	(3, "hello")
record	type pair = {a: int; b: string}	{a = 3; b = "hello"}
variant	type foo = int of int \| Pair of int * string	Int 3
variant	type sign = Positive \| Zero \| Negative	Positive Zero
parameterized variant	type 'a my_list = Empty \| Cons of 'a *' a my_list	Cons (1, Cons (2, Empty))

Pattern matching (for data types)

One of the really cool features of functional programming languages is the ability to disassemble structures and make pattern matching for data. This is not a completely “functional” feature - we can imagine a kind of variation on C, which will allow you to do this, but in any case, this is a Cool Feature.

Let's start with the real problem for programming: I want to have a recording tool for mathematical expressions, such as n * (x + y), and perform symbolic multiplication in order to get n * x + n * y from the expression above.

Let's define the type of these expressions:

 type expr = Plus of expr * expr (* means a + b *)
           |  Minus of expr * expr (* means a - b *)
           |  Times of expr * expr (* means a * b *)
	   |  Divide of expr * expr (* means a / b *)
           |  Value of string (* "x", "y", "n", etc. *)
	   ;;

An expression of the form n * (x + y) is written as:

 Times (Value "n", Plus (Value "x", Value "y"))

Now we will write a function that displays (Value "n", Plus (Value "x", Value "y")) as something more similar to n * (x+y) . To do this, we will write two functions, one of which converts the expression into a beautiful string and one that outputs it (the reason for the separation is that I may need to write the same string file and I do not want to repeat the whole function for this).

 let rec to_string e =
   match e with
     Plus (left, right) -> "(" ^ (to_string left) ^ "+" ^ (to_string right) ^ ")"
   |  Minus (left, right) -> "(" ^ (to_string left) ^ "-" ^ (to_string right) ^ ")"
   |  Times (left, right) -> "(" ^ (to_string left) ^ "*" ^ (to_string right) ^ ")"
   |  Divide (left, right) -> "(" ^ (to_string left) ^ "/" ^ (to_string right) ^ ")"
   |  Value v -> v
   ;;

 let print_expr e =
   print_endline (to_string e) ;;

(Note: the ^ operator is used to concatenate strings)

But the display function in the work:

 # print_expr (Times (Value "n", Plus (Value "x", Value "y"))) ;;
 (n * (x + y))

General form for pattern matching:

 match object with
   pattern -> result
 |  pattern -> result
     ...

The left-hand side of pattern matching can be simple (as in the case of the to_string in the example above), or complex, with attachments. The next example is our function to multiply expressions in the form of n * (x + y) or in the form of (x + y) * n . For this we will use nested samples:

 let rec multiply_out e =
   match e with
     Times (e1, Plus (e2, e3)) ->
       Plus (Times (multiply_out e1, multiply_out e2),
             Times (multiply_out e1, multiply_out e3))
   |  Times (Plus (e1, e2), e3) ->
       Plus (Times (multiply_out e1, multiply_out e3),
             Times (multiply_out e2, multiply_out e3))
   |  Plus (left, right) -> Plus (multiply_out left, multiply_out right)
   |  Minus (left, right) -> Minus (multiply_out left, multiply_out right)
   |  Times (left, right) -> Times (multiply_out left, multiply_out right)
   |  Divide (left, right) -> Divide (multiply_out left, multiply_out right)
   |  Value v -> Value v
   ;;

Here it is in the work:

 # print_expr (multiply_out (Times (Value "n", Plus (Value "x", Value "y")))) ;;
 ((n * x) + (n * y))

How does multiply_out work? The key point is the first two samples. The first sample is Time (e1, Plus (e2, e3)) , which matches expressions of the form e1 * (e2 + e3) . Now take a look at the right side of the first comparison with the sample and make sure that it is equivalent to (e1 * e2) + (e1 * e3) .

The second sample does the same, except for using expressions of the form (e1 + e2) * e3 .

The remaining patterns do not change the form of the expression, but, in fact, they recursively call multiply_out on subexpressions. This provides the multiplication of all subexpressions (if you want to multiply only the top level of the expression, then you should replace all the remaining patterns with the simple rule e -> e ).

Can we do the opposite? (Factoring the general part of the subexpressions?) Of course! (although it is slightly more difficult). The following version only works for the top level of expressions. Of course, you can expand it for all levels of expression (and for more complex cases):

 let factorize e =
   match e with
     Plus (Times (e1, e2), Times (e3, e4)) when e1 = e3 -> Times (e1, Plus (e2, e4))
   |  Plus (Times (e1, e2), Times (e3, e4)) when e2 = e4 -> Times (Plus (e1, e3), e4)
   |  e -> e
   ;;

 # factorize (Plus (Times (Value "n", Value "x"), Times (Value "n", Value "y"))) ;;
 -: expr = Times (Value "n", Plus (Value "x", Value "y"))

The factorization function shows us a few more possibilities of the language. You can add custodians to each pattern match. The keeper is the condition that follows the match. Pattern matching only works if there is a pattern match. And the condition in when satisfied.

 match object with
   pattern [when condition] -> result
   pattern [when condition] -> result
     ...

The second possibility of the language is the operator = , which checks the “structural correspondence” between expressions. This means that it passes recursively into each expression and checks them to match at all levels.

OCaml can verify at compile time that you have covered all possible cases in your templates. I changed the definition of the type expr adding the Product option to it:

 type expr = Plus of expr * expr (* means a + b *)
           |  Minus of expr * expr (* means a - b *)
           |  Times of expr * expr (* means a * b *)
	   |  Divide of expr * expr (* means a / b *)
	   |  Product of expr list (* means a * b * c * ... *)
           |  Value of string (* "x", "y", "n", etc. *)
	   ;;

Then I tried to accumulate the to_string function without complementing it. OCaml issued the following warning:

 Warning: this pattern-matching is not exhaustive.
 This is not matched:
 Product _

Source: https://habr.com/ru/post/108920/

All Articles