📜 ⬆️ ⬇️

Dotty - the future of the Scala language

At the end of May, I was among the listeners of the Scala Days conference in Copenhagen. One of the key speakers was the creator of the Scala language, Martin Oderski. He spoke about the development of the language and, in particular, about the development of a compiler called Dotty. It is planned that a new compiler for version 3.0 will be developed on the basis of Dotty.

Martin has repeatedly spoken on this topic, and I would like to collect here all the latest information about Dotty - new key features and elements that were removed as unnecessary.


Martin Oderski. Scala development plan for the next few years
')
This post will be useful both for connoisseurs and completely newbies, for whom I talk about Dotty with a story about the features of Scala, as well as about its mathematical basis.

Scala is a multi-paradigm programming language, originally developed for the JVM (Java virtual machine). But now translators are also developed in JavaScript (ScalaJS) and in native code (Scala native). The name Scala comes from Sca lable la nguage ("scalable language"). Indeed, it is convenient to write on Scala both small scripts of several lines, which can then be run in the interpreter (read-eval-print loop, REPL), as well as complex systems running on a cluster of a large number of machines (in particular, systems built with using akka and apache spark frameworks).

Before developing Scala, Martin Oderski was involved in the development of generic types (generics) for Java, which appeared in Java 5 in 2004. Around that time, Martin got the idea of ​​creating a new language for the JVM, which would not have the huge baggage of backward compatibility that Java had at that time. Martin's idea was to combine a Java object-oriented approach with a functional approach similar to that used in Haskell, OCaml and Erlang, and still be a strongly typed language.

One of the main features of Scala as a strongly typed language is support for automatic type inference. Unlike other typed languages, where for each expression you need to explicitly specify the type, Scala allows you to define the type of variables, as well as the return type of the function implicitly. For example, the definition of a constant in Java is as follows:

final String s = "Hello world"; 

This is equivalent to the following Scala expression:

 val s = "Hello world" 

However, in Scala, you can also specify the type of expression explicitly, for example, in the case when the variable must have a type that is a supertype of the specified expression.

 val cs: CharSequence = "Hello world" 

The rules of implicit type inference Martin Oderski considers the main feature of the language that distinguishes it from others. He is currently leading the work on the improvement of this system, as well as on its mathematical rationale, called the DOT-calculus (DOT-calculus).

DOT calculus


DOT stands for dependent object types, i.e. deduction of types of dependent objects. By dependent type is meant the type resulting from a particular operation. In the current version of the language, there is already a certain set of rules for type inference based on existing ones, for example, restrictions on the inheritance hierarchy from above or below, or type inference depending on the argument (path-dependent type). Let's give a small example:

 trait A { type B def someFunction(b: B): B } trait C[X <: D] { type Y = (X, D) def fun1(x: X): Y def fun2(a: A): aB } 

In this example, we define two traits, A and C. trait A has a field-type B, and also defines some operation SomeFunction, which accepts a type B parameter as input. The value of type B is determined depending on the specific implementation of A. trait C has the parameter-type X, which must be a successor of type D. trait C, defines the field-type Y, as well as two functions: fun1 and fun2. fun1 takes a value of type X as input and returns a value of type Y. fun2 takes a value of type A, but the type of the return value is determined by the value of the field-type B of argument A.

DOT-calculus is a mathematical formalization of the rules for such an inference. Basic elements of DOT calculus:

  1. Top type (Any) - the type that lies at the very top of the hierarchy is the superclass for all types.
  2. Bottom type (Nothing) - the type that lies at the bottom of the hierarchy is a subtype of all types.
  3. Type declaration - type declaration in the specified limits above and below.
  4. Type selection - type inference depending on a variable.
  5. Function - a function that takes as input one or more arguments of various types and has a specific type of value.

In addition, the DOT calculus defines the following set of valid operations on types:

  1. Inheritance. Any type, if it is not borderline (in our case, Any and Nothing), can be either a supertype or a subtype of another type. Each type is a supertype and a subtype for itself.
  2. Creating structured types (Records), including other types (by analogy with objects and structures for variables).
  3. Union of types The resulting type will be the disjunction of fields and operations of the original types.
  4. Type intersection The resulting type will be a conjunction of fields and operations of the original types.
  5. Recursive type definition.

A detailed review of DOT is beyond the scope of this publication. More information about DOT-calculus can be found here .

Overview of innovations in Dotty


DOT calculus is the mathematical basis for the Dotty compiler. Actually, this is reflected in its name.

Now Dotty is an experimental platform for developing new language concepts and compilation technologies. According to Martin Oderski, the goal of developing Dotty is to strengthen the basic structures and get rid of unnecessary elements of the language. Dotty is currently developing as an independent project, but it is planned that over time it will join the main Scala branch.



A complete list of innovations can be found on the official Dotty website . And in this article I will consider only those innovations in Dotty that I consider the most important.

1. Type Intersections


Type intersection is defined as a type that simultaneously possesses all the properties of the original types. Suppose we have defined some types A and B:

 trait A { def fun1(): Int } trait B { def fun2(): String } 

Type C is defined as the intersection of types A and B:

 type C = A & B 

In this case, we can write the following function:

 def fun3(c: C): String = s"${c.fun1()} - ${c.fun2()}" 

As follows from the example, with the c parameter we can call both the fun1 () method defined for type A and the fun2 () method defined for type B.

In the current version of the compiler, this feature is supported through the with construction, for example:

 type C = A with B def fun3(c: C): String = s"${c.fun1()} - ${c.fun2()}" 

There is a significant difference between & and with constructions: & is a commutative operation, that is, type A & B is equivalent to type B & A, while A with B is not equivalent to B with A. Let us give an example:

 trait A { type T = Int } trait B { type T = String } 

For type A with B, the value of type T is equal to Int, since A takes precedence over B. In Dotty, for type A & B, type T will be equal to Int & String.

The with construction for types is currently supported in Dotty, however, it is declared as deprecated and is planned to be removed in the future.

2. Combining Types


A type union is defined as a type that has the properties of one of the source types. Unlike type intersection, in the current version of the scala compiler there is no analogy for type combining. For values ​​with a merged type, the standard library has the Either type [A, B]. Suppose we have the following types defined:

 case class Person(name: String, surname: String) case class User(nickname: String) 

In this case, we can write the following function:

 def greeting(somebody: Person | User) = somebody match { case Person(name, surname) => s"Hello, $name $surname" case User(nickname) => s"Hello $nickname, (sorry, I actually don't know your real name)" } 

The union of types gives us a shorter form of the record in comparison with the use of Either in the current version of the language:

 def greeting(somebody: Either[Person, User]) = somebody match { case Left(Person(name, surname)) => s"Hello, $name $surname" case Right(User(nickname)) => s"Hello $nickname, (sorry, I actually don't know your real name)" } 

A type union, like an intersection, is also a commutative operation.

One of the uses of type combining is to completely eliminate the null construction. Now, as an alternative to using null, is the Option construction, however, since it is implemented as a wrapper, this slightly slows down the work, because additional packing and unpacking operations are necessary. With use of association of types permission will be carried out at a compilation stage.

 def methodWithOption(s: Option[String]) = s match { case Some(string) => println(string) case None => println("There's nothing to print") } type String? = String | Null def methodWithUnion(s: String?) = s match { case string: String => println(string) case Null => println("There's nothing to print") } 

3. Determination of the closest subtypes and supertypes


With the introduction of new operations on such composite types as union and intersection, the rules for calculating the nearest types according to the inheritance hierarchy have changed. Dotty determines that for any type T and U the closest supertype will be T | U, and the closest subtype is T & U. Thus, the so-called subtyping lattice is formed. She is in the picture below.



In the current implementation of Scala, the nearest supertype is defined as a common supertype for the two types. So, in general, for two case classes T and U, the nearest supertype will be Product with Serializable. In Dotty, this is uniquely defined as T | U.

For the case of the closest subtype in the current implementation of Scala there is no single answer. The closest subtype can be either T with U or U with T. As mentioned previously, the with operation is not commutative, so the type of T with U is not equivalent to the type of U with T. Dotty eliminates this uncertainty by defining the nearest subtype as T & U. The operation & is commutative, therefore the value is unique.

 val s = "String" val i  = 10 val result = if (true) s else i 

In Scala 2.12, the Any value will be assigned to the result value. In Dotty, if you do not explicitly specify the type for result, it will also be assigned the type Any. However, we can explicitly specify the type of the result:

 val result: String | Int = if (true) s else i 

Thus, we have limited the set of valid values ​​for result to String and Int.

4. Lambda expressions for types



One of the most difficult language features in Scala is the support of so-called Higher-kinded types. The essence of higher order types is to further increase the level of abstraction using generalized programming. More details about higher order types are described in this article . We will consider a specific example, which is taken from the book Programming Scala by Dean Wampler and Alex Payne (2nd edition) .

 trait Functor[A, +M[_]] { def map2[B](f: A => B): M[B] } implicit class SeqFunctor[A](seq: Seq[A]) extends Functor[A, Seq] { override def map2[B](f: (A) => B): Seq[B] = seq map f } implicit class OptionFunctor[A](opt: Option[A]) extends Functor[A, Option] { override def map2[B](f: (A) => B): Option[B] = opt map f } 

Here we create the type Functor, which is parameterized by two types: the type of value A and the type of some wrapper M. In Scala, the expression M (without parameters) is called a type constructor. By analogy with object constructors, which can take a certain set of parameters in order to create a new object, type constructors can also take parameters in order to define a particular type. Therefore, in order to define a specific type for Functor from our example, several steps must be performed:

  1. The type for A and B is determined.
  2. Type is defined for M [A] and M [B]
  3. Type is defined for Functor [A, M]

Thus, the compiler can determine the type for the Functor only after the third iteration, so it is considered the highest order type. In general, a higher order type is a type, which takes both simple types and type constructors as parameters.

In the example above, there is one drawback: in parameters of type Functor, the constructor of type M takes one parameter. Suppose we need to write the method map2, which will change the values ​​of Map [K, V], while keeping the keys unchanged. Dean Wampler in his book proposes the following solution:

 implicit class MapFunctor[K,V1](mapKV1: Map[K,V1]) extends Functor[V1,({type λ[α] = Map[K,α]})#λ] { def map2[V2](f: V1 => V2): Map[K,V2] = mapKV1 map {   case (k,v) => (k,f(v)) } } 

In this example, we create a new constructor of type λ, which takes one parameter, closing the first parameter K for the Map. This implementation is quite confusing, since in order to create a constructor of type λ, we first create the structural type {type λ [α] = Map [K, α]}, in which we define the field type λ with one parameter, and then we pull out it through the mechanism of type projection (which Dotty decided to get rid of).

For such cases, Dotty has developed a lambda expression mechanism for types. Its syntax is as follows:

 [X] => Map[K, X] 

This expression is read as a type that has one parameter that constructs the Map type, whose key type K can be any, and the value type is equal to the parameter. Thus, we can write Functor to work with values ​​in Map as follows.

 implicit class MapFunctor[K,V1](mapKV1: Map[K,V1]) extends Functor[V1, [X] => Map[K,X]] { def map2[V2](f: V1 => V2): Map[K,V2] = mapKV1 map {  case (k,v) => (k,f(v)) } } 

As you can see from this example, the lambda expression syntax for types entered in Dotty allows you to simplify the definition of the MapFunctor class, getting rid of all confusing constructs.

Lambda expressions for types also allow imposing restrictions on covariance and contravariance on arguments, for example:

[+ X, Y] => Map [Y, X]

5. Adaptivity of arity of functions under tuples


This innovation is syntactic sugar, which simplifies working with collections from tuples (tuple), as well as in the general case with all implementations of the Product class (these are all case classes).

 val pairsList: List[(Int, Int)] = List((1,2), (3,4)) 

 case class Rectangle(width: Int, height: Int) val rectangles: List[Rectangle] = List(Rectangle(1,2), Rectangle(3,4)) 

Now, to work with collections of this type, we can use either functions with one argument:

 val sums = pairsLIst.map(pair => pair._1 + pair_2) val areas = rectangles.map(r => r.width * r.height) 

Or we can use partial functions:

 val sums = pairsLIst.map { case (a, b) => a + b } val areas = rectangles.map { case Rectangle(w, h) => w * h } 

Dotty offers a more compact and convenient option:

 val sums = pairsLIst.map(_ + _) val areas = rectangles.map(_ * _) 

Thus, for subclasses of type Product Dotty selects a function whose arity is equal to the arity of the original product.

6. Parameters for traits


Dotty has finally added the ability to set parameters when defining traits. This was not done before due to the fact that in the case of complex inheritance hierarchies, the parameter values ​​were uncertain. Dotty introduced additional restrictions on the use of parameterized traits.


 trait A(x: Int) trait B extends A trait B1 extends A(42) //  class C extends A(42) 


 class D extends A //  class D1 extends C class D2 extends C with A(84) // ,        C 


 class E extends B // ,       A class E extends A(42) with B 

7. Non-blocking lazy values


In the current version of Scala, deferred initialization of values ​​(lazy val) is implemented using the synchronization mechanism on the object in which it is contained. This solution has the following disadvantages:


 object A { lazy val a1 = B.b1 lazy val a2 = 42 } object B { lazy val b1 = A.a2 } 

If the two threads simultaneously begin to initialize the values ​​of a1 and b1, they get a lock on objects A and B, respectively. Since the initialization of b1 requires the value of a2, which has not yet been initialized in object A, the second thread waits to release the lock on object A, keeping the lock on object B. At the same time, the first thread needs to access the b1 field, but it is in turn unavailable because of the blocking by the second thread of object B. As a result, we had a deadlock, or Deadlock. (This example is taken from the report of Dmitry Petrashko )

The Dotty for lazy values ​​canceled thread-safe initialization. In the case where secure publication of a value is required for use by multiple threads, this variable should be annotated as
  @volatile 

 @volatile lazy val x = {... some initialization code …} 

8. Enumerations


Dotty has made support for enum types. The syntax for their definition was made by analogy with Java.

 enum Color { case Red, Green, Blue } 

Enumeration support is implemented at the source code parsing level. At this stage, the enum construction is converted to the following form.

 sealed class Color extends Enum object Color { private def $new(tag: Int, name: String) = {   new Color {     val enumTag = tag     def toString = name     //       } } val Red = $new(0, "Red") val Green = $new(1, "Green") val Blue = $new(2, "Blue") } 

As in Java, the Dotty enumerated type also supports parameters:

 enum Color(code: Int) { case Red extends Color(0xFF0000) case Green extends Color(0x00FF00) case Blue extends Color(0x0000FF) } 

Thus, enumerated types possess all the properties of sealed hierarchies of case classes. In addition, enumerated types allow you to get a value by name, by index, or a collection of all valid values.

 val green = Color.enumValue(1) val blue = Color.enumValueNamed("Blue") val allColors = Color.enumValues 

9. Functional types for implicit parameters


In the current implementation of the Scala language, implicit (implicit) parameters of functions are the canonical way to represent the execution context.

 def calculate(a: Int, b: Int)(implicit context: Context): Int = { val x = context.getInt("some.configuration.parameter") a * x + b } 

In this example, the context is passed implicitly, its value is taken from the so-called implicit scope.

 implicit val context: Context = createContext() val result = calculate(1,2) 

Thus, each time we call the calculate function, we need to pass only the parameters a and b. The compiler for each such call will substitute the value of context, taken from the corresponding implicit scope. The main problem of the current approach is that in the case of a large number of functions that accept the same set of implicit parameters, they must be specified for each of these functions.

In Dotty, a function that accepts implicit parameters can be represented as a type:

 type Contextual[T] = implicit Context => T 

By analogy with the usual functions that are implementations of the Function type, all implementations of the type implicit A => B will be a subtype of the following trait.

 trait ImplicitFunction1[-T0, +R] extends Function1[T0, R] { def apply(implicit x0: T0): R } 

Dotty provides various definitions of the ImplicitFunction trait, depending on the number of arguments, up to and including 22.

Thus, using the Contextual type, we can override the function calculate as follows:

 def context: Contextual[Context] = implicitly[Context] 

 def calculate(a: Int, b: Int): Contextual[Int] = { val x = context.getInt("some.configuration.parameter") a * x + b } 

Here we define a special function def context, which we get the necessary Context from the environment. Thus, the body of the calculate function has not changed much, except for the fact that context is now outside the brackets, and now it does not need to be declared in each function.

What is not included in Dotty


At the end of the review I will tell about those elements that have been removed from the language. As a rule, the need for them disappeared after the introduction of new, more convenient structures or their implementation became more problematic and began to cause conflicts, so it turned out that they were easier to remove.

Over time, Dotty will become the basis for the new version of the Scala language, and the version number will most likely be already 3.xx. This means that backward compatibility with previous versions of 2.x.x will not be provided. However, the Dotty development team promises that special tools will be developed that will facilitate the transition from version 2.x.x to 3.x.x.

1. Projection types


Projection types (type projections) are constructions of the form T # A, where T can be any type, and A is a type field of type T. For example:

 trait T { type A val a: A def fun1(x: A): Any def fun2(x: T#A): Any } 

Suppose we have two variables defined:

 val t1: T = new T { … } val t2: T = new T { … } 

In this case, the argument of the fun1 method at t1 can be only the value of t1.a, but not t2.a. The argument of the fun2 method can be either t1.a or t2.a, since the argument of the method is defined as “any value of the type-A field of the T type”.

This design was excluded, as it is not stable and can lead to collisions when crossing types. For example, the code below will compile, but will result in a ClassCastException at runtime (taken from here ):

 object Test { trait C { type A } type T = C { type A >: Any } type U = C { type A <: Nothing } type X = T & U def main(args: Array[String]) = {   val y: X#A = 1   val z: String = y } } 

Instead of type projections it is proposed to use dependent types (path-dependent types) or implicit parameters.

2. Existential types


Existential types show that there is some unknown type that is a parameter for another type. The value of this type does not interest us, the fact that it exists is simply important to us. Hence the name.This type of type has been added to Scala primarily to ensure compatibility with parameterized mask (wildcard) types in Java. For example, any collection in Java is parameterized, and in case we are not interested in the type of the parameter, we can set it through the mask as follows:

 Iterable<?> 

if we are not interested in the type of the parameter, but we know that restrictions are imposed on it, then in this case the type is defined as:

 Iterable<? extends Comparable> 

In Scala, these types will be defined as follows:

 Iterable[T] forSome { type T } //      Iterable[T] forSome { type T <: Comparable } //      

Scala also has the ability to parameterize the type of mask:

 Iterable[_] //      Iterable[_ <: Comparable] //      

In recent versions, these forms of writing are fully equivalent, therefore the form X [T] forSome {type T} was decided to be abandoned, since it does not comply with the principles of DOT and entails additional difficulties in compiler development. In general, the design forSome has not received wide distribution, as it is quite cumbersome. Now practically everywhere where integration with types from Java is required, a construction with mask parameterization is used, which was decided to be left in Dotty.

3. Pre-initialization


In Scala traits do not have parameters. This created difficulties in the case when trait has a part of abstract parameters, on which some specific parameters depend. Consider the following example:

 trait A { val x: Int val b = x * 2 } class C extends A { val x = 10 } val c = new C 

In this case, the value of cb is 0, not 20, because, according to the initialization rules, the body of the trait is first initialized in Scala and only then the class. At the time of initialization of the field b, the value for x is not yet defined, and therefore 0 is taken as the default value for the type Int.

To solve this problem Scala was introduced syntax pre-initialization. With it, you can fix the bug in the previous example:

 class C extends {val x = 10} with A 

The disadvantage of this construction is that here one has to resort to an unobvious solution instead of just taking advantage of polymorphism. With the introduction of parameters for traits, the need for preliminary initializers has disappeared, and now our example can be implemented in a simpler and more understandable way:

 trait A(x: Int) { val b = x * 2 } class C extends A(10) 

4. Delayed initialization


Scala has a special trait for deferred initialization.

 trait DelayedInit { def delayedInit(body: => Unit): Unit } 

Classes that implement this trait, when initialized, call the method delayedInit, from which you can already call the initializer for the class using the body parameter:

 class Test extends DelayedInit { def delayedInit(body: => Unit): Unit = {   println("This is delayedInit body")   body } println("This is class body") } 

Thus, when we create a new Test object, we get the following output:

This is a delayedInit body
This is a class body

Trait DelayedInit declared as Deprecated in Scala. In Dotty, it was completely excluded from the library due to the fact that traits can now be parameterized. Thus, using call-by-name semantics, similar behavior can be achieved.

 trait Delayed(body: => Unit) { println("This is delayed body") body } class Test extends Delayed(println("This is class")) 


Similarly, when creating a new Test, the output will be:

This is delayed body
This is class

5. Procedural syntax


To unify the declaration of functions, it was decided to abandon the procedural syntax for defining functions that have a return type of Unit. So instead

 def run(args: List[String]) { //Method body } 

now you need to write

 def run(args: List[String]): Unit = { //Method body } 

It is worth noting that many IDEs, in particular in IntelliJ IDEA, now automatically replace the procedural syntax with the functional one. In Dotty, it was abandoned at the compiler level.

Conclusion


In general, Dotty offers fairly simple and interesting solutions to long-overdue problems that arise when developing on Scala. For example, in my practice I somehow came across the need to write a method that had to accept several types of objects that were not connected through the inheritance hierarchy as input. It was necessary to use Any as the type for the argument followed by the pattern matching. Dotty could solve this problem by combining types. In addition, I also lack the parameters for traits. In some cases, they would be very helpful.

In the Scala community, judging by the reports at the conference, they are also waiting for the release of Dotty. In particular, in one report devoted to the akka framework, they said that it would be possible to make the actors typed, specifying the type of parameters in the receive method, which unites all messages.

Dotty can already be tried: on the Dotty website there are instructions on how to install and configure it. However, the authors do not recommend using it in an industrial code, as it is still unstable.

About the author

My name is Alexander Tokarev, and I have been developing server software for over 10 years. I started as a PHP developer, then switched to Java and recently switched to Scala. Since 2015 I work in the company CleverDATAwhere Scala is one of the main development languages, along with Java and Python. We use Scala primarily to develop processes for processing large amounts of data using Apache Spark, as well as to build highly loaded REST services for interaction with external systems based on Akka Streams.

Additional materials

  1. Martin Oderski report at Scala Days Copenhagen conference, May 2017: Video
  2. The official website of the language Scala
  3. The official site of the project Dotty
  4. Article in Wikipedia about the Scala language
  5. DOT calculus
  6. Article The essence of Scala
  7. Martin Oderski report about DOT on YOW! Nights, February 2017
  8. Report by Dmitry Petrashko , one of the developers of Dotty
  9. Higher-Kinded types
  10. Implicit function types

Source: https://habr.com/ru/post/334018/


All Articles