A new version of the de facto standard Haskell compiler - GHC 8.2.1 has been released! This release is rather an iterative improvement, but at the same time it has a number of new interesting features related to the convenience of writing code, expressiveness of the language and the performance of compiled programs. Consider the most interesting, in my opinion, changes!
Compact regions
Just one of the changes that directly affect performance. Now you can mark a certain data set as one large object (region), once you run a garbage collector (compact) through it and consider it alive as long as there is at least one link inside this region, without getting inside and without running along the object graph in subsequent builds.
This is useful, for example, if the program at the very beginning of its work creates a large data set, which is then used by most of its subsequent life. For example, the
official description cites as an example a dictionary for a spell checker with a time gain for garbage collection one and a half times, and in some of my tests the time spent in the GC is reduced by a factor of 2-3.
A papir with a description of formal logic and implementation leads (probably, on slightly more synthetic benchmarks) generally some crazy numbers (p. 9, plots 7-8), where the gain is sometimes about the order, and Haskelean GC starts to overtake such production-ready - a monster like Oracle JVM with its dimmed GC.
It’s quite simple to use: the
compact :: a -> IO (Compact a)
function
compact :: a -> IO (Compact a)
from the
Data.Compact
module is
Data.Compact
to create a region from some value, after which you can get the original (but already “compressed”) value via
getCompact :: Compact a -> a
. In total, it might look something like this:
')
compacted <- getCompact <$> compact someBigHeavyData
Naturally, when creating a compact region, the object is computed almost entirely (more specifically, it is enough to prove the region is closed), so, for example, to compactify an infinite list is not a good idea.
In addition, the resulting compact region can be serialized and deserialized. However, with reservations: the deserializing program should be, in general, exactly the same as serializing, right up to the address space, so even the ASLR will break everything.
A small reference to FranklinIf you carefully read the above article, you can see that the article adds the Compactable a
class, and the compact
function has the signature Compactable a => a -> IO (Compact a)
. In the real API, this reference is missing, and the postscript in the documentation for the function says that if there is mutable data in the region and similar non-compacted biak, an exception will be thrown. So it seems that in this case, the authors sacrificed type safety for the sake of usability.
Deriving strategies
GHC has at least three and a half mechanisms for outputting instance class instans:
1. The output of standard classes (such as
Show
,
Read
and
Eq
) and those that the GHC can output itself (all sorts of
Functor
and
Traversable
, as well as
Data
,
Typeable
and
Generic
).
2. Output via default method implementations, enabled via
DeriveAnyClass
extension
exampleIn this case, the announcement
{-# LANGUAGE DeriveAnyClass #-} class Foo a where doFoo :: a -> b doFoo = defaultImplementation data Bar = Bar deriving(Foo)
unfolds in
data Bar = Bar instance Foo Bar
which is useful if
Foo
can be output through the Generics mechanism (like, say, instances for converting to JSON with
Aeson or CSV with
Cassava ), or if the minimum definition of
Foo
not required to have any methods at all (which is useful when writing more academic code when the time class is used, say, as a witness to the conditions of the theorem).
3. In the case of type aliases created via
newtype
, it is also possible to use the implementation of the timeclasses for the base type directly through the
GeneralizedNewtypeDeriving
extension:
{-# LANGUAGE GeneralizedNewtypeDeriving #-} newtype WrappedInt = WrappedInt { unwrap :: Int } deriving(Unbox)
So, the problem is that before GHC 8.2 it was not possible to specify which mechanism should be used if several extensions are enabled at the same time — say, while simultaneously enabling
DeriveAnyClass
and
GeneralizedNewtypeDeriving
first extension had priority, which is not always desirable and fact, prevented the use of both extensions in the same module.
Now you can write
{-# LANGUAGE DeriveAnyClass, GeneralizedNewtypeDeriving, DerivingStrategies #-} newtype Baz = Baz Quux deriving (Eq, Ord) deriving stock (Read, Show) deriving newtype (Num, Floating) deriving anyclass C
You can also specify a strategy in standalone deriving-declarations:
data Foo = Foo deriving anyclass instance C Foo
Interestingly, in earlier versions it was suggested to use
{-# #-}
, but in the end the above approach was implemented.
Other improvements to autorum instances
DeriveAnyClass
. Firstly, now it is not limited to time classes with signature
*
or
* -> *
. Secondly, now instance constraints are inferred from the default implementation framework of the implementations. So, for example, such a code has not been typed before:
{-# LANGUAGE DeriveAnyClass, DefaultSignatures #-} class Foo a where bar :: a -> String default bar :: Show a => a -> String bar = show baz :: a -> a -> Bool default baz :: Ord a => a -> a -> Bool baz xy = compare xy == EQ data Option a = None | Some a deriving (Eq, Ord, Show, Foo)
since the instance for
Foo
had no constraints
(Ord a, Show a)
, and the compiler offered to add them by hand. Now corresponding constraints are automatically added to the output instance.
GeneralizedNewtypeDeriving
also wised up. In some cases (in fact, in most of the practically interesting ones) types associated with the time class are also derived automatically. So for example for types
class HasRing a where type Ring a newtype L1Norm a = L1Norm a deriving HasRing
the compiler will generate an instance
instance HasRing (L1Norm a) where type Ring (L1Norm a) = Ring a
Backpack
Now, OCaml's adherents have a little less reason to troll Haskelista: GHC 8.2 has a much more advanced system of modules (compared to what it was before) - Backpack. This in itself is quite a large and complex change that deserves a separate article, so I simply refer to the author’s
dissertation implementation with a formal description and a shorter
example .
Other
We list the selected other changes:
- In the insides of the compiler itself, the concept of join points is formalized - blocks of code that always execute after a given branch. It gives an insignificant, but statistically significant increase in the performance of the compiled code and opens up scope for further optimizations.
- Improved performance on NUMA systems.
- Added the ability to allocate fewer threads for the garbage collector than directly for the
mutator of the program itself. Simon Marlow describes how and why it was implemented in the context of using Haskell on Facebook in this post. - Improvements in support of levity polymorphism, responsible for the possibility of writing functions that work with types that live in
*
(more or less common types that we love so much), and those that live in #
(non-lazy unboxed-types). - Improvements in type safety reflection.
- The ability to use ld.gold or ld.lld instead of the standard linker ld.
- Error messages are now made in color and with pointers to the position of the error in the clang style.