High kind data

Yes, you have not dreamed, and you have not misheard - it is a high kind. A kind is a term of type theory, meaning in essence the type of [data] type.

But first, some lyrics.

Habré published several articles, which described in detail the method of data validation in functional languages.
')
This article - my five cents in this HYIP. We will look at data validation in Haskell.

Type validation

Previously, an example of a validation technique using type validation was considered:

type EmailContactInfo = String type PostalContactInfo = String data ContactInfo = EmailOnly EmailContactInfo | PostOnly PostalContactInfo | EmailAndPost (EmailContactInfo, PostalContactInfo) data Person = Person { pName :: String, , pContactInfo :: ContactInfo, }

Using this method is simply impossible to create incorrect data. However, despite the fact that such validation is very easy to create and read, using it forces you to write a lot of routine and make many changes to the code. So, the use of such a method is limited only for really important data.

Validation of data of a high kind

In this article, we will look at another validation method — using data of a higher kind.

Suppose we have a data type:

 data Person = Person { pName :: String , pAge :: Int }

And we will validate the data only if all the fields in the record are valid.
Since Haskell is more functional than most functional languages, you can easily get rid of most of the routines.

Here it is possible and therefore this method is widely used among library authors on Haskell.

In order to discuss, let's imagine that we want the user to fill in the identity data through a web form or something else. In other words, it is possible that they may spoil the filling of some of the information without necessarily canceling the rest of the data structure. If they successfully complete the entire structure, we would like to get a filled-in Record Person .

One of the modeling methods is to use the second data type:

 data MaybePerson = MaybePerson  { mpName :: Maybe String  , mpAge :: Maybe Int  }

where, recall, the optional type is used:

 -- already in Prelude data Maybe a = Nothing | Just a

From here, the validation function is quite simple:

 validate :: MaybePerson -> Maybe Person validate (MaybePerson name age) =  Person <$> name <*> age

A little more detail about the functions (<$>) and (<*>)

The function (<$>) is only an infix synonym for the functor fmap

 -- already in Prelude fmap :: Functor f => (a -> b) -> fa -> fb (<$>) :: Functor f => (a -> b) -> fa -> fb (<$>) = fmap

And (<*>) is a function of applying an applicative functor

 -- already in Prelude (<*>) :: Applicative f => f (a -> b) -> fa -> fb

And for the optional type, these functions have the following definition

 -- already in Prelude (<$>) :: (a -> b) -> Maybe a -> Maybe b _ <$> Nothing = Nothing f <$> (Just a) = Just (fa) (<*>) :: Maybe (a -> b) -> Maybe a -> Maybe b (Just f) <*> m = f <$> m Nothing <*> _ = Nothing

Our validation works, but it is annoying to write by hand an additional routine code, since this is done completely mechanically. Moreover, the duplication of these efforts means that we will need to use our brains in the future to make sure that all three definitions remain synchronized. Would it be great if the compiler could handle it?

SURPRISE! HE CAN! We will help the high race!

In Haskell there is such a thing as a genus, it is also a kind , and the simplest and most accurate enough explanation is that a genus is a type of [data] type. The most widely used genus is * , which can be called "finite"

 ghci> :k Int Int :: * ghci> :k String String :: * ghci> :k Maybe Int Maybe Int :: * ghci> :k Maybe String Maybe String :: * ghci> :k [Int] [Int] :: *

And what kind of Maybe ?

 ghci> :k Maybe Maybe :: * -> * ghci> :k [] [] :: * -> *

This is an example of a high kind.

Notice that we can describe both Person and MaybePerson with the following unique data of a high gender:

 data Person' f = Person { pName :: f String , pAge :: f Int }

Here we parameterize Person ' over something f (with the genus * -> * ), which allows us to do the following to use the source types:

 type Person = Person' Identity type MaybePerson = Person' Maybe

Here we use a simple wrapper type of Identity.

 -- already in Prelude newtype Identity a = Identity { runIdentity :: a }

Although this works, it is a bit annoying in the case of Person , since now all our data is wrapped inside Identity :

 ghci> :t pName @Identity pName :: Person -> Identity String ghci> :t runIdentity. pName runIdentity. pName :: Person -> String

We can eliminate this annoyance trivially, after which we consider why it is such a definition of Person ' that is really useful. To get rid of identifiers, we can use the family of types (function at the type level), which erases them:

 {-# LANGUAGE TypeFamilies #-} -- "Higher-Kinded Data" type family HKD fa where HKD Identity a = a HKD fa = fa data Person' f = Person  { pName :: HKD f String  , pAge :: HKD f Int  } deriving (Generic)

Conclusion Generic we need for the 2nd part of the article.

Using the family of HKD types means that GHC automatically erases any Identity wrappers in our views:

 ghci> :t pName @Identity pName :: Person -> String ghci> :t pName @Maybe pName :: Person -> Maybe String

and it is precisely such a version of Person of a high kind that can be best used as a replacement for our original one.

The obvious question is that we bought ourselves with all this work done. Let's go back to the validation wording to help us answer this question.

We can now rewrite it using our new technology:

 validate :: Person' Maybe -> Maybe Person validate (Person name age) = Person <$> name <*> age

Not a very interesting change? But the intrigue is how little you need to change. As you can see, only our type and pattern match our initial implementation. What is neat here is that we have now consolidated Person and MaybePerson into the same representation, and therefore they are no longer related only in a nominal sense.

Generics and more general validation function

The current version of the validation function must be written for each new data type, even though the code is quite routine.

We can write a validation version that will work for any higher data type.

You could use Template Haskell, but it generates code and is used only in extreme cases. We will not.

The secret is to contact GHC.Generics . If you are unfamiliar with the library, it provides an isomorphism from the regular Haskell data type into a general representation that can be structurally controlled by a smart programmer (that is, us.) By providing code for changing the constant types, products and coproducts, we can force GHC write code for us independent of the type. This is a very neat technique that will tickle your toes, if you have not seen it before.

We want to end up with something like:

 validate :: _ => d Maybe -> Maybe (d Identity)

From the point of view of Generics, any type can most generally be divided into several structures:

 -- undefined data, lifted version of Empty data V1 p -- Unit: used for constructors without arguments, lifted version of () data U1 p = U1 -- a container for ac, Constants, additional parameters and recursion of kind * newtype K1 icp = K1 { unK1 :: c } -- a wrapper, Meta-information (constructor names, etc.) newtype M1 itfp = M1 { unM1 :: fp } -- Sums: encode choice between constructors, lifted version of Either data (:+:) fgp = L1 (fp) | R1 (gp) -- Products: encode multiple arguments to constructors, lifted version of (,) data (:*:) fgp = (fp) :*: (gp)

That is, there can be non-liberalized structures, non-argument structures, constant structures, meta-informational (constructors, etc.). As well as associations of structures - total or associations of the type of OR-OR and animated, they are also cortex associations or records.

First we need to determine the class that will be the workhorse of our transformation. By experience, this is always the most difficult part - the types of these generalized transformations are extremely abstract and, in my opinion, very difficult to reason. Let's use:

 {-# LANGUAGE MultiParamTypeClasses #-} class GValidate io where gvalidate :: ip -> Maybe (op)

You can use the “soft and slow” rules for reasoning about how your class type should look, but in general you will need both an input parameter and an output parameter. Both of them must be of the kind * -> * , and then transmit this existentialized p , thanks to dark, unholy reasons that are not known to humanity. Then, using a small checklist, we go through to help wrap our head around this nightmarish hellish landscape, which we will go around in succession later.

In any case, our class is already in our hands, now we just need to write out copies of our class for different types of GHC.Generic . We can start with the base case, which we should be able to check, namely, Maybe k :

 {-# LANGUAGE FlexibleInstances #-} {-# LANGUAGE TypeOperators #-} instance GValidate (K1 a (Maybe k)) (K1 ak) where -- gvalidate :: K1 a (Maybe k) -> Maybe (K1 ak) gvalidate (K1 k) = K1 <$> k {-# INLINE gvalidate #-}

K1 is a "constant type", which means that this is where our structural recursion ends. In our Person ' s example, this will be pName :: HKD f String .

In most cases, when you have a base case, the rest are simply mechanically determined instances for other types. If you do not need access to metadata about the original type anywhere, these instances will almost always be trivial homomorphisms.

We can start with multiplicative structures — if we have GValidate io and GValidate i 'o' , we should be able to run them in parallel:

 instance (GValidate io, GValidate i' o') => GValidate (i :*: i') (o :*: o') where gvalidate (l :*: r) = (:*:) <$> gvalidate l <*> gvalidate r {-# INLINE gvalidate #-}

If K1 refers directly to our Person ' selectors, (: * :) roughly corresponds to the syntax with a comma, with which we separate our fields in the record.

We can define a similar instance of GValidate for coproducts or summary structures (the corresponding values are separated in the data definition):

 instance (GValidate io, GValidate i' o') => GValidate (i :+: i') (o :+: o') where gvalidate (L1 l) = L1 <$> gvalidate l gvalidate (R1 r) = R1 <$> gvalidate r {-# INLINE gvalidate #-}

In addition, since we don’t care about finding metadata, we can simply define GValidate io on the metadata constructor:

 instance GValidate io => GValidate (M1 _a _b i) (M1 _a' _b' o) where gvalidate (M1 x) = M1 <$> gvalidate x {-# INLINE gvalidate #-}

Now there are uninteresting structures for a complete description. We will provide them with the following trivial copies for non-residential types ( V1 ) and for designers without any parameters ( U1 ):

 instance GValidate V1 V1 where gvalidate = undefined {-# INLINE gvalidate #-} instance GValidate U1 U1 where gvalidate U1 = Just U1 {-# INLINE gvalidate #-}

Using undefined is safe here because it can only be called with a value of V1 . Fortunately for us, V1 is uninhabited and uninitialized, so this can never happen, so we are morally right in our use of undefined .

Without further ado, now that we have this whole mechanism, we can finally write a non-generic version of validation:

 {-# LANGUAGE FlexibleContexts #-} validate :: ( Generic (f Maybe) , Generic (f Identity) , GValidate (Rep (f Maybe)) (Rep (f Identity)) ) => f Maybe -> Maybe (f Identity) validate = fmap to . gvalidate . from

Every time you can get a wide smile, when the signature for the function is longer than the actual implementation; this means that we hired a compiler to write code for us. What is important for validation here is that there is no mention of Person ' ; this function will work for any type defined as high data. Voila!

Results

This is all for today, guys. We got acquainted with the idea of high-order data, saw how this is completely equivalent to the type of data defined in a more traditional way, and also caught a glimpse of what things are possible with this approach.

It allows you to do all sorts of amazing things, such as: generate lenses for arbitrary data types without resorting to Pattern Haskel; sequence by data type; and automatically track dependencies for using record fields.

Happy application of high delivery!

Original: Higher-Kinded Data

Source: https://habr.com/ru/post/429104/

All Articles

High kind data

Type validation

Validation of data of a high kind

Generics and more general validation function

Results

More articles: