We write our monads on Scala using the example of a CSV parser

Recently we learned a lot about monads. We have already figured out what it is and even know how to draw them, we saw reports explaining their purpose. So I decided to drop into the outgoing monad train and write on this topic, until it finally became mainstream. But I will come from a slightly different side: there will be no calculations from category theory, there will be no inserts in the very best language , and there will not even be scalaz / shapeless and parser-combinators libraries. As you know, the best way to figure out how something works is to do it yourself. Today we will write our monad.

Task

Take for example the banal task: parsing a CSV file. Suppose we need to parse the file strings into case classes, then send them to the database, serialize to json / protobuf, and so on. Forget about escaping and quotes, for even greater simplicity, we believe that the delimiter character cannot be found in the fields. I think that if someone decides to drag this solution into your project, it will not be difficult to twist this feature.

Suppose we have the following CSV file:

1997;Ford;E350;ac, abs, moon;3000.00 1996; Jeep; Grand Cherokee; MUST SELL! air, moon roof, loaded; 4799.00 1999;Chevy;Venture "Extended Edition"; ; 4900.00

We need to deserialize it into a set of objects of the following type:

 case class Car(year: Int, mark: String, model: String, comment: String, price: BigDecimal)

Obvious approach

In order to compare with something, I have to give an example from life, which the use of monads makes clearer, more pleasant, more reliable, etc.

Suppose the file line is already loaded into the content variable:

 val lines = content.split('\n') val entities = lines.map { line => line.split(';').map(_.trim) match { case Array(year, mark, model, comment, price) => Car(year.toInt, mark, model, comment, BigDecimal(price)) } }.toSeq

Cons of the approach:

Mixing the logic of converting field types and constructing the entity itself.
Boilerplate case matching: with an increase in the number of fields, the code will rapidly lose readability.
It is necessary to explicitly handle cases when the number of fields does not match the expected, when the string is too long, etc.

Pros:

Straight-forward: no additional layers of abstraction.

Monad parser

I suggest to look at the task from the other side.

Imagine that at the beginning we have one piece of raw data - in a particular case - a line from a file, although in fact it doesn’t matter to us: it could be a byte array, a list of words, an iterator, anything from which we can get data.
Suppose that we write each record in several stages, each of which is the parsing of a specific field in the record. Then for each stage we can fix the result: the value of this field (hereinafter referred to as the word ) + the remainder of the raw data (hereinafter referred to as the remainder ), which we will consider at the subsequent stages of parsing, extracting the following fields from it. Or we will not, if the field is the last.
Further, for brevity, we will call this function " handler ".
Then in the end we will only have to combine the results of these stages into the final entity.

Returning to the code, the handler of each stage is to have an ad like:

 def parse[T, Src]: Src => (T, Src)

Now a little about the monads themselves.

In a nutshell, a monad can be described as a container containing a value + some context.
Syntactically, in the case of Scala, this means that the monad must have the flatMap method, generally declared as:

 def flatMap[T](f: T => M[T]): M[T]

If f is a value stored in a container, then what is the context? Here's what: although f has only one argument, but since we can call another flatMap from within one flatMap, from the internal flatMap we will have access to all the values declared inside the external, that is, including all the previous words.

Please note that it is not necessary to implement the map method from the monad, but we will still define it, it will be useful for us to create modified parsers from those already defined.

You also need to define the operation of wrapping the net value into a monad. This is not a class method, but it can be a constructor call, or the apply method of the companion object, there is no strict requirement for this, and I suggest defining the apply method for convenience.

We implement the monad containing the function parse, such as we defined above and see how we can combine different parsers with it.

So we need to write a class that encapsulates the parsing of a field of a particular type, which:

Implements the flatMap method
Implements the map method
You also need to define the apply operation on the companion object.
You need to define an interface method that will be called by the final client code and will not contain unnecessary details in the declaration.

 class Parser[T, Src](private val p: Src => (T, Src)) { def flatMap[M](f: T => Parser[M, Src]): Parser[M, Src] = Parser { src => val (word, rest) = p(src) f(word).p(rest) } def map[M](f: T => M): Parser[M, Src] = Parser { src => val (word, rest) = p(src) (f(word), rest) } def parse(src: Src): T = p(src)._1 }

So what happens in the flatMap method?
We apply the current parser's handler to the input value, then using the function - method argument we add it to the context visible to all subsequent parsers along the chain.

With the map method, everything is much clearer, we simply apply its argument - the function f to the current word, and leave the rest unchanged.

And the companion object containing the point operation, which is also the apply method, which is also an object call with parentheses:

 object Parser { def apply[T, Src](f: Src => (T, Src)) = new Parser[T, Src](f) }

Application

So what? What advantages does this approach give us, apart from the undoubted increase of your authority among colleagues unfamiliar with monads? Now we will see.

Using the abstraction suggested above, we finally write our innovative, functional, type-safe CSV parser.

We write field type parsers

To begin with, we implement a parser of one field of type String.

 def StringField = Parser[String, String] { str => val idx = str.indexOf(separator) if (idx > -1) (str.substring(0, idx), str.substring(idx + 1)) else (str, "") }

Nothing complicated, right?

Now let's see how to define an Int type parser based on StringField.
Even easier!

 def IntField = StringField.map(_.toInt)

Similarly for all the rest:

 def BigDecimalField = StringField.map(BigDecimal(_)) def IntField = StringField.map(_.toInt) def BooleanField = StringField.map(_.toBoolean) //

Putting it all together

So far, we have considered only the parsers of individual fields, but how do we collect these fields into a single entity? This is where the context comes in. Thanks to him, we can use the values obtained in the overlying parsers in the underlying parsers.

So, the construction of the final entity parser will look like this:

 val parser = for { year <- IntField mark <- StringField model <- StringField comment <- StringField price <- BigDecimalField } yield Car(year, mark, model, comment, price)

In my opinion it looks very cool.
If you suddenly do not feel completely confident with syntactic sugar for comprehension, then this would be approximately how it would look like a chain of flatMaps:

 IntField.flatMap { year => StringField.flatMap { mark => StringField.flatMap { model => StringField.flatMap { comment => BigDecimalField.map { price => Car(year, mark, model, comment, price) } } } } }

It looks, of course, a little worse, but it becomes obvious what contexts we are talking about, these are scopes bounded by curly braces.

We got the parser parser, now all we need is to feed the source file line by line to its parse method and get the result. For example:

 val result = str.split('\n').map(parser.parse)

Result:

 Array(Car(1997,Ford,E350,ac, abs, moon,3000.00), Car(1996,Jeep,Grand Cherokee,MUST SELL! air, moon roof, loaded,4799.00), Car(1999,Chevy,Venture "Extended Edition",,4900.00))

pros

The final parser is described beautifully and concisely, from its declaration it is easy to understand the types and sequence of fields in the file, it is easy to change and test.
You are a cool expert who knows a lot about OP, capable of monads and generally the most fashionable ~~in the area~~ in openspace.

Minuses

The presence of a generalized entity with not the most obvious logic, especially for those who are not very good at these your monads, or who have recently transferred from Java.

Summary

Monads and other categories in the Rock are not something that cannot be lived without. Moreover, they are practically not imposed by the language itself. In essence, monadnosti in Scala is a small ad-hoc contract, fulfilling which you get the opportunity to use your classes in for-comprehension. And that is all.

Nevertheless, the flexibility of the language and the ability to quite easily implement rather clever constructs on it - this is an absolute plus of the language, which unleashes hands for experiments.

As to whether it is worth using such constructions in the production code: I do not know, this is the choice of each individual command. Probably, I would first try to allocate them into separate libraries, cover them with tests and test them in every possible way (although we certainly know that with real functionaries everything works without tests). And for the logic that is needed here and now, I would rather use a more straight-forward implementation.

Source: https://habr.com/ru/post/326002/

All Articles