ZuriHac: practicing in functional programming

In June of this year, in the small Swiss town of Rapperswil, for the tenth time, an event called ZuriHac was held . This time more than five hundred Haskell lovers, from novices to the founding fathers of the language, gathered on it. Although the organizers call this event a hackathon, yet it is not a conference or a hackathon in the classical sense. Its format is different from traditional programmers. We learned about ZuriHac by happy coincidence, took part in it, and now we consider it our duty to tell about an unusual find!

About Us

This article was prepared by two third-year students of the program “Applied Mathematics and Computer Science” at the HSE - St. Petersburg: Vasily Alferov and Elizaveta Vasilenko. The passion for functional programming for both of us began with a series of lectures by D. N. Moskvin on the 2nd year of university. Currently, Vasily participates in the Google Summer of Code program, in which he is engaged in the implementation of algebraic graphs in Haskell, under the leadership of the Alga project team . Elizabeth applied the acquired skills of functional programming in term paper on the implementation of the anti-unification algorithm, followed by application in type theory.
')

Event format

The target audience is the owners of open source projects, programmers who want to participate in their development, functional programming researchers, and people just keen on Haskell. This year, at the venue - HSR Hochschule für Technik Rapperswil University - developers from over fifty Haskell open source projects from around the world gathered to talk about their products and get fresh people interested in their development.

Photos from Twitter ZuriHac

The scheme is very simple: you need to write in advance a few suggestions about your project and send them to the organizers, who will publish information about your project on the page of the event. In addition, on the first day, the authors of the projects have thirty seconds to very briefly tell from the stage what they are doing and what needs to be done. Then, interested people search for authors and ask in detail about the tasks.

We do not have our own open projects so far, but we really want to contribute to the existing ones, so we registered as regular members. For three days we worked with two groups of developers. It turns out that joint study of the code and live communication makes the interaction between the project authors and contributors very productive - at ZuriHac we were able to sort out areas that were new to us and were able to help two completely different teams by closing the task in each of the projects.

In addition to valuable practice, several lectures and master classes were also given at ZuriHac. We especially remember two lectures. At the first one, Andrei Mokhov from the University of Newcastle spoke about selective applicative functors - a class of types that should be intermediate between applicative functors and monads. In another lecture, one of the founders of Haskell, Simon Peyton Jones, talked about how type inference in the GHC compiler works.

Lecture by Simon Peyton Jones. Photos from Twitter ZuriHac

The master classes held during the hackathon were divided into three categories depending on the level of training of the participants. The tasks proposed by participants who joined the project development also had notes with a level of difficulty. The small but friendly community of functional programmers welcomes newbies to their ranks. For understanding the lectures of Andrei Mokhov and Simon Peyton Jones, however, the course of functional programming passed at the university was very useful.

Both for regular participants and project authors registration for the event is free. We submitted applications for participation in early June, after which we rather quickly transferred from the waiting list to the list of confirmed participants.

And now we will talk about the projects in the development of which we participated.

Pandoc

Pandoc is a universal text document converter, in fact, from any format to any. For example, from docx to pdf, or from Markdown to MediaWiki. Its author, John MacFarlane, is a professor of philosophy at the University of California at Berkeley. In general, Pandoc is quite famous, and some of our friends were surprised when they learned that Pandoc was written in Haskell.

List of document formats supported by Pandoc. The site also has a whole graph, but this image does not fit into the article.

Of course, Pandoc does not implement direct conversion for each pair of formats. To support such an extensive set of transformations, a standard architectural solution is used: first, the entire document is translated into a special internal intermediate representation, and then a document in a different format is generated from this internal representation. The internal representation of the developers called "AST", which stands for Abstract Syntax Tree, or abstract syntax tree . You can look at the intermediate presentation very simply: to do this, you just need to set the output format as “native”

$ cat example.html <h1>Hello, World!</h1> $ pandoc -f html -t native example.html [Header 1 ("hello-world",[],[]) [Str "Hello,",Space,Str "World!"]]

Readers who have at least worked a bit with Haskell can already by this short example assume that Pandoc was written on Haskell: the output of this command is a representation of Pandoc’s internal structures as a string, created in the same way as Haskell, for example, in the standard library.

So, here you can see that the internal representation is a recursive structure with a list in each internal node. For example, at the top level is a list of one element - the first level header with the attributes “hello-world”, [], []. Inside this header is a hidden list from the string “Hello,”, a space, and the string “World!”.

As you can see, the internal representation is not much different from HTML. It is a tree, where each internal node communicates some information about the formatting of its descendants, and the leaves contain the actual contents of the document.

If you go down to a concrete implementation level, the data type for the entire document is defined like this:

 data Pandoc = Pandoc Meta [Block]

Here, Block is the internal vertices about which it’s mentioned above, and Meta is meta-information about the document, such as the title, creation date, authors — for different formats it’s different, and Pandoc tries to save this information whenever possible when converting from format to format.

Almost all block constructors — for example, Header or Para (paragraph) —are taken as attributes and a list of vertices of a lower level as arguments - Inline, as a rule. For example, Space or Str are Inline type constructors, and the HTML tag is also converted into its own special Inline. We do not see the point of giving a complete definition of these types, however, we note that it can be viewed here .

Interestingly, the Pandoc type is a monoid. This means that there is some empty document, and that documents can be folded together. This is convenient to use when writing Readers - you can break a document into parts by arbitrary logic, parse each one separately, and then put everything together into one document. At the same time, meta-information will be collected from all parts of the document at once.

When converting, say, from LaTeX to HTML, first a special module, called LaTeXReader, converts the input document to AST, then another module, called HTMLWriter, converts AST to HTML. Thanks to this architecture, it is not necessary to write a quadratic number of conversions - it is enough to write a Reader and Writer for each new format, and all possible pairs of conversions will be automatically supported.

It is clear that such an architecture has its drawbacks, predicted long ago by experts in the field of software architecture. The most significant is the cost of making changes to the syntax tree. If the change is serious enough, you will have to change the code in all Readers and Writers. For example, one of the tasks facing the developers of Pandoc is the support of complex table formats. Now Pandoc is able only in the most simple tables, with heading, columns and value in each cell. Say the colspan attribute in HTML will simply be ignored. One of the reasons for this behavior is the lack of a unified scheme for presenting tables in all or at least many formats - accordingly, it is not clear in what form the tables should be stored in an internal representation. But even after choosing a specific presentation, absolutely all Readers and Writers that support working with tables will need to be changed.

The Haskell language was chosen not only from the authors' great love of functional programming. Haskell is known for its extensive word processing capabilities. One example is the parsec library — a library that actively uses functional programming concepts — monoids, monads, equivalent and alternative functors — for writing arbitrary parsers. All the power of Parsec can be seen in the HaskellWiki example , where the full parser of a simple imperative programming language is parsed. Of course, Parsec is actively used in Pandoc.

Briefly, monads are used for sequential parsing, when one comes first and then the other. For example, in this example:

 whileParser :: Parser Stmt whileParser = whiteSpace >> statement

First you need to count the space, and then statement - which also has the type Parser Stmt.

Alternative functors are used for rollback in case parsing fails. For example,

 statement :: Parser Stmt statement = parens statement <|> sequenceOfStmt

It means that you should either try to read the statement in brackets, or try to read several statements successively.

Applicative functors are mainly used as short paths for monads. For example, let the tok function read some kind of token (this is a real function from LaTeXReader). Look at this combination

 const <$> tok <*> tok

She reads two tokens in a row and returns the first one.

For all these classes in Haskell, there are beautiful character operators, which makes Reader programming similar to ASCII art. Just admire this wonderful code.

Our tasks were related to LaTeXReader. Vasily's task was to support the \ mbox and \ hbox commands, useful for writing packages in LaTeX. Elizabeth’s responsibility was to support the \ epigraph command, which allows the epigraphs to be written in LaTeX documents.

Hatrace

In UNIX-like operating systems, the ptrace system call is often implemented. It is useful for debugging and simulating program environments, allowing you to monitor system calls that a program makes. For example, the very useful strace utility uses ptrace inside it.

Hatrace is a library that provides an interface for ptrace in Haskell. The fact is that ptrace itself is heavily heaped up and it is quite difficult to use it directly, especially from functional languages.

Hatrace runs as strace at startup and takes similar arguments. Its difference from strace is that it is also a library that provides a simpler interface than just ptrace.

Using hatrace, they have already caught one unpleasant bug in the Haskell compiler GHC - being killed at the wrong moment, it generates incorrect object files and does not recompile them upon restart. Scripting system calls allowed to reproduce the error in one run, when random killings reproduced the error in about two hours.

We added system call interfaces to the library — Elizaveta added brk, and Vasily added mmap. According to the results of our work, it is possible to more simply and accurately use the arguments of these system calls when using the library.

Source: https://habr.com/ru/post/460537/

All Articles

ZuriHac: practicing in functional programming

About Us

Event format

Pandoc

Hatrace

More articles: