πŸ“œ ⬆️ ⬇️

About Parboiled

Part 1. Why Parboiled?


Today, in the light of the rapid growth of the popularity of functional programming languages, parser combinators, tools that facilitate the analysis of mere mortals, are increasingly being used. Libraries such as Parsec (Haskell) and Planck (OCaml) have already managed to prove themselves well in their ecosystems. Their convenience and capabilities at one time prompted the creator of the Scala language, Martin Oderski, to add their analogue - Scala Parser Combinators (now rendered in scala-modules ) to the standard library, and the knowledge and ability to use such tools - are attributed to mandatory requirements for Scala-developers level A3 .

This series of articles is devoted to the Parboiled library, a powerful alternative and possible replacement for the Scala Parser Combinators. In it, we will look in detail at working with the current version of the library - Parboiled2, and also pay attention to Parboiled1, since most of the existing code still uses it.

Cycle structure:
')


Introduction


Parboiled is a library that allows you to easily parse (parsit) markup languages ​​(such as HTML, XML or JSON), programming languages, configuration files, logs, text protocols, and any text whatever. Parboiled will come in handy if you want to develop your own domain-specific language ( DSL ): with its help, you can quickly get an abstract syntax tree and, remembering the interpreter pattern, execute the commands of your domain language.

At the moment there are several versions of this library:


I wrote this article with an emphasis on Parboiled2 (by the way, I will continue to write about him in the masculine, without the word β€œlibrary”), but sometimes I will be distracted to talk about the important differences between the first and second versions.

Main features


Brief description of Parboiled2:


In practice, this means:


New in version two


This section will be mainly useful and understandable to those who have already worked with the first version of the library. Beginners, most likely, should return to this list after reading the entire series of articles.

First of all, Parboiled2 successfully eliminates a number of childhood diseases of the first version:


Besides:


Performance comparisons


Parboiled1 is known for its sluggishness (in any case, in relation to parsers generated by ANTLR), due to the fact that all rule matching actions were performed in runtime and the compiler could not perform any significant optimizations on such a parser. In Parboiled2, performance was put at the forefront and many things were redone on macros, so the compiler got a lot of freedom in optimizing, and the user got the long-awaited performance. Below we will demonstrate what good results the developers have achieved.

Parboiled against JSON parsers written with straight hands


Parboiled is a generalized tool for creating parsers, and as you know, a specialized tool always turns out to be better than a generalized tool for solving its specialized task. In the Java world, there are a small number of JSON parsers written by hand by ancient elven masters, and Alexander Myltsev (one of the Parboiled2 developers) tested how much Parboiled is losing in performance to these artifacts. The results were quite optimistic, especially in the case of Parboiled2.

  - β”‚ ,  β”‚ ──────────────────────────────────────┼───────────┼───────────────────────────────── Parboiled1JsonParser β”‚ 85.64 β”‚ β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡ Parboiled2JsonParser β”‚ 13.17 β”‚ β–‡β–‡β–‡β–‡ Json4SNative β”‚ 8.06 β”‚ β–ˆβ–ˆβ– Argonaut β”‚ 7.01 β”‚ β–‡β–‡ Json4SJackson β”‚ 4.09 β”‚ β–‡ 

Parboiled vs regular expressions


Thanks to the use of static optimizations, Parboiled2 is able to work much faster than regular expressions (at least those that come bundled with the Java class library). Here are some confirmations from the mailing list :

  - β”‚ ,  β”‚ ──────────────────────────────────────┼───────────┼─────────────────────────────────── Parboiled2 (warmup) β”‚ 1621.21 β”‚ β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡ Parboiled2 β”‚ 409.16 β”‚ β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡ Parboiled2 w/ better types (warmup) β”‚ 488.92 β”‚ β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡ Parboiled2 w/ better types β”‚ 134.68 β”‚ β–‡β–‡β–‡ Regex (warmup) β”‚ 621.95 β”‚ β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡ Regex β”‚ 620.38 β”‚ β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡ 

Parboiled vs. Scala Parser Combinators


In the mailing list, you can find another performance test , which is in good agreement with the first (about JSON) and contains data for comparison with Scala Parser Combinators. Everything is very, very sad.

  - β”‚ ,  β”‚ ──────────────────────────────────────┼───────────┼───────────────────────────────── Parboiled1JsonParser | 73.81 | β–‡ Parboiled2JsonParser | 10.49 | β–Ž ParserCombinators | 2385.78 | β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡ 

What parboiled can't


Most articles about parser combinators start with exhausting explanations of what PEG is, what it is and why it should be feared. In order to parse configs, a thorough understanding of this is not necessary, but you should still be aware of the limitations of this type of grammar. So, Parboiled basically does not know how:


In the next part, I’ll tell you how Parboiled describes a custom grammar, and we will write a simple recognizer for the tree-like format of the configuration files.

Source: https://habr.com/ru/post/270233/


All Articles