⬆️ ⬇️

Create a parser for ini-files in Haskell

In this article, I will tell you how to write your Haskell ini-file parser. I will take as a basis the context-free grammar built in my previous article . To build the parser, Parsec library will be used, which allows you to build your own parsers by combining ready-made primitive parsers using parser combinators .



Important: This article assumes that the reader is familiar with the basics of Haskell. If this is not the case, then I advise you first to read a couple of articles for beginners (they can be found including on Habré).



Grammar



First, let us recall which grammar for ini-files we built in the previous article:

inidata = spaces, {section} .

section = "[", ident, "]", stringSpaces, "\n", {entry} .

entry = ident, stringSpaces, "=", stringSpaces, value, "\n", spaces .

ident = identChar, {identChar} .

identChar = letter | digit | "_" | "." | "," | ":" | "(" | ")" | "{" | "}" | "-" | "#" | "@" | "&" | "*" | "|" .

value = {not "\n"} .

stringSpaces = {" " | "\t"} .

spaces = {" " | "\t" | "\n" | "\r"} .



We will need her description soon.



Haskell and Parsec



Start by installing Parsec (you can take it on the official website or search for ready-made packages for your OS). The installation process for different systems may be different, so I will not describe it here.

')

I will try to describe in detail the process of creating a parser on Haskell. Let's start with connecting the necessary modules. In addition to the standard System (for getting parameters), Data.Char (for the isSpace function) and Data.List (for the find function), you need to connect the Parsec module - Text.ParserCombinators.Parsec.

1 module Main where

2

3 import System.Environment

4 import Data.Char

5 import Data.List

6 import Text.ParserCombinators.Parsec



We define data types: a record is a key – value pair, a section is a key – list of records pair, all ini-file data is a list of sections.

8 type Entry = (String, String)

9 type Section = (String, [Entry])

10 type IniData = [Section]



Now we will transfer the grammar from the Backus-Naur notation into Haskell. Let's start with inidata.

12 inidata = spaces >> many section >>= return



Let me explain what is written here: inidata consists of spaces (this is a primitive parser of the Parsec library), followed by (indicated by the monadic operator >>) many sections, the values ​​of which we return (>> = return).

What does it mean returning values? The task of the parser is not only to check the correspondence of the grammar and data, but also to convert the data into some kind of structural form. In our case, this is the data type IniData. The many function is a parser combinator that builds a parser for {A} for some non-terminal A parser.



Now let's translate the nonterminal section into Haskell. section is much more complicated than inidata and therefore I will write it in do-notation.

14 section = do

15 char '['

16 name <- ident

17 char ']'

18 stringSpaces

19 char ' \n '

20 spaces

21 el <- many entry

22 return (name, el)



This code is an almost literal translation of the nonterminal section from Backus-Naur notation. The char function creates a primitive parser that parses one character. It is worth paying attention to lines 16, 21 and 22. In line 16 we save the value of the ident non terminal (section name), and in line 21 we save the list of records that follow the section header. In line 22, we return the read section name and the list of entries (this corresponds to the Section type).



Go to the records.

24 entry = do

25 k <- ident

26 stringSpaces

27 char '='

28 stringSpaces

29 v <- value

30 spaces

31 return (k, v)



If you understand how we built the parser for a section, then there should not be any problems. In short: in lines 25 and 29 we save the name of the parameter and its value, and return a pair composed of them (corresponds to the type Entry).



Write a nonterminal for the identifier. We will use the fact that in Parsec there is a combinator many1, which will allow to splices the non-terminals identChar and ident into one (we could not do it in Backus-Naur notation, because there is no such designation).

32 ident = many1 (letter <|> digit <|> oneOf "_.,:(){}-#@&*|" ) >>= return . trim



The combinator many1 means that the identifier consists of at least one character. Operator <|> matches the character "|" in the Backus-Naur notation. letter and digit are primitive parsers for letters and numbers, respectively. The oneOf function for a string is equivalent to (char '_' <|> char '.' <|> .....). Note also that when returning a value, the resulting string is truncated (using the trim function).



We do the same with non-terminal for a value, but using the noneOf parser, which is inverse to oneOf.



34 value = many (noneOf " \n " ) >>= return . trim





The last nonterminal remained - stringSpaces (nonterminal spaces already exist in Parsec).

36 stringSpaces = many (char ' ' <|> char ' \t ' )



With grammar everything. It remains to identify several useful functions and, of course, the main itself.



The trim function is needed to remove extra spaces at the beginning and end of a line.

38 trim = f . f

39 where f = reverse . dropWhile isSpace



The split function splits text into lines using the delim separator, with the separator itself remaining at the end of the line.

41 split delim = foldr f [[]]

42 where

43 f x rest @ (r : rs)

44 | x == delim = [delim] : rest

45 | otherwise = (x : r) : rs



The removeComments function removes comments and empty lines: it breaks the text into lines, removes those that begin with ";" or "\ n", and then sticks them together again.

47 removeComments = foldr ( ++ ) [] . filter comment . split ' \n '

48 where comment [] = False

49 comment (x : _) = (x /= ';' ) && (x /= ' \n ' )



The findValue function searches IniData for the parameter value by the section name and the parameter name (the calculation occurs in the Maybe monad). First we find the section by name, and then among the records from the section we find the necessary parameter. If at some point we find nothing, the function will simply return Nothing.

51 findValue ini s p = do

52 el <- find ( \ x -> fst x == s) ini

53 v <- find ( \ x -> fst x == p) (snd el)

54 return $ snd $ v





Go to the last step - the function main.



56 main = do

57 args <- getArgs

58 prog <- getProgName

59 if (length args) /= 3

60 then putStrLn $ "Usage: " ++ prog ++ " <file.ini> <section> <parameter>"

61 else do

62 file <- readFile $ head args

63 [s,p] <- return $ tail args

64 lns <- return ( removeComments file )

65 case (parse inidata "some text" lns) of

66 Left err -> putStr "Parse error: " >> print err

67 Right x -> case (findValue x s p) of

68 Just x -> putStrLn x

69 Nothing -> putStrLn "Can't find requested parameter"

70 return ()



Everything is simple here as in the good old C. Lines 57-58 - we get the parameters and the name of the program. Further, if the parameters are not 3, then print the usage. If everything is ok with the parameters, then we read the file (62) and delete the comments (64).

Now you need to run the parser. For this there is a function parse (65), which needs to transfer the main non-terminal, the name of the text (used for error output) and the text itself. The parse function returns either an error description (Left, 65), or data received (Right, 66). If everything is parsed, then in the data we are looking for a record by section name and parameter name (67). The search can return either the value found (Just, 68), then we display it, or nothing (Nothing, 69) - then we display an error message.



Now the code is completely written. Compile it and run it on a test sample.

$ ghc --make ini.hs -o ini_hs

[1 of 1] Compiling Main ( ini.hs, ini.o )

Linking ini_hs ...



$ ./ini_hs /usr/lib/firefox-3.0.5/application.ini App ID

{ec8030f7-c20a-464f-9b0e-13a3a9e97384}



$ ./ini_hs /usr/lib/firefox-3.0.5/application.ini App IDD

Can't find requested parameter





I hope that this article will help you write your own parser =)



An interesting note: you can compare the parser from this article with the C ++ parser from the article “Create a parser for ini-files in C ++” .



Ps. Thank you for helping move this post to the Haskell blog.

Source: https://habr.com/ru/post/50337/



All Articles