📜 ⬆️ ⬇️

Create a parser for ini-files in C ++

In this article I will tell you how to write your ini-file parser in C ++. We take as a basis the context-free grammar built in my previous article . To build the parser, the Boost Spirit library will be used, which allows you to build your own parsers by combining ready-made primitive parsers using parser combinators .

Important: this article assumes that the reader is familiar with the basics of C ++ (including the active use of STL). If you are not very sure of yourself, then I advise you first to read a couple of articles for C ++ and STL beginners.


Grammar


First, let us recall which grammar for ini-files we built in the previous article:
inidata = spaces, {section} .
section = "[", ident, "]", stringSpaces, "\n", {entry} .
entry = ident, stringSpaces, "=", stringSpaces, value, "\n", spaces .
ident = identChar, {identChar} .
identChar = letter | digit | "_" | "." | "," | ":" | "(" | ")" | "{" | "}" | "-" | "#" | "@" | "&" | "*" | "|" .
value = {not "\n"} .
stringSpaces = {" " | "\t"} .
spaces = {" " | "\t" | "\n" | "\r"} .

We will need her description soon.
')

C ++ and Boost Spirit



Start by installing boost (you can take it on the official website or search for ready-made packages for your OS). It is not necessary to collect the boost, since the whole Spirit lives in heders. The installation process for different systems may be different, so I will not describe it here.

I will try to describe in detail the process of creating a parser in C ++. In this case, I will not particularly think about performance, since this is not the purpose of this article.

Let's start by connecting the necessary headers.
1 #include <fstream>
2 #include <functional>
3 #include <numeric>
4 #include <list>
5 #include <vector>
6 #include <string>
7
8 #include <boost/spirit.hpp>
9 #include <boost/algorithm/string.hpp>
10
11 using namespace std;
12 using namespace boost::spirit;

In addition to the header of the Spirit itself, I included a library of string algorithms from boost (I will use the trim function). The construction using namespace is not always a good practice, but here for brevity I will allow myself.

We define data types: a record is a key – value pair, a section is a key – list of records pair, all ini-file data is a list of sections.
14 typedef pair<string, string> Entry;
15 typedef list<Entry > Entries;
16 typedef pair<string, Entries> Section;
17 typedef list<Section> IniData;

In addition to the data types, we will need event handlers that will be called when the parser parses the next non-terminal.
19 struct add_section
20 {
21 add_section( IniData & data ) : data_(data) {}
22
23 void operator ()( char const * p, char const * q) const
24 {
25 string s(p,q);
26 boost::algorithm::trim(s);
27 data_.push_back( Section( s, Entries() ) );
28 }
29
30 IniData & data_;
31 };
32
33 struct add_key
34 {
35 add_key( IniData & data ) : data_(data) {}
36
37 void operator ()( char const * p, char const * q) const
38 {
39 string s(p,q);
40 boost::algorithm::trim(s);
41 data_.back().second.push_back( Entry( s, string() ) );
42 }
43
44 IniData & data_;
45 };
46
47 struct add_value
48 {
49 add_value( IniData & data ) : data_(data) {}
50
51 void operator ()( char const * p, char const * q) const
52 {
53 data_.back().second.back().second.assign(p, q);
54 }
55
56 IniData & data_;
57 };


Event handlers are functors that take a piece of a string as input (through two pointers).
The add_section functor will be called at the moment when the parser recognizes the next section. As a parameter, add_section will receive the name of this section. The add_key functor will be called at the moment when the parser recognizes the name of the new parameter. The add_value functor will be called at the moment when the parser recognizes the value of the parameter. Using these functors, sequential filling of IniData is organized: first an empty section is added (add_section), then an Entry with an empty value is added (add_key) to this section, and then this value is filled (add_value).

Now we will transfer the grammar from the Backus-Naur notation to C ++. To do this, create a special class inidata_parser.
59 struct inidata_parser : public grammar<inidata_parser>
60 {
61 inidata_parser(IniData & data) : data_(data) {}
62
63 template < typename ScannerT>
64 struct definition
65 {
66 rule<ScannerT> inidata, section, entry, ident, value, stringSpaces, spaces;
67
68 rule<ScannerT> const & start() const { return inidata; }
69
70 definition(inidata_parser const & self)
71 {
72 inidata = *section;
73
74 section = ch_p( '[' )
75 >> ident[add_section(self.data_)]
76 >> ch_p( ']' )
77 >> stringSpaces
78 >> ch_p( '\n' )
79 >> spaces
80 >> *(entry);
81
82 entry = ident[add_key(self.data_)]
83 >> stringSpaces
84 >> ch_p( '=' )
85 >> stringSpaces
86 >> value[add_value(self.data_)]
87 >> spaces;
88
89
90 ident = +(alnum_p | chset<>( "-_.,:(){}#@&*|" ) );
91
92 value = *(~ch_p( '\n' ));
93
94 stringSpaces = *blank_p;
95
96 spaces = *space_p;
97 }
98
99 };
100
101 IniData & data_;
102 };

This class encapsulates the entire grammar. We will understand in more detail. In line 59, we see that the parser is inherited from the grammar class template using crtp, which is necessary for Spirit to work properly. The parser accepts a link to the empty IniData in the constructor and saves it (61). Inside the parser you need to define a template structure definition (63-64). The definition structure has data members of type rule — these are the parsers for each of the non-terminals of our Backus-Naur grammar (66). It is necessary to define a member function start, which will return a link to the main nonterminal - inidata (68).

In the definition constructor, we describe the grammar. Grammar is rewritten in C ++ almost verbatim. inidata consists of several sections (72) - this is expressed by an asterisk (as the wedge closure, but the asterisk to the left). The section starts with a square bracket - for this, the built-in parser ch_p is used, which parses one character. Instead of a comma from the Backus-Naur notation, use the >> operator. The event handler functor is written in square brackets after the expression (75, 82, 86). The symbol "+" on the left means "at least one", and "~" means negation. alnum_p - built-in parser for letters and numbers. chset <> matches any character from a string (it is important that the minus comes first, otherwise it is perceived as an interval sign, like “az”). blank_p matches the space character in the string (space or tab), space_p matches any space character (including both line feed and carriage return).

Note that the nonterminals of ident and identChar were merged into one thanks to the “+” operator — in Backus-Naur notation, this was impossible, since there is no such designation.

With grammar everything. It remains to learn how to delete comments and search for values ​​in IniData.
To delete comments, we need a special functor.
104 struct is_comment{ bool operator ()( string const & s ) const { return s[ 0 ] == '\n' || s[ 0 ] == ';' ; } };

Now let's write the search function in IniData.
106 struct first_is
107 {
108 first_is(std::string const & s) : s_(s) {}
109
110 template < class Pair >
111 bool operator ()(Pair const & p) const { return p.first == s_; }
112
113 string const & s_;
114 };
115
116 bool find_value( IniData const & ini, string const & s, string const & p, string & res )
117 {
118 IniData::const_iterator sit = find_if(ini.begin(), ini.end(), first_is(s));
119 if (sit == ini.end())
120 return false ;
121
122 Entries::const_iterator it = find_if(sit->second.begin(), sit->second.end(), first_is(p));
123 if (it == sit->second.end())
124 return false ;
125
126 res = it->second;
127 return true ;
128 }

Instead of the functor first_is, you can use boost :: bind, but I decided not to interfere with everything in one heap. The search is simple: first in the list we look for a section by name, then in the list of section records we look for a parameter by name, and, if everything is found, then we return the value of the parameter through the link parameter.

It remains to write the main.
130 int main( int argc, char ** argv)
131 {
132 if ( argc != 4 )
133 {
134 cout << "Usage: " << argv[ 0 ] << " <file.ini> <section> <parameter>" << endl;
135 return 0 ;
136 }
137
138 ifstream in(argv[ 1 ]);
139 if ( !in )
140 {
141 cout << "Can't open file \" " << argv[ 1 ] << '\"' << endl;
142 return 1 ;
143 }
144
145 vector< string > lns;
146
147 std::string s;
148 while ( !in.eof() )
149 {
150 std::getline( in, s );
151 boost::algorithm::trim(s);
152 lns.push_back( s+= '\n' );
153 }
154 lns.erase( remove_if(lns.begin(), lns.end(), is_comment()), lns.end());
155 string text = accumulate( lns.begin(), lns.end(), string() );
156
157 IniData data;
158 inidata_parser parser(data); // Our parser
159 BOOST_SPIRIT_DEBUG_NODE(parser);
160
161 parse_info<> info = parse(text.c_str(), parser, nothing_p);
162 if (!info.hit)
163 {
164 cout << "Parse error \n " ;
165 return 1 ;
166 }
167
168 string res;
169 if (find_value(data, argv[ 2 ], argv[ 3 ], res))
170 cout << res;
171 else
172 cout << "Can't find requested parameter" ;
173 cout << endl;
174 }


Lines 132-136 - we check the program parameters: if there are not 4 of them, then we output the usage. If everything is OK with the parameters, then open the file (138-143). If everything is fine with the file, then we create an array of strings lns (145) and read the entire file into it (147-153). After that, we delete the comments from there using the is_comment funded functor (154). In conclusion, we glue all the lines into one (155).

In lines 157-159 a parser is created and initialized. Now we start the parser - the parse function is used for this, which accepts the text itself, the parser and a special parser for the passed characters (for example, we would like to skip all spaces). In our case, the parser for the passed characters will be empty - nothing_p (i.e., nothing without parsing). The result of the parse function is the parse_info <> structure. We are interested in the boolean field hit of this structure, which is true if no errors occurred. In lines 162-166, we report if an error has occurred. It remains only to find the parameter specified in the command line and display its value (168-173).

Now the code is completely written. Compile it and run it on a test sample.
$ g++ ini.cpp -o ini_cpp

$ ./ini_cpp /usr/lib/firefox-3.0.5/application.ini App ID
{ec8030f7-c20a-464f-9b0e-13a3a9e97384}

$ ./ini_cpp /usr/lib/firefox-3.0.5/application.ini App IDD
Can't find requested parameter


I hope that this article will help you write your own parser =)

An interesting note: you can compare the parser from this article with the Haskell parser from the article “Create a parser for ini-files in Haskell” .

Ps. Thanks for helping move this article to the C ++ blog.

Source: https://habr.com/ru/post/50976/


All Articles