EXPR: TERM | EXPR '+' TERM | EXPR '-' TERM; TERM: NUM | TERM '*' NUM | TERM '/' NUM; NUM: DIGIT | NUM DIGIT; DIGIT: '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
flex in the first part : we list the grammar rules, and in front of each rule we write the sish code that will be executed during the rule folding.%{
#include <stdio.h>
int yylex() { return getc( stdin ); }
void yyerror( char *s) {
fprintf ( stderr , " %s \n " , s);
}
%}
%%
EVALUATE: EXPR '\n' { printf( " %d \n " , $$ ) } ;
EXPR: TERM
| EXPR '+' TERM { $$ = $1 + $3 ; }
| EXPR '-' TERM { $$ = $1 - $3 ; }
;
TERM: NUM
| TERM '*' NUM { $$ = $1 * $3 ; }
| TERM '/' NUM { $$ = $1 / $3 ; }
;
NUM: DIGIT
| NUM DIGIT { $$ = $1 * 10 + $2 ; }
;
DIGIT: '0' { $$ = 0 ; } | '1' { $$ = 1 ; } | '2' { $$ = 2 ; } | '3' { $$ = 3 ; }
| '4' { $$ = 4 ; } | '5' { $$ = 5 ; } | '6' { $$ = 6 ; } | '7' { $$ = 7 ; }
| '8' { $$ = 8 ; } | '9' { $$ = 9 ; }
;
%%

lex files, the code in % {%} tags is copied into the parser unchanged. There are two functions that need to be defined: int yylex() returns the next input character, and void yyerror(char *s) prints an error message. Classic yacc included a ready-made implementation of yyerror , which could, if necessary, be redeclared; but its bison GNU-clone requires an explicit implementation from the programmer.lex files, divides the declaration area and the grammar rules area. The rules are listed in the same way as we are used to; at the end of the rule we add the convolution code. In the convolutional code, to refer to the character values ​​on the parser stack, we use $ -tags. $$ refers to a collapsible variable (the left side of the rule), $1 to the leftmost character on the right side of the rule (the deepest in the parser stack), $2 to the second left, and so on. If the right side of the rule is N characters, then you can use values ​​from $1 to $N$$=$1EXPR would be suitable as a root, but then the printing of the calculated expression would have to be put in an EXPR convolution; which means that the value of each subexpression along the way would also be printed. For the convenience of printing we will create a new variable, EVALUATE , which is used only as a root. This at the same time makes it possible to read a line feed at the end of the expression - in addition to the convenience of printing, we get the convenience of input.liby library, which contains the standard main() bison.[tyomitch@home ~]$ bison 2.y
[tyomitch@home ~]$ cc -ly 2.tab.c
[tyomitch@home ~]$ ./a.out
22+3*4-5
=29
[tyomitch@home ~]$ ./a.out
2 + 2
syntax error
EXPR: TERM | EXPR WS '+' WS TERM | EXPR WS '-' WS TERM ;
TERM: NUM | TERM WS '*' WS NUM | TERM WS '/' WS NUM ;
NUM: DIGIT | NUM DIGIT ;
DIGIT: '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
WS: | WS ' ' ;
flex and bison ! It was easy to recognize simple expressions (numbers, spaces), and it was possible to recognize complex expressions.flex parser that successfully recognizes numbers and deletes comments and spaces, then run the input text through it; and what we succeed in, we will drive through the advanced bison parser. In fact, there is no need to store an intermediate result: both steps can be completed in one pass. flex reads the input text character by character, occasionally passing tokens to the bison, the terminal symbols of the grammar. The values ​​for tokens flex itself sets.flex must somehow recognize the terminal characters of the bison grammar. We will run bison with the -d key, then it will generate a .h file listing all terminals. To do this, you need to declare (indicating %token ) the terminals in the grammar file - and all that remains is to refer to the generated .h file in the .lex file.%{
#include <stdio.h>
void yyerror( char *s) {
fprintf ( stderr , " %s \n " , s);
}
%}
%token NUM
%%
EVALUATE: EXPR { printf( "= %d \n " , $$ ); } ;
EXPR: TERM
| EXPR '+' TERM { $$ = $1 + $3 ; }
| EXPR '-' TERM { $$ = $1 - $3 ; }
;
TERM: NUM
| TERM '*' NUM { $$ = $1 * $3 ; }
| TERM '/' NUM { $$ = $1 / $3 ; }
;
%%
yylex is no longer needed: now the input characters will not come from stdin , but from flex .'\n' after EXPR (it will be swallowed by flex ), and removed all the rules about NUM and DIGIT ..lex file:%{
#include "3.tab.h"
%}
%option yylineno
%option noyywrap
%%
[/][/] .*\n ; // comment
[0-9] + { yylval = atoi(yytext);
return NUM;
}
[ \t\r\n] ; // whitespace
. { return * yytext ; }
%%
.tab.h#define NUM 258yylval , and return the token code.noyywrap option indicates that the input text is one, and after reading EOF do not need to try to go to the next text. This option was set automatically while we were using %option main , which reads from stdin . Now main() will be bison, so we do not need to ask flex standard main() , nor write our own.[tyomitch@home ~]$ lex 3.lex
[tyomitch@home ~]$ bison -d 3.y
[tyomitch@home ~]$ cc -ly lex.yy.c 3.tab.c
[tyomitch@home ~]$ ./a.out
22+ // hello
3*4 - 5
=29
[tyomitch@home ~]$ ./a.out
22+x
syntax error
-v , and look at the file with the suffix .output . ...
state 7
4 EXPR: EXPR '-'. TERM
NUM shift, and go to state 1
TERM go to state 11
state 8
6 TERM: TERM '*'. NUM
NUM shift, and go to state 12
state 9
7 TERM: TERM '/'. NUM
NUM shift, and go to state 13
state 10
3 EXPR: EXPR '+' TERM.
6 TERM: TERM. '*' NUM
7 | TERM. '/' NUM
'*' shift, and go to state 8
'/' shift, and go to state 9
$ default reduce using rule 3 (EXPR)
...
-t key, and the global flag yydebug will appear in the parser. It must be set to 1 - for example, in main() .YYPRINT :%{
#include <stdio.h>
void yyerror( char *s) {
fprintf ( stderr , " %s \n " , s);
}
#define YYPRINT(file, type, value) fprintf(file, " %d " , value);
%}
%token NUM
%%
EVALUATE: EXPR { printf( "= %d \n " , $$ ); } ;
EXPR: TERM
| EXPR '+' TERM { $$ = $1 + $3 ; }
| EXPR '-' TERM { $$ = $1 - $3 ; }
;
TERM: NUM
| TERM '*' NUM { $$ = $1 * $3 ; }
| TERM '/' NUM { $$ = $1 / $3 ; }
;
%%
int main () { yydebug= 1 ; return yyparse(); }
main() ourselves, it is no longer necessary to connect liby when compiling:[tyomitch@home ~]$ lex 3.lex
[tyomitch@home ~]$ bison -dt 3.y
[tyomitch@home ~]$ cc lex.yy.c 3.tab.c
[tyomitch@home ~]$ ./a.out
Starting parse
Entering state 0
Reading a token: 22+3*4-5
Next token is token NUM (22)
Shifting token NUM, Entering state 1
Reducing stack by rule 5 (line 20), NUM -> TERM
Stack now 0
Entering state 4
Reading a token: Next token is token '+' (22)
Reducing stack by rule 2 (line 15), TERM -> EXPR
Stack now 0
Entering state 3
Next token is token '+' (22)
Shifting token '+', Entering state 6
Reading a token: Next token is token NUM (3)
Shifting token NUM, Entering state 1
Reducing stack by rule 5 (line 20), NUM -> TERM
Stack now 0 3 6
Entering state 10
Reading a token: Next token is token '*' (3)
Shifting token '*', Entering state 8
Reading a token: Next token is token NUM (4)
Shifting token NUM, Entering state 12
Reducing stack by rule 6 (line 21), TERM '*' NUM -> TERM
Stack now 0 3 6
Entering state 10
Reading a token: Next token is token '-' (4)
Reducing stack by rule 3 (line 16), EXPR '+' TERM -> EXPR
Stack now 0
Entering state 3
Next token is token '-' (4)
Shifting token '-', Entering state 7
Reading a token: Next token is token NUM (5)
Shifting token NUM, Entering state 1
Reducing stack by rule 5 (line 20), NUM -> TERM
Stack now 0 3 7
Entering state 11
Reading a token: Now at end of input.
Reducing stack by rule 4 (line 17), EXPR '-' TERM -> EXPR
Stack now 0
Entering state 3
Now at end of input.
Reducing stack by rule 1 (line 13), EXPR -> EVALUATE
=29
Stack now 0
Entering state 2
Now at end of input.
YYPRINT macro YYPRINT not specified, then it is necessary to guess the values ​​of the read tokens: the bison will print only empty brackets.%{
#include <stdio.h>
void yyerror( char *s) {
fprintf ( stderr , " %s \n " , s);
}
%}
%token NUM
%%
EVALUATE: EXPR { printf( "= %d \n " , $$ ) } ;
EXPR: NUM
| EXPR '+' EXPR { $$ = $1 + $3 ; }
| EXPR '-' EXPR { $$ = $1 - $3 ; }
| EXPR '*' EXPR { $$ = $1 * $3 ; }
| EXPR '/' EXPR { $$ = $1 / $3 ; }
;
%%
[tyomitch@home ~]$ bison 4.y
4.y: conflicts: 16 shift/reduce
%{
#include <stdio.h>
void yyerror( char *s) {
fprintf ( stderr , " %s \n " , s);
}
%}
%token NUM
%left '+' '-'
%left '*' '/'
%%
EVALUATE: EXPR { printf( "= %d \n " , $$ ) } ;
EXPR: NUM
| EXPR '+' EXPR { $$ = $1 + $3 ; }
| EXPR '-' EXPR { $$ = $1 - $3 ; }
| EXPR '*' EXPR { $$ = $1 * $3 ; }
| EXPR '/' EXPR { $$ = $1 / $3 ; }
;
%%
%right associative operators, there is a %right directive.if (1) if (2) foo(); else bar(); if (1) if (2) foo(); else bar();if (1) { if (2) foo(); else bar(); } if (1) { if (2) foo(); else bar(); } if (1) { if (2) foo(); else bar(); } - it is necessary that the priority of else be higher than that of ')'WORD: S1 'a' 'i' 'l' | S2 'a' 'l' 'e'; S1: 's'; S2: 's';- unambiguous, and it corresponds to only two words - sail and sale. When the parser has shifted the first letter of 's' and sees after it of 'a', it cannot make a choice, collapse
S1 or S2 : the correct convolution depends on what letter will be after 'a'; but its machine does not yet see.flex , the core of the parser is a transition table and a small loop. Two parallel stacks are used: the yyssa state yyssa and the yyvsa value yyvsa — all the same, the states and characters enter and leave the stack always in pairs..output file. The static const unsigned char yytranslate[259] array static const unsigned char yytranslate[259] matches all terminals with a class. (From 0 to 255 - ordinary characters; 258 - terminal NUM declared by us). static const unsigned char yytable [] =
{
6, 7, 5, 8, 9, 10, 11, 1, 12, 13
};
It is surprising that the table is not only one-dimensional, but even the elements in it are smaller than our states (there are 14). #define YYPACT_NINF -5
static const yysigned_char yypact [] =
{
4, -5, 2, -4, -3, -5, 4, 4, 5, 6,
-3, -3, -5, -5
};
YYPACT_NINF means that no yytable element yytable ; in other words, the action performed does not depend on the input character. static const unsigned char yydefact [] =
{
0, 6, 0, 2, 3, 1, 0, 0, 0, 0,
4, 5, 7, 8
};
Only the execution of the convolution can be independent of the input character, so the values ​​in yydefact are the numbers of the grammar rules by which to fold.yypact[n] action is stored for state n and for character class 0. The action for character class k is stored in yytable[yypact[n]+k] ; therefore, in yypact can be negative indices - this is just a “base” to which the class number will be added.yytable , there is another table: static const unsigned char yycheck [] =
{
4, 5, 0, 6, 7, 6, 7, 3, 3, 3
};
What do we see? yytable[0] refers to class 4 characters, yytable[1] to class 5 characters, and so on. ...
state 7
4 EXPR: EXPR '-'. TERM
NUM shift, and go to state 1
TERM go to state 11
state 8
6 TERM: TERM '*'. NUM
NUM shift, and go to state 12
...
yytable[yypact[7]+3]==yytable[4+3]==1 (and indeed, yycheck[yypact[7]+3]==3 )yytable[yypact[8]+3]==yytable[5+3]==12 (and indeed, yycheck[yypact[7]+3]==3 )yynewstate :
*++yyssp = yystate; //
yyn = yypact[yystate]; //
if (yyn == YYPACT_NINF)
goto yydefault; //
yychar = yylex(); //
yytoken = yytranslate[yychar]; //
yyn += yytoken; // yytable
if (yyn < 0 || YYLAST < yyn || yycheck[yyn] != yytoken)
goto yydefault; //
yyn = yytable[yyn];
if (yyn <= 0 ) {
yyn = -yyn;
goto yyreduce;
}
*++yyvsp = yylval; //
yystate = yyn; //
goto yynewstate;
yydefault :
yyn = yydefact[yystate];
if (yyn == 0 ) // :
goto yyerrlab; // ,
//
yyreduce :
yylen = yyr2[yyn]; //
yyval = yyvsp[ 1 -yylen]; // : $1
// :
// $- yyval yyvsp[]
switch (yyn) {
case 2 :
{ printf( "= %d \n " , yyval); }
break ;
case 4 :
{ yyval = yyvsp[- 2 ] + yyvsp[ 0 ]; }
break ;
case 5 :
{ yyval = yyvsp[- 2 ] - yyvsp[ 0 ]; }
break ;
case 7 :
{ yyval = yyvsp[- 2 ] * yyvsp[ 0 ]; }
break ;
case 8 :
{ yyval = yyvsp[- 2 ] / yyvsp[ 0 ]; }
break ;
}
yyvsp -= yylen; //
yyssp -= yylen;
*++yyvsp = yyval; //
yyn = yyr1[yyn]; //
//
yystate = yypgoto[yyn - YYNTOKENS] + *yyssp;
if ( 0 <= yystate && yystate <= YYLAST && yycheck[yystate] == *yyssp)
yystate = yytable[yystate];
else
yystate = yydefgoto[yyn - YYNTOKENS];
goto yynewstate;
.y file will grow in the parser.Source: https://habr.com/ru/post/99366/
All Articles