On the forums you can see topics from the category "How do I see my ideal programming language." At the same time, such grammars are created that the analyzer can never convert to code. Under the cut there are several dangers that await the developer of a new clear, elegant, flexible programming language.
Let's start with the definition. Grammar - a description of the structure of the language. For example, grammar for expressions consisting of numbers and plus signs:
list -> list + digit
list -> digit
digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9Example: 9 + 5
Such a grammar is unambiguous. This means that, having received a certain expression (9 + 5 + 7) at the input, we can build only one parse tree of this expression. T e expression 9 + 5 + 7 can be interpreted only as (9 + 5) + 7, and not 9 + (5 + 7) or something else. There are several types of grammar ambiguities.
')
1. Ambiguity of priorities and associativity.
Take a simple example of ambiguous grammar:
E -> E + E | E * E | (E) | idid is just a number. Such a grammar defines expressions of the form: (9 + 5) * 3 or 3 * 4 + 7. But what happens if you apply the expression 1 + 2 * 3 to the input? As a result, we get (1 + 2) * 3 = 3 * 3 = 9. The solution to this problem is to assign priorities:
E -> E + T | T
T -> T * F | F
F -> (E) | idThe same situation with associativity. Take the expression 1 + 2 + 3 + 4.
With grammar with
list -> list + digit, we get parsing ((1 + 2) + 3) + 4, and with
list -> digit + list we get parsing 1 + (2 + (3 + 4)).
2. Ambiguity of “wandering” else
It has grammar:
stmt -> if expr then stmt else stmt
| if expr then stmt
| otherIt simply expands the if-then-else condition. This grammar is ambiguous, because such a record is possible:
if expr1 then if expr2 then smth1 else smth2.
The problem is that the analyzer will not be able to decide whether the else applies to the first if condition or the second one. Of course, this is solved by placing the brackets:
if expr1 then {if expr2 then smth1 else smth2}
but this is not always possible. This ambiguity may appear in other constructions of your language. It is accepted that else refers to the last then. Decision:
stmt -> matched_stmt
| unmathced_stmt
matched_stmt -> if expr then matched_stmt else matched_stmt
| other
unmathced_stmt -> if expr then stmt
| if expr then matched_stmt else unmathced_stmtThe meaning of the grammar is that the instruction between then and else must be “balanced” (matched), i.e. it should not end in then without the corresponding else.
3. Ambiguity of special products of particular cases.
There is an ambiguous grammar:
E -> E func1 E func2 E
E -> E func1 E
E -> E func2 E
E -> (E)
e -> cc is any number. As a result, we have 2 functions that do something with numbers, as well as the ability to place brackets, and the case with sequential evaluation of func1 and func2 is considered as special (perhaps some special rules come into force). The conflict will arise when trying to disassemble the line:
5 func1 7 func2 9
The analyzer cannot choose between
E -> E func1 E func2 E and
E -> E func1 E after reading the number 7. The output is the same as in step 1.
Entering additional variables is not the only way to solve these problems. You can use the analyzer to create an analysis table, and then manually set priorities for resolving ambiguities in it, although with a large table (Pascal had a table that had 100,000 rows), this would require a lot of time and attention. I described a very small group of problems that arise when writing a grammar of my own language, without touching on the topic of LL and LR analyzers, left / right recursion, etc. An example of getting rid of left recursion is
here .
Sources:
Alfred Aho, Ravi Seti, Jeffrey Ullman "Compilers: Principles, Technologies, Tools"