Object.create
method (creates a new object that inherits the members of an existing object) and a lexical analyzer, which, on the input line, creates an array of lexemes objects. Moving along this array, we will build a syntax parsing tree.symbol_table
object. var symbol_table = {};
original_symbol
object is the prototype for all other characters. His methods are usually overwhelmed. The meaning of the nud
and led
methods, as well as the binding power, will be explained below in the “Priorities” section. var original_symbol = { nud: function () { this.error("Undefined."); }, led: function (left) { this.error("Missing operator."); } };
id
) and a binding strength (optional bp
parameter, the default is zero), and returns a symbol object for the given id
. If the symbol is already in symbol_table
, the function returns it. Otherwise, it creates a new character that is inherited from original_symbol
, stores it in the symbol table, and returns. The character object initially contains the id
, value, left binding power ( lbp
), and everything that came from original_symbol
. var symbol = function (id, bp) { var s = symbol_table[id]; bp = bp || 0; if (s) { if (bp >= s.lbp) { s.lbp = bp; } } else { s = Object.create(original_symbol); s.id = s.value = id; s.lbp = bp; symbol_table[id] = s; } return s; };
symbol(":"); symbol(";"); symbol(","); symbol(")"); symbol("]"); symbol("}"); symbol("else");
(end)
indicates the end of the stream of tokens. The symbol (name)
is a prototype for new names, for example, variable names. I included brackets in their ids to avoid possible matches with custom lexemes. symbol("(end)"); symbol("(name)");
tokens
primitive lexemes containing the type
field ( "name"
, "string"
, "number"
, or "operator"
), and the value
field (string or number). The variable token
always refers to the current token. var token;
advance
function creates a new token object from the following primitive token and assigns it to the variable token
. If the optional id
parameter is passed, the function checks that the token has the corresponding identifier. The prototype of the new lexeme object is a symbol (name)
in the current scope or a symbol from the symbol table. The arity
field of the new lexeme will be either "name"
, or "literal"
, or "operator"
. Subsequently, when we learn more about the role of a token in a program, this value may change to "binary"
, "unary"
or "statement"
. var advance = function (id) { var a, o, t, v; if (id && token.id !== id) { token.error("Expected '" + id + "'."); } if (token_nr >= tokens.length) { token = symbol_table["(end)"]; return; } t = tokens[token_nr]; token_nr += 1; v = t.value; a = t.type; if (a === "name") { o = scope.find(v); } else if (a === "operator") { o = symbol_table[v]; if (!o) { t.error("Unknown operator."); } } else if (a === "string" || a === "number") { a = "literal"; o = symbol_table["(literal)"]; } else { t.error("Unexpected token."); } token = Object.create(o); token.value = v; token.arity = a; return token; };
scope
. var scope;
original_scope
object is the prototype for all objects that represent the scope. It contains a define
method that allows you to define new variables. The define
method converts a name token into a token variable. It gives an error if the variable has already been defined in scope or the name is a reserved word. var itself = function () { return this; }; var original_scope = { define: function (n) { var t = this.def[n.value]; if (typeof t === "object") { n.error(t.reserved ? "Already reserved." : "Already defined."); } this.def[n.value] = n; n.reserved = false; n.nud = itself; n.led = null; n.std = null; n.lbp = 0; n.scope = scope; return n; },
find
method is used to find a definition by name. He begins the search from the current scope and goes, if necessary, up the chain, ending with a table of characters. If the definition could not be found, it returns symbol_table["(name)"]
. The method checks that the value with the given name is not equal to undefined
(which would mean referring to an undeclared name) and that this is not a function (which would indicate a conflict with the inherited method). find: function (n) { var e = this, o; while (true) { o = e.def[n]; if (o && typeof o !== 'function') { return e.def[n]; } e = e.parent; if (!e) { o = symbol_table[n]; return o && typeof o !== 'function' ? o : symbol_table["(name)"]; } } },
pop
method closes the scope by replacing it with its parent. pop: function () { scope = this.parent; },
reserve
method is used to indicate that a given name is a reserved word in the current scope. reserve: function (n) { if (n.arity !== "name" || n.reserved) { return; } var t = this.def[n.value]; if (t) { if (t.reserved) { return; } if (t.arity === "name") { n.error("Already defined."); } } this.def[n.value] = n; n.reserved = true; } };
if
) are reserved and cannot be used as variable names. The flexibility of our parser allows you to achieve more. For example, we can say that in any function any given name can be used either as a language operator or as a variable name. We will reserve words locally only after they have been used as reserved. This simplifies the life of the creator of the language, because the addition of new keywords will not break existing programs, and simplifies the life of programmers, because they do not interfere with unnecessary restrictions on the use of names.new_scope
function, which creates a new prototype instance, original_scope
. var new_scope = function () { var s = scope; scope = Object.create(original_scope); scope.def = {}; scope.parent = s; return scope; };
d
A e
B f
e
belong? In other words, we have to choose between (d
A e)
B f
d
A (e
B f)
.nud
(null denotation, null-match) and led
(left denotation, left-match). The nud
does not matter which lexemes are to the left, and the led
method is important. The nud
method nud
used by values (variables and literals) and prefix operators. The led
method is used by infix and postfix operators. A lexeme can have both nud
and led
methods nud
. For example, minus ( -
) can be either prefix (changing the sign of a number) or infix (subtraction), so both methods are defined for it.0 | operators without communication ; etc. |
ten | assignment operators: = , etc. |
20 | ?: |
thirty | || && |
40 | comparison operators: === , etc. |
50 | + - |
60 | * / |
70 | unary operators:! etc. |
80 | . [ ( |
expression
. It takes as input the right binding force, which indicates how actively the expression is associated with the tokens on the right. var expression = function (rbp) { var left; var t = token; advance(); left = t.nud(); while (rbp < token.lbp) { t = token; advance(); left = t.led(left); } return left; }
expression
function calls the nud
method of the current token token
, which processes literals, variables, and prefix operators. Then, as long as the right binding power is less than the left binding power of the next lexeme, the led
method is invoked. This method handles infix and postfix operators. The process can be recursive, since the nud
and led
methods themselves can call expression
.+
operator is an infix operator, so it has a led
method that turns the lexeme object into a tree, whose two branches ( first
and second
) are operands to the left and right of the +
sign. The led
method accepts the left operand as a parameter and finds the right operand by calling expression
. symbol("+", 50).led = function (left) { this.first = left; this.second = expression(50); this.arity = "binary"; return this; };
*
symbol is similar to +
except for the id
value and the binding strength. It is more tightly bound to operands, so its binding power is higher. symbol("*", 60).led = function (left) { this.first = left; this.second = expression(60); this.arity = "binary"; return this; };
infix
function that helps us create infix operators. The infix
function takes id
, binding strength, and optionally led
function. If the function is not specified, infix
creates the default led
function, which is suitable in most cases. var infix = function (id, bp, led) { var s = symbol(id, bp); s.led = led || function (left) { this.first = left; this.second = expression(bp); this.arity = "binary"; return this; }; return s; }
infix("+", 50); infix("-", 50); infix("*", 60); infix("/", 60);
===
is an exact comparison operator in javascript. infix("===", 40); infix("!==", 40); infix("<", 40); infix("<=", 40); infix(">", 40); infix(">=", 40);
?
and :
This is not a regular infix operator, so you have to set the led
function here. infix("?", 20, function (left) { this.first = left; this.second = expression(0); advance(":"); this.third = expression(0); this.arity = "ternary"; return this; });
.
) Is used to refer to a member of an object. To his right there must be a name, but it will be used as a literal. infix(".", 80, function (left) { this.first = left; if (token.arity !== "name") { token.error("Expected a property name."); } token.arity = "literal"; this.second = token; this.arity = "binary"; advance(); return this; });
[
used to access a member of an object or an element of an array dynamically. The expression on the right should be followed by a closing bracket ]
. infix("[", 80, function (left) { this.first = left; this.second = expression(0); this.arity = "binary"; advance("]"); return this; });
var infixr = function (id, bp, led) { var s = symbol(id, bp); s.led = led || function (left) { this.first = left; this.second = expression(bp - 1); this.arity = "binary"; return this; }; return s; }
&&
operator returns the first operand if it is false, otherwise it returns the second. Operator ||
returns the first operand if it is true, otherwise returns the second. False value is 0
, the empty string is ""
, false
or null
. Any other value (including any object) is considered true. infixr("&&", 30); infixr("||", 30);
var prefix = function (id, nud) { var s = symbol(id); s.nud = nud || function () { scope.reserve(this); this.first = expression(70); this.arity = "unary"; return this; }; return s; } prefix("-"); prefix("!"); prefix("typeof");
nud
method for the bracket (
calls advance(")")
to find the matching bracket )
. The lexeme itself (
misses the syntax tree, because nud
returns only the contents of the brackets. prefix("(", function () { var e = expression(0); advance(")"); return e; });
infixr
function. But better we will write a separate assignment
function, because we want to do something else: make sure that the expression on the left is lvalue, that is, we can assign something to it, and set the field of assignment
so that we can easily find all assignments in the future . var assignment = function (id) { return infixr(id, 10, function (left) { if (left.id !== "." && left.id !== "[" && left.arity !== "name") { left.error("Bad lvalue."); } this.first = left; this.second = expression(9); this.assignment = true; this.arity = "binary"; return this; }); }; assignment("="); assignment("+="); assignment("-=");
assignment
function returns the result of the infixr
call, and infixr
result of the symbol
call.constant
function creates language constants. The nud
method turns a name token into a literal token. var constant = function (s, v) { var x = symbol(s); x.nud = function () { scope.reserve(this); this.value = symbol_table[this.id].value; this.arity = "literal"; return this; }; x.value = v; return x; }; constant("true", true); constant("false", false); constant("null", null); constant("pi", 3.141592653589793);
(literal)
is a prototype for all string and numeric literals. The nud
method of a nud
token returns the token itself. symbol("(literal)").nud = itself;
std
(statement denotation, matching the proposal). The std
method is similar to nud
, but it is called only at the beginning of a sentence.statement
function parses one statement
. If the current token contains the std
method, the token is reserved and this method is called. Otherwise, we believe that a sentence is an expression ending in a semicolon. For reliability, we will consider expressions that are not assignments or function calls for error. var statement = function () { var n = token, v; if (n.std) { advance(); scope.reserve(n); return n.std(); } v = expression(0); if (!v.assignment && v.id !== "(") { v.error("Bad expression statement."); } advance(";"); return v; };
statements
function statements
sentences until it encounters a token (end)
or }
, which marks the end of a block. The function returns a sentence, an array of sentences, or null
if no sentences are found. var statements = function () { var a = [], s; while (true) { if (token.id === "}" || token.id === "(end)") { break; } s = statement(); if (s) { a.push(s); } } return a.length === 0 ? null : a.length === 1 ? a[0] : a; };
stmt
function is used to add sentence symbols to the symbol table. It takes the id
and the std
function as parameters. var stmt = function (s, f) { var x = symbol(s); x.std = f; return x; };
stmt("{", function () { new_scope(); var a = statements(); advance("}"); scope.pop(); return a; });
block
function parses a block. var block = function () { var t = token; advance("{"); return t.std(); };
var
clause defines one or more variables in the current block. The variable name can be followed by the equal sign =
and the initial value of the variable. stmt("var", function () { var a = [], n, t; while (true) { n = token; if (n.arity !== "name") { n.error("Expected a new variable name."); } scope.define(n); advance(); if (token.id === "=") { t = token; advance("="); t.first = n; t.second = expression(0); t.arity = "binary"; a.push(t); } if (token.id !== ",") { break; } advance(","); } advance(";"); return a.length === 0 ? null : a.length === 1 ? a[0] : a; });
while
clause defines a loop. It includes an expression in brackets and a block. stmt("while", function () { advance("("); this.first = expression(0); advance(")"); this.second = block(); this.arity = "statement"; return this; });
if
clause creates a conditional construct. If the block is followed by an else
symbol, then we also analyze the next block or the next if
. stmt("if", function () { advance("("); this.first = expression(0); advance(")"); this.second = block(); if (token.id === "else") { scope.reserve(token); advance("else"); this.third = token.id === "if" ? statement() : block(); } else { this.third = null; } this.arity = "statement"; return this; });
break
clause is used to end the loop ahead of time stmt("break", function () { advance(";"); if (token.id !== "}") { token.error("Unreachable statement."); } this.arity = "statement"; return this; });
return
clause is used to exit a function. It may contain an optional expression (the return value of the function). stmt("return", function () { if (token.id !== ";") { this.first = expression(0); } advance(";"); if (token.id !== "}") { token.error("Unreachable statement."); } this.arity = "statement"; return this; });
prefix("function", function () { var a = []; new_scope(); if (token.arity === "name") { scope.define(token); this.name = token.value; advance(); } advance("("); if (token.id !== ")") { while (true) { if (token.arity !== "name") { token.error("Expected a parameter name."); } scope.define(token); a.push(token); advance(); if (token.id !== ",") { break; } advance(","); } } this.first = a; advance(")"); advance("{"); this.second = statements(); advance("}"); this.arity = "function"; scope.pop(); return this; });
(
. When calling, you can specify a number of arguments. We will check the left operand to cut off situations where the value on the left cannot be a function. infix("(", 80, function (left) { var a = []; if (left.id === "." || left.id === "[") { this.arity = "ternary"; this.first = left.first; this.second = left.second; this.third = a; } else { this.arity = "binary"; this.first = left; this.second = a; if ((left.arity !== "unary" || left.id !== "function") && left.arity !== "name" && left.id !== "(" && left.id !== "&&" && left.id !== "||" && left.id !== "?") { left.error("Expected a variable name."); } } if (token.id !== ")") { while (true) { a.push(expression(0)); if (token.id !== ",") { break; } advance(","); } } advance(")"); return this; });
this
symbol is a special variable. When calling a method, it contains a reference to the object. symbol("this").nud = function () { scope.reserve(this); this.arity = "this"; return this; };
prefix("[", function () { var a = []; if (token.id !== "]") { while (true) { a.push(expression(0)); if (token.id !== ",") { break; } advance(","); } } advance("]"); this.first = a; this.arity = "unary"; return this; });
:
. The key is a literal or a name that is interpreted as a literal. prefix("{", function () { var a = []; if (token.id !== "}") { while (true) { var n = token; if (n.arity !== "name" && n.arity !== "literal") { token.error("Bad key."); } advance(); advance(":"); var v = expression(0); v.key = n.value; a.push(v); if (token.id !== ",") { break; } advance(","); } } advance("}"); this.first = a; this.arity = "unary"; return this; });
infix
function to help the code generator. We can also pass in additional methods for convolving constants and generating code.for
, switch
and try
), the label, more code to check for errors, error recovery, and a bunch of new operators. We can add task and type inference.Source: https://habr.com/ru/post/227241/
All Articles