Object.create method (creates a new object that inherits the members of an existing object) and a lexical analyzer, which, on the input line, creates an array of lexemes objects. Moving along this array, we will build a syntax parsing tree.symbol_table object. var symbol_table = {}; original_symbol object is the prototype for all other characters. His methods are usually overwhelmed. The meaning of the nud and led methods, as well as the binding power, will be explained below in the “Priorities” section. var original_symbol = { nud: function () { this.error("Undefined."); }, led: function (left) { this.error("Missing operator."); } }; id ) and a binding strength (optional bp parameter, the default is zero), and returns a symbol object for the given id . If the symbol is already in symbol_table , the function returns it. Otherwise, it creates a new character that is inherited from original_symbol , stores it in the symbol table, and returns. The character object initially contains the id , value, left binding power ( lbp ), and everything that came from original_symbol . var symbol = function (id, bp) { var s = symbol_table[id]; bp = bp || 0; if (s) { if (bp >= s.lbp) { s.lbp = bp; } } else { s = Object.create(original_symbol); s.id = s.value = id; s.lbp = bp; symbol_table[id] = s; } return s; }; symbol(":"); symbol(";"); symbol(","); symbol(")"); symbol("]"); symbol("}"); symbol("else"); (end) indicates the end of the stream of tokens. The symbol (name) is a prototype for new names, for example, variable names. I included brackets in their ids to avoid possible matches with custom lexemes. symbol("(end)"); symbol("(name)"); tokens primitive lexemes containing the type field ( "name" , "string" , "number" , or "operator" ), and the value field (string or number). The variable token always refers to the current token. var token; advance function creates a new token object from the following primitive token and assigns it to the variable token . If the optional id parameter is passed, the function checks that the token has the corresponding identifier. The prototype of the new lexeme object is a symbol (name) in the current scope or a symbol from the symbol table. The arity field of the new lexeme will be either "name" , or "literal" , or "operator" . Subsequently, when we learn more about the role of a token in a program, this value may change to "binary" , "unary" or "statement" . var advance = function (id) { var a, o, t, v; if (id && token.id !== id) { token.error("Expected '" + id + "'."); } if (token_nr >= tokens.length) { token = symbol_table["(end)"]; return; } t = tokens[token_nr]; token_nr += 1; v = t.value; a = t.type; if (a === "name") { o = scope.find(v); } else if (a === "operator") { o = symbol_table[v]; if (!o) { t.error("Unknown operator."); } } else if (a === "string" || a === "number") { a = "literal"; o = symbol_table["(literal)"]; } else { t.error("Unexpected token."); } token = Object.create(o); token.value = v; token.arity = a; return token; }; scope . var scope; original_scope object is the prototype for all objects that represent the scope. It contains a define method that allows you to define new variables. The define method converts a name token into a token variable. It gives an error if the variable has already been defined in scope or the name is a reserved word. var itself = function () { return this; }; var original_scope = { define: function (n) { var t = this.def[n.value]; if (typeof t === "object") { n.error(t.reserved ? "Already reserved." : "Already defined."); } this.def[n.value] = n; n.reserved = false; n.nud = itself; n.led = null; n.std = null; n.lbp = 0; n.scope = scope; return n; }, find method is used to find a definition by name. He begins the search from the current scope and goes, if necessary, up the chain, ending with a table of characters. If the definition could not be found, it returns symbol_table["(name)"] . The method checks that the value with the given name is not equal to undefined (which would mean referring to an undeclared name) and that this is not a function (which would indicate a conflict with the inherited method). find: function (n) { var e = this, o; while (true) { o = e.def[n]; if (o && typeof o !== 'function') { return e.def[n]; } e = e.parent; if (!e) { o = symbol_table[n]; return o && typeof o !== 'function' ? o : symbol_table["(name)"]; } } }, pop method closes the scope by replacing it with its parent. pop: function () { scope = this.parent; }, reserve method is used to indicate that a given name is a reserved word in the current scope. reserve: function (n) { if (n.arity !== "name" || n.reserved) { return; } var t = this.def[n.value]; if (t) { if (t.reserved) { return; } if (t.arity === "name") { n.error("Already defined."); } } this.def[n.value] = n; n.reserved = true; } }; if ) are reserved and cannot be used as variable names. The flexibility of our parser allows you to achieve more. For example, we can say that in any function any given name can be used either as a language operator or as a variable name. We will reserve words locally only after they have been used as reserved. This simplifies the life of the creator of the language, because the addition of new keywords will not break existing programs, and simplifies the life of programmers, because they do not interfere with unnecessary restrictions on the use of names.new_scope function, which creates a new prototype instance, original_scope . var new_scope = function () { var s = scope; scope = Object.create(original_scope); scope.def = {}; scope.parent = s; return scope; }; d A e B fe belong? In other words, we have to choose between (d A e) B fd A (e B f) .nud (null denotation, null-match) and led (left denotation, left-match). The nud does not matter which lexemes are to the left, and the led method is important. The nud method nud used by values (variables and literals) and prefix operators. The led method is used by infix and postfix operators. A lexeme can have both nud and led methods nud . For example, minus ( - ) can be either prefix (changing the sign of a number) or infix (subtraction), so both methods are defined for it.| 0 | operators without communication ; etc. |
| ten | assignment operators: = , etc. |
| 20 | ?: |
| thirty | || && |
| 40 | comparison operators: === , etc. |
| 50 | + - |
| 60 | * / |
| 70 | unary operators:! etc. |
| 80 | . [ ( |
expression . It takes as input the right binding force, which indicates how actively the expression is associated with the tokens on the right. var expression = function (rbp) { var left; var t = token; advance(); left = t.nud(); while (rbp < token.lbp) { t = token; advance(); left = t.led(left); } return left; } expression function calls the nud method of the current token token , which processes literals, variables, and prefix operators. Then, as long as the right binding power is less than the left binding power of the next lexeme, the led method is invoked. This method handles infix and postfix operators. The process can be recursive, since the nud and led methods themselves can call expression .+ operator is an infix operator, so it has a led method that turns the lexeme object into a tree, whose two branches ( first and second ) are operands to the left and right of the + sign. The led method accepts the left operand as a parameter and finds the right operand by calling expression . symbol("+", 50).led = function (left) { this.first = left; this.second = expression(50); this.arity = "binary"; return this; }; * symbol is similar to + except for the id value and the binding strength. It is more tightly bound to operands, so its binding power is higher. symbol("*", 60).led = function (left) { this.first = left; this.second = expression(60); this.arity = "binary"; return this; }; infix function that helps us create infix operators. The infix function takes id , binding strength, and optionally led function. If the function is not specified, infix creates the default led function, which is suitable in most cases. var infix = function (id, bp, led) { var s = symbol(id, bp); s.led = led || function (left) { this.first = left; this.second = expression(bp); this.arity = "binary"; return this; }; return s; } infix("+", 50); infix("-", 50); infix("*", 60); infix("/", 60); === is an exact comparison operator in javascript. infix("===", 40); infix("!==", 40); infix("<", 40); infix("<=", 40); infix(">", 40); infix(">=", 40); ? and : This is not a regular infix operator, so you have to set the led function here. infix("?", 20, function (left) { this.first = left; this.second = expression(0); advance(":"); this.third = expression(0); this.arity = "ternary"; return this; }); . ) Is used to refer to a member of an object. To his right there must be a name, but it will be used as a literal. infix(".", 80, function (left) { this.first = left; if (token.arity !== "name") { token.error("Expected a property name."); } token.arity = "literal"; this.second = token; this.arity = "binary"; advance(); return this; }); [ used to access a member of an object or an element of an array dynamically. The expression on the right should be followed by a closing bracket ] . infix("[", 80, function (left) { this.first = left; this.second = expression(0); this.arity = "binary"; advance("]"); return this; }); var infixr = function (id, bp, led) { var s = symbol(id, bp); s.led = led || function (left) { this.first = left; this.second = expression(bp - 1); this.arity = "binary"; return this; }; return s; } && operator returns the first operand if it is false, otherwise it returns the second. Operator || returns the first operand if it is true, otherwise returns the second. False value is 0 , the empty string is "" , false or null . Any other value (including any object) is considered true. infixr("&&", 30); infixr("||", 30); var prefix = function (id, nud) { var s = symbol(id); s.nud = nud || function () { scope.reserve(this); this.first = expression(70); this.arity = "unary"; return this; }; return s; } prefix("-"); prefix("!"); prefix("typeof"); nud method for the bracket ( calls advance(")") to find the matching bracket ) . The lexeme itself ( misses the syntax tree, because nud returns only the contents of the brackets. prefix("(", function () { var e = expression(0); advance(")"); return e; }); infixr function. But better we will write a separate assignment function, because we want to do something else: make sure that the expression on the left is lvalue, that is, we can assign something to it, and set the field of assignment so that we can easily find all assignments in the future . var assignment = function (id) { return infixr(id, 10, function (left) { if (left.id !== "." && left.id !== "[" && left.arity !== "name") { left.error("Bad lvalue."); } this.first = left; this.second = expression(9); this.assignment = true; this.arity = "binary"; return this; }); }; assignment("="); assignment("+="); assignment("-="); assignment function returns the result of the infixr call, and infixr result of the symbol call.constant function creates language constants. The nud method turns a name token into a literal token. var constant = function (s, v) { var x = symbol(s); x.nud = function () { scope.reserve(this); this.value = symbol_table[this.id].value; this.arity = "literal"; return this; }; x.value = v; return x; }; constant("true", true); constant("false", false); constant("null", null); constant("pi", 3.141592653589793); (literal) is a prototype for all string and numeric literals. The nud method of a nud token returns the token itself. symbol("(literal)").nud = itself; std (statement denotation, matching the proposal). The std method is similar to nud , but it is called only at the beginning of a sentence.statement function parses one statement . If the current token contains the std method, the token is reserved and this method is called. Otherwise, we believe that a sentence is an expression ending in a semicolon. For reliability, we will consider expressions that are not assignments or function calls for error. var statement = function () { var n = token, v; if (n.std) { advance(); scope.reserve(n); return n.std(); } v = expression(0); if (!v.assignment && v.id !== "(") { v.error("Bad expression statement."); } advance(";"); return v; }; statements function statements sentences until it encounters a token (end) or } , which marks the end of a block. The function returns a sentence, an array of sentences, or null if no sentences are found. var statements = function () { var a = [], s; while (true) { if (token.id === "}" || token.id === "(end)") { break; } s = statement(); if (s) { a.push(s); } } return a.length === 0 ? null : a.length === 1 ? a[0] : a; }; stmt function is used to add sentence symbols to the symbol table. It takes the id and the std function as parameters. var stmt = function (s, f) { var x = symbol(s); x.std = f; return x; }; stmt("{", function () { new_scope(); var a = statements(); advance("}"); scope.pop(); return a; }); block function parses a block. var block = function () { var t = token; advance("{"); return t.std(); }; var clause defines one or more variables in the current block. The variable name can be followed by the equal sign = and the initial value of the variable. stmt("var", function () { var a = [], n, t; while (true) { n = token; if (n.arity !== "name") { n.error("Expected a new variable name."); } scope.define(n); advance(); if (token.id === "=") { t = token; advance("="); t.first = n; t.second = expression(0); t.arity = "binary"; a.push(t); } if (token.id !== ",") { break; } advance(","); } advance(";"); return a.length === 0 ? null : a.length === 1 ? a[0] : a; }); while clause defines a loop. It includes an expression in brackets and a block. stmt("while", function () { advance("("); this.first = expression(0); advance(")"); this.second = block(); this.arity = "statement"; return this; }); if clause creates a conditional construct. If the block is followed by an else symbol, then we also analyze the next block or the next if . stmt("if", function () { advance("("); this.first = expression(0); advance(")"); this.second = block(); if (token.id === "else") { scope.reserve(token); advance("else"); this.third = token.id === "if" ? statement() : block(); } else { this.third = null; } this.arity = "statement"; return this; }); break clause is used to end the loop ahead of time stmt("break", function () { advance(";"); if (token.id !== "}") { token.error("Unreachable statement."); } this.arity = "statement"; return this; }); return clause is used to exit a function. It may contain an optional expression (the return value of the function). stmt("return", function () { if (token.id !== ";") { this.first = expression(0); } advance(";"); if (token.id !== "}") { token.error("Unreachable statement."); } this.arity = "statement"; return this; }); prefix("function", function () { var a = []; new_scope(); if (token.arity === "name") { scope.define(token); this.name = token.value; advance(); } advance("("); if (token.id !== ")") { while (true) { if (token.arity !== "name") { token.error("Expected a parameter name."); } scope.define(token); a.push(token); advance(); if (token.id !== ",") { break; } advance(","); } } this.first = a; advance(")"); advance("{"); this.second = statements(); advance("}"); this.arity = "function"; scope.pop(); return this; }); ( . When calling, you can specify a number of arguments. We will check the left operand to cut off situations where the value on the left cannot be a function. infix("(", 80, function (left) { var a = []; if (left.id === "." || left.id === "[") { this.arity = "ternary"; this.first = left.first; this.second = left.second; this.third = a; } else { this.arity = "binary"; this.first = left; this.second = a; if ((left.arity !== "unary" || left.id !== "function") && left.arity !== "name" && left.id !== "(" && left.id !== "&&" && left.id !== "||" && left.id !== "?") { left.error("Expected a variable name."); } } if (token.id !== ")") { while (true) { a.push(expression(0)); if (token.id !== ",") { break; } advance(","); } } advance(")"); return this; }); this symbol is a special variable. When calling a method, it contains a reference to the object. symbol("this").nud = function () { scope.reserve(this); this.arity = "this"; return this; }; prefix("[", function () { var a = []; if (token.id !== "]") { while (true) { a.push(expression(0)); if (token.id !== ",") { break; } advance(","); } } advance("]"); this.first = a; this.arity = "unary"; return this; }); : . The key is a literal or a name that is interpreted as a literal. prefix("{", function () { var a = []; if (token.id !== "}") { while (true) { var n = token; if (n.arity !== "name" && n.arity !== "literal") { token.error("Bad key."); } advance(); advance(":"); var v = expression(0); v.key = n.value; a.push(v); if (token.id !== ",") { break; } advance(","); } } advance("}"); this.first = a; this.arity = "unary"; return this; }); infix function to help the code generator. We can also pass in additional methods for convolving constants and generating code.for, switchand try), the label, more code to check for errors, error recovery, and a bunch of new operators. We can add task and type inference.Source: https://habr.com/ru/post/227241/
All Articles