📜 ⬆️ ⬇️

We explain to the grandmother how to write your own programming language.

This is a playground where I will try to explain how to create a tiny programming language (Mu). You can watch live at open source here or download it here . Tutorial can read right now.

image

We write our programming language (in Swift)


In order to write your own programming language, it is not necessary to have a degree in Computer Science, it’s enough to understand 3 basic steps.
')

Language: Mu (ÎĽ)


Mu is a minimal language that contains a postfix operator, a binary operation and “single-digit” numbers.

Example: (s 2 4) or (s (s 4 5) 4) or (s (s 4 5) (s 3 2)) ...

Steps:



Lexer


In computer science, lexical analysis is the process of analyzing the input sequence of characters (for example, such as the source code in one of the programming languages) in order to obtain at the output a sequence of characters called "tokens" (like grouping letters in words).

The group of characters in the input sequence, identified at the output of the process as a token, is called a token. In the process of lexical analysis, recognition and selection of lexemes from the input sequence of characters is performed.
- Wikipedia

Example:

image

Because Mu so small — only one operator and numbers — you can simply iterate through the input and check each character.

 enum Token { case parensOpen case op(String) case number(Int) case parensClose } struct Lexer { static func tokenize(_ input: String) -> [Token] { return input.characters.flatMap { switch $0 { case "(": return Token.parensOpen case ")": return Token.parensClose case "s": return Token.op($0.description) default: if "0"..."9" ~= $0 { return Token.number(Int($0.description)!) } } return nil } } } let input = "(s (s 4 5) 4)" let tokens = Lexer.tokenize(input) 

Parser


A parser or parser is part of a program that converts input data (usually text) into a structured format. Parser parses text.

Grammar:


 expression: parensOpen operator primaryExpression primaryExpression parensClose primaryExpression: expression | number parensOpen: "(" parensClose: ")" operator: "s" number: [0-9] 

Mu grammar is a contextless grammar, which means that it describes all possible variations of strings in a language. The parser starts from the top (root of the generated tree) and moves to the lowest node.

Tip: The code should be a direct representation of grammar.

 func parseExpression() -> ExpressionNode { ... firstPrimaryExpression = parsePrimaryExpression() secondPrimaryExpression = parsePrimaryExpression() ... } func parseExpression() -> PrimaryExpressionNode { return parseExpression() || parseNumber() } 

image

 indirect enum PrimaryExpressionNode { case number(Int) case expression(ExpressionNode) } struct ExpressionNode { var op: String var firstExpression: PrimaryExpressionNode var secondExpression: PrimaryExpressionNode } struct Parser { var index = 0 let tokens: [Token] init(tokens: [Token]) { self.tokens = tokens } mutating func popToken() -> Token { let token = tokens[index] index += 1 return token } mutating func peekToken() -> Token { return tokens[index] } mutating func parse() throws -> ExpressionNode { return try parseExpression() } mutating func parseExpression() throws -> ExpressionNode { guard case .parensOpen = popToken() else { throw ParsingError.unexpectedToken } guard case let Token.op(_operator) = popToken() else { throw ParsingError.unexpectedToken } let firstExpression = try parsePrimaryExpression() let secondExpression = try parsePrimaryExpression() guard case .parensClose = popToken() else { throw ParsingError.unexpectedToken } return ExpressionNode(op: _operator, firstExpression: firstExpression, secondExpression: secondExpression) } mutating func parsePrimaryExpression() throws -> PrimaryExpressionNode { switch peekToken() { case .number: return try parseNumber() case .parensOpen: let expressionNode = try parseExpression() return PrimaryExpressionNode.expression(expressionNode) default: throw ParsingError.unexpectedToken } } mutating func parseNumber() throws -> PrimaryExpressionNode { guard case let Token.number(n) = popToken() else { throw ParsingError.unexpectedToken } return PrimaryExpressionNode.number(n) } } //MARK: Utils extension ExpressionNode: CustomStringConvertible { public var description: String { return "\(op) -> [\(firstExpression), \(secondExpression)]" } } extension PrimaryExpressionNode: CustomStringConvertible { public var description: String { switch self { case .number(let n): return n.description case .expression(let exp): return exp.description } } } let input = "(s 2 (s 3 5))" let tokens = Lexer.tokenize(input) var parser = Parser(tokens: tokens) var ast = try! parser.parse() 

Interpreter


In computer science, an interpreter is a program that sequentially executes instructions written in a programming language or scripting language without first compiling it into machine code. (Wikipedia)

Example:

The Mu interpreter will pass through its AST and calculate the values, applying the operator to the child nodes.

image

 enum InterpreterError: Error { case unknownOperator } struct Interpreter { static func eval(_ expression: ExpressionNode) throws -> Int { let firstEval = try eval(expression.first) let secEval = try eval(expression.second) if expression.op == "s" { return firstEval + secEval } throw InterpreterError.unknownOperator } static func eval(_ prim: PrimaryExpressionNode) throws -> Int { switch prim { case .expression(let exp): return try eval(exp) case .number(let n): return Int(n) } } } let input = "(s (s 5 2) 4)" let tokens = Lexer.tokenize(input) var parser = Parser(tokens: tokens) let ast = try! parser.parse() try! Interpreter.eval(ast) 

Conclusion


image

 var parser = Parser(tokens: tokens) let ast = try! parser.parse() 


Resources





Publishing support - Edison company, which specializes in asphalt plants automation and the development of payment systems and terminals .

Source: https://habr.com/ru/post/315068/


All Articles