📜 ⬆️ ⬇️

Creating a programming language using LLVM. Part 6: Language Extension: User Defined Operators

Table of contents:
Part 1: Introduction and Lexical Analysis
Part 2: Implementing Parser and AST
Part 3: LLVM IR Code Generation
Part 4: Adding JIT and Optimizer Support
Part 5: Language Expansion: Control Flow
Part 6: Language Extension: User Defined Operators
Part 7: Language Expansion: Variable Variables
Part 8: Compile to Object Code
Part 9: Add Debugging Information
Part 10: Conclusion and other goodies LLVM



6.1. Introduction


Welcome to chapter 6 of the guide “Creating a programming language using LLVM”. At the moment we have a fully functional language, although minimal, but nonetheless useful. But there is still one problem. There are few useful operators in our language (no, for example, division, logical negation, and even comparisons, with the exception of the “less” comparison operator).

Our language, Kaleidoscope, is simple and beautiful, but in this chapter we deviate from these principles. The retreat is that we get a simple, but ugly, in some way, language, but at the same time powerful. One of the great things about creating your own language is that you decide what is good and what is bad. In this tutorial, we decided that this is normal and will demonstrate some interesting parsing techniques.
')
At the end of this chapter, we will look at an example of a Kaleidoscope application that displays the Mandelbrot set. This will be an example of what you can do with the Kaleidoscope and its feature set.

6.2. User-Defined Operators: Basic Thought


“Operator Overloading”, which we introduce into the Kaleidoscope, is more generalized than in languages ​​like C ++. In C ++, you can only override existing operators, you cannot programmatically change their syntax, enter new operators, change priority levels, and so on. In this chapter, we will add all these features to the Kaleidoscope, which will allow the user to determine for himself the set of supported operators.

The purpose for which we consider the user-defined operators in this guide is to show the power and flexibility of using a “self-written” parser. The parser that we did uses recursive descent for most of the grammar, and parsing operators with priorities for expressions (see part 2 ). Using the parsing of operators with priorities, it is very easy to allow the programmer to introduce new operators into the grammar: the grammar is dynamically expanded with JIT.

Two specific features that we add are unary operators (the Kaleidoscope does not currently have unary operators) and binary operators. For example:

#   . def unary!(v) if v then 0 else 1; #  >    ,   <. def binary> 10 (LHS RHS) RHS < LHS; #   " " def binary| 5 (LHS RHS) if LHS then 1 else if RHS then 1 else 0; #  =     ,  . def binary= 9 (LHS RHS) !(LHS < RHS | LHS > RHS); 

Many languages ​​tend to implement the functions of the standard library in the language itself. In the language of Kaleidoscope, we can implement a significant part of the language as a library!

Let us divide the implementation of these possibilities into two parts: the implementation of support for custom binary operators and unary operators.

6.3. User-defined binary operators


Adding support for user-defined binary operators in our framework is very simple. First, add support for the keywords "unary" and "binary":

 enum Token { ... // operators tok_binary = -11, tok_unary = -12 }; ... static int gettok() { ... if (IdentifierStr == "for") return tok_for; if (IdentifierStr == "in") return tok_in; if (IdentifierStr == "binary") return tok_binary; if (IdentifierStr == "unary") return tok_unary; return tok_identifier; 

This code adds support for the keywords "unary" and "binary" to the lexical analyzer, just as we did in the previous parts. One good thing about our AST implementation is that we represent binary operators completely generalized, using their ASCII code as an opcode. For our additional operators, we will use the same representation, and we do not need additional support from the AST or parser.

On the other hand, we can present the definitions of these new operators as a function definition. In our grammar, the “name” in the definition of a function is parsed as the production of a function prototype and falls into an AST node of the PrototypeAST type. To present our new user-defined operators as prototypes, we need to extend the PrototypeAST node in this way:

 /// PrototypeAST -    ""  ///       class PrototypeAST { std::string Name; std::vector<std::string> Args; bool IsOperator; unsigned Precedence; //    public: PrototypeAST(const std::string &name, std::vector<std::string> Args, bool IsOperator = false, unsigned Prec = 0) : Name(name), Args(std::move(Args)), IsOperator(IsOperator), Precedence(Prec) {} Function *codegen(); const std::string &getName() const { return Name; } bool isUnaryOp() const { return IsOperator && Args.size() == 1; } bool isBinaryOp() const { return IsOperator && Args.size() == 2; } char getOperatorName() const { assert(isUnaryOp() || isBinaryOp()); return Name[Name.size() - 1]; } unsigned getBinaryPrecedence() const { return Precedence; } }; 

The bottom line is that, in addition to the well-known name of the prototype, we also keep information about whether it was an operator, and, if so, what level of priority it has. The priority level is used only for binary operators (as we will see later, it does not apply to unary operators). Now we have a way to present a prototype user operator, and we need to be able to parse it:

 ///  /// ::= id '(' id* ')' /// ::= binary LETTER number? (id, id) static std::unique_ptr<PrototypeAST> ParsePrototype() { std::string FnName; unsigned Kind = 0; // 0 = , 1 =  , 2 =   unsigned BinaryPrecedence = 30; switch (CurTok) { default: return LogErrorP("Expected function name in prototype"); case tok_identifier: FnName = IdentifierStr; Kind = 0; getNextToken(); break; case tok_binary: getNextToken(); if (!isascii(CurTok)) return LogErrorP("Expected binary operator"); FnName = "binary"; FnName += (char)CurTok; Kind = 2; getNextToken(); //  ,    if (CurTok == tok_number) { if (NumVal < 1 || NumVal > 100) return LogErrorP("Invalid precedence: must be 1..100"); BinaryPrecedence = (unsigned)NumVal; getNextToken(); } break; } if (CurTok != '(') return LogErrorP("Expected '(' in prototype"); std::vector<std::string> ArgNames; while (getNextToken() == tok_identifier) ArgNames.push_back(IdentifierStr); if (CurTok != ')') return LogErrorP("Expected ')' in prototype"); // . getNextToken(); //  ')'. // ,     if (Kind && ArgNames.size() != Kind) return LogErrorP("Invalid number of operands for operator"); return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames), Kind != 0, BinaryPrecedence); } 

A fairly straightforward parsing code, we have already seen a lot of similar code earlier. One interesting part of this code is a pair of lines, in which FnName is assigned to a binary operator. For the user operator "@", the prototype will be given the name "binary @". Here the advantages of the fact that the names of characters in the LLVM symbol table can have any characters, including zero, are manifested.
The next interesting thing to add is code generation support for binary operators. In our existing structure, it is simply adding a “default” to our existing node of a binary operator:

 Value *BinaryExprAST::codegen() { Value *L = LHS->codegen(); Value *R = RHS->codegen(); if (!L || !R) return nullptr; switch (Op) { case '+': return Builder.CreateFAdd(L, R, "addtmp"); case '-': return Builder.CreateFSub(L, R, "subtmp"); case '*': return Builder.CreateFMul(L, R, "multmp"); case '<': L = Builder.CreateFCmpULT(L, R, "cmptmp"); //  bool 0/1  double 0.0  1.0 return Builder.CreateUIToFP(L, Type::getDoubleTy(TheContext), "booltmp"); default: break; } //      ,   .  //    . Function *F = getFunction(std::string("binary") + Op); assert(F && "binary operator not found!"); Value *Ops[2] = { L, R }; return Builder.CreateCall(F, Ops, "binop"); } 

As we can see above, the new code is very simple. It simply finds the appropriate operator in the symbol table and generates a function call for it. Since custom operators are implemented simply as normal functions, everything falls into place.

The last piece of code we need is just a piece of high-level magic:

 Function *FunctionAST::codegen() { //     FunctionProtos map,   //    . auto &P = *Proto; FunctionProtos[Proto->getName()] = std::move(Proto); Function *TheFunction = getFunction(P.getName()); if (!TheFunction) return nullptr; //  ,  . if (P.isBinaryOp()) BinopPrecedence[P.getOperatorName()] = P.getBinaryPrecedence(); //       . BasicBlock *BB = BasicBlock::Create(TheContext, "entry", TheFunction); ... 

Before generating the function code, if it is a custom operator, register it in the priority table. This allows the logic of parsing a binary operator, which we already have, to handle the operator. Since we are developing a fully generalized parser with operator priorities, we need to extend the grammar.
We now have user-defined binary operators. They are built mainly on the basis of the framework that we have made for other operators. Adding unary operators will be a little more difficult, because we don't have a framework for this yet. Let's see how to do this.

6.4. Unary operators, user-defined


Since we do not yet support unary operators in the Kaleidoscope language, we need to add their support. Above, we added simple support for the “unary” keyword to the lexical analyzer. Now we need an AST node.

 /// UnaryExprAST -      class UnaryExprAST : public ExprAST { char Opcode; std::unique_ptr<ExprAST> Operand; public: UnaryExprAST(char Opcode, std::unique_ptr<ExprAST> Operand) : Opcode(Opcode), Operand(std::move(Operand)) {} Value *codegen() override; }; 

The AST node is very simple and obvious. It is similar to the AST node of a binary operator, except that it has one descendant. Now we need to add the logic of parsing. Parsing unary operators is very simple; add a function that does this:

 /// unary /// ::= primary /// ::= '!' unary static std::unique_ptr<ExprAST> ParseUnary() { //      ,     if (!isascii(CurTok) || CurTok == '(' || CurTok == ',') return ParsePrimary(); //    ,   int Opc = CurTok; getNextToken(); if (auto Operand = ParseUnary()) return llvm::make_unique<UnaryExprAST>(Opc, std::move(Operand)); return nullptr; } 

The grammar we added here is very simple. If we see a unary operator when the primary operator is parsed, we eat this operator as a prefix and parse the remaining piece as another unary operator. This allows us to handle multiple unary operators (such as "!! x"). Note that unary operators cannot have ambiguous parsing, like binary operators, and do not need priority information.
The problem with this function is that we may need to call ParseUnary from anywhere. To do this, we replace the previous call to ParsePrimary with ParseUnary:

 /// binoprhs /// ::= ('+' unary)* static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec, std::unique_ptr<ExprAST> LHS) { ... //       auto RHS = ParseUnary(); if (!RHS) return nullptr; ... } ///  /// ::= unary binoprhs /// static std::unique_ptr<ExprAST> ParseExpression() { auto LHS = ParseUnary(); if (!LHS) return nullptr; return ParseBinOpRHS(0, std::move(LHS)); } 

With these two simple changes, you can now parse unary operators and build AST for them. Then you need to add parser support for prototypes to parse the prototypes of the unary operator. Expand the binary operator code above, as follows:

 /// prototype /// ::= id '(' id* ')' /// ::= binary LETTER number? (id, id) /// ::= unary LETTER (id) static std::unique_ptr<PrototypeAST> ParsePrototype() { std::string FnName; unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. unsigned BinaryPrecedence = 30; switch (CurTok) { default: return LogErrorP("Expected function name in prototype"); case tok_identifier: FnName = IdentifierStr; Kind = 0; getNextToken(); break; case tok_unary: getNextToken(); if (!isascii(CurTok)) return LogErrorP("Expected unary operator"); FnName = "unary"; FnName += (char)CurTok; Kind = 1; getNextToken(); break; case tok_binary: ... 

As with binary operators, we call unary operators a name that includes the operator symbol. This will help us in code generation. Now you need to add support for generating code for the unary operator. It looks like this:

 Value *UnaryExprAST::codegen() { Value *OperandV = Operand->codegen(); if (!OperandV) return nullptr; Function *F = getFunction(std::string("unary") + Opcode); if (!F) return LogErrorV("Unknown unary operator"); return Builder.CreateCall(F, OperandV, "unop"); } 

The code is similar to the implementation of a binary operator, but simpler, mainly because it does not need to support priorities.

6.5. Kick tires


It is hard to believe, but by introducing a few simple extensions discussed in the last parts, we have grown a real language. Many interesting things can be done with it, including I / O, math, and many other things. For example, you can enter a sequence operator (the printd function prints the value of the argument and translates the string):

 ready> extern printd(x); Read extern: declare double @printd(double) ready> def binary : 1 (xy) 0; #  ,    ... ready> printd(123) : printd(456) : printd(789); 123.000000 456.000000 789.000000 Evaluated to 0.000000 

We can also define a variety of “simple” operators, for example:

 #    def unary!(v) if v then 0 else 1; #  . def unary-(v) 0-v; #  >    ,   <. def binary> 10 (LHS RHS) RHS < LHS; #   "" def binary| 5 (LHS RHS) if LHS then 1 else if RHS then 1 else 0; #   "" def binary& 6 (LHS RHS) if !LHS then 0 else !!RHS; #  =    ,   def binary = 9 (LHS RHS) !(LHS < RHS | LHS > RHS); #  ':'  :   ,   #    RHS. def binary : 1 (xy) y; 

By taking if / then / else support, we can define interesting I / O functions. For example, the following code displays a character whose "density" reflects the value passed to it: the smaller the value, the darker the character.

 ready> extern putchard(char); ... ready> def printdensity(d) if d > 8 then putchard(32) # ' ' else if d > 4 then putchard(46) # '.' else if d > 2 then putchard(43) # '+' else putchard(42); # '*' ... ready> printdensity(1): printdensity(2): printdensity(3): printdensity(4): printdensity(5): printdensity(9): putchard(10); **++. Evaluated to 0.000000 

Based on these simple operations, we can begin to identify more interesting things. For example, here is a small function that determines the number of iterations for which the function diverges on the complex plane:

 # ,       #   z = z^2 + c    def mandelconverger(real imag iters creal cimag) if iters > 255 | (real*real + imag*imag > 4) then iters else mandelconverger(real*real - imag*imag + creal, 2*real*imag + cimag, iters+1, creal, cimag); #    def mandelconverge(real imag) mandelconverger(real, imag, 0, real, imag); 

This function (z = z2 + c) is a beautiful little creation, the basis of the calculation of the Mandelbrot set . Our mandelconverge function returns the number of iterations that a complex orbit needs to go to infinity. The function “saturates” with a value of 255. By itself, this function is not very useful, but if you plot its values ​​on a two-dimensional plane, you will see a set of Mandelbrot. Since we are limited here using the putchard function, our wonderful graphics output is also limited, but we can use the “density” output function given above.

 #          #    def mandelhelp(xmin xmax xstep ymin ymax ystep) for y = ymin, y < ymax, ystep in ( (for x = xmin, x < xmax, xstep in printdensity(mandelconverge(x,y))) : putchard(10) ) # mandel -        #       def mandel(realstart imagstart realmag imagmag) mandelhelp(realstart, realstart+realmag*78, realmag, imagstart, imagstart+imagmag*40, imagmag); 

Now we can try to build Mandelbrot set:

Expand
ready> mandel(-2.3, -1.3, 0.05, 0.07);
*******************************+++++++++++*************************************
*************************+++++++++++++++++++++++*******************************
**********************+++++++++++++++++++++++++++++****************************
*******************+++++++++++++++++++++.. ...++++++++*************************
*****************++++++++++++++++++++++.... ...+++++++++***********************
***************+++++++++++++++++++++++..... ...+++++++++*********************
**************+++++++++++++++++++++++.... ....+++++++++********************
*************++++++++++++++++++++++...... .....++++++++*******************
************+++++++++++++++++++++....... .......+++++++******************
***********+++++++++++++++++++.... ... .+++++++*****************
**********+++++++++++++++++....... .+++++++****************
*********++++++++++++++........... ...+++++++***************
********++++++++++++............ ...++++++++**************
********++++++++++... .......... .++++++++**************
*******+++++++++..... .+++++++++*************
*******++++++++...... ..+++++++++*************
*******++++++....... ..+++++++++*************
*******+++++...... ..+++++++++*************
*******.... .... ...+++++++++*************
*******.... . ...+++++++++*************
*******+++++...... ...+++++++++*************
*******++++++....... ..+++++++++*************
*******++++++++...... .+++++++++*************
*******+++++++++..... ..+++++++++*************
********++++++++++... .......... .++++++++**************
********++++++++++++............ ...++++++++**************
*********++++++++++++++.......... ...+++++++***************
**********++++++++++++++++........ .+++++++****************
**********++++++++++++++++++++.... ... ..+++++++****************
***********++++++++++++++++++++++....... .......++++++++*****************
************+++++++++++++++++++++++...... ......++++++++******************
**************+++++++++++++++++++++++.... ....++++++++********************
***************+++++++++++++++++++++++..... ...+++++++++*********************
*****************++++++++++++++++++++++.... ...++++++++***********************
*******************+++++++++++++++++++++......++++++++*************************
*********************++++++++++++++++++++++.++++++++***************************
*************************+++++++++++++++++++++++*******************************
******************************+++++++++++++************************************
*******************************************************************************
*******************************************************************************
*******************************************************************************
Evaluated to 0.000000
ready> mandel(-2, -1, 0.02, 0.04);
**************************+++++++++++++++++++++++++++++++++++++++++++++++++++++
***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++
*********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.
*******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++...
*****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.....
***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........
**************++++++++++++++++++++++++++++++++++++++++++++++++++++++...........
************+++++++++++++++++++++++++++++++++++++++++++++++++++++..............
***********++++++++++++++++++++++++++++++++++++++++++++++++++........ .
**********++++++++++++++++++++++++++++++++++++++++++++++.............
********+++++++++++++++++++++++++++++++++++++++++++..................
*******+++++++++++++++++++++++++++++++++++++++.......................
******+++++++++++++++++++++++++++++++++++...........................
*****++++++++++++++++++++++++++++++++............................
*****++++++++++++++++++++++++++++...............................
****++++++++++++++++++++++++++...... .........................
***++++++++++++++++++++++++......... ...... ...........
***++++++++++++++++++++++............
**+++++++++++++++++++++..............
**+++++++++++++++++++................
*++++++++++++++++++.................
*++++++++++++++++............ ...
*++++++++++++++..............
*+++....++++................
*.......... ...........
*
*.......... ...........
*+++....++++................
*++++++++++++++..............
*++++++++++++++++............ ...
*++++++++++++++++++.................
**+++++++++++++++++++................
**+++++++++++++++++++++..............
***++++++++++++++++++++++............
***++++++++++++++++++++++++......... ...... ...........
****++++++++++++++++++++++++++...... .........................
*****++++++++++++++++++++++++++++...............................
*****++++++++++++++++++++++++++++++++............................
******+++++++++++++++++++++++++++++++++++...........................
*******+++++++++++++++++++++++++++++++++++++++.......................
********+++++++++++++++++++++++++++++++++++++++++++..................
Evaluated to 0.000000
ready> mandel(-0.9, -1.4, 0.02, 0.03);
*******************************************************************************
*******************************************************************************
*******************************************************************************
**********+++++++++++++++++++++************************************************
*+++++++++++++++++++++++++++++++++++++++***************************************
+++++++++++++++++++++++++++++++++++++++++++++**********************************
++++++++++++++++++++++++++++++++++++++++++++++++++*****************************
++++++++++++++++++++++++++++++++++++++++++++++++++++++*************************
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++**********************
+++++++++++++++++++++++++++++++++.........++++++++++++++++++*******************
+++++++++++++++++++++++++++++++.... ......+++++++++++++++++++****************
+++++++++++++++++++++++++++++....... ........+++++++++++++++++++**************
++++++++++++++++++++++++++++........ ........++++++++++++++++++++************
+++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++**********
++++++++++++++++++++++++++........... ....++++++++++++++++++++++********
++++++++++++++++++++++++............. .......++++++++++++++++++++++******
+++++++++++++++++++++++............. ........+++++++++++++++++++++++****
++++++++++++++++++++++........... ..........++++++++++++++++++++++***
++++++++++++++++++++........... .........++++++++++++++++++++++*
++++++++++++++++++............ ...........++++++++++++++++++++
++++++++++++++++............... .............++++++++++++++++++
++++++++++++++................. ...............++++++++++++++++
++++++++++++.................. .................++++++++++++++
+++++++++.................. .................+++++++++++++
++++++........ . ......... ..++++++++++++
++............ ...... ....++++++++++
.............. ...++++++++++
.............. ....+++++++++
.............. .....++++++++
............. ......++++++++
........... .......++++++++
......... ........+++++++
......... ........+++++++
......... ....+++++++
........ ...+++++++
....... ...+++++++
....+++++++
.....+++++++
....+++++++
....+++++++
....+++++++
Evaluated to 0.000000
ready> ^D


At this stage, we can consider the Kaleidoscope a real, powerful language. It is not self-similar :), but can be used to draw things that are!

This concludes the “adding user-defined operators” part of our guide. We have successfully completed our language by adding the ability to extend the language using the library, and have shown how this can be used to build a simple but interesting user application in the Kaleidoscope language. At this stage, on the Kaleidoscope, you can write various applications that can call functions with side effects, but cannot define and change variables themselves.

Strictly speaking, changeable variables are an important feature of some languages, and it’s not quite obvious how to add their support without putting the SSA design phase into your frontend. In the next section, we describe how to add mutable variables to the frontend without building an SSA.

6.6. Full code listing


Below is a complete listing of the code for our working example, extended by the support of user-defined operators.
Build example:

 #  clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy #  ./toy 

On some platforms, you will need to link with -rdynamic or -Wl, –export-dynamic. Then the characters defined in the executable file will be exported to the dynamic linker and available in runtime. This is not necessary if you compile the support code into a shared library, although this can cause problems under Windows.

Code:

Chapter Code
 #include "llvm/ADT/APFloat.h" #include "llvm/ADT/STLExtras.h" #include "llvm/IR/BasicBlock.h" #include "llvm/IR/Constants.h" #include "llvm/IR/DerivedTypes.h" #include "llvm/IR/Function.h" #include "llvm/IR/Instructions.h" #include "llvm/IR/IRBuilder.h" #include "llvm/IR/LLVMContext.h" #include "llvm/IR/LegacyPassManager.h" #include "llvm/IR/Module.h" #include "llvm/IR/Type.h" #include "llvm/IR/Verifier.h" #include "llvm/Support/TargetSelect.h" #include "llvm/Target/TargetMachine.h" #include "llvm/Transforms/Scalar.h" #include "llvm/Transforms/Scalar/GVN.h" #include "../include/KaleidoscopeJIT.h" #include <algorithm> #include <cassert> #include <cctype> #include <cstdint> #include <cstdio> #include <cstdlib> #include <map> #include <memory> #include <string> #include <vector> using namespace llvm; using namespace llvm::orc; //===----------------------------------------------------------------------===// //   //===----------------------------------------------------------------------===// //     [0-255]    ,    //  enum Token { tok_eof = -1, //  tok_def = -2, tok_extern = -3, //   tok_identifier = -4, tok_number = -5, //  tok_if = -6, tok_then = -7, tok_else = -8, tok_for = -9, tok_in = -10, //  tok_binary = -11, tok_unary = -12 }; static std::string IdentifierStr; //   tok_identifier static double NumVal; //   tok_number /// gettok -       static int gettok() { static int LastChar = ' '; //   while (isspace(LastChar)) LastChar = getchar(); if (isalpha(LastChar)) { // : [a-zA-Z][a-zA-Z0-9]* IdentifierStr = LastChar; while (isalnum((LastChar = getchar()))) IdentifierStr += LastChar; if (IdentifierStr == "def") return tok_def; if (IdentifierStr == "extern") return tok_extern; if (IdentifierStr == "if") return tok_if; if (IdentifierStr == "then") return tok_then; if (IdentifierStr == "else") return tok_else; if (IdentifierStr == "for") return tok_for; if (IdentifierStr == "in") return tok_in; if (IdentifierStr == "binary") return tok_binary; if (IdentifierStr == "unary") return tok_unary; return tok_identifier; } if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ std::string NumStr; do { NumStr += LastChar; LastChar = getchar(); } while (isdigit(LastChar) || LastChar == '.'); NumVal = strtod(NumStr.c_str(), nullptr); return tok_number; } if (LastChar == '#') { //     do LastChar = getchar(); while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); if (LastChar != EOF) return gettok(); } //   .   EOF. if (LastChar == EOF) return tok_eof; // ,     ascii-. int ThisChar = LastChar; LastChar = getchar(); return ThisChar; } //===----------------------------------------------------------------------===// //    ( ) //===----------------------------------------------------------------------===// namespace { /// ExprAST -    . class ExprAST { public: virtual ~ExprAST() = default; virtual Value *codegen() = 0; }; /// NumberExprAST -      "1.0". class NumberExprAST : public ExprAST { double Val; public: NumberExprAST(double Val) : Val(Val) {} Value *codegen() override; }; /// VariableExprAST -   , , "a". class VariableExprAST : public ExprAST { std::string Name; public: VariableExprAST(const std::string &Name) : Name(Name) {} Value *codegen() override; }; /// UnaryExprAST -      class UnaryExprAST : public ExprAST { char Opcode; std::unique_ptr<ExprAST> Operand; public: UnaryExprAST(char Opcode, std::unique_ptr<ExprAST> Operand) : Opcode(Opcode), Operand(std::move(Operand)) {} Value *codegen() override; }; /// BinaryExprAST -      class BinaryExprAST : public ExprAST { char Op; std::unique_ptr<ExprAST> LHS, RHS; public: BinaryExprAST(char Op, std::unique_ptr<ExprAST> LHS, std::unique_ptr<ExprAST> RHS) : Op(Op), LHS(std::move(LHS)), RHS(std::move(RHS)) {} Value *codegen() override; }; /// CallExprAST -      class CallExprAST : public ExprAST { std::string Callee; std::vector<std::unique_ptr<ExprAST>> Args; public: CallExprAST(const std::string &Callee, std::vector<std::unique_ptr<ExprAST>> Args) : Callee(Callee), Args(std::move(Args)) {} Value *codegen() override; }; /// IfExprAST -    if/then/else. class IfExprAST : public ExprAST { std::unique_ptr<ExprAST> Cond, Then, Else; public: IfExprAST(std::unique_ptr<ExprAST> Cond, std::unique_ptr<ExprAST> Then, std::unique_ptr<ExprAST> Else) : Cond(std::move(Cond)), Then(std::move(Then)), Else(std::move(Else)) {} Value *codegen() override; }; /// ForExprAST -    for/in. class ForExprAST : public ExprAST { std::string VarName; std::unique_ptr<ExprAST> Start, End, Step, Body; public: ForExprAST(const std::string &VarName, std::unique_ptr<ExprAST> Start, std::unique_ptr<ExprAST> End, std::unique_ptr<ExprAST> Step, std::unique_ptr<ExprAST> Body) : VarName(VarName), Start(std::move(Start)), End(std::move(End)), Step(std::move(Step)), Body(std::move(Body)) {} Value *codegen() override; }; /// PrototypeAST -    "" , ///    ,    (, ,  /// ,  ),   . class PrototypeAST { std::string Name; std::vector<std::string> Args; bool IsOperator; unsigned Precedence; // Precedence if a binary op. public: PrototypeAST(const std::string &Name, std::vector<std::string> Args, bool IsOperator = false, unsigned Prec = 0) : Name(Name), Args(std::move(Args)), IsOperator(IsOperator), Precedence(Prec) {} Function *codegen(); const std::string &getName() const { return Name; } bool isUnaryOp() const { return IsOperator && Args.size() == 1; } bool isBinaryOp() const { return IsOperator && Args.size() == 2; } char getOperatorName() const { assert(isUnaryOp() || isBinaryOp()); return Name[Name.size() - 1]; } unsigned getBinaryPrecedence() const { return Precedence; } }; /// FunctionAST -      class FunctionAST { std::unique_ptr<PrototypeAST> Proto; std::unique_ptr<ExprAST> Body; public: FunctionAST(std::unique_ptr<PrototypeAST> Proto, std::unique_ptr<ExprAST> Body) : Proto(std::move(Proto)), Body(std::move(Body)) {} Function *codegen(); }; } //     //===----------------------------------------------------------------------===// //  //===----------------------------------------------------------------------===// /// CurTok/getNextToken -    . CurTok -  /// ,    . getNextToken     ///     CurTok  . static int CurTok; static int getNextToken() { return CurTok = gettok(); } /// BinopPrecedence -     , ///   static std::map<char, int> BinopPrecedence; /// GetTokPrecedence -       . static int GetTokPrecedence() { if (!isascii(CurTok)) return -1; // ,     int TokPrec = BinopPrecedence[CurTok]; if (TokPrec <= 0) return -1; return TokPrec; } /// Error* -     . std::unique_ptr<ExprAST> LogError(const char *Str) { fprintf(stderr, "Error: %s\n", Str); return nullptr; } std::unique_ptr<PrototypeAST> LogErrorP(const char *Str) { LogError(Str); return nullptr; } static std::unique_ptr<ExprAST> ParseExpression(); /// numberexpr ::= number static std::unique_ptr<ExprAST> ParseNumberExpr() { auto Result = llvm::make_unique<NumberExprAST>(NumVal); getNextToken(); //   return std::move(Result); } /// parenexpr ::= '(' expression ')' static std::unique_ptr<ExprAST> ParseParenExpr() { getNextToken(); //  (. auto V = ParseExpression(); if (!V) return nullptr; if (CurTok != ')') return LogError("expected ')'"); getNextToken(); //  ). return V; } /// identifierexpr /// ::= identifier /// ::= identifier '(' expression* ')' static std::unique_ptr<ExprAST> ParseIdentifierExpr() { std::string IdName = IdentifierStr; getNextToken(); //  . if (CurTok != '(') //   return llvm::make_unique<VariableExprAST>(IdName); // Call. getNextToken(); //  ( std::vector<std::unique_ptr<ExprAST>> Args; if (CurTok != ')') { while (true) { if (auto Arg = ParseExpression()) Args.push_back(std::move(Arg)); else return nullptr; if (CurTok == ')') break; if (CurTok != ',') return LogError("Expected ')' or ',' in argument list"); getNextToken(); } } //  ')'. getNextToken(); return llvm::make_unique<CallExprAST>(IdName, std::move(Args)); } /// ifexpr ::= 'if' expression 'then' expression 'else' expression static std::unique_ptr<ExprAST> ParseIfExpr() { getNextToken(); // eat the if. //  auto Cond = ParseExpression(); if (!Cond) return nullptr; if (CurTok != tok_then) return LogError("expected then"); getNextToken(); //  then auto Then = ParseExpression(); if (!Then) return nullptr; if (CurTok != tok_else) return LogError("expected else"); getNextToken(); auto Else = ParseExpression(); if (!Else) return nullptr; return llvm::make_unique<IfExprAST>(std::move(Cond), std::move(Then), std::move(Else)); } /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression static std::unique_ptr<ExprAST> ParseForExpr() { getNextToken(); //  for. if (CurTok != tok_identifier) return LogError("expected identifier after for"); std::string IdName = IdentifierStr; getNextToken(); //   if (CurTok != '=') return LogError("expected '=' after for"); getNextToken(); //  '='. auto Start = ParseExpression(); if (!Start) return nullptr; if (CurTok != ',') return LogError("expected ',' after for start value"); getNextToken(); auto End = ParseExpression(); if (!End) return nullptr; //    std::unique_ptr<ExprAST> Step; if (CurTok == ',') { getNextToken(); Step = ParseExpression(); if (!Step) return nullptr; } if (CurTok != tok_in) return LogError("expected 'in' after for"); getNextToken(); // eat 'in'. auto Body = ParseExpression(); if (!Body) return nullptr; return llvm::make_unique<ForExprAST>(IdName, std::move(Start), std::move(End), std::move(Step), std::move(Body)); } /// primary /// ::= identifierexpr /// ::= numberexpr /// ::= parenexpr /// ::= ifexpr /// ::= forexpr static std::unique_ptr<ExprAST> ParsePrimary() { switch (CurTok) { default: return LogError("unknown token when expecting an expression"); case tok_identifier: return ParseIdentifierExpr(); case tok_number: return ParseNumberExpr(); case '(': return ParseParenExpr(); case tok_if: return ParseIfExpr(); case tok_for: return ParseForExpr(); } } /// unary /// ::= primary /// ::= '!' unary static std::unique_ptr<ExprAST> ParseUnary() { //      ,     . if (!isascii(CurTok) || CurTok == '(' || CurTok == ',') return ParsePrimary(); //    ,   int Opc = CurTok; getNextToken(); if (auto Operand = ParseUnary()) return llvm::make_unique<UnaryExprAST>(Opc, std::move(Operand)); return nullptr; } /// binoprhs /// ::= ('+' unary)* static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec, std::unique_ptr<ExprAST> LHS) { //    ,    while (true) { int TokPrec = GetTokPrecedence(); //   ,    //  ,   if (TokPrec < ExprPrec) return LHS; //   ,     int BinOp = CurTok; getNextToken(); // eat binop //       auto RHS = ParseUnary(); if (!RHS) return nullptr; //  BinOp    RHS,    RHS,  //    RHS   LHS. int NextPrec = GetTokPrecedence(); if (TokPrec < NextPrec) { RHS = ParseBinOpRHS(TokPrec + 1, std::move(RHS)); if (!RHS) return nullptr; } //  LHS/RHS. LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS), std::move(RHS)); } } /// expression /// ::= unary binoprhs /// static std::unique_ptr<ExprAST> ParseExpression() { auto LHS = ParseUnary(); if (!LHS) return nullptr; return ParseBinOpRHS(0, std::move(LHS)); } /// prototype /// ::= id '(' id* ')' /// ::= binary LETTER number? (id, id) /// ::= unary LETTER (id) static std::unique_ptr<PrototypeAST> ParsePrototype() { std::string FnName; unsigned Kind = 0; // 0 = , 1 = , 2 = . unsigned BinaryPrecedence = 30; switch (CurTok) { default: return LogErrorP("Expected function name in prototype"); case tok_identifier: FnName = IdentifierStr; Kind = 0; getNextToken(); break; case tok_unary: getNextToken(); if (!isascii(CurTok)) return LogErrorP("Expected unary operator"); FnName = "unary"; FnName += (char)CurTok; Kind = 1; getNextToken(); break; case tok_binary: getNextToken(); if (!isascii(CurTok)) return LogErrorP("Expected binary operator"); FnName = "binary"; FnName += (char)CurTok; Kind = 2; getNextToken(); //  ,    if (CurTok == tok_number) { if (NumVal < 1 || NumVal > 100) return LogErrorP("Invalid precedence: must be 1..100"); BinaryPrecedence = (unsigned)NumVal; getNextToken(); } break; } if (CurTok != '(') return LogErrorP("Expected '(' in prototype"); std::vector<std::string> ArgNames; while (getNextToken() == tok_identifier) ArgNames.push_back(IdentifierStr); if (CurTok != ')') return LogErrorP("Expected ')' in prototype"); // . getNextToken(); // eat ')'. // ,     if (Kind && ArgNames.size() != Kind) return LogErrorP("Invalid number of operands for operator"); return llvm::make_unique<PrototypeAST>(FnName, ArgNames, Kind != 0, BinaryPrecedence); } /// definition ::= 'def' prototype expression static std::unique_ptr<FunctionAST> ParseDefinition() { getNextToken(); // eat def. auto Proto = ParsePrototype(); if (!Proto) return nullptr; if (auto E = ParseExpression()) return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E)); return nullptr; } /// toplevelexpr ::= expression static std::unique_ptr<FunctionAST> ParseTopLevelExpr() { if (auto E = ParseExpression()) { //    auto Proto = llvm::make_unique<PrototypeAST>("__anon_expr", std::vector<std::string>()); return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E)); } return nullptr; } /// external ::= 'extern' prototype static std::unique_ptr<PrototypeAST> ParseExtern() { getNextToken(); // eat extern. return ParsePrototype(); } //===----------------------------------------------------------------------===// //   //===----------------------------------------------------------------------===// static LLVMContext TheContext; static IRBuilder<> Builder(TheContext); static std::unique_ptr<Module> TheModule; static std::map<std::string, Value *> NamedValues; static std::unique_ptr<legacy::FunctionPassManager> TheFPM; static std::unique_ptr<KaleidoscopeJIT> TheJIT; static std::map<std::string, std::unique_ptr<PrototypeAST>> FunctionProtos; Value *LogErrorV(const char *Str) { LogError(Str); return nullptr; } Function *getFunction(std::string Name) { // , ,       . if (auto *F = TheModule->getFunction(Name)) return F; // I ,         // . auto FI = FunctionProtos.find(Name); if (FI != FunctionProtos.end()) return FI->second->codegen(); //    ,  null. return nullptr; } Value *NumberExprAST::codegen() { return ConstantFP::get(TheContext, APFloat(Val)); } Value *VariableExprAST::codegen() { // ,       Value *V = NamedValues[Name]; if (!V) return LogErrorV("Unknown variable name"); return V; } Value *UnaryExprAST::codegen() { Value *OperandV = Operand->codegen(); if (!OperandV) return nullptr; Function *F = getFunction(std::string("unary") + Opcode); if (!F) return LogErrorV("Unknown unary operator"); return Builder.CreateCall(F, OperandV, "unop"); } Value *BinaryExprAST::codegen() { Value *L = LHS->codegen(); Value *R = RHS->codegen(); if (!L || !R) return nullptr; switch (Op) { case '+': return Builder.CreateFAdd(L, R, "addtmp"); case '-': return Builder.CreateFSub(L, R, "subtmp"); case '*': return Builder.CreateFMul(L, R, "multmp"); case '<': L = Builder.CreateFCmpULT(L, R, "cmptmp"); //  bool 0/1  double 0.0 or 1.0 return Builder.CreateUIToFP(L, Type::getDoubleTy(TheContext), "booltmp"); default: break; } //      ,     .  //    . Function *F = getFunction(std::string("binary") + Op); assert(F && "binary operator not found!"); Value *Ops[] = {L, R}; return Builder.CreateCall(F, Ops, "binop"); } Value *CallExprAST::codegen() { //       Function *CalleeF = getFunction(Callee); if (!CalleeF) return LogErrorV("Unknown function referenced"); // ,    . if (CalleeF->arg_size() != Args.size()) return LogErrorV("Incorrect # arguments passed"); std::vector<Value *> ArgsV; for (unsigned i = 0, e = Args.size(); i != e; ++i) { ArgsV.push_back(Args[i]->codegen()); if (!ArgsV.back()) return nullptr; } return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); } Value *IfExprAST::codegen() { Value *CondV = Cond->codegen(); if (!CondV) return nullptr; //          0.0. CondV = Builder.CreateFCmpONE( CondV, ConstantFP::get(TheContext, APFloat(0.0)), "ifcond"); Function *TheFunction = Builder.GetInsertBlock()->getParent(); //    then  else.   'then'  //   BasicBlock *ThenBB = BasicBlock::Create(TheContext, "then", TheFunction); BasicBlock *ElseBB = BasicBlock::Create(TheContext, "else"); BasicBlock *MergeBB = BasicBlock::Create(TheContext, "ifcont"); Builder.CreateCondBr(CondV, ThenBB, ElseBB); //  . Builder.SetInsertPoint(ThenBB); Value *ThenV = Then->codegen(); if (!ThenV) return nullptr; Builder.CreateBr(MergeBB); //    'Then'    ,  ThenBB  PHI. ThenBB = Builder.GetInsertBlock(); //   "else" TheFunction->getBasicBlockList().push_back(ElseBB); Builder.SetInsertPoint(ElseBB); Value *ElseV = Else->codegen(); if (!ElseV) return nullptr; Builder.CreateBr(MergeBB); //    'Else'    ,  ElseBB  PHI. ElseBB = Builder.GetInsertBlock(); //    TheFunction->getBasicBlockList().push_back(MergeBB); Builder.SetInsertPoint(MergeBB); PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(TheContext), 2, "iftmp"); PN->addIncoming(ThenV, ThenBB); PN->addIncoming(ElseV, ElseBB); return PN; } //  for-loop : // ... // start = startexpr // goto loop // loop: // variable = phi [start, loopheader], [nextvariable, loopend] // ... // bodyexpr // ... // loopend: // step = stepexpr // nextvariable = variable + step // endcond = endexpr // br endcond, loop, endloop // outloop: Value *ForExprAST::codegen() { //    ,  . Value *StartVal = Start->codegen(); if (!StartVal) return nullptr; //        ,     //  Function *TheFunction = Builder.GetInsertBlock()->getParent(); BasicBlock *PreheaderBB = Builder.GetInsertBlock(); BasicBlock *LoopBB = BasicBlock::Create(TheContext, "loop", TheFunction); //          LoopBB. Builder.CreateBr(LoopBB); //    LoopBB. Builder.SetInsertPoint(LoopBB); //   PHI   . PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(TheContext), 2, VarName); Variable->addIncoming(StartVal, PreheaderBB); //    ,   PHI.   //   ,    ,   . Value *OldVal = NamedValues[VarName]; NamedValues[VarName] = Variable; //   . ,    ,   //  BB. ,    ,   ,   //  . if (!Body->codegen()) return nullptr; //    Value *StepVal = nullptr; if (Step) { StepVal = Step->codegen(); if (!StepVal) return nullptr; } else { //   ,  1.0. StepVal = ConstantFP::get(TheContext, APFloat(1.0)); } Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar"); //    Value *EndCond = End->codegen(); if (!EndCond) return nullptr; //          0.0. EndCond = Builder.CreateFCmpONE( EndCond, ConstantFP::get(TheContext, APFloat(0.0)), "loopcond"); //   " "    BasicBlock *LoopEndBB = Builder.GetInsertBlock(); BasicBlock *AfterBB = BasicBlock::Create(TheContext, "afterloop", TheFunction); //      LoopEndBB. Builder.CreateCondBr(EndCond, LoopBB, AfterBB); //       AfterBB. Builder.SetInsertPoint(AfterBB); //      PHI. Variable->addIncoming(NextVar, LoopEndBB); //    . if (OldVal) NamedValues[VarName] = OldVal; else NamedValues.erase(VarName); //    0.0. return Constant::getNullValue(Type::getDoubleTy(TheContext)); } Function *PrototypeAST::codegen() { //   : double(double,double) etc. std::vector<Type *> Doubles(Args.size(), Type::getDoubleTy(TheContext)); FunctionType *FT = FunctionType::get(Type::getDoubleTy(TheContext), Doubles, false); Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule.get()); //    unsigned Idx = 0; for (auto &Arg : F->args()) Arg.setName(Args[Idx++]); return F; } Function *FunctionAST::codegen() { //      FunctionProtos,   //    . auto &P = *Proto; FunctionProtos[Proto->getName()] = std::move(Proto); Function *TheFunction = getFunction(P.getName()); if (!TheFunction) return nullptr; //   ,   if (P.isBinaryOp()) BinopPrecedence[P.getOperatorName()] = P.getBinaryPrecedence(); //        BasicBlock *BB = BasicBlock::Create(TheContext, "entry", TheFunction); Builder.SetInsertPoint(BB); //      NamedValues. NamedValues.clear(); for (auto &Arg : TheFunction->args()) NamedValues[Arg.getName()] = &Arg; if (Value *RetVal = Body->codegen()) { //   Builder.CreateRet(RetVal); //    verifyFunction(*TheFunction); //     TheFPM->run(*TheFunction); return TheFunction; } //    ,   TheFunction->eraseFromParent(); if (P.isBinaryOp()) BinopPrecedence.erase(P.getOperatorName()); return nullptr; } //===----------------------------------------------------------------------===// //     JIT //===----------------------------------------------------------------------===// static void InitializeModuleAndPassManager() { //    TheModule = llvm::make_unique<Module>("my cool jit", TheContext); TheModule->setDataLayout(TheJIT->getTargetMachine().createDataLayout()); //       TheFPM = llvm::make_unique<legacy::FunctionPassManager>(TheModule.get()); //   "peephole"-. TheFPM->add(createInstructionCombiningPass()); //   TheFPM->add(createReassociatePass()); //    TheFPM->add(createGVNPass()); //    (    ..). TheFPM->add(createCFGSimplificationPass()); TheFPM->doInitialization(); } static void HandleDefinition() { if (auto FnAST = ParseDefinition()) { if (auto *FnIR = FnAST->codegen()) { fprintf(stderr, "Read function definition:"); FnIR->print(errs()); fprintf(stderr, "\n"); TheJIT->addModule(std::move(TheModule)); InitializeModuleAndPassManager(); } } else { //       getNextToken(); } } static void HandleExtern() { if (auto ProtoAST = ParseExtern()) { if (auto *FnIR = ProtoAST->codegen()) { fprintf(stderr, "Read extern: "); FnIR->print(errs()); fprintf(stderr, "\n"); FunctionProtos[ProtoAST->getName()] = std::move(ProtoAST); } } else { //       getNextToken(); } } static void HandleTopLevelExpression() { //       if (auto FnAST = ParseTopLevelExpr()) { if (FnAST->codegen()) { // JIT  ,    //    auto H = TheJIT->addModule(std::move(TheModule)); InitializeModuleAndPassManager(); //  JIT   __anon_expr auto ExprSymbol = TheJIT->findSymbol("__anon_expr"); assert(ExprSymbol && "Function not found"); //         (  // ,  double)      . double (*FP)() = (double (*)())(intptr_t)cantFail(ExprSymbol.getAddress()); fprintf(stderr, "Evaluated to %f\n", FP()); //     JIT. TheJIT->removeModule(H); } } else { //       getNextToken(); } } /// top ::= definition | external | expression | ';' static void MainLoop() { while (true) { fprintf(stderr, "ready> "); switch (CurTok) { case tok_eof: return; case ';': //       getNextToken(); break; case tok_def: HandleDefinition(); break; case tok_extern: HandleExtern(); break; default: HandleTopLevelExpression(); break; } } } //===----------------------------------------------------------------------===// // "" ,        //===----------------------------------------------------------------------===// #ifdef LLVM_ON_WIN32 #define DLLEXPORT __declspec(dllexport) #else #define DLLEXPORT #endif /// putchard - putchar,  double,  0. extern "C" DLLEXPORT double putchard(double X) { fputc((char)X, stderr); return 0; } /// printd - printf,  double   "%f\n",  0. extern "C" DLLEXPORT double printd(double X) { fprintf(stderr, "%f\n", X); return 0; } //===----------------------------------------------------------------------===// //  main //===----------------------------------------------------------------------===// int main() { InitializeNativeTarget(); InitializeNativeTargetAsmPrinter(); InitializeNativeTargetAsmParser(); //     // 1 -    BinopPrecedence['<'] = 10; BinopPrecedence['+'] = 20; BinopPrecedence['-'] = 20; BinopPrecedence['*'] = 40; // highest. //    fprintf(stderr, "ready> "); getNextToken(); TheJIT = llvm::make_unique<KaleidoscopeJIT>(); InitializeModuleAndPassManager(); //    MainLoop(); return 0; } 

Source: https://habr.com/ru/post/336080/


All Articles