Creating a programming language using LLVM. Part 7: Language Expansion: Variable Variables

Table of contents:

Part 1: Introduction and Lexical Analysis
Part 2: Implementing Parser and AST
Part 3: LLVM IR Code Generation
Part 4: Adding JIT and Optimizer Support
Part 5: Language Expansion: Control Flow
Part 6: Language Extension: User Defined Operators
Part 7: Language Expansion: Variable Variables
Part 8: Compile to Object Code
Part 9: Adding Debugging Information
Part 10: Conclusion and other goodies LLVM

7.1. Introduction

Welcome to chapter 7 of the guide “Creating a programming language using LLVM”. In chapters 1-6, we built a complete, albeit simple, functional programming language. On this path, we learned some parsing techniques, learned how to build and how to represent AST, how to build LLVM IR, and how to optimize the resulting code, and how JIT compiles it.

Although the Kaleidoscope is interesting as a functional language, the fact that it is functional makes it too easy to generate LLVM IR for it. In particular, the language functionality makes it very simple to build an LLVM IR directly in the SSA form. Since LLVM requires the input code to be in SSA form, this is a very good feature, and for beginners it is often unclear how to generate code for imperative languages with variable variables.

A short (and happy) summary of this chapter is that you don't need to build an SSA form in your front end: LLVM provides well-tuned and well-tested support for this, although the way it is done will be unexpected for someone.

7.2. Why is this a difficult task?

To understand why mutable variables cause difficulties with building SSA, consider an extremely simple C example:

int G, H; int test(_Bool Condition) { int X; if (Condition) X = G; else X = H; return X; }

In this example, there is a variable "X", the value of which depends on the program execution path. Since there are two possible values for X before the return command, a PHI node is inserted that combines both values. The LLVM IR for this example will look something like this:

 @G = weak global i32 0 ; type of @G is i32* @H = weak global i32 0 ; type of @H is i32* define i32 @test(i1 %Condition) { entry: br i1 %Condition, label %cond_true, label %cond_false cond_true: %X.0 = load i32* @G br label %cond_next cond_false: %X.1 = load i32* @H br label %cond_next cond_next: %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] ret i32 %X.2 }

In this example, loading from global variables G and H in LLVM IR is explicit, and they are alive in the then / else branches of the if operator (cond_true / cond_false). To combine input values, the X.2 phi node in the cond_next block chooses the correct value depending on where the control flow came from: if from a cond_false block, then X.2 gets the value X.1.
If the control flow came from cond_true, the variable gets the value X.0. In this chapter, we will not explain the details of the SSA form. For more information, contact one of the many online directories.

The question of this chapter is who propagates the phi nodes while descending assignment to a variable variable. The problem is that LLVM requires that IR be in SSA form: there is no “without SSA” mode. However, designing SSA requires nontrivial algorithms and data structures, and it would be inconvenient and costly to reproduce this logic in each fontend.

7.3. Memory in LLVM

The trick is that, although LLVM requires that all register values are in SSA form, it does not require (and does not allow) memory objects to be in SSA form. In the example above, note that the download commands from G and H have direct access to G and H: they are not renamed and not numbered. This is different from some other compiler systems that try to assign version numbers to objects in memory. In LLVM, instead of analyzing the flow of memory data in LLVM IR, this happens in analytic passes that run on demand.

The basic idea is that we can make stack variables (which live in memory, because it is a stack) for each variable object in the function. In order to appreciate the advantage of this approach, we need to talk about how stack variables are represented in LLVM.

In LLVM, all memory access operations occur explicitly using load / store commands, and are carefully designed so that there is no need for an address taking operator. Note that TP of the global variables @ G / @ H is actually “i32 *”, despite the fact that the variables themselves are declared as “i32”. This means that @G determines the location for i32 in the global data area, but the name of this variable actually refers to the address of this space. Stack variables work the same way, but, unlike global variables, they are declared with the alloca command:

 define i32 @example() { entry: %X = alloca i32 ;  %X - i32*. ... %tmp = load i32* %X ;    %X  . %tmp2 = add i32 %tmp, 1 ;  store i32 %tmp2, i32* %X ;  ...

This code shows an example of how you can declare and manipulate stack variables in LLVM IR. The stack memory allocated by the alloca command is completely generalized: you can pass the address of the stack slot to a function, save it in another variable, etc. In the example above, we can rewrite this example to use the alloca technique and avoid using PHI nodes.

 @G = weak global i32 0 ; type of @G is i32* @H = weak global i32 0 ; type of @H is i32* define i32 @test(i1 %Condition) { entry: %X = alloca i32 ; type of %X is i32*. br i1 %Condition, label %cond_true, label %cond_false cond_true: %X.0 = load i32* @G store i32 %X.0, i32* %X ;  X br label %cond_next cond_false: %X.1 = load i32* @H store i32 %X.1, i32* %X ;  X br label %cond_next cond_next: %X.2 = load i32* %X ;  X ret i32 %X.2 }

So, we’ve discovered a way to handle arbitrary mutable variables without having to create PHI nodes:

Each variable being changed becomes a stack area.
Each variable reading becomes a load from the stack.
Each entry in a variable becomes a saving to the stack.
Taking the address of a variable is just a direct use of the value of the address on the stack.

Although we solved our problem, another one appeared: now we are having an intensive exchange with the stack for very simple and common operations, which degrades performance. Fortunately for us, the LLVM optimizer has a well-tuned optimization pass called “mem2reg” that handles such cases, converts the alloca commands into SSA registers and inserts Phi nodes where necessary. If you pass the code from the example above through this pass, you will get:

 $ llvm-as < example.ll | opt -mem2reg | llvm-dis @G = weak global i32 0 @H = weak global i32 0 define i32 @test(i1 %Condition) { entry: br i1 %Condition, label %cond_true, label %cond_false cond_true: %X.0 = load i32* @G br label %cond_next cond_false: %X.1 = load i32* @H br label %cond_next cond_next: %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] ret i32 %X.01 }

The mem2reg pass implements the standard “iterated dominance frontier” algorithm for constructing an SSA form and has a number of optimizations that accelerate (very general) degenerate cases. The optimization of mem2reg is the answer to the question of what to do with variable variables, and we highly recommend relying on it. Note that mem2reg works with variables only under certain conditions:

mem2reg is focused on alloca commands (allocating memory on the stack): it searches for alloca commands and, if it can process them, does so. It does not work with global variables and heap allocation.
mem2reg searches for alloca commands only in the input block of the function. Placing in the input block ensures that alloca is executed only once, which simplifies the analysis.
mem2reg converts only those alloca commands that are used in direct load and store commands. If the address of the stack variable is passed to a function, or some funny pointer arithmetic is used, the alloca command will not be converted.
mem2reg works only with alloca commands for first-class values (such as pointers, scalar and vector values), and only if the size of the array for allocating memory is 1 (or is missing in the .ll file). mem2reg is not able to convert a structure or array into registers. Note that the sroa pass is more powerful and can transform structures, unions, and arrays in many cases.

All these properties are easily achievable in most imperative languages, and we will illustrate them below using the example of a Kaleidoscope. The last question you can ask is: should I worry about all this in my front end? Wouldn't it have been better if I had designed the SSA form directly, avoiding the use of the mem2reg optimizing pass? In short, we strongly recommend that you use this technique to build an SSA form, unless there are very good reasons for not doing this.

This technique:

Proven and well tested: clang uses this technique for local mutable variables. Also, in most common LLVM clients, this technique is used for their variables. You can be sure that bugs are fast and fixed in the early stages.

It works very fast: mem2reg optimizes both special cases and general ones by doing it quickly. For example, there are separate fast optimizations for variables used only in one block, for variables that have a single assignment point, good heuristics to avoid inserting extra phi nodes, etc.

Generates debug information: The debug information in LLVM is based on the addresses of the variables to which it is attached. This technique is very naturally tailored to this style of debug information.

Finally, it is easy to integrate into your frontend and work, and easy to implement. Let us now expand the Kaleidoscope to work with variable variables!

7.4. Changeable in Kaleidoscope

Now we know the essence of the problem we want to solve. Let's see what it looks like in the context of our little language Kaleidoscope. We need to add two possibilities:

The ability to change a variable using the operator "=".
Ability to define new variables.

The first point is what we all said, we have variables as input arguments of the function, and as intermediate variables, and redefine them. Also, the ability to define new variables is useful regardless of whether we can change them. Here is a motivating example showing how we can use them:

 #  ':'  :   ,   #    RHS. def binary : 1 (xy) y; #   fib,     def fib(x) if (x < 3) then 1 else fib(x-1)+fib(x-2); #   fib. def fibi(x) var a = 1, b = 1, c in (for i = 3, i < x in c = a + b : a = b : b = c) : b; #  . fibi(10);

In order to change a variable, we must use a trick with “alloca” for existing variables. When we complete this, we will add our new operator, and expand the Kaleidoscope to support the definition of new variables.

7.5. We convert existing variables into a changeable form¶

The table of characters in the Kaleidoscope during code generation is represented by a table (map) "NamedValues". The table contains pointers to LLVM “Value *” values, which contain double precision values for named variables. In order to maintain variable variability, we need to change them a bit, so that the NamedValues table contains variable memory locations. This change is refactoring: it changes the structure of the code, but (by itself) does not change the behavior of the compiler. All these changes are isolated in the Kaleidoscope code generator.

At this stage in the development of the Kaleidoscope, it maintains variables for only two cases: the input arguments of the function, and the loop variable “for”. For consistency, we allow changes to these variables as well as user-defined variables. This means that they also need addresses in memory.

To begin reworking the Kaleidoscope, we will modify the NamedValues table so that it contains AllocaInst * instead of Value *. When we do this, the C ++ compiler will tell us which parts of the code we need to change:

 static std::map<std::string, AllocaInst*> NamedValues;

Also, since we will need to create alloca commands, we use an auxiliary function that creates alloca commands in the input block of the function:

 /// CreateEntryBlockAlloca -   alloca    /// .       .. static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction, const std::string &VarName) { IRBuilder<> TmpB(&TheFunction->getEntryBlock(), TheFunction->getEntryBlock().begin()); return TmpB.CreateAlloca(Type::getDoubleTy(TheContext), 0, VarName.c_str()); }

This fun-looking code creates an IRBuilder object that points to the first command (.begin ()) of the input block. It creates an alloca statement with the desired name and returns it. Since all values in the Cadeidoscope are real numbers with double precision, we do not need to pass a type to use this instruction.

When this is done, the first functional change we want to make refers to variable references. According to our new scheme, variables are alive on the stack, and the code that generates references to them should actually generate a load instruction from the stack slot:

 Value *VariableExprAST::codegen() { //     Value *V = NamedValues[Name]; if (!V) return LogErrorV("Unknown variable name"); //   return Builder.CreateLoad(V, Name.c_str()); }

As you can see, everything is absolutely straightforward. Now we need to update the code points that define the variable to insert the alloca instruction. Let's start with ForExprAST :: codegen () (see full code listing for the full version):

 Function *TheFunction = Builder.GetInsertBlock()->getParent(); //   alloca      AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); //    ,  . Value *StartVal = Start->codegen(); if (!StartVal) return nullptr; //    alloca. Builder.CreateStore(StartVal, Alloca); ... //    Value *EndCond = End->codegen(); if (!EndCond) return nullptr; // , ,   alloca.    //      . Value *CurVar = Builder.CreateLoad(Alloca); Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar"); Builder.CreateStore(NextVar, Alloca); ...

This code is actually identical to the code given earlier, in which we allow mutable variables. The big difference is that we no longer have to construct a PHI node, and we use load / store to access the variable when we need it.

In order to support variable function arguments, we also need to make alloca instructions for them. The code for this is very simple:

 Function *FunctionAST::codegen() { ... Builder.SetInsertPoint(BB); //      NamedValues. NamedValues.clear(); for (auto &Arg : TheFunction->args()) { //  alloca    AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, Arg.getName()); //      alloca. Builder.CreateStore(&Arg, Alloca); //     . NamedValues[Arg.getName()] = Alloca; } if (Value *RetVal = Body->codegen()) { ...

For each argument, we create an alloca instruction, write the input value to alloca, and register alloca as the memory location for the argument. This method is called FunctionAST :: codegen () immediately after generating the input block of the function.

The last part of the mission is to add the mem2reg pass, which allows us to generate good code again:

 //   alloca  . TheFPM->add(createPromoteMemoryToRegisterPass()); //   peephole-    . TheFPM->add(createInstructionCombiningPass()); //   TheFPM->add(createReassociatePass()); ...

It is interesting to see how the code looks before and after the optimization of mem2reg. For example, here is the code for our recursive function, fib, before and after optimization. Before optimization:

 define double @fib(double %x) { entry: %x1 = alloca double store double %x, double* %x1 %x2 = load double, double* %x1 %cmptmp = fcmp ult double %x2, 3.000000e+00 %booltmp = uitofp i1 %cmptmp to double %ifcond = fcmp one double %booltmp, 0.000000e+00 br i1 %ifcond, label %then, label %else then: ; preds = %entry br label %ifcont else: ; preds = %entry %x3 = load double, double* %x1 %subtmp = fsub double %x3, 1.000000e+00 %calltmp = call double @fib(double %subtmp) %x4 = load double, double* %x1 %subtmp5 = fsub double %x4, 2.000000e+00 %calltmp6 = call double @fib(double %subtmp5) %addtmp = fadd double %calltmp, %calltmp6 br label %ifcont ifcont: ; preds = %else, %then %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ] ret double %iftmp }

There is only one variable (x, input argument), but you can still see the simple code generation strategy that we use. In the input block, alloca is created, and the initial input value is stored there. Each variable reference causes a read from the stack. Also note that we do not modify the if / then / else expression, and it still inserts PHI nodes. Although we can make alloca in this case, it’s actually easier to create a PHI node, so we’ll just do it.

Here is the code after passing mem2reg:

 define double @fib(double %x) { entry: %cmptmp = fcmp ult double %x, 3.000000e+00 %booltmp = uitofp i1 %cmptmp to double %ifcond = fcmp one double %booltmp, 0.000000e+00 br i1 %ifcond, label %then, label %else then: br label %ifcont else: %subtmp = fsub double %x, 1.000000e+00 %calltmp = call double @fib(double %subtmp) %subtmp5 = fsub double %x, 2.000000e+00 %calltmp6 = call double @fib(double %subtmp5) %addtmp = fadd double %calltmp, %calltmp6 br label %ifcont ifcont: ; preds = %else, %then %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ] ret double %iftmp }

This is a trivial case for mem2reg, since there are no variable declarations. The purpose with which we show this is to protect you from the desire to make the code ineffective. After the rest of the optimization is done, we get:

 define double @fib(double %x) { entry: %cmptmp = fcmp ult double %x, 3.000000e+00 %booltmp = uitofp i1 %cmptmp to double %ifcond = fcmp ueq double %booltmp, 0.000000e+00 br i1 %ifcond, label %else, label %ifcont else: %subtmp = fsub double %x, 1.000000e+00 %calltmp = call double @fib(double %subtmp) %subtmp5 = fsub double %x, 2.000000e+00 %calltmp6 = call double @fib(double %subtmp5) %addtmp = fadd double %calltmp, %calltmp6 ret double %addtmp ifcont: ret double 1.000000e+00 }

Here we see that the simplifying passage decided to clone the instruction to return to the end of the “else” block. This allowed to remove some branches and PHI-node.

Now, when the symbol table is updated and contains stack variables, we add an assignment operator.

7.6. New assignment operator

In our framework, adding a new assignment operator is very simple. Parsing it like any other binary operator, but processing it yourself (instead of letting the user do it). First of all, assign priority to it:

 int main() { //     // 1 -    BinopPrecedence['='] = 2; BinopPrecedence['<'] = 10; BinopPrecedence['+'] = 20; BinopPrecedence['-'] = 20;

Now, when the parser knows the priority of the binary operator, it will take care of parsing and generating the AST. We need to implement code generation for the assignment operator. It looks like this:

 Value *BinaryExprAST::codegen() { // '=' -  , ..       LHS if (Op == '=') { //  ,  LHS  . VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS.get()); if (!LHSE) return LogErrorV("destination of '=' must be a variable");

Unlike other binary operators, our assignment operator should not be a model “generate LHS, generate RHS, perform calculations”. It is treated as a special case before other binary operators are processed. Another weird thing is that the LHS must be a variable. It would be incorrect to write "(x + 1) = expr" - only expressions of the type "x = expr" are allowed.

  //    RHS. Value *Val = RHS->codegen(); if (!Val) return nullptr; //   Value *Variable = NamedValues[LHSE->getName()]; if (!Variable) return LogErrorV("Unknown variable name"); Builder.CreateStore(Val, Variable); return Val; } ...

When we have a variable, generating the code for the assignment is very simple: we generate the RHS for the assignment, create the store instruction, and return the calculated value. The return value allows you to make chains of assignments, for example, "X = (Y = Z)".

We now have an assignment operator, and we can change the loop variable and arguments. For example, we can write this code:

 # Function to print a double. extern printd(x); # Define ':' for sequencing: as a low-precedence operator that ignores operands # and just returns the RHS. def binary : 1 (xy) y; def test(x) printd(x) : x = 4 : printd(x); test(123);

At startup, this example will print “123”, then “4”, showing that we actually changed the value of the variable! Well, now we have reached the goal to make it work, in general, we need to construct an SSA form. However, it would be really useful if we could enter our own local variables, let's do it!

7.7. User-Defined Local Variables

Adding “var” and “in” is similar to any other extensions we made in the Kaleidoscope: we expand the lexical analyzer, the parser, the AST, and the code generator. The first step to add our var / in construct is to expand the lexical analyzer. As before, this is trivial, the code will look like this:

 enum Token { ... //  var tok_var = -13 ... } ... static int gettok() { ... if (IdentifierStr == "in") return tok_in; if (IdentifierStr == "binary") return tok_binary; if (IdentifierStr == "unary") return tok_unary; if (IdentifierStr == "var") return tok_var; return tok_identifier; ...

The next step is to define the AST node that we are constructing. For var / in, it looks like this:

 /// VarExprAST -    var/in class VarExprAST : public ExprAST { std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames; std::unique_ptr<ExprAST> Body; public: VarExprAST(std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames, std::unique_ptr<ExprAST> Body) : VarNames(std::move(VarNames)), Body(std::move(Body)) {} Value *codegen() override; };

"Var / in" allows you to define a list of names at once, and each name can optionally have an initial value. We save information in the vector VarNames. Also, var / in has has a body, the body can have access to variables defined in var / in.
When this is done, we can define parts of the parser. The first thing we do is add a primary expression:

 /// primary /// ::= identifierexpr /// ::= numberexpr /// ::= parenexpr /// ::= ifexpr /// ::= forexpr /// ::= varexpr static std::unique_ptr<ExprAST> ParsePrimary() { switch (CurTok) { default: return LogError("unknown token when expecting an expression"); case tok_identifier: return ParseIdentifierExpr(); case tok_number: return ParseNumberExpr(); case '(': return ParseParenExpr(); case tok_if: return ParseIfExpr(); case tok_for: return ParseForExpr(); case tok_var: return ParseVarExpr(); } }

Then we define ParseVarExpr:

 /// varexpr ::= 'var' identifier ('=' expression)? // (',' identifier ('=' expression)?)* 'in' expression static std::unique_ptr<ExprAST> ParseVarExpr() { getNextToken(); //  var. std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames; //       if (CurTok != tok_identifier) return LogError("expected identifier after var");

The first part of this code parses the list of identifier / expression pairs in the local VarNames vector.

 while (1) { std::string Name = IdentifierStr; getNextToken(); //   //    std::unique_ptr<ExprAST> Init; if (CurTok == '=') { getNextToken(); //  '='. Init = ParseExpression(); if (!Init) return nullptr; } VarNames.push_back(std::make_pair(Name, std::move(Init))); //   ,    if (CurTok != ',') break; getNextToken(); //  ','. if (CurTok != tok_identifier) return LogError("expected identifier list after var"); }

When all variables are parsed, parse the body and create an AST node.

  //      'in'. if (CurTok != tok_in) return LogError("expected 'in' keyword after 'var'"); getNextToken(); // eat 'in'. auto Body = ParseExpression(); if (!Body) return nullptr; return llvm::make_unique<VarExprAST>(std::move(VarNames), std::move(Body)); }

Now we can parse and submit the code, and we need to support the generation of LLVM IR code for it. This is the beginning of the code:

 Value *VarExprAST::codegen() { std::vector<AllocaInst *> OldBindings; Function *TheFunction = Builder.GetInsertBlock()->getParent(); //       for (unsigned i = 0, e = VarNames.size(); i != e; ++i) { const std::string &VarName = VarNames[i].first; ExprAST *Init = VarNames[i].second.get();

In essence, the code loops around all the variables and processes them one at a time. For each variable placed in the symbol table, we memorize the previous value, which we replace with OldBindings.

  //        ,   //      , //     : // var a = 1 in // var a = a in ... #     'a'. Value *InitVal; if (Init) { InitVal = Init->codegen(); if (!InitVal) return nullptr; } else { //   ,  0.0. InitVal = ConstantFP::get(TheContext, APFloat(0.0)); } AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); Builder.CreateStore(InitVal, Alloca); //    ,    //    OldBindings.push_back(NamedValues[VarName]); //   NamedValues[VarName] = Alloca; }

This code has more comments. The basic idea is that we generate an initializer, create the alloca command, then update the character in the table to point to the alloca. When all variables are written to the symbol table, we calculate the body of the var / in expression:

 //    ,       Value *BodyVal = Body->codegen(); if (!BodyVal) return nullptr;

Finally, before returning, we restore the previous bunch of variables:

  //       for (unsigned i = 0, e = VarNames.size(); i != e; ++i) NamedValues[VarNames[i].first] = OldBindings[i]; //    return BodyVal; }

The end result of all this is that we got the variables correctly placed in the scope, and we even (in a trivial way) allowed them to be modified.

Now we have finished what we wanted to do. Our sample iteration function, fib, from the intro is compiled and works great. The mem2reg optimization pass optimizes all our stack variables in SSA registers, inserting PHI nodes where necessary, and our frontend remains simple: there are no complicated algorithms and calculations.

7.8. Full listing

Below is a complete listing of the source code for our working example, expanded by variable variables and var / in support. To build an example, use the commands:

 #  clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy #  ./toy

Source:

Code

 #include "llvm/ADT/APFloat.h" #include "llvm/ADT/STLExtras.h" #include "llvm/IR/BasicBlock.h" #include "llvm/IR/Constants.h" #include "llvm/IR/DerivedTypes.h" #include "llvm/IR/Function.h" #include "llvm/IR/Instructions.h" #include "llvm/IR/IRBuilder.h" #include "llvm/IR/LLVMContext.h" #include "llvm/IR/LegacyPassManager.h" #include "llvm/IR/Module.h" #include "llvm/IR/Type.h" #include "llvm/IR/Verifier.h" #include "llvm/Support/TargetSelect.h" #include "llvm/Target/TargetMachine.h" #include "llvm/Transforms/Scalar.h" #include "llvm/Transforms/Scalar/GVN.h" #include "../include/KaleidoscopeJIT.h" #include <algorithm> #include <cassert> #include <cctype> #include <cstdint> #include <cstdio> #include <cstdlib> #include <map> #include <memory> #include <string> #include <utility> #include <vector> using namespace llvm; using namespace llvm::orc; //===----------------------------------------------------------------------===// //   //===----------------------------------------------------------------------===// //     [0-255]    ,    //  enum Token { tok_eof = -1, //  tok_def = -2, tok_extern = -3, //   tok_identifier = -4, tok_number = -5, //  tok_if = -6, tok_then = -7, tok_else = -8, tok_for = -9, tok_in = -10, //  tok_binary = -11, tok_unary = -12, //   tok_var = -13 }; static std::string IdentifierStr; //   tok_identifier static double NumVal; //   tok_number /// gettok -       static int gettok() { static int LastChar = ' '; //   while (isspace(LastChar)) LastChar = getchar(); if (isalpha(LastChar)) { // : [a-zA-Z][a-zA-Z0-9]* IdentifierStr = LastChar; while (isalnum((LastChar = getchar()))) IdentifierStr += LastChar; if (IdentifierStr == "def") return tok_def; if (IdentifierStr == "extern") return tok_extern; if (IdentifierStr == "if") return tok_if; if (IdentifierStr == "then") return tok_then; if (IdentifierStr == "else") return tok_else; if (IdentifierStr == "for") return tok_for; if (IdentifierStr == "in") return tok_in; if (IdentifierStr == "binary") return tok_binary; if (IdentifierStr == "unary") return tok_unary; if (IdentifierStr == "var") return tok_var; return tok_identifier; } if (isdigit(LastChar) || LastChar == '.') { // : [0-9.]+ std::string NumStr; do { NumStr += LastChar; LastChar = getchar(); } while (isdigit(LastChar) || LastChar == '.'); NumVal = strtod(NumStr.c_str(), nullptr); return tok_number; } if (LastChar == '#') { //     do LastChar = getchar(); while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); if (LastChar != EOF) return gettok(); } //   .   EOF. if (LastChar == EOF) return tok_eof; // ,     ascii-. int ThisChar = LastChar; LastChar = getchar(); return ThisChar; } //===----------------------------------------------------------------------===// //    ( ) //===----------------------------------------------------------------------===// namespace { /// ExprAST -    . class ExprAST { public: virtual ~ExprAST() = default; virtual Value *codegen() = 0; }; /// NumberExprAST -      "1.0". class NumberExprAST : public ExprAST { double Val; public: NumberExprAST(double Val) : Val(Val) {} Value *codegen() override; }; /// VariableExprAST -   , , "a". class VariableExprAST : public ExprAST { std::string Name; public: VariableExprAST(const std::string &Name) : Name(Name) {} Value *codegen() override; const std::string &getName() const { return Name; } }; /// UnaryExprAST -      class UnaryExprAST : public ExprAST { char Opcode; std::unique_ptr<ExprAST> Operand; public: UnaryExprAST(char Opcode, std::unique_ptr<ExprAST> Operand) : Opcode(Opcode), Operand(std::move(Operand)) {} Value *codegen() override; }; /// BinaryExprAST -      class BinaryExprAST : public ExprAST { char Op; std::unique_ptr<ExprAST> LHS, RHS; public: BinaryExprAST(char Op, std::unique_ptr<ExprAST> LHS, std::unique_ptr<ExprAST> RHS) : Op(Op), LHS(std::move(LHS)), RHS(std::move(RHS)) {} Value *codegen() override; }; /// CallExprAST -      class CallExprAST : public ExprAST { std::string Callee; std::vector<std::unique_ptr<ExprAST>> Args; public: CallExprAST(const std::string &Callee, std::vector<std::unique_ptr<ExprAST>> Args) : Callee(Callee), Args(std::move(Args)) {} Value *codegen() override; }; /// IfExprAST -    if/then/else. class IfExprAST : public ExprAST { std::unique_ptr<ExprAST> Cond, Then, Else; public: IfExprAST(std::unique_ptr<ExprAST> Cond, std::unique_ptr<ExprAST> Then, std::unique_ptr<ExprAST> Else) : Cond(std::move(Cond)), Then(std::move(Then)), Else(std::move(Else)) {} Value *codegen() override; }; /// ForExprAST -    for/in. class ForExprAST : public ExprAST { std::string VarName; std::unique_ptr<ExprAST> Start, End, Step, Body; public: ForExprAST(const std::string &VarName, std::unique_ptr<ExprAST> Start, std::unique_ptr<ExprAST> End, std::unique_ptr<ExprAST> Step, std::unique_ptr<ExprAST> Body) : VarName(VarName), Start(std::move(Start)), End(std::move(End)), Step(std::move(Step)), Body(std::move(Body)) {} Value *codegen() override; }; /// VarExprAST -    var/in class VarExprAST : public ExprAST { std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames; std::unique_ptr<ExprAST> Body; public: VarExprAST( std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames, std::unique_ptr<ExprAST> Body) : VarNames(std::move(VarNames)), Body(std::move(Body)) {} Value *codegen() override; }; /// PrototypeAST -    "" , ///    ,    (, ,  /// ,  ),   . class PrototypeAST { std::string Name; std::vector<std::string> Args; bool IsOperator; unsigned Precedence; // Precedence if a binary op. public: PrototypeAST(const std::string &Name, std::vector<std::string> Args, bool IsOperator = false, unsigned Prec = 0) : Name(Name), Args(std::move(Args)), IsOperator(IsOperator), Precedence(Prec) {} Function *codegen(); const std::string &getName() const { return Name; } bool isUnaryOp() const { return IsOperator && Args.size() == 1; } bool isBinaryOp() const { return IsOperator && Args.size() == 2; } char getOperatorName() const { assert(isUnaryOp() || isBinaryOp()); return Name[Name.size() - 1]; } unsigned getBinaryPrecedence() const { return Precedence; } }; /// FunctionAST -      class FunctionAST { std::unique_ptr<PrototypeAST> Proto; std::unique_ptr<ExprAST> Body; public: FunctionAST(std::unique_ptr<PrototypeAST> Proto, std::unique_ptr<ExprAST> Body) : Proto(std::move(Proto)), Body(std::move(Body)) {} Function *codegen(); }; } //     //===----------------------------------------------------------------------===// //  //===----------------------------------------------------------------------===// /// CurTok/getNextToken -    . CurTok -  /// ,    . getNextToken     ///     CurTok  . static int CurTok; static int getNextToken() { return CurTok = gettok(); } /// BinopPrecedence -     , ///   static std::map<char, int> BinopPrecedence; /// GetTokPrecedence -       . static int GetTokPrecedence() { if (!isascii(CurTok)) return -1; // ,     int TokPrec = BinopPrecedence[CurTok]; if (TokPrec <= 0) return -1; return TokPrec; } /// Error* -     . std::unique_ptr<ExprAST> LogError(const char *Str) { fprintf(stderr, "Error: %s\n", Str); return nullptr; } std::unique_ptr<PrototypeAST> LogErrorP(const char *Str) { LogError(Str); return nullptr; } static std::unique_ptr<ExprAST> ParseExpression(); /// numberexpr ::= number static std::unique_ptr<ExprAST> ParseNumberExpr() { auto Result = llvm::make_unique<NumberExprAST>(NumVal); getNextToken(); //   return std::move(Result); } /// parenexpr ::= '(' expression ')' static std::unique_ptr<ExprAST> ParseParenExpr() { getNextToken(); //  (. auto V = ParseExpression(); if (!V) return nullptr; if (CurTok != ')') return LogError("expected ')'"); getNextToken(); //  ). return V; } /// identifierexpr /// ::= identifier /// ::= identifier '(' expression* ')' static std::unique_ptr<ExprAST> ParseIdentifierExpr() { std::string IdName = IdentifierStr; getNextToken(); //  . if (CurTok != '(') //   return llvm::make_unique<VariableExprAST>(IdName); // Call. getNextToken(); //  ( std::vector<std::unique_ptr<ExprAST>> Args; if (CurTok != ')') { while (true) { if (auto Arg = ParseExpression()) Args.push_back(std::move(Arg)); else return nullptr; if (CurTok == ')') break; if (CurTok != ',') return LogError("Expected ')' or ',' in argument list"); getNextToken(); } } //  ')'. getNextToken(); return llvm::make_unique<CallExprAST>(IdName, std::move(Args)); } /// ifexpr ::= 'if' expression 'then' expression 'else' expression static std::unique_ptr<ExprAST> ParseIfExpr() { getNextToken(); // eat the if. //  auto Cond = ParseExpression(); if (!Cond) return nullptr; if (CurTok != tok_then) return LogError("expected then"); getNextToken(); //  then auto Then = ParseExpression(); if (!Then) return nullptr; if (CurTok != tok_else) return LogError("expected else"); getNextToken(); auto Else = ParseExpression(); if (!Else) return nullptr; return llvm::make_unique<IfExprAST>(std::move(Cond), std::move(Then), std::move(Else)); } /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression static std::unique_ptr<ExprAST> ParseForExpr() { getNextToken(); // eat the for. if (CurTok != tok_identifier) return LogError("expected identifier after for"); std::string IdName = IdentifierStr; getNextToken(); //   if (CurTok != '=') return LogError("expected '=' after for"); getNextToken(); //  '='. auto Start = ParseExpression(); if (!Start) return nullptr; if (CurTok != ',') return LogError("expected ',' after for start value"); getNextToken(); auto End = ParseExpression(); if (!End) return nullptr; //    std::unique_ptr<ExprAST> Step; if (CurTok == ',') { getNextToken(); Step = ParseExpression(); if (!Step) return nullptr; } if (CurTok != tok_in) return LogError("expected 'in' after for"); getNextToken(); //  'in'. auto Body = ParseExpression(); if (!Body) return nullptr; return llvm::make_unique<ForExprAST>(IdName, std::move(Start), std::move(End), std::move(Step), std::move(Body)); } /// varexpr ::= 'var' identifier ('=' expression)? // (',' identifier ('=' expression)?)* 'in' expression static std::unique_ptr<ExprAST> ParseVarExpr() { getNextToken(); //  var. std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames; //      if (CurTok != tok_identifier) return LogError("expected identifier after var"); while (true) { std::string Name = IdentifierStr; getNextToken(); //   // Read the optional initializer. std::unique_ptr<ExprAST> Init = nullptr; if (CurTok == '=') { getNextToken(); //  '='. Init = ParseExpression(); if (!Init) return nullptr; } VarNames.push_back(std::make_pair(Name, std::move(Init))); // End of var list, exit loop. if (CurTok != ',') break; getNextToken(); //  ','. if (CurTok != tok_identifier) return LogError("expected identifier list after var"); } // At this point, we have to have 'in'. if (CurTok != tok_in) return LogError("expected 'in' keyword after 'var'"); getNextToken(); //  'in'. auto Body = ParseExpression(); if (!Body) return nullptr; return llvm::make_unique<VarExprAST>(std::move(VarNames), std::move(Body)); } /// primary /// ::= identifierexpr /// ::= numberexpr /// ::= parenexpr /// ::= ifexpr /// ::= forexpr /// ::= varexpr static std::unique_ptr<ExprAST> ParsePrimary() { switch (CurTok) { default: return LogError("unknown token when expecting an expression"); case tok_identifier: return ParseIdentifierExpr(); case tok_number: return ParseNumberExpr(); case '(': return ParseParenExpr(); case tok_if: return ParseIfExpr(); case tok_for: return ParseForExpr(); case tok_var: return ParseVarExpr(); } } /// unary /// ::= primary /// ::= '!' unary static std::unique_ptr<ExprAST> ParseUnary() { //      ,     . if (!isascii(CurTok) || CurTok == '(' || CurTok == ',') return ParsePrimary(); //    ,   int Opc = CurTok; getNextToken(); if (auto Operand = ParseUnary()) return llvm::make_unique<UnaryExprAST>(Opc, std::move(Operand)); return nullptr; } /// binoprhs /// ::= ('+' unary)* static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec, std::unique_ptr<ExprAST> LHS) { //    ,    while (true) { int TokPrec = GetTokPrecedence(); //   ,    //  ,   if (TokPrec < ExprPrec) return LHS; //   ,     int BinOp = CurTok; getNextToken(); // eat binop //       auto RHS = ParseUnary(); if (!RHS) return nullptr; //  BinOp    RHS,    RHS,  //    RHS   LHS. int NextPrec = GetTokPrecedence(); if (TokPrec < NextPrec) { RHS = ParseBinOpRHS(TokPrec + 1, std::move(RHS)); if (!RHS) return nullptr; } //  LHS/RHS. LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS), std::move(RHS)); } } /// expression /// ::= unary binoprhs /// static std::unique_ptr<ExprAST> ParseExpression() { auto LHS = ParseUnary(); if (!LHS) return nullptr; return ParseBinOpRHS(0, std::move(LHS)); } /// prototype /// ::= id '(' id* ')' /// ::= binary LETTER number? (id, id) /// ::= unary LETTER (id) static std::unique_ptr<PrototypeAST> ParsePrototype() { std::string FnName; unsigned Kind = 0; // 0 = , 1 = , 2 = . unsigned BinaryPrecedence = 30; switch (CurTok) { default: return LogErrorP("Expected function name in prototype"); case tok_identifier: FnName = IdentifierStr; Kind = 0; getNextToken(); break; case tok_unary: getNextToken(); if (!isascii(CurTok)) return LogErrorP("Expected unary operator"); FnName = "unary"; FnName += (char)CurTok; Kind = 1; getNextToken(); break; case tok_binary: getNextToken(); if (!isascii(CurTok)) return LogErrorP("Expected binary operator"); FnName = "binary"; FnName += (char)CurTok; Kind = 2; getNextToken(); //  ,    if (CurTok == tok_number) { if (NumVal < 1 || NumVal > 100) return LogErrorP("Invalid precedence: must be 1..100"); BinaryPrecedence = (unsigned)NumVal; getNextToken(); } break; } if (CurTok != '(') return LogErrorP("Expected '(' in prototype"); std::vector<std::string> ArgNames; while (getNextToken() == tok_identifier) ArgNames.push_back(IdentifierStr); if (CurTok != ')') return LogErrorP("Expected ')' in prototype"); // . getNextToken(); // eat ')'. // ,     if (Kind && ArgNames.size() != Kind) return LogErrorP("Invalid number of operands for operator"); return llvm::make_unique<PrototypeAST>(FnName, ArgNames, Kind != 0, BinaryPrecedence); } /// definition ::= 'def' prototype expression static std::unique_ptr<FunctionAST> ParseDefinition() { getNextToken(); //  def. auto Proto = ParsePrototype(); if (!Proto) return nullptr; if (auto E = ParseExpression()) return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E)); return nullptr; } /// toplevelexpr ::= expression static std::unique_ptr<FunctionAST> ParseTopLevelExpr() { if (auto E = ParseExpression()) { //    auto Proto = llvm::make_unique<PrototypeAST>("__anon_expr", std::vector<std::string>()); return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E)); } return nullptr; } /// external ::= 'extern' prototype static std::unique_ptr<PrototypeAST> ParseExtern() { getNextToken(); // eat extern. return ParsePrototype(); } //===----------------------------------------------------------------------===// //   //===----------------------------------------------------------------------===// static LLVMContext TheContext; static IRBuilder<> Builder(TheContext); static std::unique_ptr<Module> TheModule; static std::map<std::string, AllocaInst *> NamedValues; static std::unique_ptr<legacy::FunctionPassManager> TheFPM; static std::unique_ptr<KaleidoscopeJIT> TheJIT; static std::map<std::string, std::unique_ptr<PrototypeAST>> FunctionProtos; Value *LogErrorV(const char *Str) { LogError(Str); return nullptr; } Function *getFunction(std::string Name) { //  ,       . if (auto *F = TheModule->getFunction(Name)) return F; //  ,         // . auto FI = FunctionProtos.find(Name); if (FI != FunctionProtos.end()) return FI->second->codegen(); //    ,  null. return nullptr; } /// CreateEntryBlockAlloca -   alloca    /// .     . static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction, const std::string &VarName) { IRBuilder<> TmpB(&TheFunction->getEntryBlock(), TheFunction->getEntryBlock().begin()); return TmpB.CreateAlloca(Type::getDoubleTy(TheContext), nullptr, VarName); } Value *NumberExprAST::codegen() { return ConstantFP::get(TheContext, APFloat(Val)); } Value *VariableExprAST::codegen() { // ,       Value *V = NamedValues[Name]; if (!V) return LogErrorV("Unknown variable name"); //   return Builder.CreateLoad(V, Name.c_str()); } Value *UnaryExprAST::codegen() { Value *OperandV = Operand->codegen(); if (!OperandV) return nullptr; Function *F = getFunction(std::string("unary") + Opcode); if (!F) return LogErrorV("Unknown unary operator"); return Builder.CreateCall(F, OperandV, "unop"); } Value *BinaryExprAST::codegen() { //    '=', ..     LHS   if (Op == '=') { //  ,  LHS   // ,     RTTI, .. LLVM   //  .    LLVM  RTTI     // dynamic_cast    . VariableExprAST *LHSE = static_cast<VariableExprAST *>(LHS.get()); if (!LHSE) return LogErrorV("destination of '=' must be a variable"); //   RHS. Value *Val = RHS->codegen(); if (!Val) return nullptr; //   Value *Variable = NamedValues[LHSE->getName()]; if (!Variable) return LogErrorV("Unknown variable name"); Builder.CreateStore(Val, Variable); return Val; } Value *L = LHS->codegen(); Value *R = RHS->codegen(); if (!L || !R) return nullptr; switch (Op) { case '+': return Builder.CreateFAdd(L, R, "addtmp"); case '-': return Builder.CreateFSub(L, R, "subtmp"); case '*': return Builder.CreateFMul(L, R, "multmp"); case '<': L = Builder.CreateFCmpULT(L, R, "cmptmp"); //  bool 0/1  double 0.0 or 1.0 return Builder.CreateUIToFP(L, Type::getDoubleTy(TheContext), "booltmp"); default: break; } //      ,     .  //    . Function *F = getFunction(std::string("binary") + Op); assert(F && "binary operator not found!"); Value *Ops[] = {L, R}; return Builder.CreateCall(F, Ops, "binop"); } Value *CallExprAST::codegen() { //       Function *CalleeF = getFunction(Callee); if (!CalleeF) return LogErrorV("Unknown function referenced"); // ,    . if (CalleeF->arg_size() != Args.size()) return LogErrorV("Incorrect # arguments passed"); std::vector<Value *> ArgsV; for (unsigned i = 0, e = Args.size(); i != e; ++i) { ArgsV.push_back(Args[i]->codegen()); if (!ArgsV.back()) return nullptr; } return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); } Value *IfExprAST::codegen() { Value *CondV = Cond->codegen(); if (!CondV) return nullptr; //          0.0. CondV = Builder.CreateFCmpONE( CondV, ConstantFP::get(TheContext, APFloat(0.0)), "ifcond"); Function *TheFunction = Builder.GetInsertBlock()->getParent(); //    then  else.   'then'  //   BasicBlock *ThenBB = BasicBlock::Create(TheContext, "then", TheFunction); BasicBlock *ElseBB = BasicBlock::Create(TheContext, "else"); BasicBlock *MergeBB = BasicBlock::Create(TheContext, "ifcont"); Builder.CreateCondBr(CondV, ThenBB, ElseBB); //  . Builder.SetInsertPoint(ThenBB); Value *ThenV = Then->codegen(); if (!ThenV) return nullptr; Builder.CreateBr(MergeBB); //    'Then'    ,  ThenBB  PHI. ThenBB = Builder.GetInsertBlock(); //   "else" TheFunction->getBasicBlockList().push_back(ElseBB); Builder.SetInsertPoint(ElseBB); Value *ElseV = Else->codegen(); if (!ElseV) return nullptr; Builder.CreateBr(MergeBB); //    'Else'    ,  ElseBB  PHI. ElseBB = Builder.GetInsertBlock(); //    TheFunction->getBasicBlockList().push_back(MergeBB); Builder.SetInsertPoint(MergeBB); PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(TheContext), 2, "iftmp"); PN->addIncoming(ThenV, ThenBB); PN->addIncoming(ElseV, ElseBB); return PN; } //  for-loop : // var = alloca double // ... // start = startexpr // store start -> var // goto loop // loop: // ... // bodyexpr // ... // loopend: // step = stepexpr // endcond = endexpr // // curvar = load var // nextvar = curvar + step // store nextvar -> var // br endcond, loop, endloop // outloop: Value *ForExprAST::codegen() { Function *TheFunction = Builder.GetInsertBlock()->getParent(); //   alloca      AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); //    ,  'variable'    Value *StartVal = Start->codegen(); if (!StartVal) return nullptr; //    alloca. Builder.CreateStore(StartVal, Alloca); //       ,     // . BasicBlock *LoopBB = BasicBlock::Create(TheContext, "loop", TheFunction); //        LoopBB. Builder.CreateBr(LoopBB); //    LoopBB. Builder.SetInsertPoint(LoopBB); //  ,    PHI-.   //   ,    ,   . AllocaInst *OldVal = NamedValues[VarName]; NamedValues[VarName] = Alloca; //   . ,    ,   //  BB. ,    ,   ,   //  . if (!Body->codegen()) return nullptr; //    Value *StepVal = nullptr; if (Step) { StepVal = Step->codegen(); if (!StepVal) return nullptr; } else { //   ,  1.0. StepVal = ConstantFP::get(TheContext, APFloat(1.0)); } //    Value *EndCond = End->codegen(); if (!EndCond) return nullptr; // , ,   alloca.   //        Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str()); Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar"); Builder.CreateStore(NextVar, Alloca); //          0.0. EndCond = Builder.CreateFCmpONE( EndCond, ConstantFP::get(TheContext, APFloat(0.0)), "loopcond"); //   " "    BasicBlock *AfterBB = BasicBlock::Create(TheContext, "afterloop", TheFunction); //      LoopEndBB. Builder.CreateCondBr(EndCond, LoopBB, AfterBB); //       AfterBB. Builder.SetInsertPoint(AfterBB); //    . if (OldVal) NamedValues[VarName] = OldVal; else NamedValues.erase(VarName); //    0.0. return Constant::getNullValue(Type::getDoubleTy(TheContext)); } Value *VarExprAST::codegen() { std::vector<AllocaInst *> OldBindings; Function *TheFunction = Builder.GetInsertBlock()->getParent(); //       for (unsigned i = 0, e = VarNames.size(); i != e; ++i) { const std::string &VarName = VarNames[i].first; ExprAST *Init = VarNames[i].second.get(); //     ,   //      ,   // l  : // var a = 1 in // var a = a in ... #    'a'. Value *InitVal; if (Init) { InitVal = Init->codegen(); if (!InitVal) return nullptr; } else { // If not specified, use 0.0. InitVal = ConstantFP::get(TheContext, APFloat(0.0)); } AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); Builder.CreateStore(InitVal, Alloca); //    OldBindings.push_back(NamedValues[VarName]); //   NamedValues[VarName] = Alloca; } //       Value *BodyVal = Body->codegen(); if (!BodyVal) return nullptr; //    for (unsigned i = 0, e = VarNames.size(); i != e; ++i) NamedValues[VarNames[i].first] = OldBindings[i]; //     return BodyVal; } Function *PrototypeAST::codegen() { //   : double(double,double) etc. std::vector<Type *> Doubles(Args.size(), Type::getDoubleTy(TheContext)); FunctionType *FT = FunctionType::get(Type::getDoubleTy(TheContext), Doubles, false); Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule.get()); //    unsigned Idx = 0; for (auto &Arg : F->args()) Arg.setName(Args[Idx++]); return F; } Function *FunctionAST::codegen() { //      FunctionProtos,   //    . auto &P = *Proto; FunctionProtos[Proto->getName()] = std::move(Proto); Function *TheFunction = getFunction(P.getName()); if (!TheFunction) return nullptr; //   ,   if (P.isBinaryOp()) BinopPrecedence[P.getOperatorName()] = P.getBinaryPrecedence(); //        BasicBlock *BB = BasicBlock::Create(TheContext, "entry", TheFunction); Builder.SetInsertPoint(BB); //      NamedValues. NamedValues.clear(); for (auto &Arg : TheFunction->args()) { //   alloca   AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, Arg.getName()); //     alloca. Builder.CreateStore(&Arg, Alloca); //      NamedValues[Arg.getName()] = Alloca; } if (Value *RetVal = Body->codegen()) { // Finish off the function. Builder.CreateRet(RetVal); //    verifyFunction(*TheFunction); //     TheFPM->run(*TheFunction); return TheFunction; } //    ,   TheFunction->eraseFromParent(); if (P.isBinaryOp()) BinopPrecedence.erase(P.getOperatorName()); return nullptr; } //===----------------------------------------------------------------------===// //     JIT //===----------------------------------------------------------------------===// static void InitializeModuleAndPassManager() { //    TheModule = llvm::make_unique<Module>("my cool jit", TheContext); TheModule->setDataLayout(TheJIT->getTargetMachine().createDataLayout()); //       TheFPM = llvm::make_unique<legacy::FunctionPassManager>(TheModule.get()); //   alloca   TheFPM->add(createPromoteMemoryToRegisterPass()); //   "peephole"-. TheFPM->add(createInstructionCombiningPass()); //   TheFPM->add(createReassociatePass()); //    TheFPM->add(createGVNPass()); //    (    ..). TheFPM->add(createCFGSimplificationPass()); TheFPM->doInitialization(); } static void HandleDefinition() { if (auto FnAST = ParseDefinition()) { if (auto *FnIR = FnAST->codegen()) { fprintf(stderr, "Read function definition:"); FnIR->print(errs()); fprintf(stderr, "\n"); TheJIT->addModule(std::move(TheModule)); InitializeModuleAndPassManager(); } } else { //       getNextToken(); } } static void HandleExtern() { if (auto ProtoAST = ParseExtern()) { if (auto *FnIR = ProtoAST->codegen()) { fprintf(stderr, "Read extern: "); FnIR->print(errs()); fprintf(stderr, "\n"); FunctionProtos[ProtoAST->getName()] = std::move(ProtoAST); } } else { //       getNextToken(); } } static void HandleTopLevelExpression() { //       if (auto FnAST = ParseTopLevelExpr()) { if (FnAST->codegen()) { // JIT  ,    //    auto H = TheJIT->addModule(std::move(TheModule)); InitializeModuleAndPassManager(); //  JIT   __anon_expr auto ExprSymbol = TheJIT->findSymbol("__anon_expr"); assert(ExprSymbol && "Function not found"); //         (  // ,  double)      . double (*FP)() = (double (*)())(intptr_t)cantFail(ExprSymbol.getAddress()); fprintf(stderr, "Evaluated to %f\n", FP()); //     JIT. TheJIT->removeModule(H); } } else { //       getNextToken(); } } /// top ::= definition | external | expression | ';' static void MainLoop() { while (true) { fprintf(stderr, "ready> "); switch (CurTok) { case tok_eof: return; case ';': //       getNextToken(); break; case tok_def: HandleDefinition(); break; case tok_extern: HandleExtern(); break; default: HandleTopLevelExpression(); break; } } } //===----------------------------------------------------------------------===// // "" ,        //===----------------------------------------------------------------------===// #ifdef LLVM_ON_WIN32 #define DLLEXPORT __declspec(dllexport) #else #define DLLEXPORT #endif /// putchard - putchar,  double,  0. extern "C" DLLEXPORT double putchard(double X) { fputc((char)X, stderr); return 0; } /// printd - printf,  double   "%f\n",  0. extern "C" DLLEXPORT double printd(double X) { fprintf(stderr, "%f\n", X); return 0; } //===----------------------------------------------------------------------===// //  main //===----------------------------------------------------------------------===// int main() { InitializeNativeTarget(); InitializeNativeTargetAsmPrinter(); InitializeNativeTargetAsmParser(); //     // 1 -    BinopPrecedence['='] = 2; BinopPrecedence['<'] = 10; BinopPrecedence['+'] = 20; BinopPrecedence['-'] = 20; BinopPrecedence['*'] = 40; //  . //    fprintf(stderr, "ready> "); getNextToken(); TheJIT = llvm::make_unique<KaleidoscopeJIT>(); InitializeModuleAndPassManager(); //    MainLoop(); return 0; }

Source: https://habr.com/ru/post/336468/

All Articles