📜 ⬆️ ⬇️

Creating a programming language using LLVM. Part 5: Language Expansion: Control Flow

Welcome to Chapter 5 of the tutorial, Creating a Programming Language with LLVM. The previous chapters ( 1st , 2nd, 3rd , and 4th ) described the implementation of a simple Kaleidoscope programming language and the inclusion of support for generating LLVM IR, as well as subsequent optimization and JIT compilation. Unfortunately, the Kaleidoscope is almost useless in its current form: it has no control flow, with the exception of calls and returns. This means that there can be no conditional jumps in the code, which significantly limits the programming language. In this chapter, we extend the Kaleidoscope by adding the expression if/then/else and a simple "for" loop.

If / Then / Else


Extending Kaleidoscope to using if/then/else is a fairly simple task. It is required to add support for this “new” concept to the lexical and syntactic analyzers, AST and code generator. This is a good example because it shows how easy it is to “build up” a language over time, gradually expanding it as new ideas appear.

Before we add this extension, you need to talk about what we want to get in the end. And we want to be able to write something like this:

 def fib(x) if x < 3 then 1 else fib(x-1)+fib(x-2); 

In Kaleidoscope, each construct is an expression. Thus, an if/then/else expression, like any others, should return a value. Since we mainly use the functional form, we must calculate the condition and then return the value of "then" or "else" depending on its value. This is very similar to the C "?:" Expression.
')
The semantics of the if/then/else expression is that it evaluates the condition as a logical value: 0.0 is considered false, and everything else is true. If the condition is true, the first subexpression is calculated and returned; if the condition is false, the second subexpression is calculated and returned. Since Kaleidoscope allows for side effects, it is important to fix this behavior.

Now that we know what we want, we will begin to sort things out in parts.

Refinement of the lexical analyzer to support If / Then / Else

Everything is simple here. First, add new values ​​for the corresponding tokens to the enumeration:

  //  tok_if = -6, tok_then = -7, tok_else = -8, 

Now we recognize new keywords in lexical analysis. It's easy too:

  ... if (IdentifierStr == "def") return tok_def; if (IdentifierStr == "extern") return tok_extern; 
  if (IdentifierStr == "if") return tok_if; if (IdentifierStr == "then") return tok_then; if (IdentifierStr == "else") return tok_else; 
  return tok_identifier; 


Refine AST to support If / Then / Else

To introduce a new kind of expression, add a new AST node:

 /// IfExprAST -     if/then/else. class IfExprAST : public ExprAST { ExprAST *Cond, *Then, *Else; public: IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else) : Cond(cond), Then(then), Else(_else) {} virtual Value *Codegen(); }; 

The AST node has pointers to various subexpressions.

Refinement of the parser to support If / Then / Else

Now, when in lexical analysis we have the corresponding tokens and there is an AST node, the parsing logic will be very simple. First, let's define a new parsing function:

 /// ifexpr ::= 'if' expression 'then' expression 'else' expression static ExprAST *ParseIfExpr() { getNextToken(); //  if. // . ExprAST *Cond = ParseExpression(); if (!Cond) return 0; if (CurTok != tok_then) return Error("expected then"); getNextToken(); //  then ExprAST *Then = ParseExpression(); if (Then == 0) return 0; if (CurTok != tok_else) return Error("expected else"); getNextToken(); ExprAST *Else = ParseExpression(); if (!Else) return 0; return new IfExprAST(Cond, Then, Else); } 

Next, connect it as the primary expression:

 static ExprAST *ParsePrimary() { switch (CurTok) { default: return Error("unknown token when expecting an expression"); case tok_identifier: return ParseIdentifierExpr(); case tok_number: return ParseNumberExpr(); case '(': return ParseParenExpr(); 
  case tok_if: return ParseIfExpr(); 
  } } 


LLVM IR for If / Then / Else

Now that we have parsing and building an AST, the final part will be adding support for LLVM code generation. This is the most interesting part in adding if/then/else , because here we will introduce new concepts. Everything in the above code was described in detail in previous chapters.

Let's look at a simple example:

 extern foo(); extern bar(); 
 def baz(x) if x then foo() else bar(); 

If you disable optimization, the code you receive (soon) from Kaleidoscope looks like this:

 declare double @foo() declare double @bar() define double @baz(double %x) { entry: %ifcond = fcmp one double %x, 0.000000e+00 br i1 %ifcond, label %then, label %else then: ; preds = %entry %calltmp = call double @foo() br label %ifcont else: ; preds = %entry %calltmp1 = call double @bar() br label %ifcont ifcont: ; preds = %else, %then %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ] ret double %iftmp } 

To visualize the flow control graph, you can use the excellent feature of the LLVM " opt " tool. If you put this LLVM IR in "t.ll" and run "llvm-as < t.ll | opt -analyze -view-cfg" , a window will appear in which you will see this graph:

image

Another way to get it is to call "F->viewCFG()" or "F->viewCFGOnly()" (where F is "Function*" ) by actually including the call in the code and recompiling or calling in the debugger. LLVM has many useful features for visualizing various graphs.

Returning to the generated code, it is quite simple: the input block calculates a conditional expression (in our case "x" ) and compares the result with 0.0 using the instruction "fcmp one" "fcmp one" . Based on the result of this expression, the code goes into either the "then" block or the "else" block, which contain expressions for true / false cases.

As soon as the then/else blocks are completed, both of these branches go to the "ifcont" block to execute code that is located after if/then/else . In this case, the only thing left to do is to return to the calling function. Then the question arises: how do you know the code of the expression to return?

The answer to this question includes an important operation SSA: operation Phi Phi . If you are not familiar with SSA, then the Wikipedia article will be a good introduction, as well as there are other introductory articles about it that are easily found by your favorite search engine. The short version: the “execution” of the Phi operation requires the “memorization” of the block from which we came. The operation Phi takes values ​​and the corresponding input control units. In this case, if the control comes from the "then" block, it gets the value "calltmp" . If control comes from an "else" block, it gets the value "calltmp1" .

Now you are probably starting to think, “Oh, no! This means that my simple and elegant front end will have to start generating the SSA form in order to use LLVM! ” Fortunately, this is not the case, and we strongly advise NOT to implement the SSA algorithm in your front end, unless there is a truly important reason for this. In practice, there are two kinds of values ​​in code written for the average imperative programming language that might require Phi nodes:
  1. Code providing custom variables: = 1; = + 1; = 1; = + 1;
  2. Values ​​that are implied in the structure of your AST, such as the Phi node in this case.
In Chapter 7 of this tutorial (“mutable variables”), we will talk about # 1. In the meantime, just trust me that you do not need to build SSA to cope with this case. For # 2, you have a choice: use the methods we describe for # 1, or you can directly insert Phi nodes if it is convenient for you. In this case, it's really very easy to generate Phi nodes, so we decided to do just that.

Okay, time to generate the code!

If / Then / Else Code Generation

For code generation, we implement the IfExprAST method for IfExprAST :

 Value *IfExprAST::Codegen() { Value *CondV = Cond->Codegen(); if (CondV == 0) return 0; //       0.0. CondV = Builder.CreateFCmpONE(CondV, ConstantFP::get(getGlobalContext(), APFloat(0.0)), "ifcond"); 

This code is simple and similar to what we saw before. We get an expression for the condition, and then compare this value with zero to get its truth as a 1-bit (bool) value.

  Function *TheFunction = Builder.GetInsertBlock()->getParent(); //     then  else.   'then'  //  . BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction); BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else"); BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont"); Builder.CreateCondBr(CondV, ThenBB, ElseBB); 

This code creates base blocks that are related to if/then/else and correspond to the blocks in the example above. The first line gets the current Function object (the function being formed). She gets it by asking Builder about the current base unit and getting its “parent” (current function).

Then he creates three blocks. Notice that it passes "TheFunction" to the constructor of the "then" block. This causes the constructor to automatically insert a new block at the end of the specified function. Two other blocks are also created, but not yet inserted into the function.

After creating the blocks, we can generate a conditional transition that chooses between them. Note that the implicit creation of new blocks does not affect the IRBuilder in IRBuilder , so the insertion still occurs in the same block in which we are included. Also note that it creates a branch for the "then" block and the "else" block, although the "else" block has not yet been inserted into the function. But everything is in order: this is the standard way to support LLVM links "forward".

  //  . Builder.SetInsertPoint(ThenBB); Value *ThenV = Then->Codegen(); if (ThenV == 0) return 0; Builder.CreateBr(MergeBB); // Codegen of 'Then'    ,  ThenBB  PHI. ThenBB = Builder.GetInsertBlock(); 

After inserting a conditional transition, we tell Builder to begin inserting into the "then" block. Strictly speaking, this call moves the cursor to the end of the specified block. However, since the "then" block is empty, it also begins our insertion at the beginning of the block. :)

Once the cursor is set, we recursively call the code generation of the "then" expression from the AST. To complete the block "then" , we create an unconditional transition to the block merging branches. One of the interesting (and very important) aspects of LLVM IR is that it requires all the basic blocks to “complete” with a control flow statement , such as a return or branch. This means that the entire control flow, including the transfer of control to another code ( fall throughs ), must be explicitly present in the LLVM IR. If this rule is violated, the verifier will return an error.

The last line is important here. When creating the Phi node in the merge block of the branches, we must specify the block / value pairs that indicate how Phi will work. It is important to note that the Phi node expects to get an input point for each block predecessor in the CFG. Why, then, do we get the current block when we just set it in ThenBB five lines higher? The problem is that the expression "then" may in fact itself change the block that Builder generates, for example, if it contains nested expressions "if/then/else" . Since the recursive call of Codegen can arbitrarily change the current block, we need to get the current actual values ​​for the code to be created in the Phi node.

  //   else. TheFunction->getBasicBlockList().push_back(ElseBB); Builder.SetInsertPoint(ElseBB); Value *ElseV = Else->Codegen(); if (ElseV == 0) return 0; Builder.CreateBr(MergeBB); //  'Else'    ,  ElseBB  PHI. ElseBB = Builder.GetInsertBlock(); 

Code generation for the "else" block is almost identical to code generation for the "then" block. The only significant difference in the first line is that it adds an "else" block to the function. Recall that earlier the "else" block was created, but not added to the function. Now that the "then" and "else" blocks have been generated, we can end up with the merge code:

  //   . TheFunction->getBasicBlockList().push_back(MergeBB); Builder.SetInsertPoint(MergeBB); PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, "iftmp"); PN->addIncoming(ThenV, ThenBB); PN->addIncoming(ElseV, ElseBB); return PN; } 

The first two lines are already familiar to you: the first one adds the "merge" block for the function object. The second block moves the insertion point so that the newly created code will be inserted into the "merge" block. Then we need to create a PHI node and set the block / value pairs for the PHI .

Finally, the CodeGen function returns the Phi node as the value calculated for the if/then/else expression. In our example above, this return value will be passed to the top level function code, which will create a return instruction.

In general, we now have the ability to execute conditional code in Kaleidoscope. With this extension, Kaleidoscope is a fairly comprehensive programming language that can calculate a wide range of numeric functions. Now we will add another useful expression that is familiar to us from non-functional programming languages ​​...

'For' loop


Now that we know how to add basic control constructs to a language, we have a tool for adding more powerful things. Let's try to add something more aggressive - the expression "for" :

  extern putchard(char) 
  def printstar(n) for i = 1, i < n, 1.0 in putchard(42); # ascii 42 = '*' #  100  '*' printstar(100); 

This expression defines a new variable (in this case, "i" ), which is iterated from the initial value until the condition (in this case, "i < n" ) becomes true, increasing by the step value (in this case, "1.0" ). If the step value is omitted, then by default it is equal to 1.0 . The loop executes its body expression. Since we have nothing better to return, we will simply define the cycle as always returning 0.0 . Later, when we add mutable variables, it will be more useful.

As in the previous case, consider the changes that we need Kaleidoscope in to support the cycles, in parts.

Refinement of the lexical analyzer to support the 'for' cycle

The lexical analyzer is supplemented in the same way as in the case of if/then/else :

  ...   Token ... //  tok_if = -6, tok_then = -7, tok_else = -8, 
  tok_for = -9, tok_in = -10 
 ... in gettok ... if (IdentifierStr == "def") return tok_def; if (IdentifierStr == "extern") return tok_extern; if (IdentifierStr == "if") return tok_if; if (IdentifierStr == "then") return tok_then; if (IdentifierStr == "else") return tok_else; 
  if (IdentifierStr == "for") return tok_for; if (IdentifierStr == "in") return tok_in; 
  return tok_identifier; 


AST revision to support the 'for' loop

The AST node is also simple. It stores in its nodes the name of the variable and the expression of the loop body.

 /// ForExprAST -     for/in. class ForExprAST : public ExprAST { std::string VarName; ExprAST *Start, *End, *Step, *Body; public: ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end, ExprAST *step, ExprAST *body) : VarName(varname), Start(start), End(end), Step(step), Body(body) {} virtual Value *Codegen(); }; 


Revision of the parser to support the 'for' loop

Changes in the parser code are also fairly standard. The only interesting point here is the handling of the optional step value. The parser processes it by checking if the second comma is present. If not, it sets the step value to null in the AST node:

 /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression static ExprAST *ParseForExpr() { getNextToken(); //  for. if (CurTok != tok_identifier) return Error("expected identifier after for"); std::string IdName = IdentifierStr; getNextToken(); //  . if (CurTok != '=') return Error("expected '=' after for"); getNextToken(); //  '='. ExprAST *Start = ParseExpression(); if (Start == 0) return 0; if (CurTok != ',') return Error("expected ',' after for start value"); getNextToken(); ExprAST *End = ParseExpression(); if (End == 0) return 0; //   . ExprAST *Step = 0; if (CurTok == ',') { getNextToken(); Step = ParseExpression(); if (Step == 0) return 0; } if (CurTok != tok_in) return Error("expected 'in' after for"); getNextToken(); //  'in'. ExprAST *Body = ParseExpression(); if (Body == 0) return 0; return new ForExprAST(IdName, Start, End, Step, Body); } 


LLVM IR for the 'for' loop

Now we come to the interesting part: let's start the generation of LLVM IR. For our example above, we get the following LLVM IR (note that this dump is generated for clarity with optimization turned off):

 declare double @putchard(double) define double @printstar(double %n) { entry: ; initial value = 1.0 (inlined into phi) br label %loop loop: ; preds = %loop, %entry %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ] ;  %calltmp = call double @putchard(double 4.200000e+01) ;  %nextvar = fadd double %i, 1.000000e+00 ;  %cmptmp = fcmp ult double %i, %n %booltmp = uitofp i1 %cmptmp to double %loopcond = fcmp one double %booltmp, 0.000000e+00 br i1 %loopcond, label %loop, label %afterloop afterloop: ; preds = %loop ; loop always returns 0.0 ret double 0.000000e+00 } 

This loop contains all the same constructs that we saw before: the phi node, several expressions, and some basic blocks. Let's see how it fits together.

Code Generation for a 'for' Loop

The first part of Codegen very simple: we just get the expression for the initial value of the loop:

 Value *ForExprAST::Codegen() { // Emit the start code first, without 'variable' in scope. Value *StartVal = Start->Codegen(); if (StartVal == 0) return 0; 

The next step is to create the base block LLVM to start the loop body. In the above case, the whole body of the loop is one block, but remember that the body code itself may consist of several blocks (for example, if it contains if/then/else or for/in expressions).

  //       ,    //  . Function *TheFunction = Builder.GetInsertBlock()->getParent(); BasicBlock *PreheaderBB = Builder.GetInsertBlock(); BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction); //        LoopBB. Builder.CreateBr(LoopBB); 

This code is similar to the one we saw for if/then/else . We remember the block from which we got into the loop, since we will need it to create the node Phi . Then we create the actual block that starts the loop, and create an unconditional transition between the two blocks.

  //    LoopBB. Builder.SetInsertPoint(LoopBB); //   PHI    Start. PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, VarName.c_str()); Variable->addIncoming(StartVal, PreheaderBB); 

Now that the "preheader" header for the loop is set, we proceed to generate the loop body code. First, we move the cursor and create a PHI node for the loop variable. Since we already know the initial value, we add it to the Phi node. Please note that Phi will eventually get the second value, but we cannot yet install it (because it does not exist yet!).

  //       PHI.    //  ,     ,  . Value *OldVal = NamedValues[VarName]; NamedValues[VarName] = Variable; //   . ,      ,   //   .  ,    , //  ,     . if (Body->Codegen() == 0) return 0; 

Now the code becomes more interesting. Our "for" loop introduces a new variable into the symbol table. This means that our symbol table can contain either function arguments or loop variables. Before we generate the loop body code, we add the loop variable as the current value for its name. Please note - it is possible that there is already a variable with the same name in the outer scope. You can easily make this an error (return an error and return null if there is already an entry for VarName), but we will allow the overlapping of variables. For proper processing, we memorize the potentially overlapping value in OldVal (which will be null if there is no overlapping variable).

After setting the loop variable in the symbol table, code is recursively generated in the body. This allows the body to use a loop variable: any references to it are naturally found in the symbol table.

  //   . Value *StepVal; if (Step) { StepVal = Step->Codegen(); if (StepVal == 0) return 0; } else { //   ,  1.0. StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0)); } Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar"); 

When the body is generated, we calculate the next value of the iteration variable by adding the step value or 1.0, if it is not specified. "NextVar"will be the value of the loop variable in the next loop step.

  //      . Value *EndCond = End->Codegen(); if (EndCond == 0) return EndCond; //       0.0. EndCond = Builder.CreateFCmpONE(EndCond, ConstantFP::get(getGlobalContext(), APFloat(0.0)), "loopcond"); 

Finally, we calculate the loop out value to determine if the loop should be completed. This is almost the same as the condition calculation in if/then/else.

  //   " "   . BasicBlock *LoopEndBB = Builder.GetInsertBlock(); BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction); //      LoopEndBB. Builder.CreateCondBr(EndCond, LoopBB, AfterBB); //       AfterBB. Builder.SetInsertPoint(AfterBB); 

With the complete loop body code, we only need to complete the control flow for it. This code remembers the end of a block (for a node phi), and then creates a block to exit the loop ( "afterloop"). Based on the value of the exit condition, it creates a conditional transition that chooses between executing the loop again or exiting the loop. Any subsequent code is added to the block "afterloop", so we move the insertion point exactly there.

  //      PHI. Variable->addIncoming(NextVar, LoopEndBB); //   . if (OldVal) NamedValues[VarName] = OldVal; else NamedValues.erase(VarName); //  for   0.0. return Constant::getNullValue(Type::getDoubleTy(getGlobalContext())); } 

The final code does the cleanup: we now have a value "NextVar"and we can add the loop's PHI value to the node. After that, we delete the loop variable from the symbol table, so it is not in scope after the loop. Finally, the code generated for the loop always returns 0.0, so we return from ForExprAST::Codegen.

This concludes the tutorial chapter "Adding a control flow to Kaleidoscope". In this chapter, we looked at a couple of aspects of LLVM IR that are important for front-end developers to know. In the next chapter of our saga, we will be a little crazier and add user-defined operators for our poor innocent programming language.

Full code listing


Here is the full code listing, extended expressions if/then/elseand for... Going as follows:

#
g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
#
./toy

And the code itself:

 #include "llvm/DerivedTypes.h" #include "llvm/ExecutionEngine/ExecutionEngine.h" #include "llvm/ExecutionEngine/JIT.h" #include "llvm/LLVMContext.h" #include "llvm/Module.h" #include "llvm/PassManager.h" #include "llvm/Analysis/Verifier.h" #include "llvm/Analysis/Passes.h" #include "llvm/Target/TargetData.h" #include "llvm/Target/TargetSelect.h" #include "llvm/Transforms/Scalar.h" #include "llvm/Support/IRBuilder.h" #include <cstdio> #include <string> #include <map> #include <vector> using namespace llvm; //===----------------------------------------------------------------------===// // Lexer ( ) //===----------------------------------------------------------------------===// //     [0-255],   , //       enum Token { tok_eof = -1, //  ( ) tok_def = -2, tok_extern = -3, //  ( : , ) tok_identifier = -4, tok_number = -5, //   tok_if = -6, tok_then = -7, tok_else = -8, tok_for = -9, tok_in = -10 }; static std::string IdentifierStr; // ,  tok_identifier static double NumVal; // ,  tok_number /// gettok -       . static int gettok() { static int LastChar = ' '; //  . while (isspace(LastChar)) LastChar = getchar(); if (isdigit(LastChar) || LastChar == '.') { // : [0-9.]+ IdentifierStr = LastChar; while (isalnum((LastChar = getchar()))) IdentifierStr += LastChar; if (IdentifierStr == "def") return tok_def; if (IdentifierStr == "extern") return tok_extern; if (IdentifierStr == "if") return tok_if; if (IdentifierStr == "then") return tok_then; if (IdentifierStr == "else") return tok_else; if (IdentifierStr == "for") return tok_for; if (IdentifierStr == "in") return tok_in; return tok_identifier; } if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ std::string NumStr; do { NumStr += LastChar; LastChar = getchar(); } while (isdigit(LastChar) || LastChar == '.'); NumVal = strtod(NumStr.c_str(), 0); return tok_number; } if (LastChar == '#') { //     do LastChar = getchar(); while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); if (LastChar != EOF) return gettok(); } //   . if (LastChar == EOF) return tok_eof; //         ASCII int ThisChar = LastChar; LastChar = getchar(); return ThisChar; } //===----------------------------------------------------------------------===// // Abstract Syntax Tree (     ) //===----------------------------------------------------------------------===// /// ExprAST -      . class ExprAST { public: virtual ~ExprAST() {} virtual Value *Codegen() = 0; }; /// NumberExprAST -       (, "1.0"). class NumberExprAST : public ExprAST { double Val; public: NumberExprAST(double val) : Val(val) {} virtual Value *Codegen(); }; /// VariableExprAST -      (, "a"). class VariableExprAST : public ExprAST { std::string Name; public: VariableExprAST(const std::string &name) : Name(name) {} virtual Value *Codegen(); }; /// BinaryExprAST -      . class BinaryExprAST : public ExprAST { char Op; ExprAST *LHS, *RHS; public: BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) : Op(op), LHS(lhs), RHS(rhs) {} virtual Value *Codegen(); }; /// CallExprAST -      . class CallExprAST : public ExprAST { std::string Callee; std::vector<ExprAST*> Args; public: CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) : Callee(callee), Args(args) {} virtual Value *Codegen(); }; /// IfExprAST -     if/then/else. class IfExprAST : public ExprAST { ExprAST *Cond, *Then, *Else; public: IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else) : Cond(cond), Then(then), Else(_else) {} virtual Value *Codegen(); }; /// ForExprAST -     for/in. class ForExprAST : public ExprAST { std::string VarName; ExprAST *Start, *End, *Step, *Body; public: ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end, ExprAST *step, ExprAST *body) : VarName(varname), Start(start), End(end), Step(step), Body(body) {} virtual Value *Codegen(); }; /// PrototypeAST -    ""  , ///        (,  , ///    ). class PrototypeAST { std::string Name; std::vector<std::string> Args; public: PrototypeAST(const std::string &name, const std::vector<std::string> &args) : Name(name), Args(args) {} Function *Codegen(); }; /// FunctionAST -     class FunctionAST { PrototypeAST *Proto; ExprAST *Body; public: FunctionAST(PrototypeAST *proto, ExprAST *body) : Proto(proto), Body(body) {} Function *Codegen(); }; //===----------------------------------------------------------------------===// // Parser (   ) //===----------------------------------------------------------------------===// /// CurTok/getNextToken -    . CurTok -   /// ,  . getNextToken     ///     CurTok. static int CurTok; static int getNextToken() { return CurTok = gettok(); } /// BinopPrecedence -      static std::map<char, int> BinopPrecedence; /// GetTokPrecedence -     . static int GetTokPrecedence() { if (!isascii(CurTok)) return -1; // ,     . int TokPrec = BinopPrecedence[CurTok]; if (TokPrec <= 0) return -1; return TokPrec; } /// Error* -       . ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } static ExprAST *ParseExpression(); /// identifierexpr /// ::= identifier /// ::= identifier '(' expression* ')' static ExprAST *ParseIdentifierExpr() { std::string IdName = IdentifierStr; getNextToken(); //  . if (CurTok != '(') //  . return new VariableExprAST(IdName); //  . getNextToken(); //  ( std::vector<ExprAST*> Args; if (CurTok != ')') { while (1) { ExprAST *Arg = ParseExpression(); if (!Arg) return 0; Args.push_back(Arg); if (CurTok == ')') break; if (CurTok != ',') return Error("Expected ')' or ',' in argument list"); getNextToken(); } } //  ')'. getNextToken(); return new CallExprAST(IdName, Args); } /// numberexpr ::= number static ExprAST *ParseNumberExpr() { ExprAST *Result = new NumberExprAST(NumVal); getNextToken(); //   return Result; } /// parenexpr ::= '(' expression ')' static ExprAST *ParseParenExpr() { getNextToken(); //  (. ExprAST *V = ParseExpression(); if (!V) return 0; if (CurTok != ')') return Error("expected ')'"); getNextToken(); //  ). return V; } /// ifexpr ::= 'if' expression 'then' expression 'else' expression static ExprAST *ParseIfExpr() { getNextToken(); //  if. // . ExprAST *Cond = ParseExpression(); if (!Cond) return 0; if (CurTok != tok_then) return Error("expected then"); getNextToken(); //  then ExprAST *Then = ParseExpression(); if (Then == 0) return 0; if (CurTok != tok_else) return Error("expected else"); getNextToken(); ExprAST *Else = ParseExpression(); if (!Else) return 0; return new IfExprAST(Cond, Then, Else); } /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression static ExprAST *ParseForExpr() { getNextToken(); //  for. if (CurTok != tok_identifier) return Error("expected identifier after for"); std::string IdName = IdentifierStr; getNextToken(); //  . if (CurTok != '=') return Error("expected '=' after for"); getNextToken(); //  '='. ExprAST *Start = ParseExpression(); if (Start == 0) return 0; if (CurTok != ',') return Error("expected ',' after for start value"); getNextToken(); ExprAST *End = ParseExpression(); if (End == 0) return 0; //   . ExprAST *Step = 0; if (CurTok == ',') { getNextToken(); Step = ParseExpression(); if (Step == 0) return 0; } if (CurTok != tok_in) return Error("expected 'in' after for"); getNextToken(); //  'in'. ExprAST *Body = ParseExpression(); if (Body == 0) return 0; return new ForExprAST(IdName, Start, End, Step, Body); } /// primary /// ::= identifierexpr /// ::= numberexpr /// ::= parenexpr /// ::= ifexpr /// ::= forexpr static ExprAST *ParsePrimary() { switch (CurTok) { default: return Error("unknown token when expecting an expression"); case tok_identifier: return ParseIdentifierExpr(); case tok_number: return ParseNumberExpr(); case '(': return ParseParenExpr(); case tok_if: return ParseIfExpr(); case tok_for: return ParseForExpr(); } } /// binoprhs /// ::= ('+' primary)* static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { //    ,    while (1) { int TokPrec = GetTokPrecedence(); //           , //  ,    if (TokPrec < ExprPrec) return LHS; // ,  ,    . int BinOp = CurTok; getNextToken(); //    //       ExprAST *RHS = ParsePrimary(); if (!RHS) return 0; //  BinOp   RHS  ,    RHS, //      RHS  LHS. int NextPrec = GetTokPrecedence(); if (TokPrec < NextPrec) { RHS = ParseBinOpRHS(TokPrec+1, RHS); if (RHS == 0) return 0; } //  LHS/RHS. LHS = new BinaryExprAST(BinOp, LHS, RHS); } } /// expression /// ::= primary binoprhs /// static ExprAST *ParseExpression() { ExprAST *LHS = ParsePrimary(); if (!LHS) return 0; return ParseBinOpRHS(0, LHS); } /// prototype /// ::= id '(' id* ')' static PrototypeAST *ParsePrototype() { if (CurTok != tok_identifier) return ErrorP("Expected function name in prototype"); //    . std::string FnName = IdentifierStr; getNextToken(); if (CurTok != '(') return ErrorP("Expected '(' in prototype"); std::vector<std::string> ArgNames; while (getNextToken() == tok_identifier) ArgNames.push_back(IdentifierStr); if (CurTok != ')') return ErrorP("Expected ')' in prototype"); //  . getNextToken(); //  ')'. return new PrototypeAST(FnName, ArgNames); } /// definition ::= 'def' prototype expression static FunctionAST *ParseDefinition() { getNextToken(); //  def. PrototypeAST *Proto = ParsePrototype(); if (Proto == 0) return 0; if (ExprAST *E = ParseExpression()) return new FunctionAST(Proto, E); return 0; } /// toplevelexpr ::= expression static FunctionAST *ParseTopLevelExpr() { if (ExprAST *E = ParseExpression()) { //   . PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); return new FunctionAST(Proto, E); } return 0; } /// external ::= 'extern' prototype static PrototypeAST *ParseExtern() { getNextToken(); //  extern. return ParsePrototype(); } //===----------------------------------------------------------------------===// // Code Generation () //===----------------------------------------------------------------------===// static Module *TheModule; static IRBuilder<> Builder(getGlobalContext()); static std::map<std::string, Value*> NamedValues; static FunctionPassManager *TheFPM; Value *ErrorV(const char *Str) { Error(Str); return 0; } Value *NumberExprAST::Codegen() { return ConstantFP::get(getGlobalContext(), APFloat(Val)); } Value *VariableExprAST::Codegen() { //      . Value *V = NamedValues[Name]; return V ? V : ErrorV("Unknown variable name"); } Value *BinaryExprAST::Codegen() { Value *L = LHS->Codegen(); Value *R = RHS->Codegen(); if (L == 0 || R == 0) return 0; switch (Op) { case '+': return Builder.CreateFAdd(L, R, "addtmp"); case '-': return Builder.CreateFSub(L, R, "subtmp"); case '*': return Builder.CreateFMul(L, R, "multmp"); case '<': L = Builder.CreateFCmpULT(L, R, "cmptmp"); //   0  1   0.0  1.0 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), "booltmp"); default: return ErrorV("invalid binary operator"); } } Value *CallExprAST::Codegen() { //      . Function *CalleeF = TheModule->getFunction(Callee); if (CalleeF == 0) return ErrorV("Unknown function referenced"); //    . if (CalleeF->arg_size() != Args.size()) return ErrorV("Incorrect # arguments passed"); std::vector<Value*> ArgsV; for (unsigned i = 0, e = Args.size(); i != e; ++i) { ArgsV.push_back(Args[i]->Codegen()); if (ArgsV.back() == 0) return 0; } return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp"); } Value *IfExprAST::Codegen() { Value *CondV = Cond->Codegen(); if (CondV == 0) return 0; //       0.0. CondV = Builder.CreateFCmpONE(CondV, ConstantFP::get(getGlobalContext(), APFloat(0.0)), "ifcond"); Function *TheFunction = Builder.GetInsertBlock()->getParent(); //     then  else.   'then'  //  . BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction); BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else"); BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont"); Builder.CreateCondBr(CondV, ThenBB, ElseBB); //  . Builder.SetInsertPoint(ThenBB); Value *ThenV = Then->Codegen(); if (ThenV == 0) return 0; Builder.CreateBr(MergeBB); //  'Then'    ,  ThenBB  PHI. ThenBB = Builder.GetInsertBlock(); //   else. TheFunction->getBasicBlockList().push_back(ElseBB); Builder.SetInsertPoint(ElseBB); Value *ElseV = Else->Codegen(); if (ElseV == 0) return 0; Builder.CreateBr(MergeBB); //  'Else'    ,  ElseBB  PHI. ElseBB = Builder.GetInsertBlock(); //   . TheFunction->getBasicBlockList().push_back(MergeBB); Builder.SetInsertPoint(MergeBB); PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, "iftmp"); PN->addIncoming(ThenV, ThenBB); PN->addIncoming(ElseV, ElseBB); return PN; } Value *ForExprAST::Codegen() { // Output this as: // ... // start = startexpr // goto loop // loop: // variable = phi [start, loopheader], [nextvariable, loopend] // ... // bodyexpr // ... // loopend: // step = stepexpr // nextvariable = variable + step // endcond = endexpr // br endcond, loop, endloop // outloop: // Emit the start code first, without 'variable' in scope. Value *StartVal = Start->Codegen(); if (StartVal == 0) return 0; //       ,    //  . Function *TheFunction = Builder.GetInsertBlock()->getParent(); BasicBlock *PreheaderBB = Builder.GetInsertBlock(); BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction); //         LoopBB. Builder.CreateBr(LoopBB); //    LoopBB. Builder.SetInsertPoint(LoopBB); //   PHI    Start. PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, VarName.c_str()); Variable->addIncoming(StartVal, PreheaderBB); //       PHI.    //  ,     ,  . Value *OldVal = NamedValues[VarName]; NamedValues[VarName] = Variable; //   . ,      ,   //   .  ,    , //  ,     . if (Body->Codegen() == 0) return 0; //   . Value *StepVal; if (Step) { StepVal = Step->Codegen(); if (StepVal == 0) return 0; } else { //   ,  1.0. StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0)); } Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar"); //      . Value *EndCond = End->Codegen(); if (EndCond == 0) return EndCond; //       0.0. EndCond = Builder.CreateFCmpONE(EndCond, ConstantFP::get(getGlobalContext(), APFloat(0.0)), "loopcond"); //   " "   . BasicBlock *LoopEndBB = Builder.GetInsertBlock(); BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction); //      LoopEndBB. Builder.CreateCondBr(EndCond, LoopBB, AfterBB); //       AfterBB. Builder.SetInsertPoint(AfterBB); //      PHI. Variable->addIncoming(NextVar, LoopEndBB); //   . if (OldVal) NamedValues[VarName] = OldVal; else NamedValues.erase(VarName); //  for   0.0. return Constant::getNullValue(Type::getDoubleTy(getGlobalContext())); } Function *PrototypeAST::Codegen() { //   : double(double,double)  .. std::vector<const Type*> Doubles(Args.size(), Type::getDoubleTy(getGlobalContext())); FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), Doubles, false); Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); //   (F)     ,     'Name'. //     ,      . if (F->getName() != Name) { //   ,      . F->eraseFromParent(); F = TheModule->getFunction(Name); //   (F)   , . if (!F->empty()) { ErrorF("redefinition of function"); return 0; } //   (F)    , . if (F->arg_size() != Args.size()) { ErrorF("redefinition of function with different # args"); return 0; } } //     . unsigned Idx = 0; for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); ++AI, ++Idx) { AI->setName(Args[Idx]); //      . NamedValues[Args[Idx]] = AI; } return F; } Function *FunctionAST::Codegen() { NamedValues.clear(); Function *TheFunction = Proto->Codegen(); if (TheFunction == 0) return 0; //      . BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); Builder.SetInsertPoint(BB); if (Value *RetVal = Body->Codegen()) { //  . Builder.CreateRet(RetVal); //   ,    (). verifyFunction(*TheFunction); //  . TheFPM->run(*TheFunction); return TheFunction; } //  ,   . TheFunction->eraseFromParent(); return 0; } //===----------------------------------------------------------------------===// // Top-Level parsing (  )   JIT //===----------------------------------------------------------------------===// static ExecutionEngine *TheExecutionEngine; static void HandleDefinition() { if (FunctionAST *F = ParseDefinition()) { if (Function *LF = F->Codegen()) { fprintf(stderr, "Read function definition:"); LF->dump(); } } else { //      . getNextToken(); } } static void HandleExtern() { if (PrototypeAST *P = ParseExtern()) { if (Function *F = P->Codegen()) { fprintf(stderr, "Read extern: "); F->dump(); } } else { //      . getNextToken(); } } static void HandleTopLevelExpression() { //      . if (FunctionAST *F = ParseTopLevelExpr()) { if (Function *LF = F->Codegen()) { // JIT-,   . void *FPtr = TheExecutionEngine->getPointerToFunction(LF); //     ( ,  double), //       . double (*FP)() = (double (*)())(intptr_t)FPtr; fprintf(stderr, "Evaluated to %f\n", FP()); } } else { //      . getNextToken(); } } /// top ::= definition | external | expression | ';' static void MainLoop() { while (1) { fprintf(stderr, "ready> "); switch (CurTok) { case tok_eof: return; case ';': getNextToken(); break; //     . case tok_def: HandleDefinition(); break; case tok_extern: HandleExtern(); break; default: HandleTopLevelExpression(); break; } } } //===----------------------------------------------------------------------===// // "" ,     //   ("extern")   . //===----------------------------------------------------------------------===// /// putchard -        0. extern "C" double putchard(double X) { putchar((char)X); return 0; } //===----------------------------------------------------------------------===// // Main driver code (  ) //===----------------------------------------------------------------------===// int main() { InitializeNativeTarget(); LLVMContext &Context = getGlobalContext(); //    . // 1 -  . BinopPrecedence['<'] = 10; BinopPrecedence['+'] = 20; BinopPrecedence['-'] = 20; BinopPrecedence['*'] = 40; //  . // Prime the first token. fprintf(stderr, "ready> "); getNextToken(); //  ,     . TheModule = new Module("my cool jit", Context); //  JIT.     . std::string ErrStr; TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create(); if (!TheExecutionEngine) { fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str()); exit(1); } FunctionPassManager OurFPM(TheModule); //   .     ,  //       . OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData())); //   AliasAnalysis  GVN. OurFPM.add(createBasicAliasAnalysisPass()); //   "peephole"  "bit-twiddling". OurFPM.add(createInstructionCombiningPass()); //  . OurFPM.add(createReassociatePass()); //   . OurFPM.add(createGVNPass()); //     (    ..). OurFPM.add(createCFGSimplificationPass()); OurFPM.doInitialization(); //   ,     . TheFPM = &OurFPM; //    " ". MainLoop(); TheFPM = 0; //   . TheModule->dump(); return 0; } 

Source: https://habr.com/ru/post/120881/


All Articles