📜 ⬆️ ⬇️

Creating a programming language using LLVM. Part 9: Add Debugging Information

Table of contents:

Part 1: Introduction and Lexical Analysis
Part 2: Implementing Parser and AST
Part 3: LLVM IR Code Generation
Part 4: Adding JIT and Optimizer Support
Part 5: Language Expansion: Control Flow
Part 6: Language Extension: User Defined Operators
Part 7: Language Expansion: Variable Variables
Part 8: Compile to Object Code
Part 9: Add Debugging Information
Part 10: Conclusion and other goodies LLVM

9.1. Introduction


Welcome to chapter 9 of the guide “Creating a programming language using LLVM”. In chapters 1 through 8, we built a small programming language with functions and variables. What happens if something goes wrong, then how to debug the program?

Debugging at the source code level uses formatting data, which helps the debugger to translate information from the binary code back to the original, which the programmer wrote. LLVM typically uses the DWARF format. DWARF in a compact form encodes types, locations of the source code and locations of variables.

A brief summary of this chapter is that we will look at the various things you need to add to a programming language to support debugging information, and how to convert it to DWARF.
')
Warning: now we can not debug in JIT mode, we will need to compile our program. We will make a few changes regarding both the work of the language and the compilation. Now we will have a compile from the source file in the Kaleidoscope language, and not the JIT execution in an interactive mode. We introduce a restriction in that we will have only one team at the “top level” in order to reduce the number of changes needed.

An example of a program that we will compile:

def fib(x) if x < 3 then 1 else fib(x-1)+fib(x-2); fib(10) 

9.2. Why is this a difficult task?


Turning on debugging information is a difficult task for several reasons - mainly due to code optimization. First, when optimizing, it is more difficult to save information about code locations. In LLVM IR, we store the location in the source code for each IR-level instruction. Optimization passages should maintain locations for newly created commands, but the combination of instructions should save only one location, and this can cause jumps throughout the optimized program. Second, optimization can move variables in such a way that they can share memory with other variables and are difficult to track. For this tutorial, we will not use optimization (as you will see below).

9.3. AOT compile mode (Ahead-of-Time Compilation)


We will cover only the aspects of adding debug information to the source language, and we will not worry about the difficulties of debugging in JIT mode, and we will make several changes in the Kaleidoscope to support the compilation of IR generated by the frontend into an independent program that can be executed, debugged, and see the result.

First, let's make the anonymous function, which contains the top level of the code, called “main”:

 - auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>()); + auto Proto = llvm::make_unique<PrototypeAST>("main", std::vector<std::string>()); 

this small change gives the function a name.

Then we remove all the command line related code:

 @@ -1129,7 +1129,6 @@ static void HandleTopLevelExpression() { /// top ::= definition | external | expression | ';' static void MainLoop() { while (1) { - fprintf(stderr, "ready> "); switch (CurTok) { case tok_eof: return; @@ -1184,7 +1183,6 @@ int main() { BinopPrecedence['*'] = 40; //   . //     - fprintf(stderr, "ready> "); getNextToken(); 

And finally, we turn off all optimization passes and JIT, so that only parsing and LLVM IR code generation remain:

 @@ -1108,17 +1108,8 @@ static void HandleExtern() { static void HandleTopLevelExpression() { //       if (auto FnAST = ParseTopLevelExpr()) { - if (auto *FnIR = FnAST->codegen()) { - // ,    - TheExecutionEngine->finalizeObject(); - //    JIT,     - void *FPtr = TheExecutionEngine->getPointerToFunction(FnIR); - - //     (  ,  double),   - //     . - double (*FP)() = (double (*)())(intptr_t)FPtr; - //   . - (void)FP; + if (!F->codegen()) { + fprintf(stderr, "Error generating code for top level expr"); } } else { //       @@ -1439,11 +1459,11 @@ int main() { //    TheModule->setDataLayout(TheExecutionEngine->getDataLayout()); OurFPM.add(new DataLayoutPass()); +#if 0 OurFPM.add(createBasicAliasAnalysisPass()); //  alloca   OurFPM.add(createPromoteMemoryToRegisterPass()); @@ -1218,7 +1210,7 @@ int main() { OurFPM.add(createGVNPass()); //     (  , ). OurFPM.add(createCFGSimplificationPass()); - + #endif OurFPM.doInitialization(); 

This relatively small set of changes takes us to the point where we can compile a piece on the Kaleidoscope into an executable program using the command line:

 Kaleidoscope-Ch9 < fib.ks | & clang -x ir - 

and get a.out / a.exe in the current working directory.

9.4. Compilation unit


The top-level container for the code section in DWARF is the compile unit.
It contains data on types and functions for a separate broadcast module (that is, a single broadcast module file). And the first thing we need is to build such a unit for our fib.ks file.

9.5. Setting DWARF generation


Similar to the IRBuilder class, there is a DIBuilder class that helps to construct metadata for the LLVM IR file. It is 1: 1 compliant with IRBuilder and LLVM IR. Using it requires you to be more familiar with DWARF terminology than IRBuilder needed instruction names, but if you read the metadata format documentation, then everything will become clearer. We will use this class to construct all the descriptions of the IR level. Their construction occurs immediately after we have built the module object. All these descriptions are constructed as global static variables, because it is easier to use them this way.

Then we will create a small container to cache some commonly used objects. The first will be our compilation unit, we will also write the code for our only type, and we will not worry about expressions that have different types:

 static DIBuilder *DBuilder; struct DebugInfo { DICompileUnit *TheCU; DIType *DblTy; DIType *getDoubleTy(); } KSDbgInfo; DIType *DebugInfo::getDoubleTy() { if (DblTy) return DblTy; DblTy = DBuilder->createBasicType("double", 64, 64, dwarf::DW_ATE_float); return DblTy; } 

And then, further in the main function, we create a module:

 DBuilder = new DIBuilder(*TheModule); KSDbgInfo.TheCU = DBuilder->createCompileUnit( dwarf::DW_LANG_C, "fib.ks", ".", "Kaleidoscope Compiler", 0, "", 0); 

Here you need to note a few things. First, although we create a compilation unit for the language namned by the Kaleidoscope, we use the C language constant. This is because the debugger does not have to understand the calling convention or ABI for languages ​​it does not recognize, and we follow C ABI in our generator code. In this case, we can be sure that we are calling the function from the debugger and it will execute. Secondly, you see “fib.ks” in the call to createCompileUnit. This is the default value, since we use shell redirection to get our source code into the Kaleidoscope compiler. In a normal frontend, you would have the name of the input file in this place.

And the last thing regarding generation of debug information through DIBuilder is that we need to “finalize” the debug information. The reason for this lies in the DIBuilder API, and we need to do this near the end of the main function:

 DBuilder->finalize(); 

before we dump the module.

9.6. Functions


Now we have our compilation unit and our source code locations, and we can add function definitions to debug information. In PrototypeAST :: codegen () we add a few lines of code to describe the context of our subroutine, in this case “File”, and the actual definition of the function itself.

So, the context:

 DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(), KSDbgInfo.TheCU.getDirectory()); 

gives us a DIFile, and requesting the compilation unit we created above for the directory and file name where we are at the moment. Then, we use the source code locations for zero (since our AST does not yet have information about the code locations), and construct our function definition:

 DIScope *FContext = Unit; unsigned LineNo = 0; unsigned ScopeLine = 0; DISubprogram *SP = DBuilder->createFunction( FContext, P.getName(), StringRef(), Unit, LineNo, CreateFunctionType(TheFunction->arg_size(), Unit), false /* internal linkage */, true /* definition */, ScopeLine, DINode::FlagPrototyped, false); TheFunction->setSubprogram(SP); 

and we now have a DISubprogram containing a link to all our metadata for the function.

9.7. Source Locations


The most important thing for debugging information is the accuracy of the source code locations, it allows you to find a place in the source code in reverse. Now our problem is that the Kaleidoscope does not have any information about the source locations in the lexical analyzer and in the parser, and we have to add it.

 struct SourceLocation { int Line; int Col; }; static SourceLocation CurLoc; static SourceLocation LexLoc = {1, 0}; static int advance() { int LastChar = getchar(); if (LastChar == '\n' || LastChar == '\r') { LexLoc.Line++; LexLoc.Col = 0; } else LexLoc.Col++; return LastChar; } 

In this piece of code, we add functionality that keeps track of the row and column of the source file. As each tokenam is parsed, we set our “lexical location” into the corresponding row and column for the start of the token. We do this by replacing all getchar () calls with a new advance () function that tracks information and then we add the source text location to all our AST classes.

 class ExprAST { SourceLocation Loc; public: ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {} virtual ~ExprAST() {} virtual Value* codegen() = 0; int getLine() const { return Loc.Line; } int getCol() const { return Loc.Col; } virtual raw_ostream &dump(raw_ostream &out, int ind) { return out << ':' << getLine() << ':' << getCol() << '\n'; } 

and we transfer these locations when we create a new expression:

 LHS = llvm::make_unique<BinaryExprAST>(BinLoc, BinOp, std::move(LHS), std::move(RHS)); 

and locations are assigned to each expression and variable.

In order to make sure that each instruction received the correct location information, we need to tell the Builder object whether we are in the new source text location. For this we use a small helper function.

 void DebugInfo::emitLocation(ExprAST *AST) { DIScope *Scope; if (LexicalBlocks.empty()) Scope = TheCU; else Scope = LexicalBlocks.back(); Builder.SetCurrentDebugLocation( DebugLoc::get(AST->getLine(), AST->getCol(), Scope)); } 

This tells the IRBuilder object where we are, as well as what area of ​​visibility. The scope can be either a compilation unit level or the nearest surrounding lexical unit, for example, the current function. To present this, create a scopes stack:

 std::vector<DIScope *> LexicalBlocks; 

and put the scope (function) at the top of the stack when we start generating code for each function:

 KSDbgInfo.LexicalBlocks.push_back(SP); 

Also, we should not forget to remove the scope from the stack at the end of the code generation for the function:

 //        KSDbgInfo.LexicalBlocks.pop_back(); 

Then we make sure that we generated a location every time we start generating code for a new AST object:

 KSDbgInfo.emitLocation(this); 

9.8. Variables


We now have functions, and we need the ability to display variables in scope. Let's take a set of arguments to our function, and make it so that we can trace and see how the function was called. Here is a bit of code, and we perform all the actions when we create allocations of arguments in FunctionAST :: codegen.

 //     map NamedValues. NamedValues.clear(); unsigned ArgIdx = 0; for (auto &Arg : TheFunction->args()) { //  alloca   AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, Arg.getName()); //      DILocalVariable *D = DBuilder->createParameterVariable( SP, Arg.getName(), ++ArgIdx, Unit, LineNo, KSDbgInfo.getDoubleTy(), true); DBuilder->insertDeclare(Alloca, D, DBuilder->createExpression(), DebugLoc::get(LineNo, 0, SP), Builder.GetInsertBlock()); //     alloca. Builder.CreateStore(&Arg, Alloca); //      NamedValues[Arg.getName()] = Alloca; } 

Here we create a variable for the first time, assign it a scope (SP), name, location of source code, type, and, if it is an argument, the index of the argument. Then we create a call to lvm.dbg.declare to show at the IR level that we have a variable in alloca (and set the starting location for the variable), and set the location of the source text to start the scope.

One interesting thing to note here is that various debuggers make various assumptions about how the code and debugging information was generated for them. In this case, we need to make some small hack in order not to generate information for the function prologue, so that the debugger will skip these instructions when setting a breakpoint. We add several lines to FunctionAST :: CodeGen:

 //      (   //     ,   //        KSDbgInfo.emitLocation(nullptr); 

and then generate a new location when we actually start generating code for the function body:

 KSDbgInfo.emitLocation(Body.get()); 

Now we have enough debug information to set a breakpoint in the function, display the function's argument variables and call the function. Not bad for just a few lines of code!

9.9. Full code listing


Below is a complete listing of the code for our working example, extended debug information. To compile an example, run:

 #  clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy #  ./toy 

Full code
 #include "llvm/ADT/STLExtras.h" #include "llvm/Analysis/BasicAliasAnalysis.h" #include "llvm/Analysis/Passes.h" #include "llvm/IR/DIBuilder.h" #include "llvm/IR/IRBuilder.h" #include "llvm/IR/LLVMContext.h" #include "llvm/IR/LegacyPassManager.h" #include "llvm/IR/Module.h" #include "llvm/IR/Verifier.h" #include "llvm/Support/TargetSelect.h" #include "llvm/Transforms/Scalar.h" #include <cctype> #include <cstdio> #include <map> #include <string> #include <vector> #include "../include/KaleidoscopeJIT.h" using namespace llvm; using namespace llvm::orc; //===----------------------------------------------------------------------===// //   //===----------------------------------------------------------------------===// //     [0-255]    ,    //  enum Token { tok_eof = -1, //  tok_def = -2, tok_extern = -3, //   tok_identifier = -4, tok_number = -5, //  tok_if = -6, tok_then = -7, tok_else = -8, tok_for = -9, tok_in = -10, //  tok_binary = -11, tok_unary = -12, //   tok_var = -13 }; std::string getTokName(int Tok) { switch (Tok) { case tok_eof: return "eof"; case tok_def: return "def"; case tok_extern: return "extern"; case tok_identifier: return "identifier"; case tok_number: return "number"; case tok_if: return "if"; case tok_then: return "then"; case tok_else: return "else"; case tok_for: return "for"; case tok_in: return "in"; case tok_binary: return "binary"; case tok_unary: return "unary"; case tok_var: return "var"; } return std::string(1, (char)Tok); } namespace { class PrototypeAST; class ExprAST; } static LLVMContext TheContext; static IRBuilder<> Builder(TheContext); struct DebugInfo { DICompileUnit *TheCU; DIType *DblTy; std::vector<DIScope *> LexicalBlocks; void emitLocation(ExprAST *AST); DIType *getDoubleTy(); } KSDbgInfo; struct SourceLocation { int Line; int Col; }; static SourceLocation CurLoc; static SourceLocation LexLoc = {1, 0}; static int advance() { int LastChar = getchar(); if (LastChar == '\n' || LastChar == '\r') { LexLoc.Line++; LexLoc.Col = 0; } else LexLoc.Col++; return LastChar; } static std::string IdentifierStr; //   tok_identifier static double NumVal; //   tok_number /// gettok -       static int gettok() { static int LastChar = ' '; //   while (isspace(LastChar)) LastChar = advance(); CurLoc = LexLoc; if (isalpha(LastChar)) { // : [a-zA-Z][a-zA-Z0-9]* IdentifierStr = LastChar; while (isalnum((LastChar = advance()))) IdentifierStr += LastChar; if (IdentifierStr == "def") return tok_def; if (IdentifierStr == "extern") return tok_extern; if (IdentifierStr == "if") return tok_if; if (IdentifierStr == "then") return tok_then; if (IdentifierStr == "else") return tok_else; if (IdentifierStr == "for") return tok_for; if (IdentifierStr == "in") return tok_in; if (IdentifierStr == "binary") return tok_binary; if (IdentifierStr == "unary") return tok_unary; if (IdentifierStr == "var") return tok_var; return tok_identifier; } if (isdigit(LastChar) || LastChar == '.') { // : [0-9.]+ std::string NumStr; do { NumStr += LastChar; LastChar = advance(); } while (isdigit(LastChar) || LastChar == '.'); NumVal = strtod(NumStr.c_str(), nullptr); return tok_number; } if (LastChar == '#') { //     do LastChar = advance(); while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); if (LastChar != EOF) return gettok(); } //   .   EOF. if (LastChar == EOF) return tok_eof; // ,     ascii-. int ThisChar = LastChar; LastChar = advance(); return ThisChar; } //===----------------------------------------------------------------------===// //    ( ) //===----------------------------------------------------------------------===// namespace { raw_ostream &indent(raw_ostream &O, int size) { return O << std::string(size, ' '); } /// ExprAST -    . class ExprAST { SourceLocation Loc; public: ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {} virtual ~ExprAST() {} virtual Value *codegen() = 0; int getLine() const { return Loc.Line; } int getCol() const { return Loc.Col; } virtual raw_ostream &dump(raw_ostream &out, int ind) { return out << ':' << getLine() << ':' << getCol() << '\n'; } }; /// NumberExprAST -      "1.0". class NumberExprAST : public ExprAST { double Val; public: NumberExprAST(double Val) : Val(Val) {} raw_ostream &dump(raw_ostream &out, int ind) override { return ExprAST::dump(out << Val, ind); } Value *codegen() override; }; /// VariableExprAST -    , , "a". class VariableExprAST : public ExprAST { std::string Name; public: VariableExprAST(SourceLocation Loc, const std::string &Name) : ExprAST(Loc), Name(Name) {} const std::string &getName() const { return Name; } Value *codegen() override; raw_ostream &dump(raw_ostream &out, int ind) override { return ExprAST::dump(out << Name, ind); } }; /// UnaryExprAST -      class UnaryExprAST : public ExprAST { char Opcode; std::unique_ptr<ExprAST> Operand; public: UnaryExprAST(char Opcode, std::unique_ptr<ExprAST> Operand) : Opcode(Opcode), Operand(std::move(Operand)) {} Value *codegen() override; raw_ostream &dump(raw_ostream &out, int ind) override { ExprAST::dump(out << "unary" << Opcode, ind); Operand->dump(out, ind + 1); return out; } }; /// BinaryExprAST -      class BinaryExprAST : public ExprAST { char Op; std::unique_ptr<ExprAST> LHS, RHS; public: BinaryExprAST(SourceLocation Loc, char Op, std::unique_ptr<ExprAST> LHS, std::unique_ptr<ExprAST> RHS) : ExprAST(Loc), Op(Op), LHS(std::move(LHS)), RHS(std::move(RHS)) {} Value *codegen() override; raw_ostream &dump(raw_ostream &out, int ind) override { ExprAST::dump(out << "binary" << Op, ind); LHS->dump(indent(out, ind) << "LHS:", ind + 1); RHS->dump(indent(out, ind) << "RHS:", ind + 1); return out; } }; /// CallExprAST -      class CallExprAST : public ExprAST { std::string Callee; std::vector<std::unique_ptr<ExprAST>> Args; public: CallExprAST(SourceLocation Loc, const std::string &Callee, std::vector<std::unique_ptr<ExprAST>> Args) : ExprAST(Loc), Callee(Callee), Args(std::move(Args)) {} Value *codegen() override; raw_ostream &dump(raw_ostream &out, int ind) override { ExprAST::dump(out << "call " << Callee, ind); for (const auto &Arg : Args) Arg->dump(indent(out, ind + 1), ind + 1); return out; } }; /// IfExprAST -    if/then/else. class IfExprAST : public ExprAST { std::unique_ptr<ExprAST> Cond, Then, Else; public: IfExprAST(SourceLocation Loc, std::unique_ptr<ExprAST> Cond, std::unique_ptr<ExprAST> Then, std::unique_ptr<ExprAST> Else) : ExprAST(Loc), Cond(std::move(Cond)), Then(std::move(Then)), Else(std::move(Else)) {} Value *codegen() override; raw_ostream &dump(raw_ostream &out, int ind) override { ExprAST::dump(out << "if", ind); Cond->dump(indent(out, ind) << "Cond:", ind + 1); Then->dump(indent(out, ind) << "Then:", ind + 1); Else->dump(indent(out, ind) << "Else:", ind + 1); return out; } }; /// ForExprAST -    for/in. class ForExprAST : public ExprAST { std::string VarName; std::unique_ptr<ExprAST> Start, End, Step, Body; public: ForExprAST(const std::string &VarName, std::unique_ptr<ExprAST> Start, std::unique_ptr<ExprAST> End, std::unique_ptr<ExprAST> Step, std::unique_ptr<ExprAST> Body) : VarName(VarName), Start(std::move(Start)), End(std::move(End)), Step(std::move(Step)), Body(std::move(Body)) {} Value *codegen() override; raw_ostream &dump(raw_ostream &out, int ind) override { ExprAST::dump(out << "for", ind); Start->dump(indent(out, ind) << "Cond:", ind + 1); End->dump(indent(out, ind) << "End:", ind + 1); Step->dump(indent(out, ind) << "Step:", ind + 1); Body->dump(indent(out, ind) << "Body:", ind + 1); return out; } }; /// VarExprAST -    var/in class VarExprAST : public ExprAST { std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames; std::unique_ptr<ExprAST> Body; public: VarExprAST( std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames, std::unique_ptr<ExprAST> Body) : VarNames(std::move(VarNames)), Body(std::move(Body)) {} Value *codegen() override; raw_ostream &dump(raw_ostream &out, int ind) override { ExprAST::dump(out << "var", ind); for (const auto &NamedVar : VarNames) NamedVar.second->dump(indent(out, ind) << NamedVar.first << ':', ind + 1); Body->dump(indent(out, ind) << "Body:", ind + 1); return out; } }; /// PrototypeAST -    "" , ///    ,    (, ,  /// ,  ),   . class PrototypeAST { std::string Name; std::vector<std::string> Args; bool IsOperator; unsigned Precedence; //     int Line; public: PrototypeAST(SourceLocation Loc, const std::string &Name, std::vector<std::string> Args, bool IsOperator = false, unsigned Prec = 0) : Name(Name), Args(std::move(Args)), IsOperator(IsOperator), Precedence(Prec), Line(Loc.Line) {} Function *codegen(); const std::string &getName() const { return Name; } bool isUnaryOp() const { return IsOperator && Args.size() == 1; } bool isBinaryOp() const { return IsOperator && Args.size() == 2; } char getOperatorName() const { assert(isUnaryOp() || isBinaryOp()); return Name[Name.size() - 1]; } unsigned getBinaryPrecedence() const { return Precedence; } int getLine() const { return Line; } }; /// FunctionAST -      class FunctionAST { std::unique_ptr<PrototypeAST> Proto; std::unique_ptr<ExprAST> Body; public: FunctionAST(std::unique_ptr<PrototypeAST> Proto, std::unique_ptr<ExprAST> Body) : Proto(std::move(Proto)), Body(std::move(Body)) {} Function *codegen(); raw_ostream &dump(raw_ostream &out, int ind) { indent(out, ind) << "FunctionAST\n"; ++ind; indent(out, ind) << "Body:"; return Body ? Body->dump(out, ind) : out << "null\n"; } }; } //     //===----------------------------------------------------------------------===// //  //===----------------------------------------------------------------------===// /// CurTok/getNextToken -    . CurTok -  /// ,    . getNextToken     ///     CurTok  . static int CurTok; static int getNextToken() { return CurTok = gettok(); } /// BinopPrecedence -     , ///   static std::map<char, int> BinopPrecedence; /// GetTokPrecedence -       . static int GetTokPrecedence() { if (!isascii(CurTok)) return -1; // ,     int TokPrec = BinopPrecedence[CurTok]; if (TokPrec <= 0) return -1; return TokPrec; } /// LogError* -     . std::unique_ptr<ExprAST> LogError(const char *Str) { fprintf(stderr, "Error: %s\n", Str); return nullptr; } std::unique_ptr<PrototypeAST> LogErrorP(const char *Str) { LogError(Str); return nullptr; } static std::unique_ptr<ExprAST> ParseExpression(); /// numberexpr ::= number static std::unique_ptr<ExprAST> ParseNumberExpr() { auto Result = llvm::make_unique<NumberExprAST>(NumVal); getNextToken(); //   return std::move(Result); } /// parenexpr ::= '(' expression ')' static std::unique_ptr<ExprAST> ParseParenExpr() { getNextToken(); //  (. auto V = ParseExpression(); if (!V) return nullptr; if (CurTok != ')') return LogError("expected ')'"); getNextToken(); //  ). return V; } /// identifierexpr /// ::= identifier /// ::= identifier '(' expression* ')' static std::unique_ptr<ExprAST> ParseIdentifierExpr() { std::string IdName = IdentifierStr; SourceLocation LitLoc = CurLoc; getNextToken(); //  . if (CurTok != '(') //    . return llvm::make_unique<VariableExprAST>(LitLoc, IdName); // Call. getNextToken(); //  ( std::vector<std::unique_ptr<ExprAST>> Args; if (CurTok != ')') { while (1) { if (auto Arg = ParseExpression()) Args.push_back(std::move(Arg)); else return nullptr; if (CurTok == ')') break; if (CurTok != ',') return LogError("Expected ')' or ',' in argument list"); getNextToken(); } } //  ')'. getNextToken(); return llvm::make_unique<CallExprAST>(LitLoc, IdName, std::move(Args)); } /// ifexpr ::= 'if' expression 'then' expression 'else' expression static std::unique_ptr<ExprAST> ParseIfExpr() { SourceLocation IfLoc = CurLoc; getNextToken(); //  if. // . auto Cond = ParseExpression(); if (!Cond) return nullptr; if (CurTok != tok_then) return LogError("expected then"); getNextToken(); //  then auto Then = ParseExpression(); if (!Then) return nullptr; if (CurTok != tok_else) return LogError("expected else"); getNextToken(); auto Else = ParseExpression(); if (!Else) return nullptr; return llvm::make_unique<IfExprAST>(IfLoc, std::move(Cond), std::move(Then), std::move(Else)); } /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression static std::unique_ptr<ExprAST> ParseForExpr() { getNextToken(); //  for. if (CurTok != tok_identifier) return LogError("expected identifier after for"); std::string IdName = IdentifierStr; getNextToken(); //  . if (CurTok != '=') return LogError("expected '=' after for"); getNextToken(); //  '='. auto Start = ParseExpression(); if (!Start) return nullptr; if (CurTok != ',') return LogError("expected ',' after for start value"); getNextToken(); auto End = ParseExpression(); if (!End) return nullptr; //   . std::unique_ptr<ExprAST> Step; if (CurTok == ',') { getNextToken(); Step = ParseExpression(); if (!Step) return nullptr; } if (CurTok != tok_in) return LogError("expected 'in' after for"); getNextToken(); //  'in'. auto Body = ParseExpression(); if (!Body) return nullptr; return llvm::make_unique<ForExprAST>(IdName, std::move(Start), std::move(End), std::move(Step), std::move(Body)); } /// varexpr ::= 'var' identifier ('=' expression)? // (',' identifier ('=' expression)?)* 'in' expression static std::unique_ptr<ExprAST> ParseVarExpr() { getNextToken(); //  var. std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames; //      if (CurTok != tok_identifier) return LogError("expected identifier after var"); while (1) { std::string Name = IdentifierStr; getNextToken(); //  .. //    std::unique_ptr<ExprAST> Init = nullptr; if (CurTok == '=') { getNextToken(); //  '='. Init = ParseExpression(); if (!Init) return nullptr; } VarNames.push_back(std::make_pair(Name, std::move(Init))); //   ,   . if (CurTok != ',') break; getNextToken(); //  ','. if (CurTok != tok_identifier) return LogError("expected identifier list after var"); } //        'in'. if (CurTok != tok_in) return LogError("expected 'in' keyword after 'var'"); getNextToken(); //  'in'. auto Body = ParseExpression(); if (!Body) return nullptr; return llvm::make_unique<VarExprAST>(std::move(VarNames), std::move(Body)); } /// primary /// ::= identifierexpr /// ::= numberexpr /// ::= parenexpr /// ::= ifexpr /// ::= forexpr /// ::= varexpr static std::unique_ptr<ExprAST> ParsePrimary() { switch (CurTok) { default: return LogError("unknown token when expecting an expression"); case tok_identifier: return ParseIdentifierExpr(); case tok_number: return ParseNumberExpr(); case '(': return ParseParenExpr(); case tok_if: return ParseIfExpr(); case tok_for: return ParseForExpr(); case tok_var: return ParseVarExpr(); } } /// unary /// ::= primary /// ::= '!' unary static std::unique_ptr<ExprAST> ParseUnary() { //      ,     . if (!isascii(CurTok) || CurTok == '(' || CurTok == ',') return ParsePrimary(); //    ,   int Opc = CurTok; getNextToken(); if (auto Operand = ParseUnary()) return llvm::make_unique<UnaryExprAST>(Opc, std::move(Operand)); return nullptr; } /// binoprhs /// ::= ('+' unary)* static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec, std::unique_ptr<ExprAST> LHS) { //    ,    while (1) { int TokPrec = GetTokPrecedence(); //    ,    //  ,   if (TokPrec < ExprPrec) return LHS; //   ,     int BinOp = CurTok; SourceLocation BinLoc = CurLoc; getNextToken(); //    //       auto RHS = ParseUnary(); if (!RHS) return nullptr; //  BinOp    RHS,    RHS,  //    RHS   LHS. int NextPrec = GetTokPrecedence(); if (TokPrec < NextPrec) { RHS = ParseBinOpRHS(TokPrec + 1, std::move(RHS)); if (!RHS) return nullptr; } //  LHS/RHS. LHS = llvm::make_unique<BinaryExprAST>(BinLoc, BinOp, std::move(LHS), std::move(RHS)); } } /// expression /// ::= unary binoprhs /// static std::unique_ptr<ExprAST> ParseExpression() { auto LHS = ParseUnary(); if (!LHS) return nullptr; return ParseBinOpRHS(0, std::move(LHS)); } /// prototype /// ::= id '(' id* ')' /// ::= binary LETTER number? (id, id) /// ::= unary LETTER (id) static std::unique_ptr<PrototypeAST> ParsePrototype() { std::string FnName; SourceLocation FnLoc = CurLoc; unsigned Kind = 0; // 0 = , 1 = , 2 = . unsigned BinaryPrecedence = 30; switch (CurTok) { default: return LogErrorP("Expected function name in prototype"); case tok_identifier: FnName = IdentifierStr; Kind = 0; getNextToken(); break; case tok_unary: getNextToken(); if (!isascii(CurTok)) return LogErrorP("Expected unary operator"); FnName = "unary"; FnName += (char)CurTok; Kind = 1; getNextToken(); break; case tok_binary: getNextToken(); if (!isascii(CurTok)) return LogErrorP("Expected binary operator"); FnName = "binary"; FnName += (char)CurTok; Kind = 2; getNextToken(); //  ,    if (CurTok == tok_number) { if (NumVal < 1 || NumVal > 100) return LogErrorP("Invalid precedence: must be 1..100"); BinaryPrecedence = (unsigned)NumVal; getNextToken(); } break; } if (CurTok != '(') return LogErrorP("Expected '(' in prototype"); std::vector<std::string> ArgNames; while (getNextToken() == tok_identifier) ArgNames.push_back(IdentifierStr); if (CurTok != ')') return LogErrorP("Expected ')' in prototype"); // . getNextToken(); //  ')'. // ,     if (Kind && ArgNames.size() != Kind) return LogErrorP("Invalid number of operands for operator"); return llvm::make_unique<PrototypeAST>(FnLoc, FnName, ArgNames, Kind != 0, BinaryPrecedence); } /// definition ::= 'def' prototype expression static std::unique_ptr<FunctionAST> ParseDefinition() { getNextToken(); //  def. auto Proto = ParsePrototype(); if (!Proto) return nullptr; if (auto E = ParseExpression()) return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E)); return nullptr; } /// toplevelexpr ::= expression static std::unique_ptr<FunctionAST> ParseTopLevelExpr() { SourceLocation FnLoc = CurLoc; if (auto E = ParseExpression()) { //    auto Proto = llvm::make_unique<PrototypeAST>(FnLoc, "__anon_expr", std::vector<std::string>()); return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E)); } return nullptr; } /// external ::= 'extern' prototype static std::unique_ptr<PrototypeAST> ParseExtern() { getNextToken(); // eat extern. return ParsePrototype(); } //===----------------------------------------------------------------------===// //    //===----------------------------------------------------------------------===// static std::unique_ptr<DIBuilder> DBuilder; DIType *DebugInfo::getDoubleTy() { if (DblTy) return DblTy; DblTy = DBuilder->createBasicType("double", 64, 64, dwarf::DW_ATE_float); return DblTy; } void DebugInfo::emitLocation(ExprAST *AST) { if (!AST) return Builder.SetCurrentDebugLocation(DebugLoc()); DIScope *Scope; if (LexicalBlocks.empty()) Scope = TheCU; else Scope = LexicalBlocks.back(); Builder.SetCurrentDebugLocation( DebugLoc::get(AST->getLine(), AST->getCol(), Scope)); } static DISubroutineType *CreateFunctionType(unsigned NumArgs, DIFile *Unit) { SmallVector<Metadata *, 8> EltTys; DIType *DblTy = KSDbgInfo.getDoubleTy(); //    EltTys.push_back(DblTy); for (unsigned i = 0, e = NumArgs; i != e; ++i) EltTys.push_back(DblTy); return DBuilder->createSubroutineType(DBuilder->getOrCreateTypeArray(EltTys)); } //===----------------------------------------------------------------------===// //   //===----------------------------------------------------------------------===// static std::unique_ptr<Module> TheModule; static std::map<std::string, AllocaInst *> NamedValues; static std::unique_ptr<KaleidoscopeJIT> TheJIT; static std::map<std::string, std::unique_ptr<PrototypeAST>> FunctionProtos; Value *LogErrorV(const char *Str) { LogError(Str); return nullptr; } Function *getFunction(std::string Name) { //  ,       . if (auto *F = TheModule->getFunction(Name)) return F; //  ,         // . auto FI = FunctionProtos.find(Name); if (FI != FunctionProtos.end()) return FI->second->codegen(); //    ,  null. return nullptr; } /// CreateEntryBlockAlloca -   alloca    /// .     . static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction, const std::string &VarName) { IRBuilder<> TmpB(&TheFunction->getEntryBlock(), TheFunction->getEntryBlock().begin()); return TmpB.CreateAlloca(Type::getDoubleTy(TheContext), nullptr, VarName.c_str()); } Value *NumberExprAST::codegen() { KSDbgInfo.emitLocation(this); return ConstantFP::get(TheContext, APFloat(Val)); } Value *VariableExprAST::codegen() { // ,       Value *V = NamedValues[Name]; if (!V) return LogErrorV("Unknown variable name"); KSDbgInfo.emitLocation(this); //   return Builder.CreateLoad(V, Name.c_str()); } Value *UnaryExprAST::codegen() { Value *OperandV = Operand->codegen(); if (!OperandV) return nullptr; Function *F = getFunction(std::string("unary") + Opcode); if (!F) return LogErrorV("Unknown unary operator"); KSDbgInfo.emitLocation(this); return Builder.CreateCall(F, OperandV, "unop"); } Value *BinaryExprAST::codegen() { KSDbgInfo.emitLocation(this); //    '=', ..     LHS   if (Op == '=') { //  ,  LHS   // ,     RTTI, .. LLVM   //  .    LLVM  RTTI     // dynamic_cast    . VariableExprAST *LHSE = static_cast<VariableExprAST *>(LHS.get()); if (!LHSE) return LogErrorV("destination of '=' must be a variable"); //   RHS. Value *Val = RHS->codegen(); if (!Val) return nullptr; //   Value *Variable = NamedValues[LHSE->getName()]; if (!Variable) return LogErrorV("Unknown variable name"); Builder.CreateStore(Val, Variable); return Val; } Value *L = LHS->codegen(); Value *R = RHS->codegen(); if (!L || !R) return nullptr; switch (Op) { case '+': return Builder.CreateFAdd(L, R, "addtmp"); case '-': return Builder.CreateFSub(L, R, "subtmp"); case '*': return Builder.CreateFMul(L, R, "multmp"); case '<': L = Builder.CreateFCmpULT(L, R, "cmptmp"); //  bool 0/1  double 0.0 or 1.0 return Builder.CreateUIToFP(L, Type::getDoubleTy(TheContext), "booltmp"); default: break; } //      ,     .  //    . Function *F = getFunction(std::string("binary") + Op); assert(F && "binary operator not found!"); Value *Ops[] = {L, R}; return Builder.CreateCall(F, Ops, "binop"); } Value *CallExprAST::codegen() { KSDbgInfo.emitLocation(this); //       Function *CalleeF = getFunction(Callee); if (!CalleeF) return LogErrorV("Unknown function referenced"); // ,    . if (CalleeF->arg_size() != Args.size()) return LogErrorV("Incorrect # arguments passed"); std::vector<Value *> ArgsV; for (unsigned i = 0, e = Args.size(); i != e; ++i) { ArgsV.push_back(Args[i]->codegen()); if (!ArgsV.back()) return nullptr; } return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); } Value *IfExprAST::codegen() { KSDbgInfo.emitLocation(this); Value *CondV = Cond->codegen(); if (!CondV) return nullptr; //          0.0. CondV = Builder.CreateFCmpONE( CondV, ConstantFP::get(TheContext, APFloat(0.0)), "ifcond"); Function *TheFunction = Builder.GetInsertBlock()->getParent(); //    then  else.   'then'  //   BasicBlock *ThenBB = BasicBlock::Create(TheContext, "then", TheFunction); BasicBlock *ElseBB = BasicBlock::Create(TheContext, "else"); BasicBlock *MergeBB = BasicBlock::Create(TheContext, "ifcont"); Builder.CreateCondBr(CondV, ThenBB, ElseBB); //  . Builder.SetInsertPoint(ThenBB); Value *ThenV = Then->codegen(); if (!ThenV) return nullptr; Builder.CreateBr(MergeBB); //    'Then'    ,  ThenBB  PHI. ThenBB = Builder.GetInsertBlock(); //   "else" TheFunction->getBasicBlockList().push_back(ElseBB); Builder.SetInsertPoint(ElseBB); Value *ElseV = Else->codegen(); if (!ElseV) return nullptr; Builder.CreateBr(MergeBB); //    'Else'    ,  ElseBB  PHI. ElseBB = Builder.GetInsertBlock(); //    TheFunction->getBasicBlockList().push_back(MergeBB); Builder.SetInsertPoint(MergeBB); PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(TheContext), 2, "iftmp"); PN->addIncoming(ThenV, ThenBB); PN->addIncoming(ElseV, ElseBB); return PN; } //  for-loop : // var = alloca double // ... // start = startexpr // store start -> var // goto loop // loop: // ... // bodyexpr // ... // loopend: // step = stepexpr // endcond = endexpr // // curvar = load var // nextvar = curvar + step // store nextvar -> var // br endcond, loop, endloop // outloop: Value *ForExprAST::codegen() { Function *TheFunction = Builder.GetInsertBlock()->getParent(); //   alloca      AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); KSDbgInfo.emitLocation(this); //    ,  'variable'    Value *StartVal = Start->codegen(); if (!StartVal) return nullptr; //    alloca. Builder.CreateStore(StartVal, Alloca); //       ,     // . BasicBlock *LoopBB = BasicBlock::Create(TheContext, "loop", TheFunction); //        LoopBB. Builder.CreateBr(LoopBB); //    LoopBB. Builder.SetInsertPoint(LoopBB); //  ,    PHI-.   //   ,    ,   . AllocaInst *OldVal = NamedValues[VarName]; NamedValues[VarName] = Alloca; //   . ,    ,   //  BB. ,    ,   ,   //  . if (!Body->codegen()) return nullptr; //    Value *StepVal = nullptr; if (Step) { StepVal = Step->codegen(); if (!StepVal) return nullptr; } else { //   ,  1.0. StepVal = ConstantFP::get(TheContext, APFloat(1.0)); } //    Value *EndCond = End->codegen(); if (!EndCond) return nullptr; // , ,   alloca.   //        Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str()); Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar"); Builder.CreateStore(NextVar, Alloca); //          0.0. EndCond = Builder.CreateFCmpONE( EndCond, ConstantFP::get(TheContext, APFloat(0.0)), "loopcond"); //   " "    BasicBlock *AfterBB = BasicBlock::Create(TheContext, "afterloop", TheFunction); //      LoopEndBB. Builder.CreateCondBr(EndCond, LoopBB, AfterBB); //       AfterBB. Builder.SetInsertPoint(AfterBB); //    . if (OldVal) NamedValues[VarName] = OldVal; else NamedValues.erase(VarName); //    0.0. return Constant::getNullValue(Type::getDoubleTy(TheContext)); } Value *VarExprAST::codegen() { std::vector<AllocaInst *> OldBindings; Function *TheFunction = Builder.GetInsertBlock()->getParent(); //       for (unsigned i = 0, e = VarNames.size(); i != e; ++i) { const std::string &VarName = VarNames[i].first; ExprAST *Init = VarNames[i].second.get(); //     ,   //      ,   // l  : // var a = 1 in // var a = a in ... #    'a'. Value *InitVal; if (Init) { InitVal = Init->codegen(); if (!InitVal) return nullptr; } else { //    ,  0.0. InitVal = ConstantFP::get(TheContext, APFloat(0.0)); } AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); Builder.CreateStore(InitVal, Alloca); //    OldBindings.push_back(NamedValues[VarName]); //   NamedValues[VarName] = Alloca; } KSDbgInfo.emitLocation(this); //       Value *BodyVal = Body->codegen(); if (!BodyVal) return nullptr; //    for (unsigned i = 0, e = VarNames.size(); i != e; ++i) NamedValues[VarNames[i].first] = OldBindings[i]; //     return BodyVal; } Function *PrototypeAST::codegen() { //   : double(double,double) etc. std::vector<Type *> Doubles(Args.size(), Type::getDoubleTy(TheContext)); FunctionType *FT = FunctionType::get(Type::getDoubleTy(TheContext), Doubles, false); Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule.get()); //    unsigned Idx = 0; for (auto &Arg : F->args()) Arg.setName(Args[Idx++]); return F; } Function *FunctionAST::codegen() { //      FunctionProtos,   //    . auto &P = *Proto; FunctionProtos[Proto->getName()] = std::move(Proto); Function *TheFunction = getFunction(P.getName()); if (!TheFunction) return nullptr; //   ,   if (P.isBinaryOp()) BinopPrecedence[P.getOperatorName()] = P.getBinaryPrecedence(); //        BasicBlock *BB = BasicBlock::Create(TheContext, "entry", TheFunction); Builder.SetInsertPoint(BB); //   DIE   DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU->getFilename(), KSDbgInfo.TheCU->getDirectory()); DIScope *FContext = Unit; unsigned LineNo = P.getLine(); unsigned ScopeLine = LineNo; DISubprogram *SP = DBuilder->createFunction( FContext, P.getName(), StringRef(), Unit, LineNo, CreateFunctionType(TheFunction->arg_size(), Unit), false /* internal linkage */, true /* definition */, ScopeLine, DINode::FlagPrototyped, false); TheFunction->setSubprogram(SP); //     KSDbgInfo.LexicalBlocks.push_back(SP); //       (    //        //       ) KSDbgInfo.emitLocation(nullptr); //     NamedValues. NamedValues.clear(); unsigned ArgIdx = 0; for (auto &Arg : TheFunction->args()) { //  alloca  . AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, Arg.getName()); //      DILocalVariable *D = DBuilder->createParameterVariable( SP, Arg.getName(), ++ArgIdx, Unit, LineNo, KSDbgInfo.getDoubleTy(), true); DBuilder->insertDeclare(Alloca, D, DBuilder->createExpression(), DebugLoc::get(LineNo, 0, SP), Builder.GetInsertBlock()); //     alloca. Builder.CreateStore(&Arg, Alloca); //      NamedValues[Arg.getName()] = Alloca; } KSDbgInfo.emitLocation(Body.get()); if (Value *RetVal = Body->codegen()) { //  . Builder.CreateRet(RetVal); //     . KSDbgInfo.LexicalBlocks.pop_back(); //    verifyFunction(*TheFunction); return TheFunction; } //    ,   TheFunction->eraseFromParent(); if (P.isBinaryOp()) BinopPrecedence.erase(Proto->getOperatorName()); //      KSDbgInfo.LexicalBlocks.pop_back(); return nullptr; } //===----------------------------------------------------------------------===// //     JIT //===----------------------------------------------------------------------===// static void InitializeModule() { //    TheModule = llvm::make_unique<Module>("my cool jit", TheContext); TheModule->setDataLayout(TheJIT->getTargetMachine().createDataLayout()); } static void HandleDefinition() { if (auto FnAST = ParseDefinition()) { if (!FnAST->codegen()) fprintf(stderr, "Error reading function definition:"); } else { //       getNextToken(); } } static void HandleExtern() { if (auto ProtoAST = ParseExtern()) { if (!ProtoAST->codegen()) fprintf(stderr, "Error reading extern"); else FunctionProtos[ProtoAST->getName()] = std::move(ProtoAST); } else { //       getNextToken(); } } static void HandleTopLevelExpression() { //       if (auto FnAST = ParseTopLevelExpr()) { if (!FnAST->codegen()) { fprintf(stderr, "Error generating code for top level expr"); } } else { //       getNextToken(); } } /// top ::= definition | external | expression | ';' static void MainLoop() { while (1) { switch (CurTok) { case tok_eof: return; case ';': //       . getNextToken(); break; case tok_def: HandleDefinition(); break; case tok_extern: HandleExtern(); break; default: HandleTopLevelExpression(); break; } } } //===----------------------------------------------------------------------===// // "" ,        //===----------------------------------------------------------------------===// #ifdef LLVM_ON_WIN32 #define DLLEXPORT __declspec(dllexport) #else #define DLLEXPORT #endif /// putchard - putchar  double,  0. extern "C" DLLEXPORT double putchard(double X) { fputc((char)X, stderr); return 0; } /// printd -  double   "%f\n",  0. extern "C" DLLEXPORT double printd(double X) { fprintf(stderr, "%f\n", X); return 0; } //===----------------------------------------------------------------------===// //  main //===----------------------------------------------------------------------===// int main() { InitializeNativeTarget(); InitializeNativeTargetAsmPrinter(); InitializeNativeTargetAsmParser(); //     // 1 -    BinopPrecedence['='] = 2; BinopPrecedence['<'] = 10; BinopPrecedence['+'] = 20; BinopPrecedence['-'] = 20; BinopPrecedence['*'] = 40; //  . //    getNextToken(); TheJIT = llvm::make_unique<KaleidoscopeJIT>(); InitializeModule(); //       TheModule->addModuleFlag(Module::Warning, "Debug Info Version", DEBUG_METADATA_VERSION); // Darwin   dwarf2. if (Triple(sys::getProcessTriple()).isOSDarwin()) TheModule->addModuleFlag(llvm::Module::Warning, "Dwarf Version", 2); //  DIBuilder,   , ..    DBuilder = llvm::make_unique<DIBuilder>(*TheModule); //      //  "fib.ks"   ,     stdin //  ,      . KSDbgInfo.TheCU = DBuilder->createCompileUnit( dwarf::DW_LANG_C, "fib.ks", ".", "Kaleidoscope Compiler", 0, "", 0); //    MainLoop(); //    DBuilder->finalize(); //   . TheModule->print(errs(), nullptr); return 0; } 

Source: https://habr.com/ru/post/336980/


All Articles