Codegen
) for each AST class:
/// ExprAST - . class ExprAST { public: virtual ~ExprAST() {}
virtual Value *Codegen() = 0;
}; /// NumberExprAST - "1.0". class NumberExprAST : public ExprAST { double Val; public: NumberExprAST(double val) : Val(val) {}
virtual Value *Codegen();
}; ...
Codegen()
method will return the IR for this AST node along with all its dependencies, and they all return an LLVM Value
object. "Value"
is the class used to represent the " Static Single Assigment Register ( SSA )" or "SSA Values" in LLVM. The most important point in SSA is that their value is calculated as the execution of the associated instruction and they cannot get a new value until (and if) the instruction is executed again. In other words, there is no way to “change” the value of SSA. For more information, read about Static Single Assignment - these concepts really seem quite natural as soon as you pay attention to them.ExprAST
class hierarchy, it may make sense to use the Visitor design pattern or some other method. Again, in this tutorial, we will not dwell on good software development methods: our goal is simplicity — and it's easier for us to add a virtual method. Value *ErrorV(const char *Str) { Error(Str); return 0; } static Module *TheModule; static IRBuilder<> Builder(getGlobalContext()); static std::map<std::string, Value*> NamedValues;
TheModule
is an LLVM construct containing all functions and global variables in a piece of code. In most cases, these are top-level structures that the LLVM IR uses for the contained code.Builder
object is an auxiliary object that makes it easy to generate LLVM instructions. Class template instances IRBuilder
IRBuilder
keeps track of the current place to insert instructions and contains methods for creating new instructions.NamedValues
map NamedValues
track of which values ​​are defined in the current scope and what their LLVM representation is. (In other words, this is the symbol table for the code). In the current form in Kaleidoscope, the only thing that can be referenced is the parameters of the functions. Thus, in this map, when generating the code for the function body, the parameters of this function will be located.Builder
has already been created to generate code in “something”. For now, we assume that this has already been done, and we will use it to save the code. Value *NumberExprAST::Codegen() { return ConstantFP::get(getGlobalContext(), APFloat(Val)); }
ConstantFP
, which contains inside a numeric value in the form of APFloat
( APFloat
has the ability to reduce real constants to arbitrary precision numbers). This code creates and returns ConstantFP
. Note that in LLVM IR all constants are unique (uniqued) and shared (shared). It is for this reason that the API uses the idiom "foo::get(...)"
rather than "new foo(..)"
or "foo::Create(..)"
. Value *VariableExprAST::Codegen() { // . Value *V = NamedValues[Name]; return V ? V : ErrorV("Unknown variable name"); }
Value *BinaryExprAST::Codegen() { Value *L = LHS->Codegen(); Value *R = RHS->Codegen(); if (L == 0 || R == 0) return 0; switch (Op) { case '+': return Builder.CreateFAdd(L, R, "addtmp"); case '-': return Builder.CreateFSub(L, R, "subtmp"); case '*': return Builder.CreateFMul(L, R, "multmp"); case '<': L = Builder.CreateFCmpULT(L, R, "cmptmp"); // 0 1 0.0 1.0 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), "booltmp"); default: return ErrorV("invalid binary operator"); } }
Builder
class finally displays its value. IRBuilder
knows where to insert the newly created instructions, and all you need to do is specify which instructions to create (for example, "CreateFAdd"
), which operands to use (in this case L and R) and, if necessary, which one to use name for the generated instruction."addtmp"
several times, LLVM will automatically assign a unique numeric suffix each time. Local value names for instructions are optional, but they make IR dumps much easier to read.fcmp
instruction always returns the value "i1" (one-bit integer). The problem is that we need a value of 0.0 or 1.0 in Kaleidoscope. To do this, we combine the fcmp
instruction with the uitofp
instruction . This instruction converts an unsigned integer to floating point. In contrast, if we used the sitofp
, the operator '<' would return 0.0 and -1.0, depending on the input value. Value *CallExprAST::Codegen() { // . Function *CalleeF = TheModule->getFunction(Callee); if (CalleeF == 0) return ErrorV("Unknown function referenced"); // . if (CalleeF->arg_size() != Args.size()) return ErrorV("Incorrect # arguments passed"); std::vector<Value*> ArgsV; for (unsigned i = 0, e = Args.size(); i != e; ++i) { ArgsV.push_back(Args[i]->Codegen()); if (ArgsV.back() == 0) return 0; } return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp"); }
"sin"
and "cos"
without any extra effort. Function *PrototypeAST::Codegen() { // : double(double,double) ... std::vector<const Type*> Doubles(Args.size(), Type::getDoubleTy(getGlobalContext())); FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), Doubles, false); Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
"Function*"
instead of "Value*"
. Since the “prototype” actually speaks of the external interface for a function (and not a calculated expression), it makes sense for it to return the LLVM function corresponding to the code generated for the function."FunctionType::get"
creates FunctionType, which should be used for this prototype. Since all the function arguments in Kaleidoscope are real numbers, the first line creates an LLVM vector of “N” real numbers. The FunctionType::get
method is then used to create a functional type that takes "N" valid arguments and returns one valid argument as the result. Note that the types in LLVM are unique, like constants, so you need not "new"
, but "get"
(in the original wordplay: "... so that you do not "
create "
it, but "
get "
.") ."Name"
defines the name specified by the user, "TheModule"
indicates that the name is registered in the symbol table "TheModule"
. // (F) , 'Name'. // , . if (F->getName() != Name) { // , . F->eraseFromParent(); F = TheModule->getFunction(Name);
extern
-definition of functions more than once, as long as their prototypes are the same (since all arguments are of the same type, we just need to check the coincidence of the number of arguments). Secondly, if we want to use extern
- the definition of functions, and then the definition of the body for them. This is necessary when defining mutually recursive functions."eraseFromParent"
), and then calls "getFunction"
to get an existing function with the given name. It is worth noting that many APIs in LLVM have both the "erase"
form and the "remove"
form. The "remove"
method detaches (unlink) an object from its parent object (for example, a Function from a Module) and returns it. The "erase"
method detaches (unlink) an object and then deletes it. // (F) , . if (!F->empty()) { ErrorF("redefinition of function"); return 0; } // (F) , . if (F->arg_size() != Args.size()) { ErrorF("redefinition of function with different # args"); return 0; } }
"extern"
function, we simply compare its number of arguments with the number of arguments of the current definition. If they do not match, then we display an error. // . unsigned Idx = 0; for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); ++AI, ++Idx) { AI->setName(Args[Idx]); // . NamedValues[Args[Idx]] = AI; } return F; }
NamedValues
map for later use by the AST node VariableExprAST
. It then returns the function object. Note that we do not check conflicting argument names (for example, "extern Foo (aba)"
). But this can be done quite simply and straightforwardly using the method we used above. Function *FunctionAST::Codegen() { NamedValues.clear(); Function *TheFunction = Proto->Codegen(); if (TheFunction == 0) return 0;
Proto
) and make sure that everything is normal. We also clear the NamedValues ​​map to make sure that nothing remains of the last function we process. Generating a prototype code ensures that we have a ready-made functional object LLVM, so that we can move on. // . BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); Builder.SetInsertPoint(BB); if (Value *RetVal = Body->Codegen()) {
"entry"
), which is inserted into TheFunction. The second line tells Builder that new instructions should be inserted at the end of the new base unit. The basic blocks in LLVM are an important part of the function that defines the control flow graph [ en ]. Since we don't have any control flow yet, our functions will contain only one block. But we will fix this in chapter 5 :). if (Value *RetVal = Body->Codegen()) { // . Builder.CreateRet(RetVal); // , (). verifyFunction(*TheFunction); return TheFunction; }
CodeGen()
for the root function expression. If there is no error, then add the code for evaluating the expression to the input block and return the value to be calculated. Then, in the absence of errors, we create an LLVM-instruction "ret"
, which terminates the function. After building the function, we call the "verifyFunction"
procedure built into LLVM. It performs various checks on the consistency of the generated code to determine that our compiler is doing everything correctly. Using this procedure is very important: it can catch a lot of bugs. When the function is completed and tested, we return it. // , . TheFunction->eraseFromParent(); return 0; }
eraseFromParent
method. This will allow the user to override a function that was incorrectly dialed earlier: if we did not delete it, it would be in the symbol table with the body, prohibiting future overrides."PrototypeAST:: Codegen"
can return previously defined declarations, our code can actually delete them. There are several ways to fix this bug, think what you can think of! Here is a small test case: extern foo(ab); # o, "foo". def foo(ab) c; # , 'c'. def bar() foo(1, 2); # , "foo"
Codegen
in the functions “HandleDefinition”, “HandleExtern”, etc., and then dump the LLVM IR. Because of this, we can look at LLVM IR for simple functions. For example:ready> 4 + 5; Read top-level expression: define double @ "" () { entry: ret double 9.000000e + 00 }
IRBuilder
. We will add optimization explicitly in the next chapter.ready> def foo (ab) a * a + 2 * a * b + b * b; Read function definition: define double @foo (double% a, double% b) { entry: % multmp = fmul double% a,% a % multmp1 = fmul double 2.000000e + 00,% a % multmp2 = fmul double% multmp1,% b % addtmp = fadd double% multmp,% multmp2 % multmp3 = fmul double% b,% b % addtmp4 = fadd double% addtmp,% multmp3 ret double% addtmp4 }
Builder'
LLVM calls that we use to create instructions.ready> def bar (a) foo (a, 4.0) + bar (31337); Read function definition: define double @bar (double% a) { entry: % calltmp = call double @foo (double% a, double 4.000000e + 00) % calltmp1 = call double @bar (double 3.133700e + 04) % addtmp = fadd double% calltmp,% calltmp1 ret double% addtmp }
ready> extern cos (x); Read extern: declare double @cos (double)
ready> cos (1.234);Read top-level expression: define double @ "" () { entry: % calltmp = call double @cos (double 1.234000e + 00) ret double% calltmp }
extern
here is the extern
for the library function "cos"
and its call.ready> ^ D ; ModuleID = 'my cool jit' define double @ "" () { entry: % addtmp = fadd double 4.000000e + 00, 5.000000e + 00 ret double% addtmp } define double @foo (double% a, double% b) { entry: % multmp = fmul double% a,% a % multmp1 = fmul double 2.000000e + 00,% a % multmp2 = fmul double% multmp1,% b % addtmp = fadd double% multmp,% multmp2 % multmp3 = fmul double% b,% b % addtmp4 = fadd double% addtmp,% multmp3 ret double% addtmp4 } define double @bar (double% a) { entry: % calltmp = call double @foo (double% a, double 4.000000e + 00) % calltmp1 = call double @bar (double 3.133700e + 04) % addtmp = fadd double% calltmp,% calltmp1 ret double% addtmp } declare double @cos (double) define double @ "" () { entry: % calltmp = call double @cos (double 1.234000e + 00) ret double% calltmp }
#
g++ -g -O3 toy.cpp `llvm-config --cppflags --ldflags --libs core` -o toy
#
./toy
#include "llvm/DerivedTypes.h" #include "llvm/LLVMContext.h" #include "llvm/Module.h" #include "llvm/Analysis/Verifier.h" #include "llvm/Support/IRBuilder.h" #include <cstdio> #include <string> #include <map> #include <vector> using namespace llvm; //===----------------------------------------------------------------------===// // Lexer ( ) //===----------------------------------------------------------------------===// // [0-255], , // enum Token { tok_eof = -1, // ( ) tok_def = -2, tok_extern = -3, // ( : , ) tok_identifier = -4, tok_number = -5 }; static std::string IdentifierStr; // , tok_identifier static double NumVal; // , tok_number /// gettok - . static int gettok() { static int LastChar = ' '; // . while (isspace(LastChar)) LastChar = getchar(); if (isalpha(LastChar)) { // : [a-zA-Z][a-zA-Z0-9]* IdentifierStr = LastChar; while (isalnum((LastChar = getchar()))) IdentifierStr += LastChar; if (IdentifierStr == "def") return tok_def; if (IdentifierStr == "extern") return tok_extern; return tok_identifier; } if (isdigit(LastChar) || LastChar == '.') { // : [0-9.]+ std::string NumStr; do { NumStr += LastChar; LastChar = getchar(); } while (isdigit(LastChar) || LastChar == '.'); NumVal = strtod(NumStr.c_str(), 0); return tok_number; } if (LastChar == '#') { // do LastChar = getchar(); while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); if (LastChar != EOF) return gettok(); } // . if (LastChar == EOF) return tok_eof; // ASCII int ThisChar = LastChar; LastChar = getchar(); return ThisChar; } //===----------------------------------------------------------------------===// // Abstract Syntax Tree ( ) //===----------------------------------------------------------------------===// /// ExprAST - . class ExprAST { public: virtual ~ExprAST() {} virtual Value *Codegen() = 0; }; /// NumberExprAST - (, "1.0"). class NumberExprAST : public ExprAST { double Val; public: NumberExprAST(double val) : Val(val) {} virtual Value *Codegen(); }; /// VariableExprAST - (, "a"). class VariableExprAST : public ExprAST { std::string Name; public: VariableExprAST(const std::string &name) : Name(name) {} virtual Value *Codegen(); }; /// BinaryExprAST - . class BinaryExprAST : public ExprAST { char Op; ExprAST *LHS, *RHS; public: BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) : Op(op), LHS(lhs), RHS(rhs) {} virtual Value *Codegen(); }; /// CallExprAST - . class CallExprAST : public ExprAST { std::string Callee; std::vector<ExprAST*> Args; public: CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) : Callee(callee), Args(args) {} virtual Value *Codegen(); }; /// PrototypeAST - "" , /// (, , /// ). class PrototypeAST { std::string Name; std::vector<std::string> Args; public: PrototypeAST(const std::string &name, const std::vector<std::string> &args) : Name(name), Args(args) {} Function *Codegen(); }; /// FunctionAST - class FunctionAST { PrototypeAST *Proto; ExprAST *Body; public: FunctionAST(PrototypeAST *proto, ExprAST *body) : Proto(proto), Body(body) {} Function *Codegen(); }; //===----------------------------------------------------------------------===// // Parser ( ) //===----------------------------------------------------------------------===// /// CurTok/getNextToken - . CurTok - /// , . getNextToken /// CurTok. static int CurTok; static int getNextToken() { return CurTok = gettok(); } /// BinopPrecedence - static std::map<char, int> BinopPrecedence; /// GetTokPrecedence - . static int GetTokPrecedence() { if (!isascii(CurTok)) return -1; // , . int TokPrec = BinopPrecedence[CurTok]; if (TokPrec <= 0) return -1; return TokPrec; } /// Error* - . ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } static ExprAST *ParseExpression(); /// identifierexpr /// ::= identifier /// ::= identifier '(' expression* ')' static ExprAST *ParseIdentifierExpr() { std::string IdName = IdentifierStr; getNextToken(); // . if (CurTok != '(') // . return new VariableExprAST(IdName); // . getNextToken(); // ( std::vector<ExprAST*> Args; if (CurTok != ')') { while (1) { ExprAST *Arg = ParseExpression(); if (!Arg) return 0; Args.push_back(Arg); if (CurTok == ')') break; if (CurTok != ',') return Error("Expected ')' or ',' in argument list"); getNextToken(); } } // ')'. getNextToken(); return new CallExprAST(IdName, Args); } /// numberexpr ::= number static ExprAST *ParseNumberExpr() { ExprAST *Result = new NumberExprAST(NumVal); getNextToken(); // return Result; } /// parenexpr ::= '(' expression ')' static ExprAST *ParseParenExpr() { getNextToken(); // (. ExprAST *V = ParseExpression(); if (!V) return 0; if (CurTok != ')') return Error("expected ')'"); getNextToken(); // ). return V; } /// primary /// ::= identifierexpr /// ::= numberexpr /// ::= parenexpr static ExprAST *ParsePrimary() { switch (CurTok) { default: return Error("unknown token when expecting an expression"); case tok_identifier: return ParseIdentifierExpr(); case tok_number: return ParseNumberExpr(); case '(': return ParseParenExpr(); } } /// binoprhs /// ::= ('+' primary)* static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { // , while (1) { int TokPrec = GetTokPrecedence(); // , // , if (TokPrec < ExprPrec) return LHS; // , , . int BinOp = CurTok; getNextToken(); // // ExprAST *RHS = ParsePrimary(); if (!RHS) return 0; // BinOp RHS , RHS, // RHS LHS. int NextPrec = GetTokPrecedence(); if (TokPrec < NextPrec) { RHS = ParseBinOpRHS(TokPrec+1, RHS); if (RHS == 0) return 0; } // LHS/RHS. LHS = new BinaryExprAST(BinOp, LHS, RHS); } } /// expression /// ::= primary binoprhs /// static ExprAST *ParseExpression() { ExprAST *LHS = ParsePrimary(); if (!LHS) return 0; return ParseBinOpRHS(0, LHS); } /// prototype /// ::= id '(' id* ')' static PrototypeAST *ParsePrototype() { if (CurTok != tok_identifier) return ErrorP("Expected function name in prototype"); std::string FnName = IdentifierStr; getNextToken(); if (CurTok != '(') return ErrorP("Expected '(' in prototype"); // . std::vector<std::string> ArgNames; while (getNextToken() == tok_identifier) ArgNames.push_back(IdentifierStr); if (CurTok != ')') return ErrorP("Expected ')' in prototype"); // . getNextToken(); // ')'. return new PrototypeAST(FnName, ArgNames); } /// definition ::= 'def' prototype expression static FunctionAST *ParseDefinition() { getNextToken(); // def. PrototypeAST *Proto = ParsePrototype(); if (Proto == 0) return 0; if (ExprAST *E = ParseExpression()) return new FunctionAST(Proto, E); return 0; } /// toplevelexpr ::= expression static FunctionAST *ParseTopLevelExpr() { if (ExprAST *E = ParseExpression()) { // . PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); return new FunctionAST(Proto, E); } return 0; } /// external ::= 'extern' prototype static PrototypeAST *ParseExtern() { getNextToken(); // extern. return ParsePrototype(); } //===----------------------------------------------------------------------===// // Code Generation () //===----------------------------------------------------------------------===// static Module *TheModule; static IRBuilder<> Builder(getGlobalContext()); static std::map<std::string, Value*> NamedValues; Value *ErrorV(const char *Str) { Error(Str); return 0; } Value *NumberExprAST::Codegen() { return ConstantFP::get(getGlobalContext(), APFloat(Val)); } Value *VariableExprAST::Codegen() { // . Value *V = NamedValues[Name]; return V ? V : ErrorV("Unknown variable name"); } Value *BinaryExprAST::Codegen() { Value *L = LHS->Codegen(); Value *R = RHS->Codegen(); if (L == 0 || R == 0) return 0; switch (Op) { case '+': return Builder.CreateFAdd(L, R, "addtmp"); case '-': return Builder.CreateFSub(L, R, "subtmp"); case '*': return Builder.CreateFMul(L, R, "multmp"); case '<': L = Builder.CreateFCmpULT(L, R, "cmptmp"); // 0 1 0.0 1.0 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), "booltmp"); default: return ErrorV("invalid binary operator"); } } Value *CallExprAST::Codegen() { // . Function *CalleeF = TheModule->getFunction(Callee); if (CalleeF == 0) return ErrorV("Unknown function referenced"); // . if (CalleeF->arg_size() != Args.size()) return ErrorV("Incorrect # arguments passed"); std::vector<Value*> ArgsV; for (unsigned i = 0, e = Args.size(); i != e; ++i) { ArgsV.push_back(Args[i]->Codegen()); if (ArgsV.back() == 0) return 0; } return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp"); } Function *PrototypeAST::Codegen() { // : double(double,double) .. std::vector<const Type*> Doubles(Args.size(), Type::getDoubleTy(getGlobalContext())); FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), Doubles, false); Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); // (F) , 'Name'. // , . if (F->getName() != Name) { // , . F->eraseFromParent(); F = TheModule->getFunction(Name); // (F) , . if (!F->empty()) { ErrorF("redefinition of function"); return 0; } // (F) , . if (F->arg_size() != Args.size()) { ErrorF("redefinition of function with different # args"); return 0; } } // . unsigned Idx = 0; for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); ++AI, ++Idx) { AI->setName(Args[Idx]); // . NamedValues[Args[Idx]] = AI; } return F; } Function *FunctionAST::Codegen() { NamedValues.clear(); Function *TheFunction = Proto->Codegen(); if (TheFunction == 0) return 0; // . BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); Builder.SetInsertPoint(BB); if (Value *RetVal = Body->Codegen()) { // . Builder.CreateRet(RetVal); // , (). verifyFunction(*TheFunction); return TheFunction; } // , . TheFunction->eraseFromParent(); return 0; } //===----------------------------------------------------------------------===// // Top-Level parsing ( ) JIT //===----------------------------------------------------------------------===// static void HandleDefinition() { if (FunctionAST *F = ParseDefinition()) { if (Function *LF = F->Codegen()) { fprintf(stderr, "Read function definition:"); LF->dump(); } } else { // . getNextToken(); } } static void HandleExtern() { if (PrototypeAST *P = ParseExtern()) { if (Function *F = P->Codegen()) { fprintf(stderr, "Read extern: "); F->dump(); } } else { // . getNextToken(); } } static void HandleTopLevelExpression() { // Evaluate a top-level expression into an anonymous function. if (FunctionAST *F = ParseTopLevelExpr()) { if (Function *LF = F->Codegen()) { fprintf(stderr, "Read top-level expression:"); LF->dump(); } } else { // . getNextToken(); } } /// top ::= definition | external | expression | ';' static void MainLoop() { while (1) { fprintf(stderr, "ready> "); switch (CurTok) { case tok_eof: return; case ';': getNextToken(); break; // . case tok_def: HandleDefinition(); break; case tok_extern: HandleExtern(); break; default: HandleTopLevelExpression(); break; } } } //===----------------------------------------------------------------------===// // "" , // ("extern") . //===----------------------------------------------------------------------===// /// putchard - 0. extern "C" double putchard(double X) { putchar((char)X); return 0; } //===----------------------------------------------------------------------===// // Main driver code ( ) //===----------------------------------------------------------------------===// int main() { LLVMContext &Context = getGlobalContext(); // . // 1 - . BinopPrecedence['<'] = 10; BinopPrecedence['+'] = 20; BinopPrecedence['-'] = 20; BinopPrecedence['*'] = 40; // . fprintf(stderr, "ready> "); getNextToken(); // , . TheModule = new Module("my cool jit", Context); // " ". MainLoop(); // . TheModule->dump(); return 0; }
Source: https://habr.com/ru/post/120424/
All Articles