ready> def test(x) 1+2+x; Read function definition: define double @test(double %x) { entry: %addtmp = fadd double 3.000000e+00, %x ret double %addtmp }
ready> def test (x) 1 + 2 + x; Read function definition: define double @test (double% x) { entry: % addtmp = fadd double 2.000000e + 00, 1.000000e + 00 % addtmp1 = fadd double% addtmp,% x ret double% addtmp1 }
LLVM IR Builder
, which itself checks if it is possible to collapse constants. It simply collapses the constant and returns a new constant instead. to create instruction.IRBuilder
when generating code. It has no “syntax overhead” when used (you don’t have to disfigure the compiler with constant checks everywhere), and in some cases it can significantly reduce the number of generated LLVM IRs (especially for languages ​​with macro preprocessors or those using many constants).IRBuilder
limited by the fact that it does all the analysis of the code "as is", that is, with the already constructed code. If you take a slightly more complicated example:ready> def test (x) (1 + 2 + x) * (x + (1 + 2)); ready> Read function definition: define double @test (double% x) { entry: % addtmp = fadd double 3.000000e + 00,% x % addtmp1 = fadd double% x, 3.000000e + 00 % multmp = fmul double% addtmp,% addtmp1 ret double% multmp }
FunctionPassManager
FunctionPassManager
, which stores and organizes LLVM optimizations that we want to run. Then we can add a set of optimizations to run. The code looks like this: FunctionPassManager OurFPM(TheModule); // . , // . OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData())); // AliasAnalysis GVN. OurFPM.add(createBasicAliasAnalysisPass()); // "peephole" "bit-twiddling". OurFPM.add(createInstructionCombiningPass()); // . OurFPM.add(createReassociatePass()); // . OurFPM.add(createGVNPass()); // ( ..). OurFPM.add(createCFGSimplificationPass()); OurFPM.doInitialization(); // , . TheFPM = &OurFPM; // " ". MainLoop();
FunctionPassManager
named "OurFPM"
. When creating it, it is necessary to pass a pointer to the module. Once this is done, we use a series of "add"
calls to add LLVM passes. The first pass in most cases is template, it is necessary so that subsequent optimizations know what data structures are used in the program. The variable "TheExecutionEngine"
is associated with the JIT, which we will get in the next section.PassManager
, we have to use it. We run it after generating our new function (in FunctionAST::Codegen
), but before returning to the user:
if (Value *RetVal = Body->Codegen()) { // . Builder.CreateRet(RetVal); // , (). verifyFunction(*TheFunction);
// . TheFPM->run(*TheFunction);
return TheFunction; }
FunctionPassManager
optimizes and immediately updates LLVM Function*
, improving (I really hope so) the function body. Now we will try to test our code again:ready> def test (x) (1 + 2 + x) * (x + (1 + 2)); ready> Read function definition: define double @test (double% x) { entry: % addtmp = fadd double% x, 3.000000e + 00 % multmp = fmul double% addtmp,% addtmp ret double% multmp }
main
:
static ExecutionEngine *TheExecutionEngine;
... int main() { ..
// JIT. . TheExecutionEngine = EngineBuilder(TheModule).create();
.. }
"Execution Engine"
, which can be either a compiler or an LLVM interpreter. LLVM will automatically select the JIT compiler for you if it exists for your platform, otherwise it will return back to the interpreter.ExecutionEngine
, JIT is ready for use. There are various useful APIs, but the simplest of these is the "getPointerToFunction(F)"
method. This JIT method compiles the specified LLVM function and returns a function pointer to the generated machine code. In our case, this means that we can change the code that parses the top-level expression as follows:
static void HandleTopLevelExpression() { // . if (FunctionAST *F = ParseTopLevelExpr()) { if (Function *LF = F->Codegen()) { LF->dump(); // .
// JIT-, . void *FPtr = TheExecutionEngine->getPointerToFunction(LF); // ( , double), // . double (*FP)() = (double (*)())(intptr_t)FPtr; fprintf(stderr, "Evaluated to %f\n", FP());
}
ready> 4 + 5; define double @ "" () { entry: ret double 9.000000e + 00 } Evaluated to 9.000000
ready> def testfunc (xy) x + y * 2; Read function definition: define double @testfunc (double% x, double% y) { entry: % multmp = fmul double% y, 2.000000e + 00 % addtmp = fadd double% multmp,% x ret double% addtmp }
ready> testfunc (4, 10);define double @ "" () { entry: % calltmp = call double @testfunc (double 4.000000e + 00, double 1.000000e + 01) ret double% calltmp } Evaluated to 24.000000
testfunc
, but never call it on testfunc itself. In fact, that the JIT for all non-anonymous functions is transitively called and compiled from an anonymous function before being returned from getPointerToFunction()
.ready> extern sin (x); Read extern: declare double @sin (double)
ready> extern cos (x);Read extern: declare double @cos (double)
ready> sin (1.0);Evaluated to 0.841471
ready> def foo (x) sin (x) * sin (x) + cos (x) * cos (x);Read function definition: define double @foo (double% x) { entry: % calltmp = call double @sin (double% x) % multmp = fmul double% calltmp,% calltmp % calltmp2 = call double @cos (double% x) % multmp4 = fmul double% calltmp2,% calltmp2 % addtmp = fadd double% multmp,% multmp4 ret double% addtmp }
ready> foo (4.0);Evaluated to 1.000000
sin
and cos
? The answer is surprisingly simple: in this example, JIT started the function execution and got to the function call. He realized that the function had not yet been compiled by JIT and called the standard set of procedures for calculating the function. In this case, its body is not defined for the function, so the JIT will end up by calling "dlsym("sin")"
. Since "sin"
defined in the JIT address space, it simply directly calls the library function.ExecutionEngine.h
) to control how the receipt of an undefined function is handled. This allows you to set up an explicit mapping between IR objects and addresses, allowing you to dynamically resolve function names on the fly, and even allow lazy JIT compilation of functions when they are first used. /// putchard - 0. extern "C" double putchard(double X) { putchar((char)X); return 0; }
"extern putchard(); putchard(120);"
which prints lowercase 'x' to the console (120 is the ASCII code for 'x'). Similar code can be used in Kaleidoscope to implement file I / O, console input, and many other features.#
g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
#
./toy
#include "llvm/DerivedTypes.h" #include "llvm/ExecutionEngine/ExecutionEngine.h" #include "llvm/ExecutionEngine/JIT.h" #include "llvm/LLVMContext.h" #include "llvm/Module.h" #include "llvm/PassManager.h" #include "llvm/Analysis/Verifier.h" #include "llvm/Analysis/Passes.h" #include "llvm/Target/TargetData.h" #include "llvm/Target/TargetSelect.h" #include "llvm/Transforms/Scalar.h" #include "llvm/Support/IRBuilder.h" #include <cstdio> #include <string> #include <map> #include <vector> using namespace llvm; //===----------------------------------------------------------------------===// // Lexer ( ) //===----------------------------------------------------------------------===// // [0-255], , // enum Token { tok_eof = -1, // ( ) tok_def = -2, tok_extern = -3, // ( : , ) tok_identifier = -4, tok_number = -5 }; static std::string IdentifierStr; // , tok_identifier static double NumVal; // , tok_number /// gettok - . static int gettok() { static int LastChar = ' '; // . while (isspace(LastChar)) LastChar = getchar(); if (isalpha(LastChar)) { // : [a-zA-Z][a-zA-Z0-9]* IdentifierStr = LastChar; while (isalnum((LastChar = getchar()))) IdentifierStr += LastChar; if (IdentifierStr == "def") return tok_def; if (IdentifierStr == "extern") return tok_extern; return tok_identifier; } if (isdigit(LastChar) || LastChar == '.') { // : [0-9.]+ std::string NumStr; do { NumStr += LastChar; LastChar = getchar(); } while (isdigit(LastChar) || LastChar == '.'); NumVal = strtod(NumStr.c_str(), 0); return tok_number; } if (LastChar == '#') { // do LastChar = getchar(); while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); if (LastChar != EOF) return gettok(); } // . if (LastChar == EOF) return tok_eof; // ASCII int ThisChar = LastChar; LastChar = getchar(); return ThisChar; } //===----------------------------------------------------------------------===// // Abstract Syntax Tree ( ) //===----------------------------------------------------------------------===// /// ExprAST - . class ExprAST { public: virtual ~ExprAST() {} virtual Value *Codegen() = 0; }; /// NumberExprAST - (, "1.0"). class NumberExprAST : public ExprAST { double Val; public: NumberExprAST(double val) : Val(val) {} virtual Value *Codegen(); }; /// VariableExprAST - (, "a"). class VariableExprAST : public ExprAST { std::string Name; public: VariableExprAST(const std::string &name) : Name(name) {} virtual Value *Codegen(); }; /// BinaryExprAST - . class BinaryExprAST : public ExprAST { char Op; ExprAST *LHS, *RHS; public: BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) : Op(op), LHS(lhs), RHS(rhs) {} virtual Value *Codegen(); }; /// CallExprAST - . class CallExprAST : public ExprAST { std::string Callee; std::vector<ExprAST*> Args; public: CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) : Callee(callee), Args(args) {} virtual Value *Codegen(); }; /// PrototypeAST - "" , /// (, , /// ). class PrototypeAST { std::string Name; std::vector<std::string> Args; public: PrototypeAST(const std::string &name, const std::vector<std::string> &args) : Name(name), Args(args) {} Function *Codegen(); }; /// FunctionAST - class FunctionAST { PrototypeAST *Proto; ExprAST *Body; public: FunctionAST(PrototypeAST *proto, ExprAST *body) : Proto(proto), Body(body) {} Function *Codegen(); }; //===----------------------------------------------------------------------===// // Parser ( ) //===----------------------------------------------------------------------===// /// CurTok/getNextToken - . CurTok - /// , . getNextToken /// CurTok. static int CurTok; static int getNextToken() { return CurTok = gettok(); } /// BinopPrecedence - static std::map<char, int> BinopPrecedence; /// GetTokPrecedence - . static int GetTokPrecedence() { if (!isascii(CurTok)) return -1; // , . int TokPrec = BinopPrecedence[CurTok]; if (TokPrec <= 0) return -1; return TokPrec; } /// Error* - . ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } static ExprAST *ParseExpression(); /// identifierexpr /// ::= identifier /// ::= identifier '(' expression* ')' static ExprAST *ParseIdentifierExpr() { std::string IdName = IdentifierStr; getNextToken(); // . if (CurTok != '(') // . return new VariableExprAST(IdName); // . getNextToken(); // ( std::vector<ExprAST*> Args; if (CurTok != ')') { while (1) { ExprAST *Arg = ParseExpression(); if (!Arg) return 0; Args.push_back(Arg); if (CurTok == ')') break; if (CurTok != ',') return Error("Expected ')' or ',' in argument list"); getNextToken(); } } // ')'. getNextToken(); return new CallExprAST(IdName, Args); } /// numberexpr ::= number static ExprAST *ParseNumberExpr() { ExprAST *Result = new NumberExprAST(NumVal); getNextToken(); // return Result; } /// parenexpr ::= '(' expression ')' static ExprAST *ParseParenExpr() { getNextToken(); // (. ExprAST *V = ParseExpression(); if (!V) return 0; if (CurTok != ')') return Error("expected ')'"); getNextToken(); // ). return V; } /// primary /// ::= identifierexpr /// ::= numberexpr /// ::= parenexpr static ExprAST *ParsePrimary() { switch (CurTok) { default: return Error("unknown token when expecting an expression"); case tok_identifier: return ParseIdentifierExpr(); case tok_number: return ParseNumberExpr(); case '(': return ParseParenExpr(); } } /// binoprhs /// ::= ('+' primary)* static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { // , while (1) { int TokPrec = GetTokPrecedence(); // , // , if (TokPrec < ExprPrec) return LHS; // , , . int BinOp = CurTok; getNextToken(); // eat binop // ExprAST *RHS = ParsePrimary(); if (!RHS) return 0; // BinOp RHS , RHS, // RHS LHS. int NextPrec = GetTokPrecedence(); if (TokPrec < NextPrec) { RHS = ParseBinOpRHS(TokPrec+1, RHS); if (RHS == 0) return 0; } // LHS/RHS. LHS = new BinaryExprAST(BinOp, LHS, RHS); } } /// expression /// ::= primary binoprhs /// static ExprAST *ParseExpression() { ExprAST *LHS = ParsePrimary(); if (!LHS) return 0; return ParseBinOpRHS(0, LHS); } /// prototype /// ::= id '(' id* ')' static PrototypeAST *ParsePrototype() { if (CurTok != tok_identifier) return ErrorP("Expected function name in prototype"); std::string FnName = IdentifierStr; getNextToken(); if (CurTok != '(') return ErrorP("Expected '(' in prototype"); // . std::vector<std::string> ArgNames; while (getNextToken() == tok_identifier) ArgNames.push_back(IdentifierStr); if (CurTok != ')') return ErrorP("Expected ')' in prototype"); // . getNextToken(); // ')'. return new PrototypeAST(FnName, ArgNames); } /// definition ::= 'def' prototype expression static FunctionAST *ParseDefinition() { getNextToken(); // def. PrototypeAST *Proto = ParsePrototype(); if (Proto == 0) return 0; if (ExprAST *E = ParseExpression()) return new FunctionAST(Proto, E); return 0; } /// toplevelexpr ::= expression static FunctionAST *ParseTopLevelExpr() { if (ExprAST *E = ParseExpression()) { // . PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); return new FunctionAST(Proto, E); } return 0; } /// external ::= 'extern' prototype static PrototypeAST *ParseExtern() { getNextToken(); // extern. return ParsePrototype(); } //===----------------------------------------------------------------------===// // Code Generation () //===----------------------------------------------------------------------===// static Module *TheModule; static IRBuilder<> Builder(getGlobalContext()); static std::map<std::string, Value*> NamedValues; static FunctionPassManager *TheFPM; Value *ErrorV(const char *Str) { Error(Str); return 0; } Value *NumberExprAST::Codegen() { return ConstantFP::get(getGlobalContext(), APFloat(Val)); } Value *VariableExprAST::Codegen() { // . Value *V = NamedValues[Name]; return V ? V : ErrorV("Unknown variable name"); } Value *BinaryExprAST::Codegen() { Value *L = LHS->Codegen(); Value *R = RHS->Codegen(); if (L == 0 || R == 0) return 0; switch (Op) { case '+': return Builder.CreateFAdd(L, R, "addtmp"); case '-': return Builder.CreateFSub(L, R, "subtmp"); case '*': return Builder.CreateFMul(L, R, "multmp"); case '<': L = Builder.CreateFCmpULT(L, R, "cmptmp"); // 0 1 0.0 1.0 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), "booltmp"); default: return ErrorV("invalid binary operator"); } } Value *CallExprAST::Codegen() { // . Function *CalleeF = TheModule->getFunction(Callee); if (CalleeF == 0) return ErrorV("Unknown function referenced"); // . if (CalleeF->arg_size() != Args.size()) return ErrorV("Incorrect # arguments passed"); std::vector<Value*> ArgsV; for (unsigned i = 0, e = Args.size(); i != e; ++i) { ArgsV.push_back(Args[i]->Codegen()); if (ArgsV.back() == 0) return 0; } return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp"); } Function *PrototypeAST::Codegen() { // : double(double,double) .. std::vector<const Type*> Doubles(Args.size(), Type::getDoubleTy(getGlobalContext())); FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), Doubles, false); Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); // (F) , 'Name'. // , . if (F->getName() != Name) { // , . F->eraseFromParent(); F = TheModule->getFunction(Name); // (F) , . if (!F->empty()) { ErrorF("redefinition of function"); return 0; } // (F) , . if (F->arg_size() != Args.size()) { ErrorF("redefinition of function with different # args"); return 0; } } // . unsigned Idx = 0; for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); ++AI, ++Idx) { AI->setName(Args[Idx]); // . NamedValues[Args[Idx]] = AI; } return F; } Function *FunctionAST::Codegen() { NamedValues.clear(); Function *TheFunction = Proto->Codegen(); if (TheFunction == 0) return 0; // . BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); Builder.SetInsertPoint(BB); if (Value *RetVal = Body->Codegen()) { // . Builder.CreateRet(RetVal); // , (). verifyFunction(*TheFunction); // . TheFPM->run(*TheFunction); return TheFunction; } // , . TheFunction->eraseFromParent(); return 0; } //===----------------------------------------------------------------------===// // Top-Level parsing ( ) JIT //===----------------------------------------------------------------------===// static ExecutionEngine *TheExecutionEngine; static void HandleDefinition() { if (FunctionAST *F = ParseDefinition()) { if (Function *LF = F->Codegen()) { fprintf(stderr, "Read function definition:"); LF->dump(); } } else { // . getNextToken(); } } static void HandleExtern() { if (PrototypeAST *P = ParseExtern()) { if (Function *F = P->Codegen()) { fprintf(stderr, "Read extern: "); F->dump(); } } else { // . getNextToken(); } } static void HandleTopLevelExpression() { // Evaluate a top-level expression into an anonymous function. if (FunctionAST *F = ParseTopLevelExpr()) { if (Function *LF = F->Codegen()) { // JIT-, . void *FPtr = TheExecutionEngine->getPointerToFunction(LF); // ( , double), // . double (*FP)() = (double (*)())(intptr_t)FPtr; fprintf(stderr, "Evaluated to %f\n", FP()); } } else { // . getNextToken(); } } /// top ::= definition | external | expression | ';' static void MainLoop() { while (1) { fprintf(stderr, "ready> "); switch (CurTok) { case tok_eof: return; case ';': getNextToken(); break; // . case tok_def: HandleDefinition(); break; case tok_extern: HandleExtern(); break; default: HandleTopLevelExpression(); break; } } } //===----------------------------------------------------------------------===// // "" , // ("extern") . //===----------------------------------------------------------------------===// /// putchard - 0. extern "C" double putchard(double X) { putchar((char)X); return 0; } //===----------------------------------------------------------------------===// // Main driver code ( ) //===----------------------------------------------------------------------===// int main() { InitializeNativeTarget(); LLVMContext &Context = getGlobalContext(); // . // 1 - . BinopPrecedence['<'] = 10; BinopPrecedence['+'] = 20; BinopPrecedence['-'] = 20; BinopPrecedence['*'] = 40; // . fprintf(stderr, "ready> "); getNextToken(); // , . TheModule = new Module("my cool jit", Context); // JIT. . std::string ErrStr; TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create(); if (!TheExecutionEngine) { fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str()); exit(1); } FunctionPassManager OurFPM(TheModule); // . , // . OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData())); // AliasAnalysis GVN. OurFPM.add(createBasicAliasAnalysisPass()); // "peephole" "bit-twiddling". OurFPM.add(createInstructionCombiningPass()); // . OurFPM.add(createReassociatePass()); // . OurFPM.add(createGVNPass()); // ( ..). OurFPM.add(createCFGSimplificationPass()); OurFPM.doInitialization(); // , . TheFPM = &OurFPM; // " ". MainLoop(); TheFPM = 0; // . TheModule->dump(); return 0; }
Source: https://habr.com/ru/post/120516/
All Articles