Creating a programming language using LLVM. Part 3: LLVM IR Code Generation

Welcome to Chapter 3 of the tutorial "Creating a programming language with LLVM." In this chapter, we will look at how to convert AST (Abstract Syntax Tree), built in Chapter 2 , to LLVM IR. She will tell you about some aspects of LLVM, as well as demonstrate how easy it is to use. You will see that much more work was required on lexical and syntactic analysis than on the direct creation of the LLVM IR code.

Please note : the code in this chapter requires LLVM 2.2 or later. With versions for LLVM 2.1 inclusive, this code will not work. Also worth noting is that you should use the version of this tutorial that corresponds to your LLVM release: you can use the documentation that comes with the official releases or visit the releases page at llvm.org .

Configuring for code generation

To generate LLVM IR, we need some simple setup. First, we define virtual code generation methods ( Codegen ) for each AST class:

 /// ExprAST -      . class ExprAST { public: virtual ~ExprAST() {}

  virtual Value *Codegen() = 0;

 }; /// NumberExprAST -       "1.0". class NumberExprAST : public ExprAST { double Val; public: NumberExprAST(double val) : Val(val) {}

  virtual Value *Codegen();

 }; ...

The Codegen() method will return the IR for this AST node along with all its dependencies, and they all return an LLVM Value object. "Value" is the class used to represent the " Static Single Assigment Register ( SSA )" or "SSA Values" in LLVM. The most important point in SSA is that their value is calculated as the execution of the associated instruction and they cannot get a new value until (and if) the instruction is executed again. In other words, there is no way to “change” the value of SSA. For more information, read about Static Single Assignment - these concepts really seem quite natural as soon as you pay attention to them.
')
Note that instead of adding virtual methods to the ExprAST class hierarchy, it may make sense to use the Visitor design pattern or some other method. Again, in this tutorial, we will not dwell on good software development methods: our goal is simplicity — and it's easier for us to add a virtual method.

We also need a method for handling errors, similar to the one used for the parser. We will use this method for error messages found during code generation (for example, using undeclared parameters):

 Value *ErrorV(const char *Str) { Error(Str); return 0; } static Module *TheModule; static IRBuilder<> Builder(getGlobalContext()); static std::map<std::string, Value*> NamedValues;

These static variables will be used during code generation. TheModule is an LLVM construct containing all functions and global variables in a piece of code. In most cases, these are top-level structures that the LLVM IR uses for the contained code.

The Builder object is an auxiliary object that makes it easy to generate LLVM instructions. Class template instances IRBuilder IRBuilder keeps track of the current place to insert instructions and contains methods for creating new instructions.

The NamedValues map NamedValues track of which values are defined in the current scope and what their LLVM representation is. (In other words, this is the symbol table for the code). In the current form in Kaleidoscope, the only thing that can be referenced is the parameters of the functions. Thus, in this map, when generating the code for the function body, the parameters of this function will be located.

Knowing all this, we can talk about code generation for any expressions. Note, it is assumed that Builder has already been created to generate code in “something”. For now, we assume that this has already been done, and we will use it to save the code.

Expression code generation

Generating LLVM code for expression nodes is very simple: less than 45 lines of commented code for all our four types of expression nodes. First, let's take numeric literals:

 Value *NumberExprAST::Codegen() { return ConstantFP::get(getGlobalContext(), APFloat(Val)); }

In LLVM IR, numeric constants are represented by the class ConstantFP , which contains inside a numeric value in the form of APFloat ( APFloat has the ability to reduce real constants to arbitrary precision numbers). This code creates and returns ConstantFP . Note that in LLVM IR all constants are unique (uniqued) and shared (shared). It is for this reason that the API uses the idiom "foo::get(...)" rather than "new foo(..)" or "foo::Create(..)" .

 Value *VariableExprAST::Codegen() { //      . Value *V = NamedValues[Name]; return V ? V : ErrorV("Unknown variable name"); }

References to variables are also quite simple when using LLVM. In the simple version of Kaleidoscope, we assume that the variable is already set somewhere and its value is available. In practice, the values on the NamedValues map can only be function arguments. This code simply checks that the specified name is defined on the map (if not, it is a link to an unknown variable) and returns a value for it. In subsequent chapters, we will add support for local variables and loop counter variables.

 Value *BinaryExprAST::Codegen() { Value *L = LHS->Codegen(); Value *R = RHS->Codegen(); if (L == 0 || R == 0) return 0; switch (Op) { case '+': return Builder.CreateFAdd(L, R, "addtmp"); case '-': return Builder.CreateFSub(L, R, "subtmp"); case '*': return Builder.CreateFMul(L, R, "multmp"); case '<': L = Builder.CreateFCmpULT(L, R, "cmptmp"); //   0  1   0.0  1.0 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), "booltmp"); default: return ErrorV("invalid binary operator"); } }

With binary operators all the fun begins. The basic idea here is that we recursively generate code for the left side of the expression, then for the right side, calculating the result of the binary expression. In this code, we made a simple switch with respect to a binary operator to create an LLVM instruction.

In the example above, the LLVM Builder class finally displays its value. IRBuilder knows where to insert the newly created instructions, and all you need to do is specify which instructions to create (for example, "CreateFAdd" ), which operands to use (in this case L and R) and, if necessary, which one to use name for the generated instruction.

Another great feature in LLVM is the assignment of names. For example, if the above uses the variable "addtmp" several times, LLVM will automatically assign a unique numeric suffix each time. Local value names for instructions are optional, but they make IR dumps much easier to read.

LLVM instructions are limited by strict rules: for example, the left and right operands of the addition instruction must be of the same type and the type of the result of the addition will correspond to the type of the operands. Since all values in Kaleidoscope are real numbers, we get very simple code for addition, subtraction and multiplication.

On the other hand, the LLVM specification says that the fcmp instruction always returns the value "i1" (one-bit integer). The problem is that we need a value of 0.0 or 1.0 in Kaleidoscope. To do this, we combine the fcmp instruction with the uitofp instruction . This instruction converts an unsigned integer to floating point. In contrast, if we used the sitofp , the operator '<' would return 0.0 and -1.0, depending on the input value.

 Value *CallExprAST::Codegen() { //      . Function *CalleeF = TheModule->getFunction(Callee); if (CalleeF == 0) return ErrorV("Unknown function referenced"); //    . if (CalleeF->arg_size() != Args.size()) return ErrorV("Incorrect # arguments passed"); std::vector<Value*> ArgsV; for (unsigned i = 0, e = Args.size(); i != e; ++i) { ArgsV.push_back(Args[i]->Codegen()); if (ArgsV.back() == 0) return 0; } return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp"); }

Generating an LLVM code to call a function is quite simple. The above code first searches for the function name in the symbol table of the LLVM module. Recall that the LLVM module is a container that contains all the functions that we can use for JIT compilation. By giving each function a name, we can use the LLVM symbol table to allow the search for function names.

After receiving the function to call, we recursively generate the code for each argument that must be passed and create the LLVM call instruction. Note that by default LLVM uses function calling conventions as in C, which allows you to use these calls for standard library functions such as "sin" and "cos" without any extra effort.

This completes the processing of the four main types of expression nodes of the Kaleidoscope language. You can go ahead and add a few more types. For example, when viewing the LLVM language, you will find several other interesting instructions that are really easy to connect to our existing development.

Function Code Generation

Code generation for prototypes and functions should handle a number of details that make their code less beautiful than generating code for expressions, but allows you to illustrate some important points. First of all, let's talk about code generation for prototypes: it is used both for the body of functions and for the declaration of external (external) functions. The code starts with:

 Function *PrototypeAST::Codegen() { //   : double(double,double)  ... std::vector<const Type*> Doubles(Args.size(), Type::getDoubleTy(getGlobalContext())); FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), Doubles, false); Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);

This multi-line code actually shows true power. First, note that this function returns "Function*" instead of "Value*" . Since the “prototype” actually speaks of the external interface for a function (and not a calculated expression), it makes sense for it to return the LLVM function corresponding to the code generated for the function.

Calling "FunctionType::get" creates FunctionType, which should be used for this prototype. Since all the function arguments in Kaleidoscope are real numbers, the first line creates an LLVM vector of “N” real numbers. The FunctionType::get method is then used to create a functional type that takes "N" valid arguments and returns one valid argument as the result. Note that the types in LLVM are unique, like constants, so you need not "new" , but "get" (in the original wordplay: "... so that you do not " create " it, but " get " .") .

The last line actually creates a function corresponding to the prototype. It indicates the type, link and name to use, and the module to insert. “ External linkage ” means that the function can be defined outside the current module and / or that it can be called outside the module. "Name" defines the name specified by the user, "TheModule" indicates that the name is registered in the symbol table "TheModule" .

  //   (F)     ,     'Name'. //     ,      . if (F->getName() != Name) { //   ,      . F->eraseFromParent(); F = TheModule->getFunction(Name);

When it comes to name conflicts, the Module's symbol table works the same way as the symbol table. Functions: if a new function is created with a name that has already been added to the symbol table, then when added to a module, it will be implicitly renamed. The above code uses this fact to determine if the same function was previously declared.

In Kaleidoscope, redefinition of functions can be in two cases. First, if we want to use extern -definition of functions more than once, as long as their prototypes are the same (since all arguments are of the same type, we just need to check the coincidence of the number of arguments). Secondly, if we want to use extern - the definition of functions, and then the definition of the body for them. This is necessary when defining mutually recursive functions.

To implement this, the above code first checks if there is a conflict of function names. If it does, it deletes the newly created function (by calling "eraseFromParent" ), and then calls "getFunction" to get an existing function with the given name. It is worth noting that many APIs in LLVM have both the "erase" form and the "remove" form. The "remove" method detaches (unlink) an object from its parent object (for example, a Function from a Module) and returns it. The "erase" method detaches (unlink) an object and then deletes it.

  //   (F)   , . if (!F->empty()) { ErrorF("redefinition of function"); return 0; } //   (F)    , . if (F->arg_size() != Args.size()) { ErrorF("redefinition of function with different # args"); return 0; } }

In order to confirm the above reasoning, we first check that the “already existing” function is “empty”. In this case, emptiness means that it does not contain basic blocks, that is, the body of the function. If the function has a body, then this is a repeated declaration, so in this case we reject the code. If the previous function is an "extern" function, we simply compare its number of arguments with the number of arguments of the current definition. If they do not match, then we display an error.

  //     . unsigned Idx = 0; for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); ++AI, ++Idx) { AI->setName(Args[Idx]); //      . NamedValues[Args[Idx]] = AI; } return F; }

The last part of the code for processing prototype functions is a loop to the arguments in the function, specifying the names of LLVM objects, the corresponding arguments, and registering the arguments in the NamedValues map for later use by the AST node VariableExprAST . It then returns the function object. Note that we do not check conflicting argument names (for example, "extern Foo (aba)" ). But this can be done quite simply and straightforwardly using the method we used above.

 Function *FunctionAST::Codegen() { NamedValues.clear(); Function *TheFunction = Proto->Codegen(); if (TheFunction == 0) return 0;

Generating code for defining functions begins quite simply: we simply call prototype code generation ( Proto ) and make sure that everything is normal. We also clear the NamedValues map to make sure that nothing remains of the last function we process. Generating a prototype code ensures that we have a ready-made functional object LLVM, so that we can move on.

  //      . BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); Builder.SetInsertPoint(BB); if (Value *RetVal = Body->Codegen()) {

Now we are going to work with Builder. The first line creates a new base unit [ en ] (named "entry" ), which is inserted into TheFunction. The second line tells Builder that new instructions should be inserted at the end of the new base unit. The basic blocks in LLVM are an important part of the function that defines the control flow graph [ en ]. Since we don't have any control flow yet, our functions will contain only one block. But we will fix this in chapter 5 :).

  if (Value *RetVal = Body->Codegen()) { //  . Builder.CreateRet(RetVal); //   ,    (). verifyFunction(*TheFunction); return TheFunction; }

Once the insertion point is set, we call code generation using CodeGen() for the root function expression. If there is no error, then add the code for evaluating the expression to the input block and return the value to be calculated. Then, in the absence of errors, we create an LLVM-instruction "ret" , which terminates the function. After building the function, we call the "verifyFunction" procedure built into LLVM. It performs various checks on the consistency of the generated code to determine that our compiler is doing everything correctly. Using this procedure is very important: it can catch a lot of bugs. When the function is completed and tested, we return it.

  //  ,   . TheFunction->eraseFromParent(); return 0; }

This small part is for error handling. For simplicity, we can handle this simple deletion of the created function using the eraseFromParent method. This will allow the user to override a function that was incorrectly dialed earlier: if we did not delete it, it would be in the symbol table with the body, prohibiting future overrides.

Actually, this code has a bug. "PrototypeAST:: Codegen" can return previously defined declarations, our code can actually delete them. There are several ways to fix this bug, think what you can think of! Here is a small test case:

 extern foo(ab); # o,  "foo". def foo(ab) c; # ,   'c'. def bar() foo(1, 2); # ,   "foo"

Changes in the main program and final thoughts

In fact, so far the code generation in LLVM has given us a little bit, except that we can look at interesting IR calls. Our code will call code generation with Codegen in the functions “HandleDefinition”, “HandleExtern”, etc., and then dump the LLVM IR. Because of this, we can look at LLVM IR for simple functions. For example:

  ready> 4 + 5;
 Read top-level expression:
 define double @ "" () {
 entry:
         ret double 9.000000e + 00
 }

Notice how the parser returns the top-level expression as an anonymous function. This will be useful when we add JIT support in the next chapter. Also note that the code is translated “very literally,” “straightforward,” there are no optimizations except for simply folding the constants made by IRBuilder . We will add optimization explicitly in the next chapter.

  ready> def foo (ab) a * a + 2 * a * b + b * b;
 Read function definition:
 define double @foo (double% a, double% b) {
 entry:
         % multmp = fmul double% a,% a
         % multmp1 = fmul double 2.000000e + 00,% a
         % multmp2 = fmul double% multmp1,% b
         % addtmp = fadd double% multmp,% multmp2
         % multmp3 = fmul double% b,% b
         % addtmp4 = fadd double% addtmp,% multmp3
         ret double% addtmp4
 }

Here are some simple arithmetic operations. Notice the striking resemblance to the Builder' LLVM calls that we use to create instructions.

  ready> def bar (a) foo (a, 4.0) + bar (31337);
 Read function definition:
 define double @bar (double% a) {
 entry:
         % calltmp = call double @foo (double% a, double 4.000000e + 00)
         % calltmp1 = call double @bar (double 3.133700e + 04)
         % addtmp = fadd double% calltmp,% calltmp1
         ret double% addtmp
 }

Here are a few function calls. Note that this function will take a long time to execute if you call it. In the future, we will add conditional flow control, making the recursion really useful :).

  ready> extern cos (x);
 Read extern: 
 declare double @cos (double)

ready> cos (1.234);

 Read top-level expression:
 define double @ "" () {
 entry:
         % calltmp = call double @cos (double 1.234000e + 00)
         ret double% calltmp
 }

extern here is the extern for the library function "cos" and its call.

  ready> ^ D
 ;  ModuleID = 'my cool jit'

 define double @ "" () {
 entry:
         % addtmp = fadd double 4.000000e + 00, 5.000000e + 00
         ret double% addtmp
 }

 define double @foo (double% a, double% b) {
 entry:
         % multmp = fmul double% a,% a
         % multmp1 = fmul double 2.000000e + 00,% a
         % multmp2 = fmul double% multmp1,% b
         % addtmp = fadd double% multmp,% multmp2
         % multmp3 = fmul double% b,% b
         % addtmp4 = fadd double% addtmp,% multmp3
         ret double% addtmp4
 }

 define double @bar (double% a) {
 entry:
         % calltmp = call double @foo (double% a, double 4.000000e + 00)
         % calltmp1 = call double @bar (double 3.133700e + 04)
         % addtmp = fadd double% calltmp,% calltmp1
         ret double% addtmp
 }

 declare double @cos (double)

 define double @ "" () {
 entry:
         % calltmp = call double @cos (double 1.234000e + 00)
         ret double% calltmp
 }

When exiting the current demo, an IR dump for the entire generated module will be returned. .

Kaleidoscope. JIT , !

, , LLVM. LLVM, . llvm-config :

# g++ -g -O3 toy.cpp `llvm-config --cppflags --ldflags --libs core` -o toy # ./toy

, , :

 #include "llvm/DerivedTypes.h" #include "llvm/LLVMContext.h" #include "llvm/Module.h" #include "llvm/Analysis/Verifier.h" #include "llvm/Support/IRBuilder.h" #include <cstdio> #include <string> #include <map> #include <vector> using namespace llvm; //===----------------------------------------------------------------------===// // Lexer ( ) //===----------------------------------------------------------------------===// //     [0-255],   , //       enum Token { tok_eof = -1, //  ( ) tok_def = -2, tok_extern = -3, //  ( : , ) tok_identifier = -4, tok_number = -5 }; static std::string IdentifierStr; // ,  tok_identifier static double NumVal; // ,  tok_number /// gettok -       . static int gettok() { static int LastChar = ' '; //  . while (isspace(LastChar)) LastChar = getchar(); if (isalpha(LastChar)) { // : [a-zA-Z][a-zA-Z0-9]* IdentifierStr = LastChar; while (isalnum((LastChar = getchar()))) IdentifierStr += LastChar; if (IdentifierStr == "def") return tok_def; if (IdentifierStr == "extern") return tok_extern; return tok_identifier; } if (isdigit(LastChar) || LastChar == '.') { // : [0-9.]+ std::string NumStr; do { NumStr += LastChar; LastChar = getchar(); } while (isdigit(LastChar) || LastChar == '.'); NumVal = strtod(NumStr.c_str(), 0); return tok_number; } if (LastChar == '#') { //     do LastChar = getchar(); while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); if (LastChar != EOF) return gettok(); } //   . if (LastChar == EOF) return tok_eof; //         ASCII int ThisChar = LastChar; LastChar = getchar(); return ThisChar; } //===----------------------------------------------------------------------===// // Abstract Syntax Tree (     ) //===----------------------------------------------------------------------===// /// ExprAST -      . class ExprAST { public: virtual ~ExprAST() {} virtual Value *Codegen() = 0; }; /// NumberExprAST -       (, "1.0"). class NumberExprAST : public ExprAST { double Val; public: NumberExprAST(double val) : Val(val) {} virtual Value *Codegen(); }; /// VariableExprAST -      (, "a"). class VariableExprAST : public ExprAST { std::string Name; public: VariableExprAST(const std::string &name) : Name(name) {} virtual Value *Codegen(); }; /// BinaryExprAST -      . class BinaryExprAST : public ExprAST { char Op; ExprAST *LHS, *RHS; public: BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) : Op(op), LHS(lhs), RHS(rhs) {} virtual Value *Codegen(); }; /// CallExprAST -      . class CallExprAST : public ExprAST { std::string Callee; std::vector<ExprAST*> Args; public: CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) : Callee(callee), Args(args) {} virtual Value *Codegen(); }; /// PrototypeAST -    ""  , ///        (,  , ///    ). class PrototypeAST { std::string Name; std::vector<std::string> Args; public: PrototypeAST(const std::string &name, const std::vector<std::string> &args) : Name(name), Args(args) {} Function *Codegen(); }; /// FunctionAST -     class FunctionAST { PrototypeAST *Proto; ExprAST *Body; public: FunctionAST(PrototypeAST *proto, ExprAST *body) : Proto(proto), Body(body) {} Function *Codegen(); }; //===----------------------------------------------------------------------===// // Parser (   ) //===----------------------------------------------------------------------===// /// CurTok/getNextToken -    . CurTok -   /// ,  . getNextToken     ///     CurTok. static int CurTok; static int getNextToken() { return CurTok = gettok(); } /// BinopPrecedence -      static std::map<char, int> BinopPrecedence; /// GetTokPrecedence -     . static int GetTokPrecedence() { if (!isascii(CurTok)) return -1; // ,     . int TokPrec = BinopPrecedence[CurTok]; if (TokPrec <= 0) return -1; return TokPrec; } /// Error* -       . ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } static ExprAST *ParseExpression(); /// identifierexpr /// ::= identifier /// ::= identifier '(' expression* ')' static ExprAST *ParseIdentifierExpr() { std::string IdName = IdentifierStr; getNextToken(); //  . if (CurTok != '(') //  . return new VariableExprAST(IdName); //  . getNextToken(); //  ( std::vector<ExprAST*> Args; if (CurTok != ')') { while (1) { ExprAST *Arg = ParseExpression(); if (!Arg) return 0; Args.push_back(Arg); if (CurTok == ')') break; if (CurTok != ',') return Error("Expected ')' or ',' in argument list"); getNextToken(); } } //  ')'. getNextToken(); return new CallExprAST(IdName, Args); } /// numberexpr ::= number static ExprAST *ParseNumberExpr() { ExprAST *Result = new NumberExprAST(NumVal); getNextToken(); //   return Result; } /// parenexpr ::= '(' expression ')' static ExprAST *ParseParenExpr() { getNextToken(); //  (. ExprAST *V = ParseExpression(); if (!V) return 0; if (CurTok != ')') return Error("expected ')'"); getNextToken(); //  ). return V; } /// primary /// ::= identifierexpr /// ::= numberexpr /// ::= parenexpr static ExprAST *ParsePrimary() { switch (CurTok) { default: return Error("unknown token when expecting an expression"); case tok_identifier: return ParseIdentifierExpr(); case tok_number: return ParseNumberExpr(); case '(': return ParseParenExpr(); } } /// binoprhs /// ::= ('+' primary)* static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { //    ,    while (1) { int TokPrec = GetTokPrecedence(); //           , //  ,    if (TokPrec < ExprPrec) return LHS; // ,  ,    . int BinOp = CurTok; getNextToken(); //    //       ExprAST *RHS = ParsePrimary(); if (!RHS) return 0; //  BinOp   RHS  ,    RHS, //      RHS  LHS. int NextPrec = GetTokPrecedence(); if (TokPrec < NextPrec) { RHS = ParseBinOpRHS(TokPrec+1, RHS); if (RHS == 0) return 0; } //  LHS/RHS. LHS = new BinaryExprAST(BinOp, LHS, RHS); } } /// expression /// ::= primary binoprhs /// static ExprAST *ParseExpression() { ExprAST *LHS = ParsePrimary(); if (!LHS) return 0; return ParseBinOpRHS(0, LHS); } /// prototype /// ::= id '(' id* ')' static PrototypeAST *ParsePrototype() { if (CurTok != tok_identifier) return ErrorP("Expected function name in prototype"); std::string FnName = IdentifierStr; getNextToken(); if (CurTok != '(') return ErrorP("Expected '(' in prototype"); //    . std::vector<std::string> ArgNames; while (getNextToken() == tok_identifier) ArgNames.push_back(IdentifierStr); if (CurTok != ')') return ErrorP("Expected ')' in prototype"); //  . getNextToken(); //  ')'. return new PrototypeAST(FnName, ArgNames); } /// definition ::= 'def' prototype expression static FunctionAST *ParseDefinition() { getNextToken(); //  def. PrototypeAST *Proto = ParsePrototype(); if (Proto == 0) return 0; if (ExprAST *E = ParseExpression()) return new FunctionAST(Proto, E); return 0; } /// toplevelexpr ::= expression static FunctionAST *ParseTopLevelExpr() { if (ExprAST *E = ParseExpression()) { //   . PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); return new FunctionAST(Proto, E); } return 0; } /// external ::= 'extern' prototype static PrototypeAST *ParseExtern() { getNextToken(); //  extern. return ParsePrototype(); } //===----------------------------------------------------------------------===// // Code Generation () //===----------------------------------------------------------------------===// static Module *TheModule; static IRBuilder<> Builder(getGlobalContext()); static std::map<std::string, Value*> NamedValues; Value *ErrorV(const char *Str) { Error(Str); return 0; } Value *NumberExprAST::Codegen() { return ConstantFP::get(getGlobalContext(), APFloat(Val)); } Value *VariableExprAST::Codegen() { //      . Value *V = NamedValues[Name]; return V ? V : ErrorV("Unknown variable name"); } Value *BinaryExprAST::Codegen() { Value *L = LHS->Codegen(); Value *R = RHS->Codegen(); if (L == 0 || R == 0) return 0; switch (Op) { case '+': return Builder.CreateFAdd(L, R, "addtmp"); case '-': return Builder.CreateFSub(L, R, "subtmp"); case '*': return Builder.CreateFMul(L, R, "multmp"); case '<': L = Builder.CreateFCmpULT(L, R, "cmptmp"); //   0  1   0.0  1.0 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), "booltmp"); default: return ErrorV("invalid binary operator"); } } Value *CallExprAST::Codegen() { //      . Function *CalleeF = TheModule->getFunction(Callee); if (CalleeF == 0) return ErrorV("Unknown function referenced"); //    . if (CalleeF->arg_size() != Args.size()) return ErrorV("Incorrect # arguments passed"); std::vector<Value*> ArgsV; for (unsigned i = 0, e = Args.size(); i != e; ++i) { ArgsV.push_back(Args[i]->Codegen()); if (ArgsV.back() == 0) return 0; } return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp"); } Function *PrototypeAST::Codegen() { //   : double(double,double)  .. std::vector<const Type*> Doubles(Args.size(), Type::getDoubleTy(getGlobalContext())); FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), Doubles, false); Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); //   (F)     ,     'Name'. //     ,      . if (F->getName() != Name) { //   ,      . F->eraseFromParent(); F = TheModule->getFunction(Name); //   (F)   , . if (!F->empty()) { ErrorF("redefinition of function"); return 0; } //   (F)    , . if (F->arg_size() != Args.size()) { ErrorF("redefinition of function with different # args"); return 0; } } //     . unsigned Idx = 0; for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); ++AI, ++Idx) { AI->setName(Args[Idx]); //      . NamedValues[Args[Idx]] = AI; } return F; } Function *FunctionAST::Codegen() { NamedValues.clear(); Function *TheFunction = Proto->Codegen(); if (TheFunction == 0) return 0; //      . BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); Builder.SetInsertPoint(BB); if (Value *RetVal = Body->Codegen()) { //  . Builder.CreateRet(RetVal); //   ,    (). verifyFunction(*TheFunction); return TheFunction; } //  ,   . TheFunction->eraseFromParent(); return 0; } //===----------------------------------------------------------------------===// // Top-Level parsing (  )   JIT //===----------------------------------------------------------------------===// static void HandleDefinition() { if (FunctionAST *F = ParseDefinition()) { if (Function *LF = F->Codegen()) { fprintf(stderr, "Read function definition:"); LF->dump(); } } else { //      . getNextToken(); } } static void HandleExtern() { if (PrototypeAST *P = ParseExtern()) { if (Function *F = P->Codegen()) { fprintf(stderr, "Read extern: "); F->dump(); } } else { //      . getNextToken(); } } static void HandleTopLevelExpression() { // Evaluate a top-level expression into an anonymous function. if (FunctionAST *F = ParseTopLevelExpr()) { if (Function *LF = F->Codegen()) { fprintf(stderr, "Read top-level expression:"); LF->dump(); } } else { //      . getNextToken(); } } /// top ::= definition | external | expression | ';' static void MainLoop() { while (1) { fprintf(stderr, "ready> "); switch (CurTok) { case tok_eof: return; case ';': getNextToken(); break; //     . case tok_def: HandleDefinition(); break; case tok_extern: HandleExtern(); break; default: HandleTopLevelExpression(); break; } } } //===----------------------------------------------------------------------===// // "" ,     //   ("extern")   . //===----------------------------------------------------------------------===// /// putchard -        0. extern "C" double putchard(double X) { putchar((char)X); return 0; } //===----------------------------------------------------------------------===// // Main driver code (  ) //===----------------------------------------------------------------------===// int main() { LLVMContext &Context = getGlobalContext(); //    . // 1 -  . BinopPrecedence['<'] = 10; BinopPrecedence['+'] = 20; BinopPrecedence['-'] = 20; BinopPrecedence['*'] = 40; //  . fprintf(stderr, "ready> "); getNextToken(); //  ,     . TheModule = new Module("my cool jit", Context); //    " ". MainLoop(); //   . TheModule->dump(); return 0; }

Source: https://habr.com/ru/post/120424/

All Articles

Creating a programming language using LLVM. Part 3: LLVM IR Code Generation

Configuring for code generation

Expression code generation

Function Code Generation

Changes in the main program and final thoughts

More articles: