LLVM generates code that generates code

Back from last week’s LLVM developer’s meeting, I’d like to talk about one of my favorite features of LLVM .

LLVM is the compiler infrastructure that underlies Clang, Vuo, and many other projects. It’s a set of libraries to help you build compilers (and more). Among other things, LLVM provides a C++ API for generating LLVM Intermediate Representation (LLVM IR) code. LLVM IR is an assembly language for a hypothetical computer. LLVM IR code can be either interpreted or compiled down to native code.

So LLVM provides this C++ API for generating LLVM IR code — but it doesn’t stop there. LLVM can also generate C++ code that generates LLVM IR code. In other words, LLVM can literally write part of your compiler for you! Here’s how it works. Suppose you want to generate LLVM IR for a for loop, but you’re not sure how. First, write a for loop in C. Then, feed that to LLVM. (Details below.) LLVM spits out C++ code that calls into LLVM’s code generation API. If you run the C++ code, it generates LLVM IR that’s functionally equivalent to the for loop you wrote in C.

Dizzy yet? It makes more sense with an example, so read on.

The web interface: llvm.org/demo

The easiest way to make LLVM generate C++ code is to go to llvm.org/demo and, under Output Options, choose LLVM C++ API code for the target.

Let’s walk through a simple example. Suppose we want to generate LLVM IR that stores an integer on the stack. We’ll fill out the web form as shown in this image:

(For those who can’t see the image, I entered void foo() { int stackVariable = 42; } in the source code field, chose C for the source language, Standard for the optimization level, and LLVM C++ API code for the target.)

Now let’s hit Compile Source Code and… woops. The C++ code for the foo function makes no mention of stackVariable:

// Function: foo (func_foo)
 {
 
  BasicBlock* label_1 = BasicBlock::Create(mod->getContext(), "",func_foo,0);
 
  // Block  (label_1)
  ReturnInst::Create(mod->getContext(), label_1);
 
 }

Looks like stackVariable got optimized away.

That’s easy to fix. For the optimization level, change from Standard to None¹. Now the C++ code for the foo function is more informative:

// Function: foo (func_foo)
 {
 
  BasicBlock* label_4 = BasicBlock::Create(mod->getContext(), "",func_foo,0);
 
  // Block  (label_4)
  AllocaInst* ptr_stackVariable = new AllocaInst(IntegerType::get(mod->getContext(), 32), "stackVariable", label_4);
  ptr_stackVariable->setAlignment(4);
  StoreInst* void_5 = new StoreInst(const_int32_3, ptr_stackVariable, false, label_4);
  void_5->setAlignment(4);
  ReturnInst::Create(mod->getContext(), label_4);
 
 }

At the top of the generated C++ code, you’ll notice a stern message: Generated by llvm2cpp - DO NOT MODIFY! Well, fasten your safety belts because, in the interest of writing readable C++ code, we’re going to MODIFY it anyway.

Since all we care about is storing an integer on the stack, we can eliminate a lot of cruft. The relevant code² (which I’ll wrap in a function) is:

void generateIntegerStore(Module* mod, BasicBlock* label_4)
{
  ConstantInt* const_int32_3 = ConstantInt::get(mod->getContext(), APInt(32, StringRef("42"), 10));
 
  AllocaInst* ptr_stackVariable = new AllocaInst(IntegerType::get(mod->getContext(), 32), "stackVariable", label_4);
  ptr_stackVariable->setAlignment(4);
  StoreInst* void_5 = new StoreInst(const_int32_3, ptr_stackVariable, false, label_4);
  void_5->setAlignment(4);
}

Let’s remove the calls to setAlignment. Specifying alignment is optional and makes your code target-dependent. (I wish LLVM didn’t generate these calls by default!)

Let’s simplify the APInt by passing an integer instead of a StringRef: APInt(32, 42). In the generated code, this still creates a 32-bit integer literal having value 42.

Finally, let’s give the variables more readable names.

The final code is:

void generateIntegerStore(Module* mod, BasicBlock* block)
{
  ConstantInt* valueToStore = ConstantInt::get(mod->getContext(), APInt(32, 42));
 
  AllocaInst* ptr_stackVariable = new AllocaInst(IntegerType::get(mod->getContext(), 32), "stackVariable", block);
  new StoreInst(valueToStore, ptr_stackVariable, false, block);
}

The command-line interface: llc

The web interface is convenient, but sometimes you need more control. Maybe you’re using a different version of LLVM than the one in the web interface. Maybe you want to #include a custom header file (which, though possible in the web interface, is a little complicated). Or, heck, maybe you’re just away from internet access.

In any case, you can run the same command-line tool that the web interface is using: llc³. If you type man llc, you see that llc “compiles LLVM source inputs into assembly language for a specified architecture.” For the architecture, one of the choices (which you don’t see until you do llc --help) is C++. Which isn’t exactly an architecture, but that’s OK.

To generate C++ code (output.cpp) that generates LLVM IR which is functionally equivalent to some C code (input.c), do this:

clang -emit-llvm -c input.c -o tmp.bc
llc -march=cpp tmp.bc -o output.cpp

Unlike in the web interface, your C code must have a main function (or else llc gives an error).

If you want to turn off optimization, add the -O0 argument:

llc -O0 -march=cpp tmp.bc -o output.cpp

Conclusion

So LLVM generates C++ code that generates LLVM IR. What does this mean for you?

You don’t have to learn the LLVM IR language before you start using LLVM’s code generation API. Learn as you go.
In addition to static documentation like the Kaleidoscope tutorial and the LLVM Language Reference Manual, you always have access to your very own dynamic, up-to-date, customized example code.
LLVM can write part of your compiler for you. As long as you can write C or C++ code that’s equivalent to the LLVM IR you want to generate, then LLVM can write C++ LLVM API code to generate it.

A compiler infrastructure that writes part of your compiler for you? Now that’s handy.

Jaymie Strecker is a software developer at Kosada, Inc. and one of the creators of Vuo.

Don’t worry about generating optimized code yourself. You can always tell LLVM to run optimization passes later. ↩︎
In case you’re wondering where const_int32_3 came from: although it was not in the earlier snippet for func_foo, it was elsewhere in LLVM’s output. ↩︎
For older versions of LLVM, the tool was llvm2cpp. It was folded into llc in LLVM 2.3. ↩︎