Have you tried Clang yet? Clang is an open-source compiler, under active development, that aims to replace GCC for compiling C, C++, and Objective-C. Compared to GCC, Clang is faster, while generating comparably fast code, and prints more useful error messages.
Clang is also better for developers who want to compile code programmatically. Unlike GCC, Clang is designed to be both a tool and an API. That makes Clang’s source code easier to understand and reuse. And, for those of us working on projects incompatible with GCC’s GPL license, it’s good to know that Clang is distributed under the BSD license.
Kosada is working on a cool new project that’s built on top of Clang and its underlying framework, LLVM. While using Clang for this project, I’ve been pleased to see how simple it is to write code that builds other code. Simple in retrospect, anyway! The code I wrote turned out to be simple, but it took lots of digging through the Clang source code to figure out what to write. So here’s my first contribution to the Clang community: two examples of using the Clang API to build code programmatically. The program compiled by these examples is one of the libcurl examples, getinmemory.c. I picked it because it demonstrates including and linking a library.
The examples refer to the Clang source code. You can download it here. I’m using version 3.1.
You can download the code for the examples here.
Example: Build a .c file to an executable
You have a .c file. You want to compile and link it to create an executable.
Briefly, here’s how: You create a Driver
object. You give it a list of arguments — the same arguments you’d pass if you were to run clang
on the command line. You tell the driver to build a Compilation
object and execute it. Congratulations, you just compiled and linked your .c file.
That’s basically what is done by the code that gets invoked when you run clang
on the command line. In the Clang source code, that’s tools/driver/driver.cpp. The example below (build_executable.cpp) is a super simplified version of that.
This example compiles and links a .c file. The first step is to set up the arguments to the Driver
:
// Path to the C file string inputPath = "getinmemory.c"; // Path to the executable string outputPath = "getinmemory"; // Path to clang (e.g. /usr/local/bin/clang) llvm::sys::Path clangPath = llvm::sys::Program::FindProgramByName("clang"); // Arguments to pass to the clang driver: // clang getinmemory.c -lcurl -v vector<const char *> args; args.push_back(clangPath.c_str()); args.push_back(inputPath.c_str()); args.push_back("-l"); args.push_back("curl");
The Driver
needs a DiagnosticsEngine
so it can report problems, so construct one of those:
clang::TextDiagnosticPrinter *DiagClient = new clang::TextDiagnosticPrinter(llvm::errs(), clang::DiagnosticOptions()); clang::IntrusiveRefCntPtr<clang::DiagnosticIDs> DiagID(new clang::DiagnosticIDs()); clang::DiagnosticsEngine Diags(DiagID, DiagClient);
Construct the Driver
itself:
clang::driver::Driver TheDriver(args[0], llvm::sys::getDefaultTargetTriple(), outputPath, true, Diags);
The Driver
doesn’t know how to do the grunt work of compiling or linking the code. It’s more of a project manager. It figures out which tasks need to be done and tells other parts of Clang, or other tools like ld
, to do them. The list of tasks is encapsulated in a Compilation
object. You need to construct a Compilation
and then execute it:
// Create the set of actions to perform clang::OwningPtr<clang::driver::Compilation> C(TheDriver.BuildCompilation(args)); // Carry out the actions int Res = 0; const clang::driver::Command *FailingCommand = 0; if (C) Res = TheDriver.ExecuteCompilation(*C, FailingCommand);
If anything went wrong with the execution, you can print the errors:
if (Res < 0) TheDriver.generateCompilationDiagnostics(*C, FailingCommand);
Bonus: Print the tasks of the Compilation
In case you’re wondering what exactly those “tasks” are in the Compilation
object, you can print them like this:
TheDriver.PrintActions(*C);
The output is something like this:
0: input, "getinmemory.c", c 1: preprocessor, {0}, cpp-output 2: compiler, {1}, assembler 3: assembler, {2}, object 4: input, "curl", object 5: linker, {3, 4}, image 6: bind-arch, "x86_64", {5}, image
Bonus: Print “verbose” information for debugging
Whether running clang
on the command line or through the Clang API, you can print extra information to help you debug by passing the -v
flag.
args.push_back("-v"); // verbose
The output is something like this:
clang version 3.1 (branches/release_31) Target: x86_64-apple-darwin10.8.0 Thread model: posix "/usr/local/Cellar/llvm/3.1/bin/clang" -cc1 -triple x86_64-apple-macosx10.6.0 -emit-obj -mrelax-all -disable-free -main-file-name getinmemory.c -pic-level 2 -mdisable-fp-elim -masm-verbose -munwind-tables -target-cpu core2 -target-linker-version 97.17 -v -resource-dir /usr/local/Cellar/llvm/3.1/bin/../lib/clang/3.1 -fmodule-cache-path /var/folders/l0/l0JTY1yrHVyI-wLWRDrCW++++TI/-Tmp-/clang-module-cache -fdebug-compilation-dir "/Users/jaymie/kosada/fdiv/Clang API" -ferror-limit 19 -fmessage-length 111 -stack-protector 1 -mstackrealign -fblocks -fobjc-dispatch-method=mixed -fobjc-default-synthesize-properties -fdiagnostics-show-option -fcolor-diagnostics -o /var/folders/l0/l0JTY1yrHVyI-wLWRDrCW++++TI/-Tmp-/getinmemory-azlq7U.o -x c getinmemory.c clang -cc1 version 3.1 based upon LLVM 3.1 default target x86_64-apple-darwin10.8.0 #include "..." search starts here: #include <...> search starts here: /usr/local/include /usr/local/Cellar/llvm/3.1/bin/../lib/clang/3.1/include /usr/include /System/Library/Frameworks (framework directory) /Library/Frameworks (framework directory) End of search list. "/usr/bin/ld" -dynamic -arch x86_64 -macosx_version_min 10.6.0 -o getinmemory -lcrt1.10.6.o /var/folders/l0/l0JTY1yrHVyI-wLWRDrCW++++TI/-Tmp-/getinmemory-azlq7U.o -lcurl -lSystem
The output shows that the Driver
is invoking clang
to do the compiling and ld
to do the linking. As you can see, the Driver
adds arguments of its own to each invocation, in addition to the ones we passed in. The -v
flag shows you exactly how the compiler and linker are being invoked.
Bonus: Build a C++ file
If the file you’re compiling is C++ instead of C, you can tell the Driver
to act like clang++
instead of clang
:
TheDriver.CCCIsCXX = true;
Conclusion
The Driver
class lets your program interact with Clang in pretty much the same way that you would interact with it on the command line. Your program could accomplish exactly the same thing by forking/spawning a process that invokes command-line clang
. The advantages of using the Clang API instead of command-line clang
are:
- You don’t have to fork/spawn a process yourself. That’s one less process, and it’s one less OS-dependent piece of code in your program.
- You get more control when the build fails. You don’t just get a return code and a printout of errors and warnings. You get data structures representing the compilation, the errors and warnings, and the command that failed.
Example: Compile a .c file to a Module
This example illustrates another advantage of using the Clang API:
- You get access to intermediate, in-memory representations of the program being compiled.
One intermediate, in-memory representation of a program that you’re likely to use is a Module
object. A Module
is a translation unit of an input program. Basically, you get one Module
per .c file that you compile.
In this example, you have a .c file. You want to create a Module
object.
You could do it with the Driver
class. You’d compile the .c file to a .bc file and then read in the .bc file. That would give you a Module
. But it would also give you the overhead of writing and reading the .bc file. You don’t need that file — all you need is the in-memory Module
.
There’s a more efficient route from .c file to Module
. In the Clang source code, it’s demonstrated in examples/clang-interpreter/main.cpp — an example that uses the Clang API to implement a C interpreter. Instead of using Clang’s “driver” classes, as in our previous example, the C interpreter example uses the “frontend” classes. That’s what we’ll do in the example below (compile_to_module.cpp).
This example compiles a .c file into an in-memory Module
, then prints the names of all global symbols in the Module
. Like the example above of compiling and linking an executable, this example begins by building a list of arguments and a DiagnosticsEngine
.
// Path to the C file string inputPath = "getinmemory.c"; // Arguments to pass to the clang frontend vector<const char *> args; args.push_back(inputPath.c_str()); // The compiler invocation needs a DiagnosticsEngine so it can report problems clang::TextDiagnosticPrinter *DiagClient = new clang::TextDiagnosticPrinter(llvm::errs(), clang::DiagnosticOptions()); llvm::IntrusiveRefCntPtr<clang::DiagnosticIDs> DiagID(new clang::DiagnosticIDs()); clang::DiagnosticsEngine Diags(DiagID, DiagClient);
The arguments and DiagnosticsEngine
get encapsulated in a CompilerInvocation
:
// Create the compiler invocation llvm::OwningPtr<clang::CompilerInvocation> CI(new clang::CompilerInvocation); clang::CompilerInvocation::CreateFromArgs(*CI, &args[0], &args[0] + args.size(), Diags);
Now you need a CompilerInstance
. (Yes, the Clang API has a class called Compilation
, a class called CompilerInvocation
, and a class called CompilerInstance
.) The frontend classes, CompilerInvocation
and CompilerInstance
, play a similar role as the driver classes, Driver
and Compilation
, used in the above example. Both the frontend classes and the driver classes take some command-line-style arguments and then compile some code. One important difference between them is that the driver classes can invoke other tools like ld
, whereas the frontend classes can only handle tasks native to Clang. Returning now to the example code, the next step is to construct the CompilerInstance
and associate it with the CompilerInvocation
:
clang::CompilerInstance Clang; Clang.setInvocation(CI.take());
Set up diagnostics so the CompilerInstance
can report problems:
Clang.createDiagnostics(args.size(), &args[0]); if (!Clang.hasDiagnostics()) return 1;
Create an action for the compiler to carry out. A frontend “action” is a little like a driver “task”, in that it’s a step to be carried out while building a program. A task is something like “compile”, “assemble”, “link”, whereas an action is something like “dump AST”, “emit assembly”, “emit bitcode”, “print preprocessed input”. In the Clang source code, you can see a list of all actions in lib/FrontendTool/ExecuteCompilerInvocation.cpp. For this example, the action is “emit LLVM only”:
llvm::OwningPtr<clang::CodeGenAction> Act(new clang::EmitLLVMOnlyAction());
Carry out the action:
if (!Clang.ExecuteAction(*Act)) return 1;
Grab the resulting Module
:
llvm::Module *module = Act->takeModule();
Just to make sure we got the Module
correctly, print all functions defined or used in the Module
:
for (llvm::Module::FunctionListType::iterator i = module->getFunctionList().begin(); i != module->getFunctionList().end(); ++i) printf("%s\n", i->getName().str().c_str());
Bonus: Return the Module
from a function
What if you want to write a function that starts with a .c file and returns a Module
? You could just take the example code above and wrap it up in a function, right? Actually, no. This doesn’t work:
// Bad example! Do not copy! llvm::Module * getModule(void) { ... llvm::OwningPtr<clang::CodeGenAction> Act(new clang::EmitLLVMOnlyAction()); if (!Clang.ExecuteAction(*Act)) return NULL; llvm::Module *module = Act->takeModule(); return module; }
You’ll find that the returned Module
doesn’t have any functions in it.
(Edit: Revised this section based on one of the comments.)
What went wrong? It turns out that CodeGenAction
, because it’s wrapped in an OwningPtr
, gets automatically destroyed when the OwningPtr
goes out of scope at the end of getModule()
. Everything owned by the CodeGenAction
also gets destroyed — including the LLVMContext
that was created by the constructor of the CodeGenAction
and became the context for the Module
. This leaves the Module
without a valid context.
One fix is to construct the CodeGenAction
with an LLVMContext
that will still be around after the CodeGenAction
is destroyed. For example:
clang::CodeGenAction * getAction(void) { ... llvm::OwningPtr<clang::CodeGenAction> Act(new clang::EmitLLVMOnlyAction(&llvm::getGlobalContext())); if (!Clang.ExecuteAction(*Act)) return NULL; llvm::Module *module = Act->takeModule(); return module; }
Bonus: Print the arguments of the CompilerInvocation
When you construct a CompilerInvocation
, you give it a list of arguments — the same arguments you’d pass on the command line. The CompilerInvocation
adds some arguments of its own to that list. You can print the complete list of arguments like this:
printf("clang "); vector<string> argsFromInvocation; CI->toArgs(argsFromInvocation); for (vector<string>::iterator i = argsFromInvocation.begin(); i != argsFromInvocation.end(); ++i) printf("%s ", (*i).c_str()); printf("\n");
The output is something like this:
clang -fdiagnostics-format=clang getinmemory.c -fsyntax-only -fdollars-in-identifiers -fno-operator-names -triple x86_64-apple-darwin10.8.0
Conclusion
Using the Clang driver classes, as in the previous example, you can interact with the Clang API in pretty much the same way that you would interact with command-line clang
. Using the Clang frontend classes, as in this example, you get even more control. You can access the data structures that LLVM uses internally to compile a program. Using the frontend classes, we were able to get a Module
from a .c file without the overhead of writing and reading additional files.
Jaymie Strecker is a software developer at Kosada, Inc. and one of the creators of Vuo.