1. LLVM Simone Campanoni simonec@eecs.northwestern.edu

2.Problems with Canvas? Problems with slides? Problems with code? Problems with LLVM?

3.Outline Introduction to LLVM CAT steps Hacking LLVM

4.LLVM LLVM is a great, hackable compiler for C/C++-like languages But it’s also A JIT A compiler for other languages (Java, CIL bytecode, etc …) LLVM is modular and well documented Started from UIUC It’s now the research tool of choice It’s an industrial-strength compiler Apple, AMD, Intel, NVIDIA

5.LLVM internals Each component is composed of pipelines Each stage: reads something as input and generates something as output To develop a stage: specify how to transform the input to generate the output Complexity lies in linking stages In this class: we’ll look at concepts/internals of middle-end Some of them are still valid for front-end/back-end

6.LLVM and other compilers LLVM is designed around it’s IR Multiple forms (human readable, bitcode on-disk, in memory) Front-end (Clang) IR Middle-end IR Back-end Machine code IR Pass Pass IR IR … Pass manager

7.Pass manager The pass manager orchestrates passes It builds the pipeline of passes in the middle-end The pipeline is created by respecting the dependences declared by each pass Pass X depends on Y Y will be invoked before X

8.Pass types Use the “smallest” one for your CAT CallGraphSCCPass ModulePass ImmutablePass FunctionPass LoopPass BasicBlockPass

9.Learning LLVM Download LLVM 3.7 and install it using CMake (Unless you’ve already installed in your system) Read the documentation Read the documentation Read the documentation Get familiar with LLVM documentation Doxygen pages (API docs) Language reference manual (IR) Programmer’s manual (LLVM-specific data structures, tools) Writing an LLVM pass

10.Code analysis and transformation Code normalization Analysis Transformation

11.CAT example: loop hoisting Do { Work( varX ); varY = varZ + 1; varX ++; } while ( varX < 100); varY = varZ + 1 ; Do { Work( varX ); varX ++; } while ( varX < 100); Loop hoisting

12.CAT example: loop hoisting (2) while ( varX < 100) { Work( varX ); varY = varZ + 1; varX ++; } Do { Work( varX ); varY = varZ + 1; varX ++; } while ( varX < 100); And now?

13.Loop normalization What: loop normalization pass When: before running loop hoisting Declare a dependence to your pass manager Advantages? Disadvantages?

14.CAT design Understand the problem Create representative code examples you expect to optimize Optimize them by hand to test the best benefits of your optimization Identify the common case Define the normalize input code Define the information you need to make your transformation safe Design the analyses to automatically generate this information Design the transformation Test, test, test

15.Improving CAT Improve your CAT by better handling your common cases Improve your CAT by improving the normalization passes Handle corner cases Before we just simply ignored them (i.e., no transformation)

16.LLVM IR RISC-based Instructions operate on variables Load and store to access memory Include high level instructions Function calls ( call ) Pointer arithmetics ( getelementptr )

17.LLVM IR (2) Strongly typed No assignments of variables with different types You need to explicitly cast variables Load and store to access memory Variables Global ( @ myVar ) Local ( % myVar ) Function parameter ( define i32 @ myF (i32 % myPar ) )

18.LLVM IR (3) 3 different (but 100% equivalent) formats Assembly: human-readable format ( FILENAME.ll ) Bitcode : machine binary on-disk ( FILENAME.bc ) In memory: in memory binary Generating IR Clang for C-like languages (similar options w.r.t. GCC) Different front-ends available

19.LLVM IR (4) It’s a Static Single Assignment (SSA) representation A variable is set only by one instruction in the function body % myVar = … A static assignment can be executed more than once We’ll study SSA later

20.SSA and not SSA example float myF (float par1, float par2, float par3){ return (par1 * par2) + par3; } define float @ myF (float %par1, float %par2, float %par3) { % 1 = fmul float %par1, %par2 % 2 = fadd float %1, %par3 ret float % 2 } define float @ myF (float %par1, float %par2, float %par3) { % 1 = fmul float %par1, %par2 %1 = fadd float %1, %par3 ret float %1 } NOT SSA SSA

21.SSA and not SSA CATs applied to SSA-based code are faster! Old compilers aren’t SSA-based Transforming IR in its SSA-form takes time When designing your CAT, think carefully about SSA Take advantage of its properties

22.LLVM tools opt to analyze/transform LLVM IR code Read LLVM IR file Load external passes Run specified passes Respect pass order you specify as input opt -pass1 -pass2 FILE.ll Optionally generate transformed IR Useful passes o pt -view- cfg FILE.ll opt -view- dom FILE.ll opt -help

23.LLVM tools c lang to compile/generate LLVM IR code To generate binaries from source code/IR code Check Makefile you have in LLVM.tar.bz2 lli to execute (interpret/JIT) LLVM IR code lli FILE.bc l lc to generate assembly from LLVM IR code llc FILE.bc

24.Conclusion LLVM is an industrial-strength compiler also used in academia Very hard to know in detail every component Focus on what’s important to your goal Become a ninja at jumping around the documentation It’s well organized, documented with a large community behind it Basic C++ skills are required

25.Final tips LLVM includes A LOT of passes Analyses Transformations Normalization Take advantage of existing code Try llc -march= cpp < bitcode >. bc -o APIs.cpp I have a point to something. What is it? getName () works on most things errs() << TheThingYouDon’tKnow ;

26.Let’s start hacking LLVM As Linus Torvalds says … Talk is cheap. Show me the code.