FALCON: An Optimizations Java JIT

Falcon是一个基于LLVM的Java JIT (Just-In-Time)编译器,也是Azul Zing JVM的默认JIT编译器。作为资深的JIT开发者,Reames极力推崇基于LLVM来做Java JIT编译器,毕竟LLVM的稳定性得到了大范围的验证,有非常活跃的开发者社区,各种新的微架构特性很快都会加入到LLVM中,性能也不错,特别对于商业软件也是非常友好的。当然本文并不是讨论这些,而是一些JIT开发者对于LLVM的使用有一些误区,Reames对此进行了重点的一一解释,并在技术细节和流程上对于JIT的开发提出自己的一些建议和理解。
展开查看详情

1.Philip Reames Azul Systems Falcon: An optimizing Java JIT

2.Agenda Intro to Falcon Why you should use LLVM to build a JIT Common Objections (and why they’re mostly wrong) 2

3.What is Falcon? Falcon is an LLVM based just-in-time compiler for Java bytecode. Shipping on-by-default in the Azul Zing JVM. Available for trial download at: www.azul.com/zingtrial/ 3

4.This talk is about lessons learned, both technical and process 4

5.Zing VM Background A hotspot derived JVM with an awesome GC (off topic) Multiple tiers of execution Interpreter Tier1 Compiler Tier2 Compiler Rapidly generated code to collect profiling quickly Run once bytecode and rare events Compile hot methods for peak performance 5

6.Business Need The existing C2 compiler is aging poorly vectorization (a key feature of modern x86_64) is an afterthought very complicated codebase; "unpleasant" bug tails the norm d ifficult to test in isolation Looking to establish a competitive advantage Long term goal is to outperform competition Velocity of performance improvement is key 6

7.Development Timeline April 2014 Proof of concept completed (in six months) Feb 2015 Mostly functionally complete April 2016 Alpha builds shared with selected customers Dec 2016 Product GA (off by default) April 2017 On by default Team size: 4-6 developers, ~20 person years invested 7

8.Team Effort Falcon Development Bean Anderson Philip Reames Chen Li Sanjoy Das Igor Laevsky Artur Pilipenko Daniel Neilson Anna Thomas Serguei Katkov Maxim Kazantsev Daniil Suchkov Michael Wolf Leela Venati Kris Mok Nina Rinskaya + VM development team + Zing QA + All Azul (E-Staff, Sales, Support, etc..) 8

9.9 Various application benchmarks + SPECjvm + Dacapo Collected on a mix of haswell and skylake machines

10.Why you should use LLVM to build a JIT Proven stability, widespread deployments Active developer community, support for new micro-architectures Proven performance (for C/C++) Welcoming to commercial projects 10

11.Why you should use LLVM to build a JIT Proven stability, widespread deployments Active developer community, support for new micro-architectures Proven performance (for C/C++) Welcoming to commercial projects 10

12.Why you should use LLVM to build a JIT Proven stability, widespread deployments Active developer community, support for new micro-architectures Proven performance (for C/C++) Welcoming to commercial projects 10

13.First, a bit of skepticism... 13 Is this something you can express in C? If so, LLVM supports it. e .g. deoptimization via spill to captured on-stack buffer buf = alloca (…); buf [0] = local_0; … a->foo( buf , actual_args …) The real question, is how well is it supported?

14.Functional Corner-cases If you can modify your ABI (calling conventions, patching sequences, etc..), your life will be much easier. You will find a couple of important hard cases. Ours were: “anchoring" for mixed stack walks r ed-zone arguments to assembler routines support for deoptimization both checked and async GC interop 14

15.Be wary of over design 15 Common knowledge that quality of safepoint lowering matters. gc.statepoint design Major goal: allow in register updates Topic of 2014 LLVM Dev talk 2+ person years of effort It turns out that hot safepoints are inliner bug .

16.Get to Functional Correctness First Priority 1: Implement all interesting cases, get real code running Priority 2: Add tests to show it continues working Design against an adversarial optimizer . Be wary of over-design, while maintaining code quality standards. 16 See backup slides for more specifics on this topic.

17.LLVM is a huge dependency Objection 2 of 6 17

18.18 Potentially a real issue, but depends on your definition of large. From our shipping product: libJVM = ~200mb libLLVM = ~40mb So, 20% code size increase.

19.We added an LLVM backend; it produced poor code Objection 3 of 6 19

20.Importance of Profiling Tier 1 collects detailed profiles; Tier 2 exploits them 25% or more of peak application performance 20

21.Prune Untaken Paths Handle rare events by returning to lower tier, reprofiling , and then recompiling. Interpreter Tier1 Code Tier2 Code Rare Events %ret = call i32 @ llvm.experimental.deoptimize () [" deopt "(..)] ret i32 %ret 21

22.Predicated devirtualization switch (type(o)) case A: A::foo(); case B: B::foo(); default: o->foo(); Critical for Java, where everything is virtual by default switch (type(o)) case A: A::foo(); case B: B::foo(); default: @ deoptimize () [“ deopt ”(…)] 22

23.Implicit Null Checks % is.null = icmp eq i8* %p, null br i1 % is.null , label %handler, label % fallthrough , ! make.implicit !{} test rax , rax jz <handler> rsi = ld [rax+8] rsi = ld [rax+8]  fault_pc __ llvm_faultmaps [ fault_pc ] -> handler handler: call @__ llvm_deoptimize 23

24.Local Code Layout branch_weights for code layout Sources of slow paths: GC barriers, safepoints handlers for builtin exceptions result of code versioning 50%+ of total code size Hot cold hot cold h ot cold 24 hot hot h ot cold c old cold

25.Local Code Layout branch_weights for code layout Sources of slow paths: GC barriers, safepoints handlers for builtin exceptions result of code versioning 50%+ of total code size Hot cold hot cold h ot cold 24 hot hot h ot cold c old cold

26.Exploiting Semantics LLVM supports a huge space of optional annotations Both metadata and attributes ~6-12 month effort Performance Analysis Root Cause Issue Add Annotation Fix Uncovered Miscompiles 26

27.Exploiting Semantics LLVM supports a huge space of optional annotations Both metadata and attributes ~6-12 month effort Performance Analysis Root Cause Issue Add Annotation Fix Uncovered Miscompiles 26

28.Expect to become an LLVM developer You will uncover bugs, both performance and correctness You will need to fix them. Will need a downstream process for incorporating and shipping fixes. 28 See backup slides for more specifics on this topic.

29.Status Check Weve got a reasonably good compiler for a c-like subset of our source language. Were packaging a modified LLVM library. This is further than most projects get. 29