26 meltdown/spectre

Microkernels #1 Exokernel: An Operating System Architecture for Application-Level Resource Management seL4: Formal Verification of an OS Kernel
展开查看详情

1. EECS 262a Today’s Papers Advanced Topics in Computer Systems ● Meltdown: Reading Kernel Memory from User Space, Moritz Lipp, Michael Schwarz, Daniel Lecture 26 Gruss, Thomas Prescher, Werner Haas, Anders Fogh, Jann Horn, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, and Mike Hamburg Meltdown/Spectre Appears in Proceedings of the 27th USENIX Security Symposium, 2018 November 29th, 2018 ● Spectre Attacks: Exploiting Speculative Execution, Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and Yuval Yarom. Will appear in Proceedings of the 40th IEEE Symposium on Security and Privacy, 2019 John Kubiatowicz ● Thoughts? Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs262 (Most slides from Mark Hill with permission!) On the Meltdown & Spectre Design Flaws Talk Info (Hidden Slide) Title: On the Meltdown & Spectre Design Flaws Mark D. Hill Speaker: Mark D. Hill, Computer Sciences Department, University of Wisconsin-Madison Abstract: Two major hardware security design flaws--dubbed Meltdown and Spectre--were broadly revealed to the public in early January 2018 in research papers and blog posts that require considerable expertise and effort to understand. To complement these, this talk seeks to give a general computer science audience the gist of these security flaws and their implications. The goal is to enable the audience can either stop there or have a framework to learn more. A non-goal is exploring many details of flaw exploitation and patch status, in Computer Sciences Dept. part, because the speaker is a computer architect, not a security expert. In particular, this talk reviews that Computer Architecture 1.0 (the version number is new) specifies the timing-independent functional behavior of a computer and micro-architecture that Univ. of Wisconsin-Madison is the set of implementation techniques that improve performance by more than 100x. It then asks, “What if a computer that is completely correct by Architecture 1.0 can be made to leak protected information via timing, a.k.a., micro-architecture?” The answer is that this exactly what is done by the Meltdown and Spectre design flaws. Meltdown leaks kernel memory, but software & hardware fixes exist. Spectre leaks memory outside of sandboxes and bounds check, and it is scary. An implication is that the definition of Architecture 1.0--the most important interface between software and hardware--is inadequate to protect information. It is time for experts from multiple viewpoints to come together to create Architecture February 2018 2.0). Bio: Mark D. Hill (http://www.cs.wisc.edu/~markhill) is John P. Morgridge Professor and Gene M. Amdahl Professor of Computer Sciences at the University of Wisconsin-Madison. Hill has a PhD in computer science from the University of California, Berkeley. Hill’s research targets computer design and evaluation. He has made contributions to parallel computer system design (e.g., memory consistency models and cache coherence), memory system design (caches and translation buffers), computer simulation (parallel systems and memory Computer Architect, Not Security systems), software (e.g., page tables and cache-conscious optimizations), deterministic replay and transactional memory. For example, he is the inventor of the widely-used 3C model of cache behavior (compulsory, capacity, and conflict misses) and co-inventor of the cornerstone for the C++ and Java multi-threaded memory specifications (sequential consistency for data-race-free programs). He is a fellow of IEEE and the ACM. He serves as Vice Chair of the Computer Community Consortium (2016-18) and served as Wisconsin Computer Expert Prepared while on a sabbatical visit to Google with public information only and Sciences Department Chair 2014-2017. representing the author’s views only, not necessarily Google’s.

2.Executive Summary Outline Architecture 1.0: the timing-independent functional behavior of a computer Micro-architecture: the implementation techniques to improve performance Computer Architecture & Micro-Architecture Background Question: What if a computer that is completely correct by Architecture 1.0 Timing Side-Channel Attack can be made to leak protected information via timing, a.k.a., Micro-Architecture? Meltdown Meltdown leaks kernel Spectre leaks memory memory, but software & outside of bounds checks or Spectre hardware fixes exist sandboxes, and is scary Wrap-Up Implication: The definition of Architecture 1.0 is inadequate to protect information Computer Architecture 0.0 -- Pre-1964 Computer Architecture 1.0 -- Born 1964 IBM System 360 defined an instruction set architecture Each Computer was New branch (R1 >= bound) goto error load R2 ← memory[train+R1] ● Implemented machine (has mass) → hardware and R3 ← R2 && 0xffff ● Instructions for hardware (no mass) → software load R4 ← memory[save+SIZE+R3] Software Lagged Hardware ● Stable interface across a family of implementations ● Each new machine design was different ● Software did NOT have to be rewritten ● Software needed to be rewritten in assembly/machine language Architecture 1.0: the timing-independent functional behavior of a computer ● Unimaginable today Going forward: Need to separate HW interface from implementation Micro-architecture: implementation techniques that change timing to go fast Note: The code is not IBM 360 assembly, but is the example used later.

3.Micro-architecture Harvested Moore’s Law Bounty Instruction Speculation Tutorial Many steps (cycles) to execute one instruction; time flows left to right → For decades, every ~2 years: 2x transistors, 1.4x faster & 1x chip power possible; 2300 transistors for Intel 4004 → millions per core & billions for caches add load (Micro-)architects took this ever doubling budget to make each processor core Go Faster: Pipelining, branch prediction, & instruction speculation execute > 100x than what it would otherwise. add Key techniques w/ tutorial next: load branch Predict direction: target or fall thru ● Instruction Speculation and Speculate! ● Hardware Caching store Speculate more! Hidden by Architecture 1.0: timing-independent functional behavior unchanged Speculation correct: Commit architectural changes of and (register) & store (memory) go fast! Mis-speculate: Abort architectural changes (registers, memory); go in other branch direction Hardware Caching Tutorial Micro-architecture Harvested Moore’s Law Bounty Main Memory (DRAM) 1000x too slow For decades, every ~2 years: 2x transistors, 1.4x faster & 1x chip power possible; 2300 transistors for Intel 4004 → millions per core & billions for caches Add Hardware Cache(s): small, transparent hardware memory (Micro-)architects took this ever doubling budget to make each processor core ● Like a software cache: speculate near-term reuse (locality) is common execute > 100x what it would otherwise ● Like a hash table: an item (block or line) can go in one or few slots branch (R1 >= bound) goto error ; Speculate branch not taken load R2 ← memory[train+R1] ; Speculate load & speculate cache hit E.g., 4-entry cache w/ slot picked with address (key) modulo 4 and R3 ← R2 && 0xffff ; Speculate AND load R4 ← memory[save+SIZE+R3] ; Speculate load & speculate cache hit 0 -- 12? 0 12 07? 0 12 12? 0 12 16? 0 16 Note 12 1 -- Miss 1 -- Miss 1 -- HIT! 1 -- Miss 1 -- victimized 2 -- Insert 12 2 -- Insert 07 2 -- No 2 -- Victim 12 2 -- “early” due 3 3 3 changes 3 Insert 16 3 to “alias” -- -- 07 07 07 Hidden by Architecture 1.0: timing-independent functional behavior unchanged

4.Whither Computer Architecture 1.0? Side-Channel Attack: SAVE Secret in Micro-Arch 1. Prime micro-architectural state Architecture 1.0: timing-independent functional behavior a. Repeatedly access array train[]to train branch predictor to expect access < bound b. Access all of array save[]to put it completely in a cache of size SIZE 2. Coerce processor into speculatively executing instructions that will be nullified Question: What if a computer that is completely correct by Architecture 1.0 to (a) find a secret & (b) save it in micro-architecture can be made to leak protected information via timing, a.k.a., micro-architecture? branch (R1 >= bound) goto error ; Speculate not taken even if R1 >= bound load R2 ← memory[train+R1] ; Speculate to find SECRET outside of train[] Implication: The definition of Architecture 1.0 is inadequate to protect information and R3 ← R2 && 0xffff ; Speculate to convert SECRET bits into index load R4 ← memory[save+SIZE+R3] ; Speculate to save SECRET by victimizing memory[save+R3] since it aliases in cache with new access memory[save+SIZE+R3] This is what Meltdown and Spectre do. Let's see why and explore implications. 3. HW detects mis-speculation Undoes architectural changes Leaves cache (micro-architecture) changes (correct by Architecture 1.0) Side-Channel Attack: RECALL Secret from Micro-Arch Meltdown (https://meltdownattack.com/meltdown.pdf) 4: Probe time to access each element of save[]--micro-architectural property; Can leak the contents of kernel memory at up to 500KB/s If accessing save[foo] slow due to cache miss, then SECRET is foo. A leak! 5: Repeat many times to obtain secret information at some bandwidth. (More shifting/masking needed to get all SECRET bits victimizing 64B cache lines) TRAP!! (not branch) Under mis- Well-known in 1983/85 DoD “Orange Book” speculation Covert timing channels include all vehicles that would allow one process to signal information to another process by modulating its own use of system resources in such a way that the change in response time observed by the second process would provide information. --TRUSTED COMPUTER SYSTEM EVALUATION CRITERIA With roots back to 1974 TENEX password attack But seemed fanciful Spy vs. Spy, Mad Magazine, 1960

5.Meltdown & Hardware Meltdown & Software Demonstrated for many Intel x86-64 cores; NOT demonstrated for AMD Bad: Meltdown operates with bug-free OS software (by Architecture 1.0) Key: When to suppress load with protection violation (user load to kernel memory) Good: Major commercial OSs patched for Meltdown ~January 2018 ● EARLY: AMD appears to suppress early, e.g., at TLB access Idea: Don’t map (much) of protected kernel address space in user process ● LATE: Intel appears to suppress at end after micro-arch state changes ● Offending load now fails address translation & does nothing ● Patches quickly derived from KAISER developed for side-channel attacks of My SWAG (Scientific Wild A** Guess) Why Kernel Address Space Layout Randomization (KASLR) ● Both are correct by Architecture 1.0 ● Performance impact 0-30% syscall frequency & core model. ● Performance shouldn’t matter as this case is supposed to be rare Future hardware can fix Meltdown (like AMD) so maybe we dodged a bullet ● Do what’s easiest & have luck that is good (AMD) or bad (Intel) Spectre (https://spectreattack.com/spectre.pdf) Spectre Applicability (Paper Sections 4, 5, & 6) Classic side-channel attack w/ deep micro-arch info 4. Exploit branch mis-prediction to let Javascript steal from Chrome browser ● 1. Attacker primes micro-architecture ● Demonstrated Intel Haswell/Skylake, AMD Ryzen, & several ARM cores ○ E.g, branch predictor or branch target buffer for saving secret ● Many other existing designs vulnerable ○ E.g., cache for recalling secret ● 2: Victim loads secret under mis-speculation 5. Used indirect branches & return-oriented programming to mis-train ○ Load should NOT trap (unlike Meltdown) branch target buffer to obtain information from different hyper-thread on same ○ Still inappropriate if managed language or sandbox core ● 3: Victim saves secret in micro-arch state, e.g., cache 6. Many other known timing-channel exist, e.g., register file contention, ● 4: Attacker recalls secret from micro-arch state; 4: repeat. functional unit occupancy, but what about unknown timing channels?

6.Spectre Code Example Spectre Mitigation (Section 7) Listing 2: Exploiting Speculative Execution via JavaScript Branch prediction 1 if (index < simpleByteArray.length) { 2 index = simpleByteArray[index | 0]; ● SW: Suppress branch prediction “when important” with mfence, etc. 3 index = (((index * TABLE1_STRIDE)|0) & (TABLE1_BYTES-1))|0; 4 localJunk ^= probeTable[index|0]|0; ● Not defined to work but appears to work--at a performance cost 5} ● HW could auto-magically suppress branch prediction when appropriate (???) Listing 3: Disassembly of Speculative Execution in Listing 2 JavaScript 1 cmpl r15,[rbp-0xe0] ; Compare index (r15) against simpleByteArray.length Branch Target Buffer 2 jnc 0x24dd099bb870 ; If index >= length, branch to instruction after move below 3 REX.W leaq rsi,[r12+rdx*1] ; Set rsi=r12+rdx=addr of first byte in simpleByteArray 4 movzxbl rsi,[rsi+r15*1] ; Read byte from address rsi+r15 (= base address+index) 5 shll rsi, 12 ; Multiply rsi by 4096 by shifting left 12 bits}\%\ ● SW: Not clear. Disable hyper-threading, etc.? 6 andl rsi,0x1ffffff ; AND reassures JIT that next operation is in-bounds 7 movzxbl rsi,[rsi+r8*1] ; Read from probeTable ● HW: Make micro-architecture state private to thread (not core or processor) 8 xorl rsi,rdi ; XOR the read result onto localJunk 9 REX.W movq rdi,rsi ; Copy localJunk into rdi More generally: Hard to mitigate threats NOT YET DEFINED. Need Computer Architecture 2.0? Need Computer Architecture 2.0? With Meltdown & Spectre, Architecture 1.0 is inadequate to protect information More generally, can we reduce our dependence on SPECULATION? Augment Architecture 1.0 with Architecture 2.0 specification of Accelerators!! GPU, DSP, IPU, TPU, ... [Hennessy & Patterson 2018 Taxonomy] ● (Abstraction of) time-visible micro-architecture? Yavits et al. MultiAmdahl, 2017 ● Dedicated Memories ● Bandwidth of known (unknown?) timing channels? ● More ALUs ● Enforced limits on user software behavior? (c.f., KAISER) Speculation NOT a first- ● Easy Parallelism order feature! Change Microarchitecture to mitigate timing channel bandwidth ● Lower precision data ● Domain Specific Language ● Suppress some speculation ● Undo most changes on mis-speculation In 2005, Arvind said Speculation (w/ von Neumann model) killed Dataflow Can this be (formally) solved or must it be managed like crime? After 2018, Dataflow-like Renaissance w/ Sea of Accelerators?

7.Meltdown v. Spectre Executive Summary Architecture 1.0: the timing-independent functional behavior of a computer Micro-architecture: the implementation techniques to improve performance Question: What if a computer that is completely correct by Architecture 1.0 can be made to leak protected information via timing, a.k.a., Micro-Architecture? Meltdown leaks kernel Spectre leaks memory memory, but software & outside of bounds checks or hardware fixes exist sandboxes, and is scary Miessler Blog (https://danielmiessler.com/blog/simple-explanation-difference-meltdown-spectre/ ) Implication: The definition of Architecture 1.0 is inadequate to protect information Final Project Timing Some References ● Final abstract/project proposal on the WEBSITE ○ Please update your project description and proposal before next week New York Times: https://www.nytimes.com/2018/01/03/business/computer-flaws.html ● I’m available for meetings next week if you would like to talk ○ Send me email Meltdown paper: https://meltdownattack.com/meltdown.pdf ● Poster Session: Spectre paper: https://spectreattack.com/spectre.pdf ○ Next Friday (12/7) from 9:00-12:00 in 5th-floor atrium. Everyone must be setup by 9:30 – if you A blog separating the two bugs: https://danielmiessler.com/blog/simple-explanation-difference-meltdown-spectre/ are late, you may not get a chance to have your poster reviewed ○ Plan on staying whole time, but might be shorter Google Blog: https://security.googleblog.com/2018/01/todays-cpu-vulnerability-what-you-need.html and ○ Who needs posters printed??? https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html ● Final paper: Industry News Sources: https://arstechnica.com/gadgets/2018/01/whats-behind-the-intel-design-flaw-forcing-numerous- patches/ and https://www.theregister.co.uk/2018/01/02/intel_cpu_design_flaw/ ○ Due Tuesday 12/11 @ AOE (by 5am) ○ 10 pages, 2- column, conference format. Bibliography doesn’t have to count toward 10 pages. ○ Must have a related work section! ○ Also, plan on a future work and/or discussion section ○ Make sure that your METRICs of success are clear and available

8. Goodbye All! You’ve been great! See you next Friday!