14 Virtual Machines #2

Virtual Machines #2 Live Migration of Virtual Machines The Case for VM-Based Cloudlets in Mobile Computing Transient customization of mobile computing infrastructure

1. Today’s Papers Why Migration is Useful Live Migration Approach (I) Live Migration Approach (II) Tracking the Writable Working Set • Live Migration of Virtual Machines EECS 262a C. Clark, K. Fraser, S. Hand, J. Hansen, E. Jul, C. Limpach, I. Pratt, A. • Load balancing for long-lived jobs (why not short lived?) • Allocate resources at the destination (to ensure it can • Update IP address to MAC address translation using • Xen inserts shadow pages under the guest OS, populated Warfield. Appears in Proceedings of the 2nd Symposium on Networked receive the domain) “gratuitous ARP” packet using guest OS's page tables Advanced Topics in Computer Systems Systems Design and Implementation (NSDI), 2005 – Service packets starting coming to the new host • Ease of management: controlled maintenance windows • Iteratively copy memory pages to the destination host Lecture 14 Together, as a group: – Service continues to run at this time on the source host – May lose some packets, but this could have happened anyway and TCP • The shadow pages are marked read-only will recover • The Case for VM-Based Cloudlets in Mobile Computing – Any page that gets written will have to be moved again Mahadev Satyanarayanan , Paramvir Bahl, Ramon Caceres, Nigel Davies. • Fault tolerance: move job away from flaky (but not yet – Iterate until a) only small amount remains, or b) not making much forward • Restart service on the new host • If OS tries to write to a page, the resulting page fault is VM Migration/Cloudlets Appears in IEEE Journal on Pervasive Computing, Vol 8, No 4, 2009 broken hardware) progress • Delete domain from the source host (no residual trapped by Xen October 9th, 2018 • Transient customization of mobile computing infrastructure Adan Wolbach, Jan Harkes, Srinivas Chellappa, M. Satyanarayanan. – Can increase bandwidth used for later iterations to reduce the time during dependencies) which pages are dirtied Appears in Proceedings of the First Workshop on Virtualization in Mobile • Xen checks the OS's original page table and forwards the John Kubiatowicz • Energy efficiency: rearrange loads to reduce A/C needs • Stop and copy the remaining (dirty) state appropriate write permission Electrical Engineering and Computer Sciences Computing (MobiVirt), 2008 – Service is down during this interval University of California, Berkeley • Today: explore value of leveraging the VMM interface for new properties – At end of the copy, the source and destination domains are identical and • If the page is not read-only in the OS's PTE, Xen marks • Data center is the right target either one could be restarted http://www.eecs.berkeley.edu/~kubitron/cs262 (migration and edge computing), many others as well including debugging the page as dirty and reliability – Once copy is acknowledged, the migration is committed in the transactional • Thoughts? 10/9/2018 Cs262a-F18 Lecture-14 2 10/9/2018 Cs262a-F18 Lecture-14 3 10/9/2018 Cs262a-F18 Lecture-14 11 10/9/2018 Cs262a-F18 Lecture-14 12 10/9/2018 Cs262a-F18 Lecture-14 13 Benefits of Migrating Virtual Machines Instead of Processes Background – Process-based Migration VMM Migration Writable Working Set OLTP Database SPECweb • Avoids `residual dependencies’ • Typically move the process and leave some support for it • Move the whole OS as a unit – don’t need to understand back on the original machine the OS or its state – E.g., old host handles local disk access, forwards network traffic • Can transfer in-memory state information – these are “residual dependencies” – old host must remain up and in use • Can move apps for which you have no source code (and are not trusted by the owner) • Allows separation of concern between users and • Hard to move exactly the right data for a process – which operator of a datacenter or cluster bits of the OS must move? • Can avoid residual dependencies in data center thanks to – E.g., hard to move TCP state of an active connection for a process global names • Non-live VMM migration is also useful: – Migrate your work environment home and back: put the suspended VMM on a USB key or send it over the network – Collective project, “Internet suspend and resume” • Compare with stop-and-copy: • Compare with stop‐and‐copy: – 32 seconds (128Mbit/sec) or 16seconds (256Mbit/sec) – 32 seconds (128Mbit/sec) or 16seconds (256Mbit/sec) 10/9/2018 Cs262a-F18 Lecture-14 4 10/9/2018 Cs262a-F18 Lecture-14 6 10/9/2018 Cs262a-F18 Lecture-14 7 10/9/2018 Cs262a-F18 Lecture-14 14 10/9/2018 Cs262a-F18 Lecture-14 15 10/9/2018 Cs262a-F18 Lecture-14 16 Goals / Challenges VM Memory Migration Options Implementation Design Overview Handling Local Resources Types of Live Migration • Minimize downtime (maximize availability) • Push phase • Pre-copy migration • Open network connections • Managed migration: move the OS without its participation – Bounded iterative push phase – Migrating VM can keep IP and MAC address. » Rounds – Broadcasts ARP new routing information • Keep the total migration time manageable • Stop-and-copy phase » Writable Working Set » Some routers might ignore to prevent spoofing • Managed migration with some paravirtualization – Short stop-and-copy phase » A guest OS aware of migration can avoid this problem – Stun rogue processes that dirty memory too quickly – Move unused pages out of the domain so they don’t need to be copied • Avoid disrupting active services by limiting impact of • Pull phase migration on both migratee and local network – Not in Xen VM migration paper, but in SnowFlock • Be careful to avoid service degradation • Local storage – Network Attached Storage • Self migration: OS participates in the migration (paravirtualization) – Harder to get a consistent OS snapshot since the OS is running! 10/9/2018 Cs262a-F18 Lecture-14 8 10/9/2018 Cs262a-F18 Lecture-14 9 10/9/2018 Cs262a-F18 Lecture-14 10 10/9/2018 Cs262a-F18 Lecture-14 17 10/9/2018 Cs262a-F18 Lecture-14 18 10/9/2018 Cs262a-F18 Lecture-14 19 CS252 S05 1 CS252 S05 2 Complex Web Workload: SPECweb99 Low-Latency Server: Quake 3 Summary VM Fork Challenge – Same as Migration! SnowFlock Insights SnowFlock Secret Sauce • Excellent results on all three goals: Suspend/resume latency 5. Heuristics: don’t fetch if I’ll overwrite 3. Multicast: exploit net hw parallelism 4. Multicast: exploit locality to prefetch 1. Start only with the basics 2. Fetch state on‐demand 400 – Minimize downtime/max availability, manageable total migration time, • Transmitting big VM State • VMs are BIG: Don’t send all the state! avoid active service disruption State: Clone 1 – VMs are big: 300 OS, disk, processes, … Virtual ? Private Seconds • Clones need little state of the parent Disk, OS,  • Downtimes are very short (60ms for Quake 3 !) – Big means slow 200 Processes State – Big means not scalable Machine 100 • Clones exhibit common locality patterns VM Descriptor Multicast • Impact on service and network are limited and reasonable 0 Clone 2 Private State 0 4 8 12 16 20 24 28 32 • Clones generate lots of private state Metadata • Total migration time is minutes Number of VMs “Special” Pages Page tables ? • Once migration is complete, source domain is completely • Same fundamental bottleneck issues as VM Migration – free shared I/O resources: host and network GDT, vcpu ~1MB for 1GB VM 10/9/2018 Cs262a-F18 Lecture-14 20 10/9/2018 Cs262a-F18 Lecture-14 21 10/9/2018 Cs262a-F18 Lecture-14 22 10/9/2018 Cs262a-F18 Lecture-14 29 10/9/2018 Cs262a-F18 Lecture-14 30 10/9/2018 Cs262a-F18 Lecture-14 31 Is this a good paper? Virtualization in the Cloud Clone Time Why SnowFlock is Fast Edge Computing Clone 32 VMs • What were the authors’ goals? • True “Utility Computing” 900 in 800 ms • Start only with the basics • Wikipedia: • What about the evaluation/metrics? – Illusion of infinite machines 800 – Many, many users Edge computing is a distributed computing paradigm in Milliseconds • Did they convince you that this was a good 700 • Send only what you really need – Many, many applications Devices which computation is largely or completely performed on system/approach? – Virtualization is key 600 distributed device nodes known as smart devices or edge Spawn • Were there any red-flags? 500 • Leverage IP Multicast devices as opposed to primarily taking place in a Multicast centralized cloud environment. • What mistakes did they make? • Need to scale bursty, dynamic applications 400 – Network hardware parallelism Start Clones • Does the system/approach meet the “Test of Time” – Graphics render 300 – Shared prefetching: exploit locality patterns – DNA search Xend challenge? 200 • Why compute on the edge? – Quant finance Descriptor • How would you review this paper today? BREAK – … 100 • Heuristics – Don’t send if it will be overwritten – Latency: Importance of human interactivity – Privacy: Keep sensitive data local 0 – Reliability: Keep computing during network partitions – Malloc: exploit clones generating new state 2 4 8 16 32 Clones Scalable Cloning: Roughly Constant 10/9/2018 Cs262a-F18 Lecture-14 23 10/9/2018 Cs262a-F18 Lecture-14 24 10/9/2018 Cs262a-F18 Lecture-14 25 10/9/2018 Cs262a-F18 Lecture-14 32 10/9/2018 Cs262a-F18 Lecture-14 33 10/9/2018 Cs262a-F18 Lecture-14 34 Application Scaling Challenges SnowFlock (Optional Paper): VM Fork Fork has Well Understood Semantics The Case for VM-Based Cloudlets What is a Cloudlet? On-the-fly configuration of Cloudlet • Awkward programming model: “Boot and Push” • Well-referenced paper in IoT/Edge Computing world • “Proximate computing infrastructure” that can be leveraged partition data if more load: (seems like everyone references it!) – Not stateful: application state transmitted explicitly Stateful swift cloning of VMs by mobile devices fork N workers fork extra workers • Basic premise: Need edge computing • Or: Fixed infrastructure for transient use by mobile devices Parallel Computation Load‐balancing Server – Mobile hardware too weak • Slow response times due to big VM swap-in Virtual if child: if load is low: – Somehow extend cloud to the edge so that there is a device close by to • Some key differences – Not swift: Predict load, pre-allocate, keep idle, consolidate, migrate Network dealloc excess mobile devices that need services with cloud computing: – Choices for full VM swap-in: boot from scratch, live migrate, VM 0 VM 1 VM 2 VM 3 VM 4 work on ith slice of data suspend/resume workers • Just a little latency can completely destroy Host 0 Host 1 Host 2 Host 3 Host 4 responsiveness of video • Stateful and Swift equivalent for process? if cycles available: – Fork! trusted code intreractiveness: fork worker – Compare even 33ms to • State inherited up to the point of cloning fork Sandboxing Opportunistic  “Thick” local computation: • Local modifications are not shared if child: if child: Computation • Create overlay “delta” from base VM • Clones make up an impromptu cluster untrusted code do fraction of long computation • Launch by sending delta to local Cloudlet server 10/9/2018 Cs262a-F18 Lecture-14 26 10/9/2018 Cs262a-F18 Lecture-14 27 10/9/2018 Cs262a-F18 Lecture-14 28 10/9/2018 Cs262a-F18 Lecture-14 35 10/9/2018 Cs262a-F18 Lecture-14 36 10/9/2018 Cs262a-F18 Lecture-14 37 CS252 S05 3 CS252 S05 4

2. Prototype: “Kimberly” Cost of launching VM with 100Mbps net BW Issues with Cloudlet Idea? • Persistent storage? – Send “local storage volume” from mobile device to Cloudlet at startup – Return delta back to mobile device at end of computation – What is security of information? Is it encrypted? Do you give up encryption keys to Cloudlet? • How to find and negotiate for computational resources? – Need device discovery services. Uses Avahi (or “Zeroconf”). Same protocol as Apple Bonjour. – Only works on same subnet or multicast-domain – How do you know that edge device has sufficient capabilities? • Security (continued) • Workshop paper (which was reading #3 for today): – Edge devices notoriously insecure “Transient Customization of Mobile Computing – Physical security is very tricky – how to prevent someone from walking up Infrastructure” to device and dumping its memory to recover keys/data – Start with standard (“base”) VM – Why do we trust random device in first place? – Compute “residue” on top of base • Largest cost: decompress+apply overlay • What is the economic model? Who would deploy these – Send ID of base and residue from mobile device  Nearby Cloudlet • Are these reasonable times? (up to 1.5 minutes…) devices? Why? 10/9/2018 Cs262a-F18 Lecture-14 38 10/9/2018 Cs262a-F18 Lecture-14 39 10/9/2018 Cs262a-F18 Lecture-14 40 Are these good papers? • What were the authors’ goals? • What about the evaluation/metrics? • Did they convince you that this was a good system/approach? • Were there any red-flags? • What mistakes did they make? • Does the system/approach meet the “Test of Time” challenge? • How would you review these papers today? 10/9/2018 Cs262a-F18 Lecture-14 41 CS252 S05 5