- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
饶磊_OS2ATC-NVMe VFIO live migration-leirao@intel
展开查看详情
1 .NVMe VFIO Live Migration for IPU/DPU Devices Rao Lei Cloud Software Development Engineer, Intel 1
2 .Agenda • Background and Motivations • VFIO Live Migration Introduction • NVMe VFIO Live migration design & implementation ▪ NVMe Controller Internal Data Live Migration ▪ Dirty Page Tracking ▪ Changes to NVMe Specification • Status & Plan
3 .Background and Motivations • Intel ASIC IPU View ▪ Compute complex ➢ 16 ARM Neoverse N1 cores@3GHz ➢ PCIe 4.0X16 ➢ 3 channel LPDDR4X channels, Up to 48GB DRAM ➢ 2 ARM Cortex A53 cores for management ➢ Tightly coupled to HW accelerators ▪ Hardware accelerators ➢ Networking: 2X100GbE or 1X200 GbE, flexible packet processor, RDMA, Traffic Shaper and Qos, OVS, virtio-net ➢ Storage: NVMe controller (SR-IOV), NVMe over Fabric, AES-XTS encryption, virtio-blk ➢ Compute: Live migration engine, address translation engine, DMA engine, inline and lookaside crypto with compression
4 .Background and Motivations • PCIe SR-IOV VM Application ▪ Pros ➢ Software simplicity IOMMU ➢ IOMMU-based DMA isolation DMA (BDF) ▪ Cons VF1 PF VFn ➢ Fixed resource allocation Q Q … Q Q Q … Q … Q Q … Q ➢ Lack of composability Backend ➢ Not support live migration BDF Resources BDF BDF Device
5 .VFIO Live Migration - Introduction • VFIO Pass-thru device live migration • PCIe config space migration ➢ Simulated by Kernel and Qemu • Device internal data migration ➢ Additional DMA CMD (Piggyback on existing DMA descriptor queue) ➢ Additional Migration Registers in VF MMIO space • Dirty page tracking ➢ Leverage IOMMU dirty bit tracking (Available from Intel SPR platform) ➢ Use the on-device dirty page tracking engine for the legacy platform
6 .VFIO Live Migration - Introduction • Architecture overview
7 .VFIO Live Migration - Introduction • VFIO Migration State Transition ▪ Source Suspend the device Save the device internal data, Close data fd and return data fd to user space Running Stop Stop_Copy Stop VFIO_DEVICE_ VFIO_DEVICE_ST VFIO_DEVICE_ STATE_STOP ATE_STOP_COPY STATE_STOP ▪ Target Return the data fd Load the device internal Restart the device to user space data and close data fd Stop Resume Stop Running VFIO_DEVICE_S VFIO_DEVICE_ VFIO_DEVICE_S TATE_RESUME STATE_STOP TATE_RUNNING
8 .NVMe VFIO Live Migration Design • NVMe Controller Internal Data Live Migration Design
9 .NVMe VFIO Live Migration Design • NVMe Commands for Controller Internal Data Live Migration Bytes Description Bytes Description 03:00 The field is common to all commands 03:00 The field is common to all commands 39:04 Reserved 39:04 Reserved 41:40 VF Index 41:40 VF Index 63:42 Reserved 63:42 Reserved Suspend/Resume command Query command Bytes Description Bytes Description 03:00 The field is common to all commands 03:00 The field is common to all commands 23:04 Reserved 23:04 Reserved 31:24 PRP Entry1 31:24 PRP Entry1 39:32 PRP Entry2 39:32 PRP Entry2 41:40 VF Index 41:40 VF Index 63:42 Reserved 63:42 Reserved Save LM data command Load LM data command
10 .NVMe VFIO Live Migration Design • Dirty Page Tracking Design
11 .NVMe VFIO Live Migration Design • NVMe Live Migration Commands for Dirty Page Tracking Bytes Description 03:00 The field is common to all commands 24:04 Reserved iova range count 31:24 PRP Entry1 page size 39:32 PRP Entry2 iova start addr 41:40 VF index iova_ranges[] iova length 43:42 Reserved 47:44 size 63:48 Reserved DMA logging start command
12 .NVMe VFIO Live Migration Design • NVMe Live Migration Commands for Dirty Page Tracking Bytes Description 03:00 The field is common to all commands 24:04 Reserved 31:24 PRP Entry1 Bytes Description 39:32 PRP Entry2 03:00 The field is common to all commands 41:40 VF index 39:04 Reserved 43:42 Reserved 41:40 VF Index 51:44 IOVA 63:42 Reserved 59:52 Length 63:48 Reserved DMA logging report command DMA logging stop command
13 .Changes to NVMe Specification • New NVMe Live Migration Commands Commands Opc Description Query 0xC4 Query the NVMe VF internal data size Suspend 0xC8 Suspend the NVMe VF controller on the source Device Resume 0xCC Resume the NVMe VF controller on the target State Save_Data 0xD2 Save the NVMe VF internal data on the source Load_Data 0xD5 Load the NVMe VF internal data on the target DMA_Logging_Start 0xD9 Start DMA logging to track dirty pages DMA_Logging_Stop 0xDC Stop the DMA logging to track dirty pages Dirty DMA_Logging_Report 0xE2 Report the dirty pages bitmap Page
14 .Changes to NVMe Specification • Extension of the Identify Controller Data Structure for Live Migration and DMA Logging Bytes Description Device 01:00 PCI Vendor ID (VID) State 03:02 PCI Subsystem Vendor ID (SSVID) 3071:04 … 3072 Live Migration Support (LME) Dirty 3073 DMA Logging Support (DLS) Page 4095:3074 Vendor-Specific The extension of the Identify Controller Data Structure
15 .Status & Plan • Status. ▪ Architect ➢ Eddie Dong (eddie.dong@intel.com) ▪ Key developers ➢ Rao Lei (lei.rao@intel.com) ➢ Yanfei Xu (yanfei.xu@intel.com) ▪ RFC patch progress and GitHub repo ➢ https://lore.kernel.org/lkml/20221206055816.292304-1-lei.rao@intel.com/ to support NVMe live migration for IPU/DPU devices ➢ https://github.com/raolei-intel/linux-nvme-live-migration.git • Plan ▪ Merge new NVMe live migration commands into NVMe specification ▪ Call for cooperation and discussion
16 .Q&A