DPDK - Accelerate Remote Rendering of Cloud Gaming


1.DPDK: Accelerate Remote Rendering of Cloud Gaming Jingjing Wu & Owen Zhang - Intel DPDK Summit - China - 2019

2. Agenda  Cloud Gaming Background  Data Path for remote rendering  Solution & work status  Future work

3. Background - Cloud Gaming  Cloud Gaming: A Fast-Evolving Ecosystem.  Streamed frames, files or commands from cloud/edge to device. Google’s Project Stream  $1B business in 2017, projected to grow at 26% is a working preview of the future of game streaming The Verge Oct 8, 2018 Microsoft’s xCloud service streams Xbox games to PCs, consoles, and mobile devices The Verge Here’s the evidence Oct 8, 2018 Amazon is building a cloud gaming service The Verge Jan 10, 2019 1Zion Market Research, “Cloud Gaming Market by Cloud Type (Public, Private, and Hybrid), by Streaming Type (Video and File), and by Device (Smart Phones, Tablets, Gaming Consoles, and PCs): Global Industry Perspective, Comprehensive Analysis, and Forecast, 2018—2026”

4.Background - VCA 2 introduction • Add-in card for Intel® Xeon Processor-based Server Systems. • Powered by the Intel® Xeon Processor E3-1500 v5 with Intel® Iris Pro Graphics P580 and Intel® Quick Sync Video • Outstanding TCO for media transcoding & rendering applications. • Learn more: intel.com/accelerators

5. Android Cloud Gaming Overview Operator : Easy to gain more users Cloud Gaming Services deployed in Data Center or Edge Server Game Server in Communication between DataCenter Video Stream game clients and servers + ...... User Input E5 Server E5 Server VCA2 (3x E3 SKL) or Future GPU card End User : Easy to play new game Developer : Easy to make better game

6.Software Stack Virtual machine Android In Container (AIC) Remote Render backend Game App Cmd stream Input Client App App Android Framework Mesa Input (GLX + UMD OpenGL) Media Remote Render frontend Player Video Video Stream server Linux Kernel Linux Kernel VCA VCA drm_drv drviver driver Intel I915 KMD ...... Client Device E5 Server in DataCenter or Edge Visual Cloud Acceleration Card -VCA2

7.Characteristics of remote rendering data path  Game frame from Server to Accelerator Card  Video stream from Accelerator Card to Server  Stream-based socket-like interface  Isolate flow transaction between Server and Accelerator from data center networking  Scale to support mutil-VM  Last but not least - Performance obsessed

8.Stream type socket w/o IP Virtual machine VCA2 node Android In Container (AIC) Remote Render backend Remote Render frontend Graphic video App Stream server Device Stream type Socket Socket Family Device IP AF_INET PF passthrough N/A AF_INET virtio_net Yes AF_VSOCK virtio_vsock No

9.Scale for multiple VMs Virtual machine VCA2 node Device Device Device Device ? Socket Family Device IP Multi-VM AF_INET PF passthrough N/A No AF_INET virtio_net Yes Yes (Switch/Router) AF_VSOCK virtio_vsock No Yes

10. Data Path solution  AF_VSOCK  Classic sockets API  QEMU+KVM compatible (virtio-vsock device)  Bi-directional between hypervisor and VMs (context id + port)  Lightweight transport layer INTEL CONFIDENTIAL Doc #xxxxx

11. Data Path Traffic Flow Game frame->video stream data path 1. IRR client receives game frame and push to Remote Render Client VM kernel vsock to transmit. Remote Render Server Android 2. User space driver who emulates virtio Container 1 backend ring Rx/Tx for virtio vsock, receives AF_VSOCK AF_VSOCK packet from VM vsock device. User Space 5 User Space Kernel space Kernel 3. Forwarding traffic between vhost user device space Virtio vsock driver and virtio vsock backend driver for VCA VOP VM virtio vsock dev vsock device. VOP vsock driver 2 VOP device VCA2 4. User space driver who uses NTB to emulate virtio backend ring Rx/Tx for vca virtio vsock, Vhost vsock PMD 3 sends packet to VOP device. 5. IRR server receives the render the frame and Virtio vsock NTB PMD encoded into video streams using OpenGL, DPDK UMD and so on. 4 User Space [VOP vsock control NTB and map remote resource r i n g according to designed ring format (virtio likely).] Kernel VOP Vsock space Memory mapping done by control path Host-E5

12.Workflow Container/Iperf Container/Iperf AF_VSOCK AF_VSOCK User Space User Space Kernel Kernel space space Virtio vsock driver VOP vsock driver VM virtio vsock dev VOP device VCA2 DPDK 1. Bring up VCA2 card, and configure the context ID for the node on card. 2. Set up DPDK environment as usual. 3. Start DPDK applications with two ports: ./examples/vsock_fwd -l 21-24 -n 4 --socket-mem 1024,1024 -- vdev="net_vsock0,iface=/tmp/dpdk-vca0.sock,dequeue-zero-copy=1" --vdev="vop_user0,path=/dev/vop_virtio00,iface=vop" 4. Bring up VM with virtio vsock user: -chardev socket,id=vus0,path=/tmp/dpdk-vca.sock -device vhost-user-vsock- pci,chardev=vus0,id=vsock-pci0,guest-cid=8 5. Run applications/Iperf in VM and accelerator.

13. Result  15x Games @ one node run successfully as expected INTEL CONFIDENTIAL Doc #xxxxx

14. Future Work  Further Cloud Gaming stack integration and tuning  Optimization  Remote memory access optimization  Enlarger buffer to improve efficiency  Enable DMA/CBDMA for buffer moving  Zero-copy in receive side INTEL CONFIDENTIAL Doc #xxxxx

15. Jingjing Wu jingjing.wu@intel.com Thanks Owen Zhang owen.zhang@intel.com


17. Components (VM <->Host)  Qemu - vhost vsock user support. ….. Media IRR Media IRR  DPDK - Polling mode driver of vhost vsock ring. AF_VSOCK User Space  Tools - Enable AF_VSOCK on Iperf Kernel  DPDK app: Fwd without dropping Virtio vsock driver VM Virtio vsock Dev Vhost vsock PMD DPDK INTEL CONFIDENTIAL Doc #xxxxx

18.Components (Host <-> Accelerator) Host-E5  DPDK DPDK  Polling mode driver of vop vsock ring based on NTB. Virtio vsock NTB PMD  VCA kernel driver Vop vsock  Virtio vsock driver based on NTB (VCA2 side) NTB PCIe  Interface provided user space to map NTB BAR and trigger event. VOP vsock driver VCA2 AF_VSOCK IRR Server Doc #xxxxx