Practice of ovsdpdk in Baidu

展开查看详情

1. Baidu Vswitch Hotupgrade A new way to upgrade vswitch with nearly zero downtime Yuan Linsi Baidu AI Cloud

2. Agenda • Evolution of Virtual Network Data Plane • Challenge • Optional Solutions • Our Solutions • The requirement and design goals • design • benefits • Further work

3. Evolution of Virtual Network Data Plane VM/Container VM/Container VM/Container VM/Container VM/Container VM/Container VM/Container VM/Container VM/Container VM/Container VM/Container VM/Container virtio-net virtio-net virtio-net virtio-net traffic-relay datapath ctrlpath/datapath virtio-emulation ctrlpath ovs-kernel ovs-dpdk ctrlpath datapath ovs-dpdk datapath phy-nic phy-nic phy-nic phy-nic

4.Evolution of Virtual Network Data Plane Key advantages 12 • High Performance 10 8 • Low latency ovs-kernel 6 • Lower CPU overhead, ovs-dpdk higher efficiency 4 hw-offload 2 0 pps(Mpps) *co-work with Mellanox

5.Challenge How to upgrade? • Need to work for different scenario, especially for the Smart Nic • upgrade do not affect customer’s service • The larger the cluster scale is, the more complexity the problem will be

6. Optional Solutions Solution 1: restart process upgrade procedure: Advantage: • Saving flows • work for both ovs-kernel and ovs-dpdk • Exiting ovsdb-server • no extra resource required • Starting ovsdb-server Problem: • stop forwarding • break time is too long to be acceptable • flow restore wait • break time is unpredictable • start_forwarding downtime • restore flow • flow restore complete

7. Optional Solutions Solution 2: Two-process backup Upgrade procedure primary vswitch • primary process hold the resource, secondary process deal with the traffic • directly restart the secondary vswitch phy-nic share memory VM • skip the initialization Advantage: • break time is predictable secondary vswitch • no extra resource required Problem: • only works for ovs-dpdk • break time still in seconds

8. Optional Solutions Solution 3: dual main-process Upgrade procedure primary vswitch • running on top of VF vf • start new process and restore memory status • switch traffic to the new one phy-nic VM Advantage: • break time is predictable vf • Millisecond break time new vswitch Problem: • only works for ovs-dpdk • require extra resource

9. Requirement and Design Goals • Solutions need to be work for multiple different scenarios • no extra resource required • the break time is predictable and minimal

10. Summary of three Solutions All of the solutions share something in common: • All operations are process-based • The essence of restore operations is trying to restore the memory status original process new process primary process secondary process primary process new process memory memory memory memory memory memory status status status status restart status status restart switch

11. Hot upgrade Design Overview • Key points • restart threads instead of processes • hot upgrade via dynamic library hot replace • memory status sync up

12. Hot upgrade Solutions • Key points 1: restart threads instead of processes stop thread thread life cycle quiescent state liba.so.0.0.1 runtime memory statuss hot replace liba.so.0.0.2 runtime memory statuss start thread

13. Hot upgrade Solutions • Key points 2: dynamic library hot replace .got .plt ElfW(Rela) liba.so.0.0.1 r_offset ✖ r_info ElfW(sym) .got ELF64_R_SYM st_name .plt … ELF .got dynstr .plt symbo_name liba.so.0.0.2

14. Hot upgrade Solutions • Key points 3: memory status sync up • What kind of memory? -- only statically allocated memory • Why ? 0000000000000683 <goo>: 683: 55 push %rbp 684: 48 89 e5 mov %rsp,%rbp .data .data 687: 8b 05 d3 03 20 00 mov 0x2003d3(%rip),%eax # 200a60 int xyz = 100 int xyz = 0 <xyz.2057> .bss .bss liba.0.0.so liba.0.1.so

15. Hot upgrade Break Time prepare hotupgrade cleanup parse executable file ELF sym info load new library parse library file ELF sym info stop thread memory status sync up 700us break time hot replace 600us start thread clean old ELF sym

16. Hot upgrade Break time in different scenario VM/Container VM/Container VM/Container VM/Container VM/Container VM/Container ovs VM/Container VM/Container VM/Container virtio-net virtio-net virtio-net datapath datapath ovs-dpdk traffic-relay ovs-kernel ovs-dpdk VF datapath phy-nic phy-nic phy-nic Best case: can be nearly zero Optimize the pmd reload Best case: can be nearly zero

17. Hot upgrade Further Work prepare hot upgrade cleanup parse executable file ELF sym info We can even do it much load new library more better: • sync up the memory parse library file ELF sym info status during compiling stop thread memory status sync up break time hot replace 600us start thread clean old ELF sym

18. Hot upgrade Advantage • Work for both ovs-kernel and ovs-dpdk • no extra resource required • break time nearly zero

19. Acknowledgement Ø Zhang Yu Ø Mao YingMing Ø Wang Li Welcome to join Baidu AI Cloud ! yuanlinsi01@baidu.com

20. 计算⽆无限可能 CLOUD.BAIDU.COM