- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 视频嵌入链接 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
XDP acceleration using nic meta data
本文是在韩国首尔netdev 2.1上介绍的最初基于xdp硬件的提示工作的延续。
它将从重点展示新的原型开始,以允许xdp程序从nic请求所需的hw生成的元数据提示。本讲座将展示nic如何生成提示,以及各种xdp应用程序的性能特征。我们还想演示这样的元数据如何有助于使用af-xdp套接字的应用程序。
然后与讨论计划中的上游思想,并期待与来自社区的更多观众围绕实现细节、编程流程等进行更多的讨论。
展开查看详情
1 .Neerav Parikh, PJ Waskiewicz (Intel Corporation, Networking Division) Saeed Mahameed (Mellanox) Linux Plumbers Conference, Nov. 2018 Vancouver, BC, Canada Network Division
2 . Overview • XDP Acceleration – Netdev 2.1 Recap • XDP Performance Results • L4 Load Balancer • xdp_tx_ip_tunnel • XDP NIIC Rx Metadata Requirements • XDP NIC Rx Metadata Programming Model • Next steps Network Division 2
3 .XDP Acceleration – Netdev 2.1 Recap What can present-day NIC HW do to help • • How do you dynamically program § Accelerate what is being done in XDP programs in terms of packet processing the Hardware to get the XDP § Offset some of the CPU cycles used for packet program the right kind of packet processing parsing help? • Keep it consistent with XDP philosophy § Avoid kernel changes as much as possible § Keep it HW agnostic as much as possible • How to pass the packet § Best effort acceleration parsing/map lookup hints that the § A framework that can change with changing needs of packet processing HW provides with every packet into • Expose the flexibility provided by programmable packet the XDP program so that it can processing pipeline to adapt to XDP program needs benefit from it? • Help design the next generation hardware to take full advantage of XDP and the kernel framework Network Division 3
4 . Netdev 2.1 Recap - Performance data • XDP1: Linux kernel sample, parses packet to identify protocol, count and drop • XDP3: Zero packet parsing (best case scenario), just drop all packets • XDP_HINTS: Uses packet type (IPv4/v6, TCP/UDP, etc.) provided by driver as meta data, no packet parsing, count and drop Network Division 4
5 .L4 Load balancer Performance XDP L4 LB - with no state tracking • L4 LB: L4 Load Balancer sample application with multiple Virtual IP 16,000,000 tunnels, forwarding packets to 14,000,000 destination based on hash 12,000,000 calculations and lookup 10,000,000 8,000,000 • Hints Type 1: Protocol Type (IPv4/v6, 6,000,000 TCP or UDP, etc.) 4,000,000 2,000,000 • Hints Type 2: Additional hints from 0 type 1 including packet data like packets /s source/destination IP addresses, XDP LB No Hints (1Q) XDP LB - Hints Type 1 (1Q) XDP LB - Hints Type 2 (1Q) source/destination ports, packet hash index (RSS) generated by XDP LB No Hints (4Q) XDP LB - Hints Type 1 (4Q) XDP LB - Hints Type 2 (4Q) hardware Network Division 5
6 .L4 Load balancer Performance XDP L4 LB - with state tracking No visible advantage in performance with 10,000,000 just packet parsing hints when XDP application is doing state tracking and 8,000,000 connection management. 6,000,000 4,000,000 2,000,000 0 packets /s XDP LB No Hints (1Q) XDP LB - Hints Type 1 (1Q) XDP LB - Hints Type 2 (1Q) XDP LB No Hints (4Q) XDP LB - Hints Type 1 (4Q) XDP LB - Hints Type 2 (4Q) https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next- queue.git/log/?h=XDP-hints-EXPERIMENTAL Network Division 6
7 .L4 Load balancer Performance Analysis Projected XDP L4 LB - with no state tracking XDP L4 LB - with state tracking 4,500,000 2,500,000 4,000,000 +7% +6% +7% 3,500,000 2,000,000 3,000,000 +77% 1,500,000 2,500,000 -8% -5% +6% 2,000,000 1,000,000 1,500,000 -8% 1,000,000 500,000 500,000 0 0 XDP LB No Hints XDP LB - Hints Type XDP LB - Hints Type XDP LB No Hints XDP LB - Hints Type XDP LB - Hints Type (1Q) 1 (1Q) 2 (1Q) (1Q) 1 (1Q) 2 (1Q) PPS without any Hints % Improvement in PPS with %Change in PPS with SW inline HW Hints (driver) generated hints Network Division 7
8 .xdp_tx_ip_tunnel with HW Flow Mark • Modified xdp_tx_iptunnel kernel sample • Need an extra map flow2tnl similar to vip2tnl • Setup a TC rule to mark packets with the well-known VIP (dst ip protocol and ds port) with a unique flow mark • XDP Rx Meta data includes a flow_mark to fetch the tunnel from flow2tnl map * Saeed Mahameed (Mellanox) Network Division 8
9 .XDP and Rx metadata Requirements XDP program to Rx metadata type selections: § Legacy NICs: Fixed vendor specific meta data structures provided as Rx descriptors or completions – Intel 82599(ixgbe), 7xx Series (i40e) § Programmable NICs: Flexible Rx descriptors allows customization of Rx meta data based on use-cases – Intel E800 Series (ice) Association of Rx meta data type to Rx Queues: § XDP Programs should run regardless of Rx meta-data enabling – Legacy Programs should run without requiring meta data § Granularity of configuration – All Rx Queues - Same fixed or flexible format meta data – Per Rx Queue – Fixed or Flexible metadata for different Rx queues for example XDP program may need different information in terms of Rx meta-data v/s AF_XDP based application on a given Rx queue may need different information Network Division 9
10 . XDP meta data programming model • Need mechanism to allow meta data types or Generic type information exchange between SW driver and XDP programs • Supported XDP meta data configured at XDP program at load time or either at compile time Netdev 2.1 Proposal Network Division 10
11 . XDP meta data programming model – Solution Options Option #1 (Fields Offset Array) Option #2 (BTF) Well known XDP meta data types, defined by the kernel • BTF support added in 4.15+ by Facebook to provide eBPF program and maps meta data description. A program can request any subset of well-known meta data fields from driver 2(a) Offset array • Extend that to provide NIC meta data programming - The driver will fill meta data buffer with a pre-defined to describe meta data formats with the ndo_bfp() order according to the requested meta data fields callback of the driver to determine if the HW can (ascending order by the field enum) offload/provide such a meta data or not - The user program will access the specific field via the 2(b) pre-defined (calculated offset array) • Optionally Driver + firmware keep layout of the metadata in BTF format; that a user can query the flow_mark = xdp->data_meta + offset_array[XDP_META_FLOW_MARK]; driver and generate normal C header file based on BTF in the given NIC • During sys_bpf(prog_load) the kernel checks (via supplied BTF) • Every NIC can have their own layout of metadata and its own meaning of the fields, Standardize at least a *Inputs from Saeed Mahameed (Mellanox) few common fields like hash Network Division 11
12 .XDP meta data programming model – Pros v/s Cons of Option #2 (BTF) compared to Options #1(Fields Offset Array) Pros • Allows vendor defined or specific offloads to be enabled without requiring kernel support • Meta data layout is well known to the BPF program at load time and doesn’t need to use offsets at run-time Cons • XDP program has to be compile/recompiled with the correct meta data type for given SW+FW+HW • Standardizing some fields is up to naming conventions of fields between different NIC vendors and overlap of these fields across vendors may create issues *Input from Saeed Mahameed (Mellanox) Network Division 12
13 .XDP Acceleration using NIC HW: Current Status • Rx meta data WIP/RFC level patches: • Intel (WIP): • https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git/commit/?h=XDP-hints-EXPERIMENTAL • Mellanox: • [RFC bpf-next 0/6] XDP RX device meta data acceleration (WIP) https://www.spinics.net/lists/netdev/msg509814.html • [RFC bpf-next 2/6] net: xdp: RX meta data infrastructure https://www.spinics.net/lists/netdev/msg509820.html • https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/commit/?h=topic/xdp_metadata&id=5f290851 5bf64d72684b2bf902acb1a8d9af2d44 • Alexei and Daniel proposal in netdev mailing list • https://www.spinics.net/lists/netdev/msg509820.html Network Division 13
14 .XDP Acceleration using NIC HW: Next Steps • Community need to agree on the approach on Rx meta data programming model to provide flexibility for a user across various use-cases and applications • Chaining, Meta data placement in the xdp buffer • Chaining can be easily achieved by calling bpf_xdp_adjust_meta helper from the chained programs • Having the meta data fields sitting exactly before the actual packet buffer (xdp→data) is ok, BUT ! • When bpf_xdp_adjust_head is required (header rewrite), and meta data buffer is filled, memmove(meta_data) will be required (performance hit) • Invalidate meta data once consumed, this will break chaining • Place meta data starting at xdp_buff.data_hard_start, complicated *Input from Saeed Mahameed (Mellanox) Network Division 14
15 .XDP Acceleration using NIC HW: Next Steps • Tx metadata and processing hints • Programming Rules in NIC HW to accelerate • Same as Rx need way to flow look-ups and actions: configure/consume Tx meta data from – Advantage of taking actions prior to Rx applications to HW via SW drivers. in software (e.g. drop or forwarding to a • Provide hints to take advantage of HW Rx queue) offloads/accelerations like checksums, – Currently tc u32/flower or ethtool based packet processing/forwarding, QoS, etc. model for enabling HW offloads and match-action rules. Programming model not suitable for XDP. – Not all NICs have eBPF map-table like semantics Network Division 15
16 . Questions? Network Division 16
17 . Backup Network Division 17
18 . Performance improvements • Internal testing yielded promising results • Test setup: Target: Intel Xeon E5-2697v2 (Ivy Bridge) Kernel: 4.14.0-rc1+ (net-next) Network device: XXV710, 25GbE NIC, driver version 2.1.14-k Configuration: Single Rx queue, pinned interrupt XDP3: Zero packet parsing (best case scenario) XDP_HINTS: Uses ptype provided by driver, no packet parsing Network Division 18
19 . HW Hints Parsing Hints Type of HW hint Size Description Packet Type U16 A unique numeric value that identifies an ordered chain of headers that were discovered by the HW in a given packet. Header offset U16 Location of the start of a particular header in a given packet. Example start of innermost L3 header. Extracted Field variable Example Inner most IPv6 address value Match U32 Match a packet on certain fields and the values, provide a SW marker as a hint if the Map Offload packet matches the rule Packet Checksum U32 A total packet Checksum Processing Hints Packet Hash U32 Hash value calculated over specified fields and a given key for a given packet type Ingress Timestamp U64 Packet timestamp as it arrives Network Division 19
20 .