Building socket-aware BPF programs

在过去的几年里,bpf在多个方面的力量不断增强。方法:通过在验证器中建立更多的智能,从而允许要加载的复杂程序,并通过API的扩展,例如添加新的映射类型和新的本地bpf函数调用。而bpf有其根源在套接字层应用过滤器时,可以对套接字进行内省与被过滤的流量相关的限制。要将这种意识构建到bpf助手中,验证器需要能够跟踪通话的安全性,包括底层套接字。这个对话将通过对验证器的扩展来在bpf程序中执行引用跟踪。这使得bpf开发人员能够使用在bpf程序的执行生存期,验证器将验证资源在程序完成之前释放一次。在验证器中使用这种新的引用跟踪能力,我们添加了套接字查找并释放对bpf api的函数调用,允许bpf程序安全地查找一套接字,并根据套接字的存在或属性建立逻辑。这个罐头用于根据侦听的存在来平衡流量负载应用程序,或实现有状态的防火墙原语以了解是否以前见过此连接的流量。用这个新的功能上,bpf程序可以更紧密地与网络集成stack对传递内核的流量的理解。

展开查看详情

1. Building socket-aware BPF programs Joe Stringer Cilium.io Linux Plumbers 2018, Vancouver, BC Joe Stringer BPF Socket Lookup Nov 13, 2018 1 / 32

2.Joe Stringer BPF Socket Lookup Nov 13, 2018 2 / 32

3. Background Network Policy ”Endpoint A can talk to endpoint B” =⇒ ”Endpoint B can reply to endpoint A” Joe Stringer BPF Socket Lookup Nov 13, 2018 3 / 32

4. Background How have we built these before? Joe Stringer BPF Socket Lookup Nov 13, 2018 4 / 32

5. Background Let’s do this with BPF Attach BPF to packet hook 3 “Connection Tracking” BPF map 3 Key by 5-tuple Associate counters, NAT state, etc. Handle tuple flipping “Policy” map 3 Deploy! 3 Joe Stringer BPF Socket Lookup Nov 13, 2018 5 / 32

6. Background Let’s do this with BPF Attach BPF to packet hook 3 “Connection Tracking” BPF map 3 Key by 5-tuple Associate counters, NAT state, etc. Handle tuple flipping “Policy” map 3 Deploy! 7 nf_conntrack: table full, dropping packet Hmm, how big should this map be again? How do we clean this up. . . Joe Stringer BPF Socket Lookup Nov 13, 2018 6 / 32

7. Background Why model it like this? Firewalls might not be co-located with the workload Firewalls should drop packets as quickly as possible Network stacks may be delicate flowers Solution? Build up state on-demand while processing packets Joe Stringer BPF Socket Lookup Nov 13, 2018 7 / 32

8. Background Recent trends Joe Stringer BPF Socket Lookup Nov 13, 2018 8 / 32

9. Socket-based firewalling If we’re co-located with the sockets . . . . . . why build our own connection table? Joe Stringer BPF Socket Lookup Nov 13, 2018 9 / 32

10. Socket-based firewalling Socket table as a connection tracker Joe Stringer BPF Socket Lookup Nov 13, 2018 10 / 32

11. Socket-based firewalling Socket safety Sockets are reference-counted internally Some memory-management under RCU rules BPF_PROG_TYPE_CGROUP_SOCK Access safety via reference held across BPF execution Bounds safety provided via bounds access checker Packet hooks may execute before associated socket is known Need to handle reference counting Joe Stringer BPF Socket Lookup Nov 13, 2018 11 / 32

12. Extending the BPF verifier Joe Stringer BPF Socket Lookup Nov 13, 2018 12 / 32

13. Extending the BPF verifier Joe Stringer BPF Socket Lookup Nov 13, 2018 13 / 32

14. Extending the BPF verifier BPF verifier: Recap At load time, loop over all instructions Validate pointer access Ensure no loops ... Access memory out of bounds? 7 Loops forever? 7 Everything safe? 3 Joe Stringer BPF Socket Lookup Nov 13, 2018 14 / 32

15. Extending the BPF verifier Socket reference counting Implicit Explicit (mainline) struct bpf_sock *sk; struct bpf_sock *sk; sk = bpf_sk_lookup(. . . ); sk = bpf_sk_lookup(. . . ); if (sk) { if (sk) { ... ... } bpf_sk_release(sk); /* Kernel will free ‘sk’ */ } Joe Stringer BPF Socket Lookup Nov 13, 2018 15 / 32

16. Extending the BPF verifier Reference counting in the BPF verifier 1 Resource acquisition 2 Execution paths while resource is held 3 Resource release Joe Stringer BPF Socket Lookup Nov 13, 2018 16 / 32

17. Extending the BPF verifier Reference acquisition Resource values are not known! This is the verifier, not the runtime Generate an identifier Store the identifier in the verifier state Associate the register with the identifier Joe Stringer BPF Socket Lookup Nov 13, 2018 17 / 32

18. Extending the BPF verifier Reference misuse Mangle and release bpf_tail_call() BPF_LD_ABS, BPF_LD_IND Joe Stringer BPF Socket Lookup Nov 13, 2018 18 / 32

19. Extending the BPF verifier Reference release Validation of pointers Remove identifier reference from state Unassociate register identifier associations Joe Stringer BPF Socket Lookup Nov 13, 2018 19 / 32

20. Extending the BPF API Joe Stringer BPF Socket Lookup Nov 13, 2018 20 / 32

21. Extending the BPF API Simplest form struct bpf_sock *bpf_sk_lookup(struct sk_buff *); void bpf_sk_release(struct bpf_sock *); Joe Stringer BPF Socket Lookup Nov 13, 2018 21 / 32

22. Extending the BPF API Namespaces Joe Stringer BPF Socket Lookup Nov 13, 2018 22 / 32

23. Extending the BPF API Arbitrary socket lookup Use any tuple for lookup Ease API across clsact, XDP Simplify packet mangle and lookup Joe Stringer BPF Socket Lookup Nov 13, 2018 23 / 32

24. Extending the BPF API Extensibility Allow influencing lookup behaviour SO_REUSEPORT Determine socket type support at load time Socket type supported? Load the program Not supported? Reject the program Joe Stringer BPF Socket Lookup Nov 13, 2018 24 / 32

25. Extending the BPF API Optimizations Avoid reference counting Allow lookup using direct packet pointers Joe Stringer BPF Socket Lookup Nov 13, 2018 25 / 32

26. Extending the BPF API Socket lookup API struct bpf_sock * bpf_sk_lookup_tcp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u32 netns, u64 flags); struct bpf_sock * bpf_sk_lookup_udp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u32 netns, u64 flags); void bpf_sk_release(struct bpf_sock *sk); Joe Stringer BPF Socket Lookup Nov 13, 2018 26 / 32

27. Extending the BPF API Socket lookup structures struct bpf_sock_tuple { union { struct { __be32 saddr; __be32 daddr; __be16 sport; __be16 dport; } ipv4; struct { __be32 saddr[4]; __be32 daddr[4]; __be16 sport; __be16 dport; } ipv6; }; }; Joe Stringer BPF Socket Lookup Nov 13, 2018 27 / 32

28. Extending the BPF API Socket structure struct bpf_sock { __u32 bound_dev_if; __u32 family; __u32 type; __u32 protocol; __u32 mark; __u32 priority; __u32 src_ip4; /* NBO */ __u32 src_ip6[4]; /* NBO */ __u32 src_port; /* NBO */ }; Joe Stringer BPF Socket Lookup Nov 13, 2018 28 / 32

29. Epilogue Joe Stringer BPF Socket Lookup Nov 13, 2018 29 / 32