When eBPF meets FUSE: Improving Performance of User File Systems

When eBPF meets FUSE: Improving Performance of User File Systems
当ebpf遇到fuse时:提高用户文件系统的性能

展开查看详情

1. When eBPF Meets FUSE Improving the performance of user file systems Ashish Bijlani, PhD Student, Georgia Tech @ashishbijlani 1 https://www.linkedin.com/in/ashishbijlani/

2.In-Kernel vs User File Systems Kernel vs User File Systems “People who think “A lot of people once that userspace thought Linux and the filesystems are machines it ran on were realistic for anything toys… but toys are just Apparently I’m misguided.” misguided.” - Linus Torvalds - Jeff Darcy

3. Kernel vs User File Systems • Examples • Examples – EncFS, Gluster, etc. – Ext4, OverlayFS, etc. • Pros • Pros – Improved security/ – Native performance reliability • Cons – Easy to develop/debug/ – Poor security/reliability maintain – Not easy to develop/ • Cons debug/maintain –– Poor Poor performance! performance!

4.File Systems in User Space (FUSE) struct fuse_lowlevel_ops ops { .lookup = handle_lookup, • State-of-the-art framework .access = NULL, – All file system handlers .getattr = handle_getattr, implemented in user space .setattr = handle_setattr, .open = handle_open, .read = handle_read, • Over 100+ FUSE file systems .readdir = handle_readdir, .write = handle_write, – Stackable: Android // more handlers … SDCardFS, EncFS, etc. .getxattr = handle_getxattr, – Network: GlusterFS, Ceph, .rename = handle_rename, Amazon S3FS, etc. .symlink = handle_symlink, .flush = NULL, }

5. FUSE Architecture 4’ FUSE Daemon Over the Application LIB FUSE network User 4 Stackable 1 Kernel VFS 2 3 5 FUSE Driver QUEUE 6 Lower FS (e.g., EXT4)

6. FUSE Performance • “cd linux-4.17; make tinyconfig; make -j4” – Intel i5-3350 quad core, SSD, Ubuntu 16.04.4 LTS – Linux 4.11.0, LibFUSE commit # 386b1b 17.54% overhead 40 30 Time (sec) 20 10 0 Native FUSE

7. FUSE Performance lookup() FUSE Daemon open(“/mnt/foo/bar”) getattr() setattr() LIB FUSE Application open() User 4 1 read() Kernel readdir() VFS write() … 2 lookup(“foo”) rename() symlink() 5 3 close() FUSE Driver QUEUE C O N T E X T getxattr() SWITCH setxattr() 6 Lower FS (e.g., EXT4)

8. # Req received by FUSE • “cd linux-4.17; make tinyconfig; make -j4” 400K 350K 300K # Requests 250K 200K 150K 100K 50K 0K Loo Get Ren Seta Crea Ope Rele Ge t Mkd Unli Ope Rea Rele Rea Writ kup ddir d attr xa t t ame ir nk n ndir e a ased ttr te se r ir

9. FUSE Optimizations • Big 128K writes –“-o max_write=131072” • Zero data copying for data I/O –“-o splice_read, splice_write, splice_move” • Leveraging VFS caches –Page cache for data I/O • “-o writeback_cache” –Dentry and Inode caches for lookup() and getattr() • “entry_timeout”, “attr_timeout”

10. FUSE Performance • “cd linux-4.17; make tinyconfig; make -j4” • Intel i5-3350 quad core, Ubuntu 16.04.4 LTS • Linux 4.11.0, LibFUSE commit # 386b1b 40 Opts Enabled -o max_write=128K Time (sec) 30 -o splice_read 20 -o splice_write 10 Opts do not help much! -o splice_move entry_timeout > 0 0 attr_timeout > 0 Native Regular Optimized

11. # Req received by FUSE • “cd linux-4.17; make tinyconfig; make -j4” 400K Regular Optimized 350K 3 1 2 300K # Requests 4 250K 200K atime changes VFS issues getxattr() 150K 4x fewer during read() for each write() for 100K lookup()s invalidate reading security labels cached attributes 50K 2’ 1’ 0K Loo Get Ren Seta Crea Ope Rele Ge t Mkd Unli Ope Rea Rele Rea Writ kup ddir d attr xa t t a ir nk n ndir e a ased ttr te me se r ir

12. eBPF • Berkeley Packet Filter (BPF) – Pseudo machine architecture for packet filtering • eBPF extends BPF – Evolved as a generic kernel extension framework – Used by tracing, perf, and network subsystems

13. eBPF Overview Clang/LLVM • Extensions written in C bytecode BPF C program • Compiled into BPF code syscall() user • Code is verified and loaded kernel into kernel • Execution under virtual Verifier machine runtime • Shared BPF maps with user bpf virtual machine BPF Map space sandbox key-value data struct Kernel functions

14. eBPF Example struct bpf_map_def map = { .type = BPF_MAP_TYPE_ARRAY, .key_size = sizeof(u32), .value_size = sizeof(u64), .max_entries = 1, // single element }; // tracepoint/syscalls/sys_enter_open int count_open(struct syscall *args) { u32 key = 0; u64 *val = bpf_map_lookup_elem(map, &key); if (val) __sync_fetch_and_add(val, 1); }

15.• Extension framework for File systems in User space –Register “thin” extensions - handle requests in kernel • Avoid user space context switch! –Share data between FUSE daemon and extensions using BPF maps • Cache metadata in the kernel

16. ExtFUSE Architecture BPF Handlers FUSE Daemon Application LIB ExtFUSE LIB FUSE User 4 1 Kernel 7 0’ Cache Load VFS Meta- BPF data Code Deliver req to 2 extension 3’ 3 BPF VM FUSE Driver QUEUE BPF Map 4’ 5 Serve from 6 cache Lower FS (e.g., EXT4)

17. ExtFUSE Example struct bpf_map_def map = { .type = BPF_MAP_TYPE_HASH, .key_size = sizeof(u64), // ino (param 0) .value_size = sizeof(struct fuse_attr_out), .max_entries = MAX_NUM_ATTRS, // 2 << 16 }; // getattr() kernel extension - cache attrs int getattr(struct extfuse_args *args) { u32 key = bpf_extfuse_read(args, PARAM0); u64 *val = bpf_map_lookup_elem(map, &key); if (val) bpf_extfuse_write(args, PARAM0, val); }

18. ExtFUSE Example • Invalidate cached attrs from kernel extensions. E.g., // setattr() kernel extension - invalidate attrs int setattr(struct extfuse_args *args) { u32 key = bpf_extfuse_read(args, PARAM0); if (val) bpf_map_delete_elem(map, &key); } • Cache attrs from FUSE daemon – Insert into map on atime change • Similarly, cache lookup()s and xattr()s, symlink()s

19. ExtFUSE Performance • “cd linux-4.17; make tinyconfig; make -j4” • Intel i5-3350 quad core, SSD, Ubuntu 16.04.4 LTS • Linux 4.11.0, LibFUSE commit # 386b1b 40 Overhead 30 Regular Latency: 17.54% Time (sec) 20 ExtFUSE Latency: 5.71% 10 ExtFUSE Memory: 50MB (worst case) 0 Cached Native Regular Optimized ExtFUSE lookup, attr, xattr

20. # Req received by FUSE • “cd linux-4.17; make tinyconfig; make -j4” 400K Regular Optimized ExtFUSE 350K 300K # Requests 250K 200K 150K Very few Very few 100K getattr()s getxattr()s 50K 0K Loo Get Ren Seta Crea Ope Rele Ge t Mkd Unli Ope Rea Rele Rea Writ kup ddir d attr xa t t a ir nk n ndir e a ased ttr te me se r ir

21. ExtFUSE Applications • BPF code to cache/invalidate meta-data in kernel – Applies potentially to all FUSE file systems – e.g., Gluster readdir ahead results could be cached • BPF code to perform custom filtering or perm checks – e.g., Android SDCardFS uid checks in lookup(), open() • BPF code to forward I/O requests to lower FS in kernel – e.g., install/remove target file descriptor in BPF map

22. Project Status • Work in progress at Georgia Tech – Applying to Gluster, EncFS, etc. – Project page: https://extfuse.github.io – Academic paper submitted • References – IOVisor eBPF Project – BPF Compiler Collection (BCC) Toolchain

23.Thank You! ashishbijlani ashish.bijlani@gatech.edu www.linkedin.com/in/ashishbijlani