MySQL and ZFS

MySQL作为一个数据库,致力于一个文件系统,但并非所有的文件系统都是平等的!在Linux上,zfs越来越受到关注,这是有充分理由的,特别是如果您碰巧也运行mysql。在本文中,我将描述ZFS的主要特性和特点,并与InnoDB的体系结构进行比较。从简单的备份到压缩和改进的缓存,您将看到MySQL从ZFS中获益匪浅。我将讨论mysql和zfs的配置,这样它们就可以很好地协同工作并发挥最佳性能。最后,将介绍和回顾使用裸机和云服务器的节约成本的mysql/zfs参考体系结构。

展开查看详情

1.MySQL and ZFS Yves Trudeau Percona

2.Who am I? • Principal architect at Percona since 2009 (10 years already…) • With Sun Microsystems and MySQL before Percona • Physicist by training • I like to understand how things work 2

3.Why a talk on MySQL and ZFS? • I like both and I couldn’t decide… • They go along well • They share many points in common 3

4.Plan • A quick tour of ZFS • Configuration guidelines for MySQL/ZFS • A real world example 4

5.A tour of ZFS Click to add text

6.ZFS Highlights ● Developed by Sun for Solaris ● Now in many platforms ● B-tree file storage, not just the directories ● 128 bits pointers!!! ● Files are split in records (b-tree leaves) ● Records can be compressed ● Copy-On-Write ● Native encryption ● Checksums and self-healing 6

7.ZPOOL ● Base unit of storage ● Made of block devices or even just files ● Disks, files, LV, mirror of disks, stripping, raidz, raidz2, raidz3… ● Filesystems from zpool ● A server → many zpools ● SLOG: Separated log device ● Cache devices, L2ARC 7

8.ZFS Filesystems ● A filesystem is: 1. a profile of settings 2. a mount point 3. a snapshotable entity ● Settings adapted → expected workload ● Can be nested ● Can be based on a snapshot (clone) 8

9.ZVols ● A block device from ZFS ● Uber cool for virtual images ● Steps for a 3 nodes cluster: 1. Create a base image on a Zvol 2. Snapshot the ZVol 3. Clone snapshot 3 times (yields 3 new ZVols) 4. Start 3 VMs using the new Zvols <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='native'/> <source dev='/dev/zvol/data/vms/kvm_PXC2'/> 9

10.The COW Magic ● ZFS never overwrites directly ● How ZFS overwrites a record? 1. Writes it somewhere else 2. De-references the old record → new record 3. GC frees up the old record • Easy snapshot (think InnoDB MVCC) • Easy cloning • Wonderful for backups • Transactional! 1 0

11.ARC for Adaptive Replacement Cache ● Sophisticated file cache ● Configurable ● Can store compressed data ● Can be layered to disk (SSD/Flash) → L2ARC 1 1

12.Kernel Modules ● Many configuration parameters (ls /sys/modules/zfs/parameters/) ● Version 0.7.5 has 169… ● Examples: ➔ zfs_arc_max: max size the ARC can be ➔ zfs_arc_meta_limit: Caps the metadata limit in ARC ➔ zfs_free_max_blocks: How fast the GC is going (InnoDB purge batch) ➔ l2arc_write_max: how fast you allow writes to L2ARC ➔ zfs_txg_timeout:max time span of a trx (think async writes) 1 2

13.Configuration Guidelines for MySQL/ZFS Click to add text

14.When Should You Use MySQL/ZFS? ● For large compressible datasets ● Challenges with backup (mix of engines) ● Spare CPU capacity (compression) ● Not IO bound ● Active dataset fits L2ARC (compressed) ● To save your flash devices... 1 4

15.ZFS Configuration ● 2 file systems for easy snapshots ➔ /var/lib/mysql → The parent, configured for sequential ops ✔ recordsize = 128KB ✔ compression can be more aggressive (gzip) ➔ /var/lib/mysql/data → The dataset ✔ recordsize = InnoDB page size (likely 16KB) ✔ fast compressor like lz4 ● Cache device (L2ARC) are great ● SLOG devices help with high durability requirements 1 5

16.MySQL Configuration ● innodb_doublewrite = 0 ● O_Direct? ● InnoDB buffer pool? leave some Ram for the ARC ➔ no L2ARC → target ARC 0.5% of the data set ➔ 1TB of data ~ 5GB ARC ➔ Not a hard rule ● Datadir = /var/lib/mysql/data ● innodb_log_group_home_dir, log-bin, slow-log, relay-log to /var/lib/mysql 1 6

17.Real World Examples Click to add text

18.A DR MySQL Replica in Google Cloud Dataset 700GB (2.5x compressible), fair replication traffic, all dataset is active (random primary keys) XFS ZFS ● n1-standard-2 (~68$/month) ● n1-standard-2 (~68$/month) ● 1TB SSD (~175$/month) ● local 375GB Nvme (30$/month) Total: 243$/month ● 500GB standard disk (20$/month) Total: 118$/month ZFS saves 125$/month 1 8

19.A PXC Cluster in AWS Dataset 2TB (2.5x compressible), needs more than 20k iops XFS/i3 ZFS/i3 ● 3x i3.4xlarge: $2700/month ● 3x i3.2xlarge: $1350/month ● 2TB SC1: $50/month XFS/EBS/io1 ● 3x r5.2xlarge: $1080/month ● 3x 3TB 20k piops: $3900/month ZFS saves 1300$/month 1 9

20.Will ZFS Really Perform Well? Sysbench TPC-C workload emulation, GCE n1-standard-2 with local 375GB, Scale 300, 2 threads XFS ZFS/Lz4 ZFS/Gzip ● 110 Trx/s ● 69 Trx/s ● 59 Trx/s ● 3100 Qps ● 1954 Qps ● 1551 Qps ● 284 GB on disk ● 102 GB on disk ● 85 GB on disk ● 76% used ● 39% used ● 26% used 2 0

21.Will ZFS Really Perform Well With L2ARC? Sysbench TPC-C workload emulation, GCE n1-standard-2 with 500GB normal disk, 375GB local disk, Scale 300, 2 threads XFS ZFS/Lz4/L2ARC ● 3 TRX/s ● 29 TRX/s (l2arc warm) ● 87 QPS ● 830 QPS ● 284 GB on disk ● 102 GB on disk ● 70% used ● 21% used 2 1

22.Conclusion ● MySQL and ZFS are great together ● Try, it is pretty easy ● Careful, you’ll get addicted 2 2

23.Thank You to Our Sponsors

24.Rate My Session 24