- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
MySQL and ZFS
MySQL作为一个数据库,致力于一个文件系统,但并非所有的文件系统都是平等的!在Linux上,zfs越来越受到关注,这是有充分理由的,特别是如果您碰巧也运行mysql。在本文中,我将描述ZFS的主要特性和特点,并与InnoDB的体系结构进行比较。从简单的备份到压缩和改进的缓存,您将看到MySQL从ZFS中获益匪浅。我将讨论mysql和zfs的配置,这样它们就可以很好地协同工作并发挥最佳性能。最后,将介绍和回顾使用裸机和云服务器的节约成本的mysql/zfs参考体系结构。
展开查看详情
1 .MySQL and ZFS Yves Trudeau Percona
2 .Who am I? • Principal architect at Percona since 2009 (10 years already…) • With Sun Microsystems and MySQL before Percona • Physicist by training • I like to understand how things work 2
3 .Why a talk on MySQL and ZFS? • I like both and I couldn’t decide… • They go along well • They share many points in common 3
4 .Plan • A quick tour of ZFS • Configuration guidelines for MySQL/ZFS • A real world example 4
5 .A tour of ZFS Click to add text
6 .ZFS Highlights ● Developed by Sun for Solaris ● Now in many platforms ● B-tree file storage, not just the directories ● 128 bits pointers!!! ● Files are split in records (b-tree leaves) ● Records can be compressed ● Copy-On-Write ● Native encryption ● Checksums and self-healing 6
7 .ZPOOL ● Base unit of storage ● Made of block devices or even just files ● Disks, files, LV, mirror of disks, stripping, raidz, raidz2, raidz3… ● Filesystems from zpool ● A server → many zpools ● SLOG: Separated log device ● Cache devices, L2ARC 7
8 .ZFS Filesystems ● A filesystem is: 1. a profile of settings 2. a mount point 3. a snapshotable entity ● Settings adapted → expected workload ● Can be nested ● Can be based on a snapshot (clone) 8
9 .ZVols ● A block device from ZFS ● Uber cool for virtual images ● Steps for a 3 nodes cluster: 1. Create a base image on a Zvol 2. Snapshot the ZVol 3. Clone snapshot 3 times (yields 3 new ZVols) 4. Start 3 VMs using the new Zvols <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='native'/> <source dev='/dev/zvol/data/vms/kvm_PXC2'/> 9
10 .The COW Magic ● ZFS never overwrites directly ● How ZFS overwrites a record? 1. Writes it somewhere else 2. De-references the old record → new record 3. GC frees up the old record • Easy snapshot (think InnoDB MVCC) • Easy cloning • Wonderful for backups • Transactional! 1 0
11 .ARC for Adaptive Replacement Cache ● Sophisticated file cache ● Configurable ● Can store compressed data ● Can be layered to disk (SSD/Flash) → L2ARC 1 1
12 .Kernel Modules ● Many configuration parameters (ls /sys/modules/zfs/parameters/) ● Version 0.7.5 has 169… ● Examples: ➔ zfs_arc_max: max size the ARC can be ➔ zfs_arc_meta_limit: Caps the metadata limit in ARC ➔ zfs_free_max_blocks: How fast the GC is going (InnoDB purge batch) ➔ l2arc_write_max: how fast you allow writes to L2ARC ➔ zfs_txg_timeout:max time span of a trx (think async writes) 1 2
13 .Configuration Guidelines for MySQL/ZFS Click to add text
14 .When Should You Use MySQL/ZFS? ● For large compressible datasets ● Challenges with backup (mix of engines) ● Spare CPU capacity (compression) ● Not IO bound ● Active dataset fits L2ARC (compressed) ● To save your flash devices... 1 4
15 .ZFS Configuration ● 2 file systems for easy snapshots ➔ /var/lib/mysql → The parent, configured for sequential ops ✔ recordsize = 128KB ✔ compression can be more aggressive (gzip) ➔ /var/lib/mysql/data → The dataset ✔ recordsize = InnoDB page size (likely 16KB) ✔ fast compressor like lz4 ● Cache device (L2ARC) are great ● SLOG devices help with high durability requirements 1 5
16 .MySQL Configuration ● innodb_doublewrite = 0 ● O_Direct? ● InnoDB buffer pool? leave some Ram for the ARC ➔ no L2ARC → target ARC 0.5% of the data set ➔ 1TB of data ~ 5GB ARC ➔ Not a hard rule ● Datadir = /var/lib/mysql/data ● innodb_log_group_home_dir, log-bin, slow-log, relay-log to /var/lib/mysql 1 6
17 .Real World Examples Click to add text
18 .A DR MySQL Replica in Google Cloud Dataset 700GB (2.5x compressible), fair replication traffic, all dataset is active (random primary keys) XFS ZFS ● n1-standard-2 (~68$/month) ● n1-standard-2 (~68$/month) ● 1TB SSD (~175$/month) ● local 375GB Nvme (30$/month) Total: 243$/month ● 500GB standard disk (20$/month) Total: 118$/month ZFS saves 125$/month 1 8
19 .A PXC Cluster in AWS Dataset 2TB (2.5x compressible), needs more than 20k iops XFS/i3 ZFS/i3 ● 3x i3.4xlarge: $2700/month ● 3x i3.2xlarge: $1350/month ● 2TB SC1: $50/month XFS/EBS/io1 ● 3x r5.2xlarge: $1080/month ● 3x 3TB 20k piops: $3900/month ZFS saves 1300$/month 1 9
20 .Will ZFS Really Perform Well? Sysbench TPC-C workload emulation, GCE n1-standard-2 with local 375GB, Scale 300, 2 threads XFS ZFS/Lz4 ZFS/Gzip ● 110 Trx/s ● 69 Trx/s ● 59 Trx/s ● 3100 Qps ● 1954 Qps ● 1551 Qps ● 284 GB on disk ● 102 GB on disk ● 85 GB on disk ● 76% used ● 39% used ● 26% used 2 0
21 .Will ZFS Really Perform Well With L2ARC? Sysbench TPC-C workload emulation, GCE n1-standard-2 with 500GB normal disk, 375GB local disk, Scale 300, 2 threads XFS ZFS/Lz4/L2ARC ● 3 TRX/s ● 29 TRX/s (l2arc warm) ● 87 QPS ● 830 QPS ● 284 GB on disk ● 102 GB on disk ● 70% used ● 21% used 2 1
22 .Conclusion ● MySQL and ZFS are great together ● Try, it is pretty easy ● Careful, you’ll get addicted 2 2
23 .Thank You to Our Sponsors
24 .Rate My Session 24