Too Much Data

下载 0

快召唤伙伴们来围观吧
微博 QQ QQ空间 贴吧
文档嵌入链接
<iframe src="https://www.slidestalk.com/u9099/Too_Much_Data?embed" frame border="0" width="640" height="360" scrolling="no" allowfullscreen="true">复制
微信扫一扫分享
已成功复制到剪贴板

无羡

发布于

5年前

1952

人观看

#信息技术

您是否在为太多的数据而苦苦挣扎？你的MySQL数据库能吸收新数据的涌入吗？您是否正在考虑使用备用数据库，但修改应用程序将具有挑战性？一旦摄入数据，您是否能够有效地查询数据？
随着联网设备的全面部署，所有这些问题的提出频率也越来越高。本次讨论的重点是面向大型度量的数据集，但提出的想法适用于更广泛的受众。从作为Percona首席架构师在帮助客户方面积累的经验来看，我将涵盖所有数据生命周期阶段，从最初的摄取、阶段、压缩、分析、聚合和最终的归档。
将提供现实世界的例子和解决方案，以快速提高性能和降低成本。

展开查看详情

1 .Too Much Data Yves Trudeau Percona

2 .Who Am I? • Principal architect at Percona since 2009 (10 years already…) • With Sun Microsystems and MySQL before Percona • Physicist by training • I like to understand how things work 2

3 .Why This Talk? ● I often see customers struggling with high inflow of data and large datasets ● A lot can be done to improve the situation ● It is a nice case to study, with many options ● Allows us to uncover the inner behavior of MySQL 3

4 .Plan ● The issue ● Benchmarking procedure ● Iterations/results/investigation ● Data compression options 4

5 .The Issue Click to add text

6 .Typical Metric Oriented Schema Columns: ● A device identifier (deviceId) ● A metric identifier (metricId) ● A metric value (value) ● A timestamp (TS) 6

7 .Insert Rate Versus Report Queries? ● High insert rate, likely many 1000 rows/s ● Queries will need to retrieve and aggregate “value” by (deviceId,metricId) ● Size >> memory ● Wide versus narrow (wide is 1 column/metric) ● This talk = only narrow 7

8 .The Benchmarking Procedure Click to add text

9 .A Synthetic Data Set ● A python script creates 14k devices each with 1 to 10 metrics ● On average, 5.5 metrics/device ● 77000 metrics ● 5000 insert periods ● 385M rows ● Goals: insert rate AND decent query time ● VM capped at 500 iops, 2GB Ram (BP 1GB), 2 vcpu, PS 5.7.25 9

10 .The Report Query ● A very simple query: select deviceId, avg(value), min(value), max(value), sum(value), count(*) from Metrics where deviceId in (102,103,104,105,106) and metricId = 1 group by deviceId; ● Looking at the execution time 1 0

11 .Results/Investigation Click to add text

12 .Iteration #1: auto-increment id PK CREATE TABLE `CollectorMetrics` ( `ID` int(10) unsigned NOT NULL AUTO_INCREMENT, `deviceId` int(10) unsigned NOT NULL, `metricId` smallint(5) unsigned NOT NULL, `Value` float NOT NULL, `TS` int(10) unsigned NOT NULL, PRIMARY KEY (`ID`), KEY `idx_nat` (`deviceId`,`metricId`,`TS`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 2

13 .Iteration #1: Results Query time: 48.5s (24235 read Iops) Size: 23GB 1 3

14 .Iteration #2: Natural PK CREATE TABLE `CollectorMetrics` ( `deviceId` int(10) unsigned NOT NULL, `metricId` smallint(5) unsigned NOT NULL, `Value` float NOT NULL, `TS` int(10) unsigned NOT NULL, PRIMARY KEY (`deviceId`,`metricId`,`TS`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 4

15 .Iteration #2: Results Not an option, but… stopped at 67M rows, 2.76 GB 1 5

16 . Iteration #3: Natural PK + Partitions CREATE TABLE `CollectorMetrics` ( `deviceId` int(10) unsigned NOT NULL, `metricId` smallint(5) unsigned NOT NULL, `Value` float NOT NULL, `TS` int(10) unsigned NOT NULL, PRIMARY KEY (`deviceId`,`metricId`,`TS`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE (TS) (PARTITION p1555691218 VALUES LESS THAN (1555691218) ENGINE = InnoDB, PARTITION p1555692118 VALUES LESS THAN (1555692118) ENGINE = InnoDB, PARTITION p1555693018 VALUES LESS THAN (1555693018) ENGINE = InnoDB, … PARTITION p1555719118 VALUES LESS THAN (1555719118) ENGINE = InnoDB, PARTITION pMax VALUES LESS THAN (4294967295) ENGINE = InnoDB) */ 32 partitions 1 6

17 .Iteration #3: Results Query time: 0.57s (320 read Iops) Size: 21.7GB 1 7

18 .Can This Really Work? ● Partition sizes tuned for ~ BP size ● On the fly ● Only one partition really used at a time ● Up to 8192 partitions… ● 100GB BP → up to 800 TB 1 8

19 .Compression Options Click to add text

20 .InnoDB Compression (8KB) Query time: 0.79s (416 read Iops) Size: 11.5 GB 2 0

21 .ZFS (Lz4) Query time: 0.97 s (316 read Iops) Size: 9.4 GB 2 1

22 .ZFS (Gzip) Query time: 0,66s (192 read Iops) Size: 7,0 GB 2 2

23 .Summary 2 3

24 .Thank You to Our Sponsors

25 .Rate My Session 25

0点赞

0收藏

0下载