- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Billion Goods in Few Categories:how Histograms Save a Life
我们存储数据的目的是使用它:搜索,检索,分组,排序…为了有效地执行这些操作,MySQL存储引擎在编译查询执行计划时索引数据并与优化器通信统计信息。这种方法非常有效,除非您的数据分布不均匀。
去年,我研究了几张门票,其中的数据遵循相同的模式:数以百万计的流行产品分为两类,其余的则使用其余的。我们很难找到快速取回货物的解决方案。提供了5.7版的解决方案。然而,新的MySQL8.0特性:柱状图,-将工作得更好,更干净,更快。谈话的想法就是这样产生的。
我会讨论的
-如何物理存储索引统计信息
-与优化器交换哪些数据
-为什么做正确的索引选择是不够的
最后,我将解释哪些问题可以解决柱状图,以及为什么使用索引统计数据不足以快速检索不均匀分布的数据。
展开查看详情
1 .Billion Goods in Few Categories: how Histograms Save a Life? November, 7, 2018 Sveta Smirnova
2 . Table of Contents •The Case •The Cardinality: Two Levels •ANALYZE TABLE Limitations •Solutions in Percona Server 5.7 •Histograms •Conclusion 2
3 . Sveta Smirnova • MySQL Support engineer • Author of • MySQL Troubleshooting • JSON UDF functions • FILTER clause for MySQL • Speaker • Percona Live, OOW, Fosdem, DevConf, HighLoad... 3
4 . Everything can be Resolved! • Hardware • Wise options • Optimized queries • Brain 4
5 . Not Everything • This talk is about • How I spent last two years • Resolving the same issue • For different customers 5
6 . Not Everything • This talk is about • How I spent last two years • Resolving the same issue • For different customers • Task was to speed up the query 5
7 . Not All the Queries can be Optimized • Specific data distribution • Access on different fields • ON clause • WHERE clause • GROUP BY • ORDER BY • Index cannot be used effectively 6
8 . Disclaimer • Topic based on real Support cases • Couple of them are still in progress 7
9 . Disclaimer • Topic based on real Support cases • All examples are 100% fake • They created such that • No customer can be identified • Everything generated Table names Column names Data • Use case itself is fictional 7
10 . Disclaimer • Topic based on real Support cases • All examples are 100% fake • All examples are simplified • Only columns, required to show the issue • Everything extra removed • Real tables usually store much more data 7
11 . Disclaimer • Topic based on real Support cases • All examples are 100% fake • All examples are simplified • All disasters happened with version 5.7 7
12 .The Case
13 . Two tables • categories • Less than 20 rows 9
14 . Two tables • categories • Less than 20 rows • goods • More than 1M rows • 20 unique cat id values • Many other fields Price Date: added, last updated, etc. Characteristics Store ... 9
15 . JOIN select * from goods join categories on (categories.id=goods.cat_id) where date_added between ’2018-07-01’ and ’2018-08-01’ and cat_id in (16,11) and price >= 1000 and <=10000 [ and ... ] [ GROUP BY ... [ORDER BY ... [ LIMIT ...]]] ; 10
16 . Option 1: Select from the Small Table First • Select from the Small Table 11
17 . Option 1: Select from the Small Table First • Select from the Small Table • For each cat id select from the large table 11
18 . Option 1: Select from the Small Table First • Select from the Small Table • For each cat id select from the large table • Filter result on date added[ and price[...]] 11
19 . Option 1: Select from the Small Table First • Select from the Small Table • For each cat id select from the large table • Filter result on date added[ and price[...]] • Slow with many items in the category 11
20 . Option 2: Select from the Large Table First • Filter rows by date added[ and price[...]] 12
21 . Option 2: Select from the Large Table First • Filter rows by date added[ and price[...]] • Get cat id values 12
22 . Option 2: Select from the Large Table First • Filter rows by date added[ and price[...]] • Get cat id values • Retrieve rows from the small table 12
23 . Option 2: Select from the Large Table First • Filter rows by date added[ and price[...]] • Get cat id values • Retrieve rows from the small table • Slow if number of rows, filtered by date added, is larger than number of goods in the selected categories 12
24 . What if use Combined Indexes? • CREATE INDEX index everything (cat id, date added[, price[, ...]]) • It resolves the issue 13
25 . What if use Combined Indexes? • CREATE INDEX index everything (cat id, date added[, price[, ...]]) • It resolves the issue • But not in all cases 13
26 . The Problem • Maintenance cost • Slower INSERT/UPDATE/DELETE • Disk space 14
27 . The Problem • Maintenance cost • Slower INSERT/UPDATE/DELETE • Disk space • Tables may have wrong cardinality 14
28 .The Cardinality: Two Levels
29 . MySQL is Layered Architecture • Optimizer • Engine • TokuDB • InnoDB • Any 16