Billion Goods in Few Categories:how Histograms Save a Life

下载 0

快召唤伙伴们来围观吧
微博 QQ QQ空间 贴吧
文档嵌入链接
<iframe src="https://www.slidestalk.com/u9368/Billion_Goods_in_Few_Categories_how_Histograms_Save_a_Life?embed" frame border="0" width="640" height="360" scrolling="no" allowfullscreen="true">复制
微信扫一扫分享
已成功复制到剪贴板

寒冰

发布于

5年前

1596

人观看

#信息技术

我们存储数据的目的是使用它：搜索，检索，分组，排序…为了有效地执行这些操作，MySQL存储引擎在编译查询执行计划时索引数据并与优化器通信统计信息。这种方法非常有效，除非您的数据分布不均匀。
去年，我研究了几张门票，其中的数据遵循相同的模式：数以百万计的流行产品分为两类，其余的则使用其余的。我们很难找到快速取回货物的解决方案。提供了5.7版的解决方案。然而，新的MySQL8.0特性：柱状图，-将工作得更好，更干净，更快。谈话的想法就是这样产生的。
我会讨论的
-如何物理存储索引统计信息
-与优化器交换哪些数据
-为什么做正确的索引选择是不够的
最后，我将解释哪些问题可以解决柱状图，以及为什么使用索引统计数据不足以快速检索不均匀分布的数据。

展开查看详情

1 .Billion Goods in Few Categories: how Histograms Save a Life? November, 7, 2018 Sveta Smirnova

2 . Table of Contents •The Case •The Cardinality: Two Levels •ANALYZE TABLE Limitations •Solutions in Percona Server 5.7 •Histograms •Conclusion 2

3 . Sveta Smirnova • MySQL Support engineer • Author of • MySQL Troubleshooting • JSON UDF functions • FILTER clause for MySQL • Speaker • Percona Live, OOW, Fosdem, DevConf, HighLoad... 3

4 . Everything can be Resolved! • Hardware • Wise options • Optimized queries • Brain 4

5 . Not Everything • This talk is about • How I spent last two years • Resolving the same issue • For different customers 5

6 . Not Everything • This talk is about • How I spent last two years • Resolving the same issue • For different customers • Task was to speed up the query 5

7 . Not All the Queries can be Optimized • Specific data distribution • Access on different fields • ON clause • WHERE clause • GROUP BY • ORDER BY • Index cannot be used effectively 6

8 . Disclaimer • Topic based on real Support cases • Couple of them are still in progress 7

9 . Disclaimer • Topic based on real Support cases • All examples are 100% fake • They created such that • No customer can be identified • Everything generated Table names Column names Data • Use case itself is fictional 7

10 . Disclaimer • Topic based on real Support cases • All examples are 100% fake • All examples are simplified • Only columns, required to show the issue • Everything extra removed • Real tables usually store much more data 7

11 . Disclaimer • Topic based on real Support cases • All examples are 100% fake • All examples are simplified • All disasters happened with version 5.7 7

12 .The Case

13 . Two tables • categories • Less than 20 rows 9

14 . Two tables • categories • Less than 20 rows • goods • More than 1M rows • 20 unique cat id values • Many other fields Price Date: added, last updated, etc. Characteristics Store ... 9

15 . JOIN select * from goods join categories on (categories.id=goods.cat_id) where date_added between ’2018-07-01’ and ’2018-08-01’ and cat_id in (16,11) and price >= 1000 and <=10000 [ and ... ] [ GROUP BY ... [ORDER BY ... [ LIMIT ...]]] ; 10

16 . Option 1: Select from the Small Table First • Select from the Small Table 11

17 . Option 1: Select from the Small Table First • Select from the Small Table • For each cat id select from the large table 11

18 . Option 1: Select from the Small Table First • Select from the Small Table • For each cat id select from the large table • Filter result on date added[ and price[...]] 11

19 . Option 1: Select from the Small Table First • Select from the Small Table • For each cat id select from the large table • Filter result on date added[ and price[...]] • Slow with many items in the category 11

20 . Option 2: Select from the Large Table First • Filter rows by date added[ and price[...]] 12

21 . Option 2: Select from the Large Table First • Filter rows by date added[ and price[...]] • Get cat id values 12

22 . Option 2: Select from the Large Table First • Filter rows by date added[ and price[...]] • Get cat id values • Retrieve rows from the small table 12

23 . Option 2: Select from the Large Table First • Filter rows by date added[ and price[...]] • Get cat id values • Retrieve rows from the small table • Slow if number of rows, filtered by date added, is larger than number of goods in the selected categories 12

24 . What if use Combined Indexes? • CREATE INDEX index everything (cat id, date added[, price[, ...]]) • It resolves the issue 13

25 . What if use Combined Indexes? • CREATE INDEX index everything (cat id, date added[, price[, ...]]) • It resolves the issue • But not in all cases 13

26 . The Problem • Maintenance cost • Slower INSERT/UPDATE/DELETE • Disk space 14

27 . The Problem • Maintenance cost • Slower INSERT/UPDATE/DELETE • Disk space • Tables may have wrong cardinality 14

28 .The Cardinality: Two Levels

29 . MySQL is Layered Architecture • Optimizer • Engine • TokuDB • InnoDB • Any 16

0点赞

0收藏

0下载