申请试用
HOT
登录
注册
 
Billion Goods in Few Categories
寒冰
/
发布于
/
1498
人观看

我们存储数据的目的是使用它:搜索,检索,分组,排序…为了有效地执行这些操作,MySQL存储引擎在编译查询执行计划时索引数据并与优化器通信统计信息。这种方法非常有效,除非数据分布不均匀。
去年,我研究了几张门票,其中的数据遵循相同的模式:数以百万计的流行产品分为两类,其余的则使用其余的。我们很难找到快速取回货物的解决方案。提供了5.7版的解决方案。然而,我们了解到一个新的MySQL8.0特性——柱状图——会更好、更干净、更快地工作。因此,我们谈话的想法诞生了。
在本次网络研讨会中,我们将讨论:
-如何物理存储索引统计信息
-与优化器交换哪些数据
-为什么不足以做出正确的索引选择
最后,我将解释哪些问题是通过柱状图解决的,以及为什么使用索引统计数据不足以快速检索不均匀分布的数据。

展开查看详情

1 .Billion Goods in Few Categories how Histograms Save a Life? April, 11, 2019 Sveta Smirnova

2 .Table of Contents •Introduction •The Use Case The Cardinality: Two Levels Example •Even Worse Use Case ANALYZE TABLE Limitations Example •Why the Difference? •How Histograms Work? •Left Overs 2

3 .Optimizer Statistics aka Histograms The column statistics data dictionary table stores histogram statistics about column values, for use by the optimizer in constructing query execution plans MySQL User Reference Manual 3

4 .Introduction

5 .Latest Support Tickets • Data distribution vary • Big difference between number of values • Costantly changing 5

6 .Latest Support Tickets • Data distribution vary • Cardinality is not correct • Was not updated in time • Updates too often • Calculated wrongly 5

7 .Latest Support Tickets • Data distribution vary • Cardinality is not correct • Index maintenance costs a lot • Hardware resources • Slow updates • Window to run CREATE INDEX 5

8 .Latest Support Tickets • Data distribution vary • Cardinality is not correct • Index maintenance costs a lot • Optimizer does not work as we wish to Examples in my talk @Percona Live 5

9 .Disclaimer • Topic based on real Support cases • Couple of them are still in progress 6

10 .Disclaimer • Topic based on real Support cases • All examples are 100% fake • They created such that • No customer can be identified • Everything generated Table names Column names Data • Use case itself is fictional 6

11 .Disclaimer • Topic based on real Support cases • All examples are 100% fake • All examples are simplified • Only columns, required to show the issue • Everything extra removed • Real tables usually store much more data 6

12 .Disclaimer • Topic based on real Support cases • All examples are 100% fake • All examples are simplified • All disasters happened with version 5.7 6

13 .The Use Case

14 .Two tables • categories • Less than 20 rows 8

15 .Two tables • categories • Less than 20 rows • goods • More than 1M rows • 20 unique cat id values • Many other fields Price Date: added, last updated, etc. Characteristics Store ... 8

16 .JOIN select * from goods join categories on (categories.id=goods.cat_id) where date_added between ’2018-07-01’ and ’2018-08-01’ and cat_id in (16,11) and price >= 1000 and <=10000 [ and ... ] [ GROUP BY ... [ORDER BY ... [ LIMIT ...]]] ; 9

17 .Option 1: Select from the Small Table First • Select from the Small Table 10

18 .Option 1: Select from the Small Table First • Select from the Small Table • For each cat id select from the large table 10

19 .Option 1: Select from the Small Table First • Select from the Small Table • For each cat id select from the large table • Filter result on date added[ and price[...]] 10

20 .Option 1: Select from the Small Table First • Select from the Small Table • For each cat id select from the large table • Filter result on date added[ and price[...]] • Slow with many items in the category 10

21 .Option 1: Illustration 11

22 .Option 1: Illustration 11

23 .Option 1: Illustration 11

24 .Option 1: Illustration 11

25 .Option 1: Illustration 11

26 .Option 1: Illustration 11

27 .Option 1: Illustration 11

28 .Option 1: Illustration 11

29 .Option 2: Select from the Large Table First • Filter rows by date added[ and price[...]] 12

0 点赞
0 收藏
0下载
相关文档