The MySQL Query Optimizer Explained Through Optimizer Trace

mysql查询优化器做出的所有主要考虑和决策都可能记录在优化器跟踪中。虽然explain将显示查询优化器决定使用的查询执行计划,但mysql的优化器跟踪将说明优化器选择此计划的原因。
本演示将向您介绍MySQL查询优化器的内部工作,通过向您展示优化器跟踪示例。我们将介绍MySQL优化器的主要阶段及其优化策略,包括查询转换、数据访问策略、范围优化器、连接优化器和子查询优化。我们还将展示优化器跟踪如何让您深入了解成本模型以及优化器如何进行成本估算。
通过参加此演示,您将了解如何使用来自优化器跟踪的信息来编写性能更好的查询。本演示还将介绍一些工具,这些工具可用于帮助处理优化器跟踪中的大量信息。

展开查看详情

1.The MySQL Query Optimizer Explained Through Optimizer Trace Øystein Grøvlen Senior Staff Engineer Alibaba Cloud 1

2.MySQL Query Optimizer JOIN SELECT a, b FROM t1, t2, t3 Query Table JOIN WHERE t1.a = t2.b scan AND t2.b = t3.c Optimizer AND t2.d > 20 Range Ref t1 scan access AND t2.d < 30; t2 t3 Table/index info Statistics (data dictionary) (storage engines) 2

3.MySQL Architecture SQL query Parser Resolver (Prepare): Semantic check,name resolution Optimizer Logical transformations Cost-based optimizer: Join order and access methods Plan refinement Query execution plan Query result Query execution Storage Engine InnoDB MyISAM 3

4.Optimizer Trace How to generate it • EXPLAIN shows the selected plan • Optimizer trace shows WHY the plan was selected SET optimizer_trace= "enabled=on"; SELECT * FROM t1, t2 WHERE f1=1 AND f1=f2 AND f2>0; SELECT trace FROM information_schema.optimizer_trace INTO OUTFILE filename LINES TERMINATED BY ’’; SET optimizer_trace="enabled=off"; QUERY SELECT * FROM t1, t2 WHERE f1=1 AND f1=f2 AND f2>0; TRACE { "steps": [ { "join_preparation": { "select#": 1,… } … } …] } MISSING_BYTES_BEYOND_MAX_MEM_SIZE 0 INSUFFICIENT_PRIVILEGES 0 4

5.Optimizer Trace Example { "steps": [ { "join_preparation": { "select#": 1, "steps": [ { "expanded_query": "/* select#1 */ select `t1`.`f1` AS `f1`,`t2`.`f2` AS `f2` from `t1` join `t2` where ((`t1`.`f1` = 1) and (`t1`.`f1` = `t2`.`f2`) and (`t2`.`f2` > 0))" } ] } }, { "join_optimization": { "select#": 1, "steps": [ { "condition_processing": { "condition": "WHERE", "original_condition": "((`t1`.`f1` = 1) and (`t1`.`f1` = `t2`.`f2`) and (`t2`.`f2` > 0))", "steps": [ { … 5

6.Optimizer Trace JSON browser plugin 6

7.Browser Plugin Collapse to see main trace objects/phases 7

8.Browser Plugin Expand JSON objects 8

9.Prepare Phase • Name resolving • Map names to database objects (tables, columns, …) • Semantic checks • Permanent transformations: • Conversion of outer join to inner join Simpler query to optimize and • Merging of views and derived tables execute • Subquery transformations • IN to EXISTS • IN to Semijoin (5.6) Prepare for later • EXISTS to IN (8.0.16) optimizations • Etc. 9

10.Conversion of outer join to inner join SELECT o_orderkey FROM orders LEFT JOIN lineitem ON o_orderkey = l_orderkey WHERE l_discount > 0.10; "join_preparation": { "select#": 1, "steps": [ { "expanded_query": "/* select#1 */ select `orders`.`o_orderkey` AS `o_orderkey` from (`orders` left join `lineitem` on ((`orders`.`o_orderkey` = `lineitem`.`l_orderkey`))) where (`lineitem`.`l_discount` > 0.10)" }, { "transformations_to_nested_joins": { "transformations": [ "outer_join_to_inner_join", "JOIN_condition_to_WHERE", "parenthesis_removal" ], "expanded_query": "/* select#1 */ select `orders`.`o_orderkey` AS `o_orderkey` from `orders` join `lineitem` where ((`lineitem`.`l_discount` > 0.10) and(`orders`.`o_orderkey` = `lineitem`.`l_orderkey`))" }… 10

11.Conversion from EXISTS-subquery to IN-subquery New in 8.0.16 SELECT o_orderpriority, COUNT(*) AS order_count FROM orders WHERE EXISTS (SELECT * FROM lineitem WHERE l_orderkey = o_orderkey AND l_commitdate < l_receiptdate) GROUP BY o_orderpriority ORDER BY o_orderpriority; SELECT o_orderpriority, COUNT(*) AS order_count FROM orders WHERE o_orderkey IN (SELECT l_orderkey FROM lineitem WHERE l_commitdate < l_receiptdate GROUP BY o_orderpriority ORDER BY o_orderpriority; SELECT o_orderpriority, COUNT(*) AS order_count FROM orders SEMIJOIN lineitem ON l_orderkey = o_orderkey WHERE l_commitdate < l_receiptdate GROUP BY o_orderpriority ORDER BY o_orderpriority; 11

12.Conversion from EXISTS-subquery to IN-subquery Optimizer trace "join_preparation": { "select#": 1, "steps": [ { "join_preparation": { "select#": 2, "steps": [ { "expanded_query": "/* select#2 */ select 1 from `lineitem` where ((`lineitem`.`l_orderkey` = `orders`.`o_orderkey`) and (`lineitem`.`l_commitDATE` < `lineitem`.`l_receiptDATE`))" }, { "transformation": { "select#": 2, "from": "EXISTS (SELECT)", "to": "semijoin", "chosen": true } } ] … 12

13.Conversion from EXISTS-subquery to IN-subquery Optimizer trace, cont. 13

14.Query Optimization Main phases Parser Resolver: Semantic check,name resolution Negation elimination Optimizer Equality and constant propagation Ref access analysis Evaluation of constant expressions Logical transformations Range access analysis Substitution of generated columns Estimation of condition fan out Prepare for cost-based Constant table detection optimization Cost-based optimizer: Join order and access methods Access method selection Table condition pushdown Join order Plan refinement Access method adjustments Sort avoidance Query execution Index condition pushdown plan Prepare temporary tables Query execution Storage engine InnoDB MyISAM 14

15.Query Optimization Main phases Parser Resolver: Semantic check,name resolution Negation elimination Optimizer Equality and constant propagation Ref access analysis Evaluation of constant expressions Logical transformations Range access analysis Substitution of generated columns Estimation of condition fan out Prepare for cost-based Constant table detection optimization Cost-based optimizer: Join order and access methods Access method selection Table condition pushdown Join order Plan refinement Access method adjustments Sort avoidance Query execution Index condition pushdown plan Prepare temporary tables Query execution Storage engine InnoDB MyISAM 15

16.Logical Transformations Condition processing SELECT * FROM t1, t2 WHERE t1.a = t2.a AND t2.a = 9 AND (NOT (t1.a > 10 OR t2.b > 3) OR (t1.b = t2.b + 7 AND t2.b = 5)); Negation elimination t1.a = t2.a AND t2.a = 9 AND (t1.a <= 10 AND t2.b <= 3 OR (t1.b = t2.b + 7 AND t2.b = 5)); Equality/const propagation t1.a = 9 AND t2.a = 9 AND (9 <= 10 AND t2.b <= 3 OR (t1.b = 5 + 7 AND t2.b = 5)); Evaluate const expressions t1.a = 9 AND t2.a = 9 AND (9 <= 10 AND t2.b <= 3 OR (t1.b = 12 AND t2.b = 5)); Trivial condition =TRUE removal t1.a = 9 AND t2.a = 9 AND (t2.b <= 3 OR (t1.b = 12 AND t2.b = 5)); 16

17.Logical Transformations Optimizer Trace "join_optimization": { "select#": 1, "steps": [ { "condition_processing": { "condition": "WHERE", "original_condition": "((`t1`.`a` = `t2`.`a`) and (`t2`.`a` = 9) and (((`t1`.`a` <= 10) and (`t2`.`b` <= 3)) or ((`t1`.`b` =(`t2`.`b` + 7)) and (`t2`.`b` = 5))))", "steps": [ { "transformation": "equality_propagation", "resulting_condition": "((((9 <= 10) and (`t2`.`b` <= 3)) or ((`t1`.`b` = (5 + 7)) and multiple equal(5, `t2`.`b`))) and multiple equal(9, `t1`.`a`, `t2`.`a`))" }, { "transformation": "constant_propagation", "resulting_condition": "((((9 <= 10) and (`t2`.`b` <= 3)) or ((`t1`.`b` = 12) and multiple equal(5, `t2`.`b`))) and multiple equal(9, `t1`.`a` , `t2`.`a`))" }, { "transformation": "trivial_condition_removal", "resulting_condition": "(((`t2`.`b` <= 3) or ((`t1`.`b` = 12) and multiple equal(5, `t2`.`b`))) and multiple equal(9, `t1`.`a`, `t2`.`a`))" }

18.Query Optimization Main phases Parser Resolver: Semantic check,name resolution Negation elimination Optimizer Equality and constant propagation Ref access analysis Evaluation of constant expressions Logical transformations Range access analysis Substitution of generated columns Estimation of condition fan out Prepare for cost-based Constant table detection optimization Cost-based optimizer: Join order and access methods Access method selection Table condition pushdown Join order Plan refinement Access method adjustments Sort avoidance Query execution Index condition pushdown plan Prepare temporary tables Query execution Storage engine InnoDB MyISAM 18

19.Ref Access Analysis Determine which indexes that can be used for index lookup in a join SELECT l_orderkey, sum(l_extendedprice * (1 - l_discount)) AS revenue, o_orderdate, o_shippriority FROM customer JOIN orders ON c_custkey = o_custkey JOIN lineitem ON l_orderkey = o_orderkey WHERE c_mktsegment = 'FURNITURE’ AND o_orderdate < '1997-04-15' AND l_shipdate > '1997-04-15' GROUP by l_orderkey, o_orderdate, o_shippriority ORDER by revenue desc, o_orderdate LIMIT 10; orders o_orderkey o_custkey lineitem customer l_orderkey c_custkey l_linenumber 19

20.Ref Access Analysis Optimizer trace { { "ref_optimizer_key_uses": [ "table": "`lineitem`", { "field": "l_orderkey", "table": "`customer`", "equals": "`orders`.`o_orderkey`", "field": "c_custkey", "null_rejecting": false "equals": "`orders`.`o_custkey`", }, "null_rejecting": true { }, "table": "`lineitem`", { "field": "l_orderkey", "table": "`orders`", "equals": "`orders`.`o_orderkey`", "field": "o_orderkey", "null_rejecting": false "equals": "`lineitem`.`l_orderkey`", } "null_rejecting": false ] }, }, 20

21.Cost-based Query Optimization General idea: JOIN • Assign cost to operations • Assign cost to partial or alternative plans JOIN Table scan • Search for plan with lowest cost Range Ref t1 scan access t2 t3 21

22.Cost-based Query Optimizations The main cost-based optimizations: • Index and access method: JOIN • Table scan • Index scan Table JOIN • Range scan scan • Index lookup (ref access) Range Ref t1 • Join order scan access • Join buffering strategy • Subquery strategy t2 t3 22

23.Optimizer Cost Model Range Cost Model scan Cost formulas Access t1 methods Join Subquery Cost estimate Cost constants JOIN Row estimate CPU IO Metadata: Statistics: - Record and index size - Table size Cost model - Index information - Cardinality configuration - Uniqueness - Range estimates - Histograms 23

24.Cost Estimates • The cost for executing a query Main cost constants • Cost unit: Cost MySQL MySQL • “read a random data page from disk” 5.7 8.0 • Main cost factors: Read a random disk page 1.0 1.0 • IO cost: • #pages read from table Read a data page from memory 1.0 0.25 • #pages read from index buffer • CPU cost: Evaluate query condition 0.2 0.1 • Evaluating query conditions Compare keys/records 0.1 0.05 • Comparing keys/records • Sorting keys 24

25.Range analysis Compare cost of table scan to index scan • Cost of table scan • Based on number of pages to read • Cost of index scan alternatives • Index scan (covering) • Index range scan • Index merge • Index for grouping • Skip scan (new in 8.0) • Loose-index scan 25

26.Range Analysis Example Table scan vs Index range scan SELECT SUM(o_totalprice) FROM orders WHERE o_orderdate BETWEEN '1994-01-01' AND '1994-12-31'; Table scan: • IO-cost: #pages in table * IO_BLOCK_READ_COST • CPU cost: #rows * ROW_EVALUATE_COST Range scan (on secondary index): • IO-cost: #rows_in_range * IO_BLOCK_READ_COST • CPU cost: #rows_in_range * ROW_EVALUATE_COST 26

27.Range Analysis Example Table scan vs index range scan cnt. EXPLAIN SELECT SUM(o_totalprice) FROM orders WHERE o_orderdate BETWEEN '1994-01-01' AND '1994-12-31'; select key id table type possible keys key rows filtered Extra type len 1 SIMPLE orders ALL i_o_orderdate NULL NULL 1480845 31.13 Using where EXPLAIN SELECT SUM(o_totalprice) FROM orders WHERE o_orderdate BETWEEN '1994-01-01' AND '1994-06-30'; select key Id table type possible keys key rows filtered Extra type len Using index 1 SIMPLE orders range i_o_orderdate i_o_orderdate 4 222102 100.00 condition 27

28.Range Analysis Example Optimizer trace "rows_estimation": [ "setup_range_conditions": [ "analyzing_range_alternatives": { { ], "range_scan_alternatives": [ "table": "`orders`", "group_index_range": { { "range_analysis": { "chosen": false, "index": "i_o_orderdate", "table_scan": { "cause": "not_group_by_or_distinct" "ranges": [ "rows": 1480845, }, "0x21940f <= o_orderDATE <= 0x9f950f" "cost": 151138 "skip_scan_range": { ], }, "potential_skip_scan_indexes": [ "index_dives_for_eq_ranges": true, "potential_range_indexes": [ { "rowid_ordered": false, { "index": "i_o_orderdate", "using_mrr": false, "index": "PRIMARY", "usable": false, "index_only": false, "usable": false, "cause": "query_references_nonkey_column" "rows": 460938, "cause": "not_applicable" } "cost": 161329, }, ] "chosen": false, { }, "cause": "cost" "index": "i_o_orderdate", } "usable": true, ], "key_parts": [ "analyzing_roworder_intersect": { "o_orderDATE", "usable": false, "o_orderkey" "cause": "too_few_roworder_scans" ] } ], } } } }] 28

29.Query Optimization Main phases Parser Resolver: Semantic check,name resolution Negation elimination Optimizer Equality and constant propagation Ref access analysis Evaluation of constant expressions Logical transformations Range access analysis Substitution of generated columns Estimation of condition fan out Prepare for cost-based Constant table detection optimization Cost-based optimizer: Join order and access methods Join order Table condition pushdown Access method selection Plan refinement Access method adjustments Sort avoidance Query execution Index condition pushdown plan Prepare temporary tables Query execution Storage engine InnoDB MyISAM 29