MySQL Indexing - Best Practices



1. Percona MySQL Indexing; University Best Practices Ульяновск, россия 10 November 2016 | Peter Zaitsev | Percona

2. You’ve Made a Great Choice ! Understanding indexing is crucial both for Developers and DBAs Poor index choices are responsible for large portion of production problems Indexing is not a rocket science 2

3.MySQL Indexing: Agenda Understanding Indexing Setting up best indexes for your applications Working around common MySQL limitations 3

4.Indexing in the Nutshell • Speed up access in the database What are • Help to enforce constraints (UNIQUE, FOREIGN KEY) indexes • Queries can be ran without any for ? indexes • But it can take a really long time 4

5. Types of Indexes you might heard about BTREE Indexes • Majority of indexes you deal in MySQL is this type RTREE Indexes • MyISAM only, for GIS HASH Indexes • MEMORY, NDB FULLTEXT Indexes • MyISAM, Innodb starting 5.6 5

6. Family of BTREE like Indexes A lot of different • Share same properties in what operations they can speed up implementations • Memory vs Disk is life changer B+ Trees are typically • Data stored in leaf nodes used for Disk storage TokuDB Fractal Trees • But physically a lot different are logically similar 6

7.B+Tree Example Branch/Root Node Less than 3 Data Pointers Leaf Node 7

8. Indexes in MyISAM vs Innodb In MyISAM data pointers point to physical offset in the In Innodb data file • All indexes are essentially • PRIMARY KEY (Explicit equivalent or Implicit) - stores data in the leaf pages of the index, not pointer • Secondary Indexes – store primary key as data pointer 8

9. What Operations can BTREE Index do ? Find all rows with KEY=5 (point lookup) Find all rows with KEY>5 (open range) Find all rows with 5<KEY<10 (closed range) NOT find all rows with last digit of the KEY is Zero • This can’t be defined as a “range” operation 9

10.String Indexes There is no • Sort order is defined for strings (collation) difference… • “AAAA” < “AAAB” really Prefix LIKE is • LIKE “ABC%” means • “ABC[LOWEST]”<KEY<“ABC[HIGHEST]” a special type • LIKE “%ABC” can’t be optimized by use of the of Range index 10

11.Multiple Column Indexes Sort Order is defined, comparing • KEY(col1,col2,col3) leading column, • (1,2,3) < (1,3,1) then second etc It is still one • not a separate BTREE BTREE Index index for each level 11

12. Overhead of The Indexing Indexes are costly; Do not add more than you need • In most cases extending index is better than adding new one Writes - Updating indexes is often major cost of database writes Reads - Wasted space on disk and in memory; additional overhead during query optimization 12

13.Indexing Innodb Tables • Pick PRIMARY KEY what suites you best Data is clustered by • For comments – (POST_ID,COMMENT_ID) can be good PRIMARY KEY storing all comments for single post close Primary Key together • Alternatively “pack” to single BIGINT PRIMARY KEY is • KEY (A) is really KEY (A,ID) internally implicitly appended • Useful for sorting, Covering Index. to all indexes 13

14.How MySQL Uses Indexes Data Lookups Sorting Avoiding reading “data” Special Optimizations 14

15.Using Indexes for Data Lookups SELECT * FROM EMPLOYEES WHERE • The classical use of index on (LAST_NAME) LAST_NAME=“Smith” • SELECT * FROM EMPLOYEES WHERE Can use Multiple column LAST_NAME=“Smith” AND indexes DEPT=“Accounting” • Will use index on (DEPT,LAST_NAME) 15

16. It Gets Tricky With Multiple Columns Index (A,B,C) • - order of columns matters Will use Index for lookup • • A>5 A=5 AND B>6 (all listed keyparts) • • A=5 AND B=6 AND C=7 A=5 AND B IN (2,3) AND C>5 Will NOT use Index • B>5 – Leading column is not referenced • B=6 AND C=7 - Leading column is not referenced Will use Part of the index • A>5 AND B=2 - range on first column; only use this key part • A=5 AND B>6 AND C=2 - range on second column, use 2 parts 16

17.The First Rule of MySQL Optimizer MySQL will stop using key parts in multi part index as soon as it met the real range (<,>, BETWEEN), it however is able to continue using key parts further to the right if IN(…) range is used 17

18.Using Index for Sorting SELECT * FROM PLAYERS ORDER • Will use index on SCORE column BY SCORE DESC • Without index MySQL will do “filesort” (external sort) which is very expensive LIMIT 10 Often Combined • SELECT * FROM PLAYERS WHERE COUNTRY=“US” ORDER BY SCORE with using Index for DESC LIMIT 10 lookup • Best served by Index on (COUNTRY,SCORE) 18

19.Multi Column indexes for efficient sorting It becomes even more restricted! KEY(A,B) Will use Index for Sorting • ORDER BY A - sorting by leading column • A=5 ORDER BY B - EQ filtering by 1st and sorting by 2nd • ORDER BY A DESC, B DESC - Sorting by 2 columns in same order • A>5 ORDER BY A - Range on the column, sorting on the same Will NOT use Index for Sorting • ORDER BY B - Sorting by second column in the index • A>5 ORDER BY B – Range on first column, sorting by second • A IN(1,2) ORDER BY B - In-Range on first column • ORDER BY A ASC, B DESC - Sorting in the different order 19

20.MySQL Using Index for Sorting Rules You can’t sort in different order by 2 columns You can only have Equality comparison (=) for columns which are not part of ORDER BY • Not even IN() works in this case 20

21.Avoiding Reading The data “Covering Index” • Applies to index use for specific query, not type of index. Reading Index ONLY and not accessing the “data” • Index is typically smaller than data SELECT STATUS FROM ORDERS WHERE • KEY(CUSTOMER_ID,STATUS) CUSTOMER_ID=123 Access is a lot more sequential • Access through data pointers is often quite “random” 21

22.Min/Max Optimizations Index help MIN()/MAX() aggregate functions • But only these SELECT MAX(ID) FROM TBL; SELECT MAX(SALARY) FROM EMPLOYEE GROUP BY DEPT_ID • Will benefit from (DEPT_ID,SALARY) index • “Using index for group-by” 22

23. Indexes and Joins • SELECT * FROM POSTS,COMMENTS WHERE AUTHOR=“Peter” AND MySQL Performs Joins COMMENTS.POST_ID=POSTS.ID as “Nested Loops” • Scan table POSTS finding all posts which have Peter as an Author • For every such post go to COMMENTS table to fetch all comments Index is only needed on table which is being • The index on POSTS.ID is not needed for this query performance looked up Very important to have • Re-Design JOIN queries which can’t be well indexed all JOINs Indexed 23

24. Using Multiple Indexes for the table MySQL Can use More • “Index Merge” than one index SELECT * FROM • Can often use Indexes on (A) and (B) separately TBL WHERE A=5 • Index on (A,B) is much better AND B=6 SELECT * FROM • 2 separate indexes is as good as it gets TBL WHERE A=5 OR • Index (A,B) can’t be used for this query B=6 24

25.Prefix Indexes You can • ALTER TABLE TITLE ADD build Index KEY(TITLE(20)); • Needed to index BLOB/TEXT columns on the • Can be significantly smaller leftmost • Can’t be used as covering index prefix of the • Choosing prefix length becomes the question column 25

26. What is new with MySQL 5.6 ? • Most of them will make your queries better Many Optimizer automatically improvements • join_buffer_size variable has whole new meaning • Values if 32MB+ can make sense Focus on Index • Most important one: ICP (Index Condition Design Practices for Pushdown) this presentation 26

27.Understanding ICP Push where clause • Think NAME LIKE “%ill%” (will “Conditions” for Storage engine to filter not convert to range) “Much more flexible • Plus filtering done on the engine level – covering Index” efficient • All or none. All is resolved through the Before MySQL 5.5 index or “row” is read if within range 27

28.ICP Examples SELECT A … WHERE B=2 AND C LIKE “%ill%’ • MySQL 5.5 and below - KEY (B) – Traditional. Using index for range only - KEY (B,C,A) - Covering. All involved columns included • MySQL 5.6 - KEY(B,C) • Range access by B; Filter clause on C only read full row if match More cases • SELECT * …WHERE A=5 and C=6 ; KEY (A,B,C) - Will scan all index entries with A=5 not all rows 28

29. How MySQL Picks which Index to Use ? Performs dynamic • The constants in query texts matter a picking for every query execution lot Estimates number of rows it needs to access • by doing “dive” in the table for given index Uses “Cardinality” • This is what ANALYZE TABLE statistics if impossible updates 29