Percona XtraDB

Percona XtraDB :带字典的压缩列–InnoDB表压缩的替代方案

展开查看详情

1.Percona XtraDB: Compressed Columns with Dictionaries – an alternative to InnoDB table compression Yura Sorokin, Senior Software Engineer at Percona

2.Existing compression methods Overview

3. Existing compression methods for MySQL • InnoDB Table Compression (starting from 5.1) https://dev.mysql.com/doc/refman/5.7/en/innodb-table-compression.html • InnoDB Page Compression (starting from 5.7) https://dev.mysql.com/doc/refman/5.7/en/innodb-page-compression.html • Filesystems with transparent compression (BTRFS, ZFS, NTFS, etc.) • COMPRESS() / UNCOMPRESS() MySQL functions https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_compress https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_uncompress • Something intriguing from Percona Server :) something you might have already noticed in the title of this talk 3

4.Real world data JSON documents

5. Real world data JSON data sets for JSON Studio tutorials http://jsonstudio.com/resources/ • JSON data set of projects funded by the World Bank (436K compressed). • JSON data set for US zip (postal) codes (656K compressed). • JSON data set of listed stocks (1.6M compressed). • JSON data set for Enron emails (3.9M compressed). • JSON data set of startup company information (14.8M compressed). ‘companies.json’ repeated 8 times 5

6. Real world data { "_id": { "$oid": "52cdef7c4bab8bd675297d8b“ }, "name": "AdventNet", "permalink": "abc3", "crunchbase_url": "http://www.crunchbase.com/company/adventnet", "homepage_url": "http://adventnet.com", "blog_url": "", "blog_feed_url": "", "twitter_username": "manageengine", "category_code": "enterprise", "number_of_employees": 600, "founded_year": 1996, "deadpooled_year": 2, "tag_list": "", "alias_list": "Zoho ManageEngine ", "email_address": "pr@adventnet.com", "phone_number": "925-924-9500", "description": "Server Management Software", "created_at": { "$date": 1180121062000 }, "updated_at": "Wed Oct 31 18:26:09 UTC 2012", "overview": "<p>AdventNet is now <a href=\"/company/zoho-manageengine\" title=\"Zoho ManageEngine\" rel=\"nofollow\">Zoho ManageEngine</a>.</p>\n\n<p>Founded in 1996, AdventNet has served a diverse range of enterprise IT, networking and telecom customers.</p>\n\n<p>AdventNet supplies server and network management software.</p>", … 6

7. Real world data Loading data from the file SET max_heap_table_size = 1024 * 1048576; CREATE TABLE companies_src ( id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, value LONGTEXT NOT NULL, PRIMARY KEY(id) ) ENGINE=Memory; LOAD DATA INFILE ‘companies.json' INTO TABLE companies_src FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '' (value); 7

8. Real world data Parsing JSON documents CREATE TABLE companies_src_parsed( id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, value JSON NOT NULL, PRIMARY KEY(id) ) ENGINE=Memory AS SELECT * FROM companies_src; 8

9. Real world data Statistics: SELECT COUNT(*) FROM companies_src_parsed; 150408 SELECT COUNT(*) FROM companies_src_parsed WHERE JSON_VALID(value) <> 1; 0 SELECT MIN(LENGTH(value)), ROUND(AVG(LENGTH(value))), MAX(LENGTH(value)) FROM companies_src_parsed; 1027 4160 278016 9

10. Real world data More statistics: SELECT JSON_KEYS(value) AS keys, JSON_LENGTH(keys) FROM companies_src_parsed ORDER BY id LIMIT 1; ["_id", "name", "image", "offices", "blog_url", "overview", "partners", "products", "tag_list", "permalink", "alias_list", "created_at", "milestones", "updated_at", "acquisition", "description", "founded_day", "investments", "screenshots", "acquisitions", "competitions", "founded_year", "homepage_url", "phone_number", "video_embeds", "blog_feed_url", "category_code", "email_address", "founded_month", "providerships", "relationships", "crunchbase_url", "external_links", "funding_rounds", "deadpooled_year", "twitter_username", "total_money_raised", "number_of_employees"] 38 SELECT MIN(JSON_DEPTH(value)), MAX(JSON_DEPTH(value)) FROM companies_src_parsed; 3 7 10

11.XtraDB Compressed Columns COLUMN_FORMAT COMPRESSED

12. XtraDB Compressed Columns Availability: • Percona Server 5.6.33-79.0 https://www.percona.com/doc/percona-server/5.6/flexibility/compressed_columns.html • Percona Server 5.7.17-11 https://www.percona.com/doc/percona-server/5.7/flexibility/compressed_columns.html 12

13. XtraDB Compressed Columns Key characteristics: • It is a data type modifier, independent from user-level SQL and InnoDB data compression, that causes the data stored in the column to be compressed on writing to storage and decompressed on reading. • For all other purposes, the data type is identical to the one without the modifier, i.e. no new data types are created. • Compression is done by using the zlib library. 13

14. XtraDB Compressed Columns CREATE TABLE tbl ( id SERIAL PRIMARY KEY, value TEXT COLUMN_FORMAT COMPRESSED NOT NULL ) ENGINE=InnoDB; INSERT INTO tbl VALUES (DEFAULT, REPEAT(‘a’, 2048)); SELECT id, value FROM tbl; 14

15. XtraDB Compressed Columns Supported data types: • BLOB (including TINYBLOB, MEDIUMBLOB, LONGBLOG) • TEXT (including TINYTEXT, MEDIUMTEXT, LONGTEXT) • VARCHAR (including NATIONAL VARCHAR) • VARBINARY 15

16. XtraDB Compressed Columns Compressing/uncompressing existing data: ALTER TABLE tbl CHANGE value value TEXT COLUMN_FORMAT DEFAULT NOT NULL; ALTER TABLE tbl MODIFY value TEXT COLUMN_FORMAT DEFAULT NOT NULL; ALTER TABLE tbl CHANGE value value TEXT COLUMN_FORMAT COMPRESSED NOT NULL; ALTER TABLE tbl MODIFY value TEXT COLUMN_FORMAT COMPRESSED NOT NULL; 16

17. XtraDB Compressed Columns 5.7-specific features: • Support for compressed JSON data type • Support for compressed generated stored columns CREATE TABLE tbl( id SERIAL PRIMARY KEY, value JSON COLUMN_FORMAT COMPRESSED NOT NULL gen TEXT GENERATED ALWAYS AS (value->”$.bio”) STORED COLUMN_FORMAT COMPRESSED ) ENGINE=InnoDB; 17

18. XtraDB Compressed Columns Tuning: • innodb_compressed_columns_zip_level = (0..9) https://www.percona.com/doc/percona- server/5.7/flexibility/compressed_columns.html#innodb_compressed_columns_zip_level • innodb_compressed_columns_threshold https://www.percona.com/doc/percona- server/5.7/flexibility/compressed_columns.html#innodb_compressed_columns_threshold 18

19. XtraDB Compressed Columns Limitations (both for 5.6 and 5.7): • Compressed columns cannot be used in indices (neither on their own nor as parts of composite keys). • ALTER TABLE ... DISCARD/IMPORT TABLESPACE is not supported for tables with compressed columns. 19

20. XtraDB Compressed Columns 2-bytes compressed blob header: R W ALGORITHM LEN-LEN C UNUSED 0 1 2 6 7 9 10 11 15 • R – reserved for future versions (currently must always be 0) • ALGORITHM - identifies which algoritm was used to compress this BLOB. Currently, the only value 0 (meaning zlib) is supported • W – 'wrap' identifies if compression algorithm calculated a checksum (adler32 in case of zlib) and appended it to the compressed data • LEN-LEN – identifies the length of the column length data portion followed by this header 20

21. XtraDB Compressed Columns 2-bytes compressed blob header: R W ALGORITHM LEN-LEN C UNUSED 0 1 2 6 7 9 10 11 15 • C – 'compressed’. If set to 1, then this header is immediately followed by 1..8 bytes (depending on the value of LEN-LEN bitfield) which determine original (uncompressed) block size. These LEN-LEN bytes are followed by compressed representation of the original data. • If 'compressed' bit is set to 0, every other bitfield must be ignored. In this case the header is immediately followed by uncompressed (original) data. 21

22.XtraDB Compressed Columns Experiments Compression ratio and data load time

23. XtraDB Compressed Columns Experiments Copying data to a new table: CREATE TABLE companies_compressed( id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, value [ LONGTEXT | JSON ] COLUMN_FORMAT COMPRESSED NOT NULL, PRIMARY KEY(id) ) ENGINE=InnoDB; INSERT INTO companies_compressed SELECT * FROM companies_src; 23

24. XtraDB Compressed Columns Experiments Calculating compressed data size: SELECT data_length FROM information_schema.tables WHERE table_schema = DATABASE() AND table_name = 'companies_compressed’; 24

25. XtraDB Compressed Columns Experiments Data size, MB 1 000 898 898 898 898 898 900 880 880 880 880 880 800 700 600 500 387 377 377 400 304 284 283 300 200 100 0 LONGTEXT JSON COMPRESSED TEXT COMPRESSED JSON data_length(LVL0), MB data_length(LVL1), MB data_length(LVL6), MB data_length(LVL9), MB 25

26. XtraDB Compressed Columns Experiments Data load time, sec 60 53 52 50 48 40 36 36 36 36 36 33 33 33 33 30 30 28 22 20 17 10 0 LONGTEXT JSON COMPRESSED TEXT COMPRESSED JSON query_time(LVL0), sec query_time(LVL1), sec query_time(LVL6), sec query_time(LVL9), sec 26

27. XtraDB Compressed Columns Experiments Data size, % 120,00% 100,00% 100,00% 100,00% 100,00% 100,00% 98,00% 98,00% 98,00% 98,00% 98,00% 100,00% 80,00% 60,00% 43,08% 41,97% 41,97% 33,83% 31,60% 31,49% 40,00% 20,00% 0,00% LONGTEXT JSON COMPRESSED TEXT COMPRESSED JSON data_length_relative(LVL0), % data_length_relative(LVL1), % data_length_relative(LVL6), % data_length_relative(LVL9), % 27

28. XtraDB Compressed Columns Experiments Data load time, % 160,61% 180,00% 157,58% 145,45% 160,00% 140,00% 109,09% 109,09% 109,09% 109,09% 109,09% 120,00% 100,00% 100,00% 100,00% 100,00% 90,91% 84,85% 100,00% 66,67% 80,00% 51,52% 60,00% 40,00% 20,00% 0,00% LONGTEXT JSON COMPRESSED TEXT COMPRESSED JSON query_time_relative(LVL0), % query_time_relative(LVL1), % query_time_relative(LVL6), % query_time_relative(LVL9), % 28

29.zlib Compression Algorithm Overview