ClickHouse 2018 How to stop waiting for your queries to complete

ClickHouse 2018 How to stop waiting for your queries to complete and start having fun.

展开查看详情

1. ClickHouse 2018 How to stop waiting for your queries to complete and start having fun Alexander Zaitsev Altinity

2.Who am I M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics since 2007 Director of Engineering in LifeStreet Co-founder of Altinity – ClickHouse Service Provider 2

3... and I am not Peter’s brother :) 3

4.What Is ClickHouse? 4

5. 5 © http://mattturck.com/

6.ClickHouse DBMS is Column Store MPP Realtime SQL Open Source 6

7. http://clickhouse.yandex • Developed by Yandex for Yandex.Metrica - Yandex (NASDAQ: YNDX) – “Russian Google” (50% market share in search, 50+ b2b and b2c products) - Yandex.Metrica – world 2nd largest web analytics platform • Open Source since June 2016 (Apache 2.0 license) • 200+ companies using in production today • Several hundred experimenting, doing POC etc. • Dozens of contributors to the source code 7

8.Why Yet Another DBMS? 8

9.Flexible SQL 9

10.Commercial OpenSource Analytical Analytical DBMS DBMS 11

11.ClickHouse Fast! Flexible! Free! Fun! 12

12.How Fast? 13

13.1+ trillion rows table :) select count(*) from dw.ad8_fact_event; SELECT count(*) FROM dw.ad8_fact_event ┌───────count()─┐ │ 1261705085657 │ └───────────────┘ 1 rows in set. Elapsed: 3.552 sec. Processed 1.26 trillion rows, 1.26 TB (355.22 billion rows/s., 355.22 GB/s.) 14 Altinity Ltd. www.altinity.com

14.1+ trillion rows table :) select sum(price_cpm) from dw.ad8_fact_event where access_day=today()-1 and event_key=-2; SELECT sum(price_cpm) FROM dw.ad8_fact_event WHERE (access_day = (today() - 1)) AND (event_key = -2) ┌────sum(price_cpm)─┐ │ 87579.09035192338 │ └───────────────────┘ 1 rows in set. Elapsed: 0.168 sec. Processed 161.89 million rows, 2.91 GB (961.83 million rows/s., 17.31 GB/s.) 15 Altinity Ltd. www.altinity.com

15.WikiStat data, 28B rows. https://www.percona.com/blog/2017/03/17/column-store-database-benchmarks-mariadb-columnstore-vs-clickhouse-vs-apache-spark/ 16

16.“1.1 Billion Taxi Rides Benchmarks” http://tech.marksblogg.com/benchmarks.html Query 1 Query 2 Query 3 Query 4 Setup 0.034 0.061 0.178 0.498 MapD & 2-node p2.8xlarge cluster 0.051 0.146 0.047 0.794 kdb+/q & 4 Intel Xeon Phi 7210 CPUs - 2.415 3.599 4.962 ClickHouse at Altinity demo server 0.762 2.472 4.131 6.041 BrytlytDB 1.0 & 2-node p2.16xlarge cluster 1.034 3.058 5.354 12.748 ClickHouse, Intel Core i5 4670K 1.56 1.25 2.25 2.97 Redshift, 6-node ds2.8xlarge cluster 2 2 1 3 BigQuery 6.41 6.19 6.09 6.63 Amazon Athena 8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS 35 39 64 81 Presto, 5-node m3.xlarge cluster w/ HDFS 152 175 235 368 PostgreSQL 9.5 & cstore_fdw 17

17. 2016 LifeStreet benchmark (unpublished) • 19 queries, 1200M rows table, 3-node clusters 18

18.Time Series benchmarks (first time today!) https://github.com/timescale/tsbs Benchmark suite to automate testing Loads 103M rows, 10 metrics per row Runs 15 queries, 1000 runs each in 8 parallel threads Supports TimescaleDB, InfluxDB, Cassandra, MongoDB and ClickHouse (Altinity PR is submitted) 19

19. Load time (s) 900 800 700 600 500 400 300 200 100 0 ClickHouse TimescaleDB InfluxDB 20

20.80 “Light” queries, time in ms 70 60 50 40 30 ClickHouse TimescaleDB 20 InfluxDB 10 0 21

21.90 “Heavy” queries, time in sec 80 70 60 50 40 ClickHouse 30 TimescaleDB 20 InfluxDB 10 0 22

22.How flexible? 23

23.ClickHouse runs at Bare metal (any Linux) Amazon Azure VMware, VirtualBox Docker, K8s 24

24.ClickHouse solves business problems at: Mobile App and Web analytics AdTech bidding analytics Operational Logs analytics DNS queries analysis Stock correlation analytics Telecom Security audit Fintech SaaS Manufactoring process control BlockChain transactions analysis 25

25. Worldwide * www.altinity.com visits in 2018 26

26. Size does not matter Yandex: 500+ servers, 25B rec/day LifeStreet: 60 servers, 75B rec/day CloudFlare: 36 servers, 200B rec/day Bloomberg: 102 servers, 1000B rec/day Toutiao: 400 servers, moving to 1000 this month 27

27.How fun ☺ life←{↑1 ω∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂ω} 28

28.with (select groupArray(C) from C) as Ca select id, groupArray(S) Sa, groupArray(V) Va, groupArray(D) Da, groupArray(P) Pa, arrayMap(c -> arrayFirstIndex(s -> s > c, Sa)-1, Ca) Ka, arrayMap((c,k) -> Va[k] + (Va[k+1] - Va[k])/(Sa[k+1] - Sa[k])*(c-Sa[k]),Ca,Ka) Ta, arrayMap(s -> arrayFirstIndex(c -> c>s, Ca)>0 ? arrayFirstIndex(c -> c>s, Ca)-1 : toInt32(length(Ca)), Sa) Ja, arrayMap(i -> Ta[i], Ja) Ra, arrayMap((v,r) -> v - r, Va, Ra) ARa, arraySum((x,y,z) -> x*y*z, ARa, Da, Pa) result from T group by id 29

29.What’s new in 2018 • Table functions mysql/odbc/file/http • clickhouse-copier • Predicate pushdown for views/subselects • LowCardinality datatype • Decimal datatype • JOIN enhancements • ALTER TABLE UPDATE/DELETE • WITH ROLLUP … and tons of performance improvements and small features 30