- What is special about time series
- What is ClickHouse
- How ClickHouse can be used for time series
1.ClickHouse for Time-Series Alexander Zaitsev
2.Agenda What is special about time series What is ClickHouse How ClickHouse can be used for time series
3.Altinity Background ● Premier provider of software and services for ClickHouse ● Incorporated in UK with distributed team in US/Canada/Europe ● Main US/Europe sponsor of ClickHouse community ● Offerings: ○ Enterprise support for ClickHouse and ecosystem projects ○ Software (Kubernetes, cluster manager, tools & utilities) ○ POCs/Training
4.What is time series? Time ordered events representing the process Monitoring change over time Finance Internet of Things
5.What is time series analytics? Measure the change: ● How something has been changed comparing to the past ● What changes are going on right now ● Predict changes in the future
6.Dedicated time series DBMSs grow! • InfluxDB • Prometheus • Kdb+ • TimescaleDB • Amazon Timestream • DolphinDB
7.What is special about time series DBMS? ● Optimized for very fast INSERT ● Efficient data storage, retention ● Aggregates, downsampling ● Fast queries Looks like ClickHouse!
9. ClickHouse is a powerful data warehouse that handles many use cases Understands SQL a b c d a b c d Runs on bare metal to cloud Stores data in columns Parallel and vectorized execution a b c d a b c d Scales to many petabytes Is Open source (Apache 2.0) Is WAY fast! http://clickhouse.yandex
10. ClickHouse is FAST! https://tech.marksblogg.com/benchmarks.html
11.Tables are split into indexed, sorted parts for fast queries Index Columns Part Indexed Table Sorted Index Columns Compressed Part Part
12. Merge Process re-sortes data in the background Part INSERT Part INSERT Time Merge Sort Part Part INSERT Merge Sort Part
13. Now we can follow how query works on a single server ClickHouse SELECT DevId, Type, avg(Value) FROM sdata WHERE MDate = '2018-01-01' Identify parts to search GROUP BY DevId, Type Query in parallel Result Set Aggregate results
14.If one server is not enough -- ClickHouse can scale out easily ClickHouse sdata_dist sdata (Distributed) (MergeTable) SELECT ... FROM ClickHouse sdata_dist sdata_dist sdata Result Set ClickHouse sdata_dist sdata
15. Built-in Replication and Failover provide high availability ClickHouse ClickHouse ReplicatedMergeTree Engine sdata_dist sdata_dist sdata sdata Zookeeper SELECT ... FROM ClickHouse ClickHouse Zookeeper sdata_dist sdata_dist sdata_dist sdata sdata Zookeeper Result Set ClickHouse ClickHouse sdata_dist sdata_dist sdata sdata
16.What are the main ClickHouse use patterns? ● Fast, scalable data warehouse for online services (SaaS and in-house apps) ● Built-in data warehouse for installed analytic applications ● Monitoring and Log Storage in-house solutions ● Exploration -- throw in a bunch of data and go crazy!
17.ClickHouse’s Four “F”-s: Fast! Flexible! Free! Fun!
18.ClickHouse for Time Series
19.Does ClickHouse fit for time series?
20.Does ClickHouse fit for time series? “One size does not fit all!” Michael Stonebraker. 2005
21.Does ClickHouse fit for time series? “ClickHouse не тормозит!” Alexey Milovidov. 2016
22.Does ClickHouse fit for time series? “One size does not fit all!” “ClickHouse не тормозит!” ? Michael Stonebraker Alexey Milovidov
23. November 2018 benchmark. TSBS ● https://github.com/timescale/tsbs ● ClickHouse vs TimescaleDB vs InfluxDB (vs Cassandra) ● Amazon r5.2xlarge instance, 8 vCPUs, 64GB RAM, EBS storage ● 100M rows, 10 metrics (columns) + metadata ● 15 test queries common for time series use cases, 8 threads https://www.altinity.com/blog/clickhouse-for-time-series
24.November 2018 benchmark. TSBS
25.November 2018 benchmark. TSBS Source raw data: 22.5GB
26.November 2018 benchmark. TSBS
27.November 2018 benchmark. TSBS
28.What have we learned? ● ClickHouse load performance is outstanding! * ● Compression is efficient, but not as good as InfluxDB’s ● Queries are fast, but can be even faster * It turned out later, it has been limited by storage performance reading source data
29.ClickHouse as time series DBMS Time series performance with flexibility of feature rich analytical SQL DBMS