Clickhouse for Timeseries

  • What is special about time series
  • What is ClickHouse
  • How ClickHouse can be used for time series
展开查看详情

1.ClickHouse for Time-Series Alexander Zaitsev

2.Agenda What is special about time series What is ClickHouse How ClickHouse can be used for time series

3.Altinity Background ● Premier provider of software and services for ClickHouse ● Incorporated in UK with distributed team in US/Canada/Europe ● Main US/Europe sponsor of ClickHouse community ● Offerings: ○ Enterprise support for ClickHouse and ecosystem projects ○ Software (Kubernetes, cluster manager, tools & utilities) ○ POCs/Training

4.What is time series? Time ordered events representing the process Monitoring change over time Finance Internet of Things

5.What is time series analytics? Measure the change: ● How something has been changed comparing to the past ● What changes are going on right now ● Predict changes in the future

6.Dedicated time series DBMSs grow! • InfluxDB • Prometheus • Kdb+ • TimescaleDB • Amazon Timestream • DolphinDB

7.What is special about time series DBMS? ● Optimized for very fast INSERT ● Efficient data storage, retention ● Aggregates, downsampling ● Fast queries Looks like ClickHouse!

8.ClickHouse Overview

9. ClickHouse is a powerful data warehouse that handles many use cases Understands SQL a b c d a b c d Runs on bare metal to cloud Stores data in columns Parallel and vectorized execution a b c d a b c d Scales to many petabytes Is Open source (Apache 2.0) Is WAY fast! http://clickhouse.yandex

10. ClickHouse is FAST! https://tech.marksblogg.com/benchmarks.html

11.Tables are split into indexed, sorted parts for fast queries Index Columns Part Indexed Table Sorted Index Columns Compressed Part Part

12. Merge Process re-sortes data in the background Part INSERT Part INSERT Time Merge Sort Part Part INSERT Merge Sort Part

13. Now we can follow how query works on a single server ClickHouse SELECT DevId, Type, avg(Value) FROM sdata WHERE MDate = '2018-01-01' Identify parts to search GROUP BY DevId, Type Query in parallel Result Set Aggregate results

14.If one server is not enough -- ClickHouse can scale out easily ClickHouse sdata_dist sdata (Distributed) (MergeTable) SELECT ... FROM ClickHouse sdata_dist sdata_dist sdata Result Set ClickHouse sdata_dist sdata

15. Built-in Replication and Failover provide high availability ClickHouse ClickHouse ReplicatedMergeTree Engine sdata_dist sdata_dist sdata sdata Zookeeper SELECT ... FROM ClickHouse ClickHouse Zookeeper sdata_dist sdata_dist sdata_dist sdata sdata Zookeeper Result Set ClickHouse ClickHouse sdata_dist sdata_dist sdata sdata

16.What are the main ClickHouse use patterns? ● Fast, scalable data warehouse for online services (SaaS and in-house apps) ● Built-in data warehouse for installed analytic applications ● Monitoring and Log Storage in-house solutions ● Exploration -- throw in a bunch of data and go crazy!

17.ClickHouse’s Four “F”-s: Fast! Flexible! Free! Fun!

18.ClickHouse for Time Series

19.Does ClickHouse fit for time series?

20.Does ClickHouse fit for time series? “One size does not fit all!” Michael Stonebraker. 2005

21.Does ClickHouse fit for time series? “ClickHouse не тормозит!” Alexey Milovidov. 2016

22.Does ClickHouse fit for time series? “One size does not fit all!” “ClickHouse не тормозит!” ? Michael Stonebraker Alexey Milovidov

23. November 2018 benchmark. TSBS ● https://github.com/timescale/tsbs ● ClickHouse vs TimescaleDB vs InfluxDB (vs Cassandra) ● Amazon r5.2xlarge instance, 8 vCPUs, 64GB RAM, EBS storage ● 100M rows, 10 metrics (columns) + metadata ● 15 test queries common for time series use cases, 8 threads https://www.altinity.com/blog/clickhouse-for-time-series

24.November 2018 benchmark. TSBS

25.November 2018 benchmark. TSBS Source raw data: 22.5GB

26.November 2018 benchmark. TSBS

27.November 2018 benchmark. TSBS

28.What have we learned? ● ClickHouse load performance is outstanding! * ● Compression is efficient, but not as good as InfluxDB’s ● Queries are fast, but can be even faster * It turned out later, it has been limited by storage performance reading source data

29.ClickHouse as time series DBMS Time series performance with flexibility of feature rich analytical SQL DBMS