ClickHouse at Messagebird - analysing billions of events in real-time

在messagebird,我们每个月向全世界发送数亿条消息,这将产生数十亿个事件。提供实时*分析是能够快速向客户传递这些消息的关键。从衡量和优化我们平台的性能,到为我们的客户提供洞察,Clickhouse在我们的组织内部被广泛使用。
在本文中,我们将研究Clickhouse如何允许我们摄取大量数据并运行复杂的交互式分析查询。我们还介绍了使Clickhouse引起我们注意的业务需求,并详细介绍了其部署过程。我们涵盖我们所面临的问题,以及我们如何处理这些问题。我们将讨论当前的云生产设置以及如何部署和使用它。最后但同样重要的是,我们讨论了我们在这一过程中所犯的错误,以及我们自己运行和维护一个Clickhouse集群所学到的东西。

展开查看详情

1.Clickhouse at MessageBird Analysing billions of events in real-time* Aleksandar Aleksandrov & Félix Mattrat NOVEMBER 2018

2. About us Data engineers & Team leads Aleksandar Félix Aleksandrov Mattrat 2

3.ABOUT 225+ Agreements Introducing We have 225+ direct-to-carrier agreements with operators worldwide. MessageBird MessageBird is a cloud communications platform that 15,000+ Customers empowers consumers to communicate with your Customers in over 60+ countries, across a business in the same way they communicate with their great variety of industries. friends - seamlessly, on their own timeline and with the context of previous conversations. 180+ Employees More than 180 employees speaking over 20 languages based in the Americas, For additional information visit: www.messagebird.com Europe & Asia. 3

4.What’s on the menu? 01. Data at MessageBird 02. The past - Age Of Darkness 03. Enlightenment - ClickHouse use case 04. What’s next? - Nirvana 4

5.DATA AT MESSAGEBIRD Needs Mostly about statistics and reporting Internal needs External needs • State of the system • Customer dashboard • Routing SMS • Reporting API • Training algorithms • ML Models 5

6.DATA AT MESSAGEBIRD The landscape • Multiple carriers is messy - no uniformity of the data • SMS messages go through many state changes up to months into the past • Pricing (both carrier and customer) changes retro-actively 6

7.Age of Darkness 7

8.AGE OF DARKNESS Hello CRON my old friend • MySQL based • Aggregates re-computed every X period of time • Served us well for +5 years 8

9.AGE OF DARKNESS Scaling problems • The system had difficulty scaling and was often lagging • Loss of granularity with pre-aggregation • Performed poorly while doing analytical queries • Inaccuracies 9

10.ENLIGHTENMENT - QUEST FOR AN ALTERNATIVE Re-thinking data collection • Able to keep up with continuously changing SMS message states • In real time* • Scalable to handle MessageBird’s global growth • More flexible to accommodate wider use of data 10

11.ENLIGHTENMENT - QUEST FOR AN ALTERNATIVE Introducing event sourcing • Event sourcing, fairly common technique • An immutable stream of events from which all states can be derivate 11

12.12

13.ENLIGHTENMENT - QUEST FOR AN ALTERNATIVE Introducing event sourcing • Problem: now we have increased our data by an order of magnitude. • How can we query this efficiently? 13

14.14

15.ENLIGHTENMENT - QUEST FOR AN ALTERNATIVE What is our unicorn database? • Able to ingest large amount of data • Data available immediately after ingestion • No loss of granularity • Flexible querying capabilities • Sub-second response time • Horizontally scalable 15

16.ENLIGHTENMENT - QUEST FOR AN ALTERNATIVE Vitess • Let’s shard the data • Now we have N shards of problems • Still has the limitations of MySQL • Poor analytical support (at the time) 16

17.ENLIGHTENMENT - QUEST FOR AN ALTERNATIVE Kudu/Impala • Promising, very clean and well defined SQL interface • Compatible with HDFS & Parquets • Column oriented • But unable to reach sub-second querying time over billions of rows 17

18.ENLIGHTENMENT - QUEST FOR AN ALTERNATIVE Google BigQuery • Scale well, millions or billions doesn’t matter • Fully managed: it’s someone else problem • Standard SQL support • Not open source • Not made for sub-second querying 18

19.ENLIGHTENMENT - CLICKHOUSE USE CASE ClickHouse 19

20.20

21.ENLIGHTENMENT - CLICKHOUSE USE CASE ClickHouse • Able to ingest a huge amount of data • Sub-second on large dataset of non-aggregated data • Flexible query capabilities: SQLish dialect • Column oriented • Scales very well vertically • Horizontally scalable • Open source 21

22.ENLIGHTENMENT - CLICKHOUSE USE CASE ClickHouse SELECT toStartOfQuarter(created_at) AS Quarter, 30 rows in set. mcc AS Country, Elapsed: 0.33sec. floor(sum(sign * rate)) AS Total, Processed 497.91 million rows, sum(sign) AS MessageCount 4.95 GB FROM messages (1.42 billions rows/s., 14.39 GB/s.) WHERE created_at >= '2018-01-01' AND customer = 666 GROUP BY Quarter, Country 22

23.ENLIGHTENMENT - CLICKHOUSE USE CASE ClickHouse what’s the trick? • Column oriented, you only pay for what you select • Each column can potentially be processed in parallel • Carefully crafted code makes use of vectorisation instructions • Different table engines fit for different needs • Horizontally scalable 23

24.So, how to ingest ever changing data into ClickHouse 24

25.ENLIGHTENMENT - CLICKHOUSE USE CASE CollapsingMergeTree • You write twice the amount of data, but eventually end up with a single row per PK • Based on the idea of log compaction • Excels at analytical queries on a large amount of data 25

26. Primary key style ENLIGHTENMENT - CLICKHOUSE USE CASE Collapsing what? sign date id status price 1 2018-10-08 666 ACCEPTED 0.01 26

27.ENLIGHTENMENT - CLICKHOUSE USE CASE Collapsing what? sign date id status price 1 2018-10-08 666 ACCEPTED 0.01 -1 2018-10-08 666 ACCEPTED 0.01 27

28.ENLIGHTENMENT - CLICKHOUSE USE CASE Collapsing what? sign date id status price 1 2018-10-08 666 ACCEPTED 0.01 -1 2018-10-08 666 ACCEPTED 0.01 1 2018-10-08 666 DELIVERED 0.05 28

29.ENLIGHTENMENT - CLICKHOUSE USE CASE Collapsing what? SELECT sum(sign * price) AS total FROM dataset sign date id status price 1 2018-10-08 666 ACCEPTED 0.01 -1 2018-10-08 666 ACCEPTED 0.01 1 2018-10-08 666 DELIVERED 0.05 29