19_10 - Cassandra To Infinity And Beyond @ Teads

下载 3

快召唤伙伴们来围观吧
微博 QQ QQ空间 贴吧
文档嵌入链接
<iframe src="https://www.slidestalk.com/Cassandra/19_10CassandraToInfinityAndBeyondTeads87486?embed" frame border="0" width="640" height="360" scrolling="no" allowfullscreen="true">复制
微信扫一扫分享
已成功复制到剪贴板

中国Cassandra技术社区

发布于

5年前

4044

人观看

#信息技术

Cassandra @ Teads
About Teads
Workload
Infrastructure
Ops
Monitoring & Alerting
Data model
Why a fork?

展开查看详情

1 .The Global Media Platform Cassandra To Inﬁnity And Beyond Romain Hardouin Senior Cloud Infrastructure Engineer

2 .Cassandra @ Teads About Teads Workload Infrastructure Ops Monitoring & Alerting Data model Why a fork?

3 .About Teads

4 .

5 .Ad Slot Available Ad Available Video Start User click Video Complete

6 . French AdTech AWS / GCP Scala / JS / Go Machine learning Docker / CoreOS Terraform / Chef / Debian Cassandra / Kafka / MySQL / Redis Spark / Flink

7 .Workload

8 . Workload Diﬀerent kinds of workloads but all of them are latency sensitive Internet scale Massive amount of data ingested from partners We also create lots of data by ourselves No more analytics ● Tons of business critical counters ● Time series ● TTL, TTL, TTL

9 . 3 years ago 1 million qps Write heavy Analytics stack

10 . Now +2 millions qps Read heavy…

11 . Now ...but also lots of streamed SSTables not counted as writes

12 .Infrastructure

13 .Gimme some ﬁgures! 250 nodes 3 regions Mostly ephemeral data Regional vs Worldwide 28 TB 21 DCs 100 billions keys

14 . Does it scale? Requests vs Nodes over the past year (linreg)

15 .Our Stack

16 .AWS instances c5d.4xlarge 16 vCPU @ 3.00GHz 32 GB RAM 400 GB NVMe

17 . Why not i3? Production P99 read latency i3.2xl arge

18 . Why not i3? Production P99 write latency i3.2xl arge

19 .Why not EBS? No more EBS Cheap storage, great for STCS Snapshots (S3 backup) No coupling between disks and CPU/RAM High latency, high I/O wait Throughput: 160 MB/s Unsteady performances

20 .Ops

21 . Ops Workﬂow e o u t Scal n a l e i S c n g es h a Any c More on https://medium.com/teads-engineering/easy-cassandra-scaling-with-terraform-chef-rundeck-9443e0375aa7

22 . Ops Workﬂow apply

23 . Ops Workﬂow t a n ce b l e ins e a c ha Unr e p l a ce R replace apply

24 .Repair No incremental repair Scheduled with Reaper http://cassandra-reaper.io

25 .Monitoring & Alerting

26 . Monitoring Overview dashboards Advanced dashboards for troubleshooting Alerting dashboards Outliers Compare a node to average Compare 3 DCs (multi regions)

27 . Monitoring Ratio cross DCs/Clusters to grasp workloads Examples: ● R/W Spread: ( (maxqps-minqps)/maxqps)*100 ● P99/95 jitter factor: ( P99 - P95 ) / P99 ● Memory cached / disk ratio

28 . Monitoring YAML Conﬁguration - include: bean_regex: org.apache.cassandra.metrics:type=ReadRepair,name=.* attribute: - Count - include: bean: org.apache.cassandra.metrics:type=CommitLog,name=TotalCommitLogSize

29 . Alerting Down node Exceptions Commitlog size High latency High pendings tasks Many hints Clock out of sync IO Wait Disk space ...

7点赞

2收藏

3下载