20_10 How Netflix Manages Petabyte Scale Apache Cassandra In The Cloud



1.How Netflix manages petabyte scale Apache Cassandra in the cloud Joey Lynch, Vinay Chella Netflix’s Distributed Database Engineers

2.Who are we? Vinay Chella Joey Lynch Distributed Systems Engineer Distributed Systems Engineer Focusing on Apache Cassandra and Data Distributed system addict and data wrangler Abstractions Cloud Data Engineering Cloud Data Engineering Netflix Netflix

3.Agenda Why use Cassandra? Scale of Cassandra Life of Cassandra Cluster - Where does it start? - Provisioning - Keep it running - Migration / Retiring Murphy’s law applied

4.Why — Millions of operations per sec Apache — Global data replication Cassandra — Failure isolation at rack level — Chaos ready database — Tunable consistency — Log structured storage engine

5.Scale — 10’s of thousands instances — 100’s of global C* clusters — >6 PB of data — Millions of requests / second — Replicating several GiB/sec data across the globe

6.Story of Apache Cassandra and Netflix Inception Provision Keep it running Migrations

7.Inception Where does it all start?

8.Inception Inception

9.Service philosophy — Context not control ◆ Education ◆ Tooling — SLOs are key ◆ Size, rate, latency, availability — Every party must be responsible Inception

10.Inception Inception

11.Invest in DevEd Inception

12.Inception Inception

13.Better Tooling

14.Cost insights Inception

15.Maintenance Inception

16.Maintenance - Repair Insights Inception



19.Maintenance - Backup Insights



22.Maintenance - Node Insights Inception


24. SLOs are Key Inception


26.Whom to page? Inception

27.Good contracts make good partners! Inception

28.Story of C* Inception Keep it running Migrations Provision

29.Provision Get up and running fast!

由Apache Cassandra PMC & Committers发起。致力于发布与传播Apache Cassandra技术,生态,最佳实践,前沿信息。