16/11 Introduction to apache cassandra by datastax

1 Apache Cassandra™ use-cases 2 Why do I need Apache Cassandra™ ? 3 Distribution, replication & consistency model 4 Features Summary
展开查看详情

1.Getting started with Apache Apache Cassandra™ DuyHai DOAN Apache Cassandra™ Evangelist

2. 1 Apache Cassandra™ use-cases 2 Why do I need Apache Cassandra™ ? 3 Distribution, replication & consistency model 4 Features Summary © DataStax, All Rights Reserved. 2

3.Apache Cassandra™ use-cases

4.Before (< 2016) Collections/ Recommendation/ Playlists Personalization Fraud Internet of things/ detection Sensor data Messaging © 2016 DataStax, All Rights Reserved. 4

5.Before (< 2016) Collections/ Recommendation/ Playlists Personalization Fraud Internet of things/ detection Sensor data Messaging © 2016 DataStax, All Rights Reserved. 5

6.Before (< 2016) Collections/ Recommendation/ Playlists Personalization Fraud Internet of things/ detection Sensor data Messaging © 2016 DataStax, All Rights Reserved. 6

7.Today (≥ 2016) © 2016 DataStax, All Rights Reserved. 7

8.Today (≥ 2016) © 2016 DataStax, All Rights Reserved. 8

9.Today (≥ 2016) © 2016 DataStax, All Rights Reserved. 9

10.Today (≥ 2016) © 2016 DataStax, All Rights Reserved. 10

11.Today (≥ 2016) © 2016 DataStax, All Rights Reserved. 11

12.Why do I need Apache Cassandra™ ?

13.Linear Scalability YOU C* C* C* 1k+ nodes, PB+ NetcoSports 3 nodes, ≈3GB © DataStax, All Rights Reserved. 13

14.Continuous availability •  thanks to the Dynamo architecture © DataStax, All Rights Reserved. 14

15.Multi data-centers/cloud native •  out-of-the-box (config only) •  AWS/GCE/Azure/CloudStack support •  Cloud/Bare-metal © DataStax, All Rights Reserved. 15

16.Multi-DC usages Data locality, disaster recovery C* C* C* C* C* C* C* New York (DC1) London (DC2) Async C* C* replication C* C* C* C* © DataStax, All Rights Reserved. 16

17.Multi-DC usages Virtual DC for workload segregation C* C* C* Same room C* C* C* C* Production Analytics (LIVE) (Spark) Async C* C* replication C* C* C* C* © DataStax, All Rights Reserved. 17

18.Multi-DC usages Prod data copy for back-up/benchmark C* C* C* C* C* C* C* Use LOCAL_XXX My tiny test DC Consistency READ-ONLY!!! Levels Async C* C* replication C* C* C* C* © DataStax, All Rights Reserved. 18

19.Operational simplicity •  1 node = 1 process + 2 config files (cassandra.yaml + cassandra-rackdc.properties) •  deployment automation (Ansible …) •  No role between nodes, perfect symmetry © DataStax, All Rights Reserved. 19

20.Eco System •  Apache Spark – Apache Cassandra integration •  analytics •  joins, aggregation •  SparkSQL/Dataframe integration with CQL (predicates push down) •  Apache Zeppelin – Apache Cassandra integration •  web-based notebook •  tabular/graph display © DataStax, All Rights Reserved. 20

21. !" Q&A © 2016 DataStax, All Rights Reserved. 21

22.Apache Cassandra™ Architecture

23.The Tokens Random hash of #partition à token = hash(#p) C* C* Hash: ] –x, x ] hash range: 264 values C* C* x = 264/2 C* C* C* C* © 2016 DataStax, All Rights Reserved. 23

24. Token Ranges ⎤ 3x ⎤ ⎤ x⎤ A:⎥⎥−x,− ⎥⎥ E:⎥⎥0, ⎥⎥ B C ⎦ 4⎦ ⎦ 4⎦ ⎤ 3x 2x ⎤ ⎤ x 2x ⎤ B:⎥⎥− ,− ⎥⎥ F :⎥⎥ , ⎥⎥ A D ⎦ 4 4⎦ ⎦4 4 ⎦ ⎤ 2x x ⎤ ⎤ 2x 3x ⎤ C:⎥⎥− ,− ⎥⎥ G:⎥⎥ , ⎥⎥ H E ⎦ 4 4⎦ ⎦4 4⎦ ⎤ x ⎤ ⎤ 3x ⎤ D:⎥⎥− ,0⎥⎥ H :⎥⎥ ,x ⎥⎥ G F ⎦ 4 ⎦ ⎦4 ⎦ © 2016 DataStax, All Rights Reserved. 24

25.Distributed Tables CREATE TABLE users( user_id int, B C …, PRIMARY KEY(user_id) ); A D user_id1 H E user_id2 user_id3 G F user_id4 user_id5 © 2016 DataStax, All Rights Reserved. 25

26.Distributed Tables B C user_id3 user_id4 A D H user_id2 E user_id1 G F user_id5 © 2016 DataStax, All Rights Reserved. 26

27.Linear Scalability Today = high load, production In danger B C A D H E G F © 2016 DataStax, All Rights Reserved. 27

28.Scaling Out +2 nodes to lower the pressure C D B E A F J G I H © 2016 DataStax, All Rights Reserved. 28

29. !" Q&A © 2016 DataStax, All Rights Reserved. 29