16/11 Introduction to apache cassandra by datastax
展开查看详情
1.Getting started with Apache Apache Cassandra™ DuyHai DOAN Apache Cassandra™ Evangelist
2. 1 Apache Cassandra™ use-cases 2 Why do I need Apache Cassandra™ ? 3 Distribution, replication & consistency model 4 Features Summary © DataStax, All Rights Reserved. 2
3.Apache Cassandra™ use-cases
4.Before (< 2016) Collections/ Recommendation/ Playlists Personalization Fraud Internet of things/ detection Sensor data Messaging © 2016 DataStax, All Rights Reserved. 4
5.Before (< 2016) Collections/ Recommendation/ Playlists Personalization Fraud Internet of things/ detection Sensor data Messaging © 2016 DataStax, All Rights Reserved. 5
6.Before (< 2016) Collections/ Recommendation/ Playlists Personalization Fraud Internet of things/ detection Sensor data Messaging © 2016 DataStax, All Rights Reserved. 6
7.Today (≥ 2016) © 2016 DataStax, All Rights Reserved. 7
8.Today (≥ 2016) © 2016 DataStax, All Rights Reserved. 8
9.Today (≥ 2016) © 2016 DataStax, All Rights Reserved. 9
10.Today (≥ 2016) © 2016 DataStax, All Rights Reserved. 10
11.Today (≥ 2016) © 2016 DataStax, All Rights Reserved. 11
12.Why do I need Apache Cassandra™ ?
13.Linear Scalability YOU C* C* C* 1k+ nodes, PB+ NetcoSports 3 nodes, ≈3GB © DataStax, All Rights Reserved. 13
14.Continuous availability • thanks to the Dynamo architecture © DataStax, All Rights Reserved. 14
15.Multi data-centers/cloud native • out-of-the-box (config only) • AWS/GCE/Azure/CloudStack support • Cloud/Bare-metal © DataStax, All Rights Reserved. 15
16.Multi-DC usages Data locality, disaster recovery C* C* C* C* C* C* C* New York (DC1) London (DC2) Async C* C* replication C* C* C* C* © DataStax, All Rights Reserved. 16
17.Multi-DC usages Virtual DC for workload segregation C* C* C* Same room C* C* C* C* Production Analytics (LIVE) (Spark) Async C* C* replication C* C* C* C* © DataStax, All Rights Reserved. 17
18.Multi-DC usages Prod data copy for back-up/benchmark C* C* C* C* C* C* C* Use LOCAL_XXX My tiny test DC Consistency READ-ONLY!!! Levels Async C* C* replication C* C* C* C* © DataStax, All Rights Reserved. 18
19.Operational simplicity • 1 node = 1 process + 2 config files (cassandra.yaml + cassandra-rackdc.properties) • deployment automation (Ansible …) • No role between nodes, perfect symmetry © DataStax, All Rights Reserved. 19
20.Eco System • Apache Spark – Apache Cassandra integration • analytics • joins, aggregation • SparkSQL/Dataframe integration with CQL (predicates push down) • Apache Zeppelin – Apache Cassandra integration • web-based notebook • tabular/graph display © DataStax, All Rights Reserved. 20
21. !" Q&A © 2016 DataStax, All Rights Reserved. 21
22.Apache Cassandra™ Architecture
23.The Tokens Random hash of #partition à token = hash(#p) C* C* Hash: ] –x, x ] hash range: 264 values C* C* x = 264/2 C* C* C* C* © 2016 DataStax, All Rights Reserved. 23
24. Token Ranges ⎤ 3x ⎤ ⎤ x⎤ A:⎥⎥−x,− ⎥⎥ E:⎥⎥0, ⎥⎥ B C ⎦ 4⎦ ⎦ 4⎦ ⎤ 3x 2x ⎤ ⎤ x 2x ⎤ B:⎥⎥− ,− ⎥⎥ F :⎥⎥ , ⎥⎥ A D ⎦ 4 4⎦ ⎦4 4 ⎦ ⎤ 2x x ⎤ ⎤ 2x 3x ⎤ C:⎥⎥− ,− ⎥⎥ G:⎥⎥ , ⎥⎥ H E ⎦ 4 4⎦ ⎦4 4⎦ ⎤ x ⎤ ⎤ 3x ⎤ D:⎥⎥− ,0⎥⎥ H :⎥⎥ ,x ⎥⎥ G F ⎦ 4 ⎦ ⎦4 ⎦ © 2016 DataStax, All Rights Reserved. 24
25.Distributed Tables CREATE TABLE users( user_id int, B C …, PRIMARY KEY(user_id) ); A D user_id1 H E user_id2 user_id3 G F user_id4 user_id5 © 2016 DataStax, All Rights Reserved. 25
26.Distributed Tables B C user_id3 user_id4 A D H user_id2 E user_id1 G F user_id5 © 2016 DataStax, All Rights Reserved. 26
27.Linear Scalability Today = high load, production In danger B C A D H E G F © 2016 DataStax, All Rights Reserved. 27
28.Scaling Out +2 nodes to lower the pressure C D B E A F J G I H © 2016 DataStax, All Rights Reserved. 28
29. !" Q&A © 2016 DataStax, All Rights Reserved. 29