18/02 - Introduct into Cassandra

展开查看详情

1. INTRODUCTION TO APACHE CASSANDRA Gökhan Atıl

2.GÖKHAN ATIL ➤ Database Administrator ➤ Oracle ACE Director (2016)
 ACE (2011) ➤ 10g/11g and R12 Oracle Certified Professional (OCP) ➤ Co-author of Expert Oracle Enterprise Manager 12c ➤ Founding Member and Vice President of TROUG ➤ Blogger (since 2008) gokhanatil.com ➤ Twitter: @gokhanatil 2

3.INTRODUCTION TO APACHE CASSANDRA ➤ What is Apache Cassandra? Why to use it? ➤ Cassandra Architecture ➤ Cassandra Query Language (CQL) ➤ Cassandra Data Modeling ➤ How to install and run Cassandra? ➤ Cassandra nodetool ➤ Backup and Recovery 3

4.WHAT IS APACHE CASSANDRA? WHY TO USE IT? 4

5.WHAT IS APACHE CASSANDRA? WHY TO USE IT? ➤ Fast Distributed (Column Family NoSQL) Database High availability Linear Scalability High Performance ➤ Fault tolerant on Commodity Hardware ➤ Multi-Data Center Support ➤ Easy to operate ➤ Proven: CERN, Netflix, eBay, GitHub, Instagram, Reddit 5

6.HIGH AVAILABILITY: CAP THEOREM AND CASSANDRA RDBMS Availability Atomicity Consistency
 Partition Consistency (ACID) Tolerance Isolation Durability 6

7.HIGH AVAILABILITY: THE RING NO MASTER NO SLAVE p ssi go e ! nl in m o I' gossip PEER TO PEER 7

8.LINEAR SCALABILITY 8

9.CASSANDRA ARCHITECTURE 9

10.CASSANDRA PARTITIONS EMAIL NAME PHONE gokhan@ Gokhan 542xxxxxxx aylin@ Aylin 532xxxxxxx ilayda@ Ilayda 532xxxxxxx PRIMARY KEY partitioner PARTITION KEY, CLUSTERING KEY 10

11.REPLICATION FACTOR EMAIL gokhan@ Murmur3Partitioner # 60 11

12.WRITE PATH (CLUSTER) coordinator client node hinted hand off 12

13. WRITE PATH (NODE) memtable flush mem disk commit log SSTable SSTable SSTable compaction ➤ Logging data in the commit log ➤ Flushing to (immutable) SSTables (Sorted Strings Table) ➤ Writing data to the memtable 13

14.READ PATH (CLUSTER) est e s t ig dig d coordinator data client node ➤ Read Repair: repair during read path using digest and timestamp 14

15. READ PATH (NODE) found memtable row (read) cache no partition partition key maybe bloom filter
 summary cache (maybe or no) mem found disk partition index SSTable 15

16.CONSISTENCY LEVELS ANY (write only) at least one node at least one/two/three replica ONE, TWO, THREE node a quorum (N/2+1) of replica QUORUM nodes across all datacenters a quorum (N/2+1) of replica LOCAL_QUORUM nodes in the same datacenter ALL on all replica nodes ➤ Formula for Strong Consistency: R + W > N 16

17.CASSANDRA QUERY LANGUAGE (CQL) 17

18.CASSANDRA QUERY LANGUAGE (CQL) ➤ Create a Keyspace (Database):
 create keyspace demo with replication = { 'class' : 'SimpleStrategy', 'replication_factor' :1 }; ➤ Remove a keyspace:
 drop keyspace demo; ➤ Select a keyspace to operate:
 use demo; 18

19.CASSANDRA QUERY LANGUAGE (CQL) ➤ Create a table:
 create table demo.democlients ( email text, name text, phone text, primary key (email, name)); ➤ Alter a table:
 EMAIL: PARTITION KEY NAME: CLUSTERING KEY alter table democlients add money int; ➤ Remove a table:
 drop table democlients; ➤ Remove all rows in a table:
 truncate table democlients; 19

20.CASSANDRA QUERY LANGUAGE (CQL) ➤ Retrieve rows:
 select * from democlients where name='Gokhan Atil' ALLOW FILTERING; -- or create a secondary index ➤ Retrieve distinct values:
 EMAIL: PARTITION KEY select DISTINCT email from democlients; ➤ Limit the number of rows returned:
 select * from democlients LIMIT 1; ➤ Sort the result:
 select * from democlients where email='gokhan at gokhanatil.com' ORDER by name DESC; NAME: CLUSTERING KEY 20

21.CASSANDRA QUERY LANGUAGE (CQL) ➤ Retrieve the results in the JSON format:
 select JSON * from democlients; ➤ Insert a row:
 insert into democlients (email, name, phone) values ('gokhan at gokhanatil.com','Gokhan Atil','542' ) IF NOT EXISTS; ➤ Insert a row with TTL (Time to live - seconds):
 insert into democlients (email, name, phone) values ('info at gokhanatil.com','Information','542' ) USING TTL 10; 21

22.CASSANDRA QUERY LANGUAGE (CQL) ➤ Update records:
 update democlients set phone='535' where email='gokhan at gokhanatil.com' and 
 name='Gokhan' IF EXISTS; ➤ Update records with a condition:
 update democlients set money=20 where email='gokhan at gokhanatil.com' and name='Gokhan Atil' 
 IF phone='542'; ➤ Delete rows:
 delete from democlients where email='gokhan at gokhanatil.com' IF EXISTS; 22

23.CASSANDRA QUERY LANGUAGE (CQL) ➤ Delete row with a condition:
 delete from democlients where email='gokhan at gokhanatil.com' and name='Gokhan Atil' IF money > 10; ➤ Delete columns in a row:
 delete money from democlients where email='gokhan at gokhanatil.com' and name='Gokhan Atil'; 23

24.CASSANDRA DATA MODELING ➤ Query-Driven Data Modeling ➤ Spread data evenly across the cluster ➤ Use Denormalization ➤ Be careful about using secondary indexes 24

25.HOW TO INSTALL AND RUN CASSANDRA? 25

26.HOW TO INSTALL AND RUN CASSANDRA CLUSTER? ➤ Make sure you have JDK (8u40 or newer) installed ➤ Download apache-cassandra-VERSION-bin.tar.gz ➤ Extract the file to a folder ➤ Make data and logs directories in cassandra folder ➤ Run bin/cassandra ➤ Edit the configuration file (conf/cassandra.yaml) ➤ Give a name to cluster, change listening address, data and logs directory locations, enable authentication and authorization. 26

27.HOW TO INSTALL AND RUN CASSANDRA CLUSTER? ➤ User docker to pull the latest image:
 docker pull cassandra ➤ Run it as standalone:
 docker run --name cas1 -p 9042:9042 -e CASSANDRA_CLUSTER_NAME=MyCluster -d cassandra ➤ Connect using clqsh:
 docker exec -it cas1 cqlsh ➤ Run nodetool (i.e for check status):
 docker exec -it cas1 nodetool status 27

28.CASSANDRA NODETOOL 28

29.CASSANDRA NODETOOL ➤ Get a quick summary of the node:
 nodetool info ➤ Get version of Cassandra:
 nodetool version 29