Cassandra Day - Data Modeling

来源于:https://github.com/DataStax-Academy/cassandra-day-2019
详细介绍了Cassandra data model

展开查看详情

1. Cassandra Day DataStax Solution Days Introduction Cassandra Introduction to Data Modeling © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Data Modeling

2. SQL (relational) vs CQL (NoSQL Cassandra) © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Data Modeling

3. Structuring Your Database Normalization: To reduce data redundancy and increase data integrity. Denormalization: Must be done in read heavy workloads to increase performance © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

4. Normalization The process of structuring a relational database in accordance with a series of normal forms in order to reduce data redundancy and increase data integrity. © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

5. Relational Data Models Employees deptId First Last 1 Edgar Codd • Multiple normal forms 2 Raymond Boyce • most do not go beyond 3NF • Foreign Keys Department • Joins id Dept 1 Engineering 2 Math © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

6. Relational Modeling CREATE TABLE users ( id number(12) NOT NULL , firstname nvarchar2(25) NOT NULL , lastname nvarchar2(25) NOT NULL, email nvarchar2(50) NOT NULL, password nvarchar2(255) NOT NULL, created_date timestamp(6), • Create entity table PRIMARY KEY (id), CONSTRAINT email_uq UNIQUE (email) • Add constraints ); • Index fields -- Users by email address index CREATE INDEX idx_users_email ON users (email); • Foreign Key relationships CREATE TABLE videos ( id number(12), userid number(12) NOT NULL, name nvarchar2(255), description nvarchar2(500), location nvarchar2(255), location_type int, added_date timestamp, CONSTRAINT users_userid_fk FOREIGN KEY (userid) REFERENCES users (Id) ON DELETE CASCADE, PRIMARY KEY (id) ); © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

7.© 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

8. Relational Modeling Data Models Application © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

9. Cassandra Modeling Application Models Data © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

10. Denormalization “The process of trying to improve the read performance of a database at the expense of losing some write performance by adding redundant copies of data.” © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

11. CQL vs SQL Employees deptId First Last 1 Edgar Codd • No joins 2 Raymond Boyce • Limited aggregations Department SELECT e.First, e.Last, d.Dept id Dept FROM Department d, Employees e WHERE ‘Codd’ = e.Last 1 Engineering AND e.deptId = d.id 2 Math © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

12. Denormalization • Combine table columns into a single view • Eliminate the need for joins • Queries are concise and easy to understand Employees id First Last Dept SELECT First, Last, Dept FROM employees 1 Edgar Codd Engineering WHERE id = ‘1’ 2 Raymond Boyce Math © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

13. Denormalization in Apache Cassandra Denormalization of tables in Apache Cassandra is absolutely critical. The biggest take away is to think about your queries first. There are no JOINS in Apache Cassandra. © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

14. Relational Databases NoSQL Databases Query Query Query Table 1 Table 2 Table 1 Table 2 In a relational database, one query can In Apache Cassandra, you cannot join data, access and join data from multiple tables queries can only access data from one table © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

15. Modeling Queries • What are your application’s workflows? • Knowing your queries in advance is CRITICAL • Different from RDBMS because I can’t just JOIN or create a new indexes to support new queries • One table per one query © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

16. Some Application Workflows in KillrVideo Show latest User Logs into Search for a videos added site video by tag to the site Show Show video Show ratings comments for and its details for a video a video Show Show basic Show videos comments information added by a posted by a about user user user © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

17. Some Queries in KillrVideo to Support Workflows Users User Logs into Find user by email Show basic site address information Find user by id about user Comments Show Find comments by Show comments for comments Find comments by user a video video (latest first) posted by a (latest first) user Ratings Show ratings for a video Find ratings by video © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

18. Cassandra data modeling © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

19. The Cassandra Mind-shift Change the way you think about data modeling • Two Main concepts for successful modeling: • Begin with the query in mind • Denormalization is good © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

20. Cassandra Data Modeling Partition – the fundamental unit of access • A Partition has Clustering Columns Data Columns • A partition key Parition Partition 42 • Associated rows with Last Name First Name Address Email • Clustering columns Flintstone Dino 3 Stone St dino@gmail.com • Data columns Flintstone Fred 3 Stone St fred@gmail.com Flintstone Wilma 3 Stone St wilm@gmail.com Partition Key: Hash Rubble Barney 4 Rock Cir brub@gmail.com “Bedrock” 42 42 Function Rubble Betty 4 Rock Cir betr@gmail.com © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

21. Cassandra Data Modeling Cartoon Characters by City Table Tables hold many partitions 42 Last Name First Name Address Email ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- “Frostbite “Smurfville” “Orbit “Bedrock” Falls” City” Partition ---- ---- ---- ---- Last Name First Name Address Email Azrael Flintstone Jetson Moose Cat Fred George Bullwinkle 5 Tree 3 2 1 Blue St Stone Star StSt acat@gmail.com fred@gmail.com gjet@gmail.com moo@gmail.com 17 ---- ---- ---- ---- ---- ---- ---- ---- Smurf Flintstone Jetson Squirrel Brainy Dino Jane Rocky 7 Sky 2 3 Azure Stone Star Blvd StSt ln brain@gmail.com dino@gmail.com jane@gmail.com fly@gmail.com ---- ---- ---- ---- Smurf Flintstone Jetson Clumsy Wilma Judy 4 StarSt 3 2 Teal Cir Stone St clu@gmail.com wilm@gmail.com judy@gmail.com Smurf Rubble Greedy Barney 9 Rock 4 RoyalCir St gime@gmail.com brub@gmail.com 83 ---- ---- ---- ---- ---- ---- ---- ---- Smurf Rubble Papa Betty 2 Rock 4 Navy Cir Ct papa@gmail.com betr@gmail.com ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 92 ---- ---- ---- ---- ---- ---- ---- ---- © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

22. Cassandra Data Modeling Keyspaces contain many tables Keyspace © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

23. Cassandra Data Modeling CQL – Cassandra’s Query Language Keyspace Name Here’s how you create a keyspace in CQL: CREATE KEYSPACE cartoons Replication Strategy WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 }; Replication Factor by Datacenter © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

24. Cassandra Data Modeling CQL – Create table example Keyspace Table CREATE TABLE cartoons.characters_by_city ( city text, last_name text, Column first_name text, Definitions address text, email text, PRIMARY KEY ((city), last_name, first_name)); Partition Primary Clustering Key Key Columns © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

25. Cassandra Data Modeling CQLSH – Investigating our schema To list your keyspaces: DESCRIBE KEYSPACES; To look at a specific keyspace definition: DESCRIBE KEYSPACE cartoon; To look at a table definition: DESCRIBE TABLE cartoon.characters_by_city; © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

26. Enough Talk Already!!!! It’s time to play! • You each have a cluster • You will access your cluster using DSE Studio • Open a browser and go to <Your node’s IP Address>:9091 • Click on notebook “Data Modeling: Data Modeling Intro” • Work your way through the steps of this Intro © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

27. Data Modeling Intro Quick review • Key concepts: • Keyspace – contains tables • Table – contains partitions • Row – has a primary key and data columns • Partition – basic unit of storage/retrieval • Identified by partition key embedded within primary key • Contains one or more rows • Primary key – intra-table row identifier • Consists of partition key and clustering columns • Partition key – partition identifier, hashes to partition token • Clustering column – intra-partition key for sorting rows within partition © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics

28. Let’s do a Data Modeling Project Together! © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Data Modeling

29. Welcome to Cassandra-Land The Theme Park Where You Can Find… • Distributed & Fault-Tolerant Rides • Amazing Throughput p p ! • And Fast Response Times n A d a e e e N u t W B © 2019 DataStax. Use only with permission. • Cassandra Day • Cassandra Basics