DataStax Top 10 Best Practices

DataStax Professional Services
展开查看详情

1. DataStax Top 10 Best Practices 1 © DataStax, All Rights Reserved. academy.datastax.com | @jscarp

2. DataStax Top 10 Best Practices • Thanks to DataStax Professional Services • Related post: https://academy.datastax.com/top10best practices 2 © DataStax, All Rights Reserved. academy.datastax.com | @jscarp

3. Why we ❤️ Apache Cassandra • Distributed, decentralized • Elastic scalability – add/remove nodes with no downtime • High performance • High availability / fault tolerant – no single point of failure • How do we realize these benefits? 3 © DataStax, All Rights Reserved. academy.datastax.com | @jscarp

4. 1. Know your access patterns 4 © DataStax, All Rights Reserved. academy.datastax.com | @jscarp

5. Relational vs. Cassandra Data Modeling • Relational Approach • Cassandra Approach Data Application Models Models Application Data 5 © DataStax, All Rights Reserved. academy.datastax.com | @jscarp

6. KillrVideo Reference Application 6 © DataStax, All Rights Reserved. academy.datastax.com | @jscarp

7. Application Workflow in KillrVideo Show latest User Logs Search for a videos added into site video by tag to the site Show video Show Show ratings and its comments for a video details for a video Show Show basic Show videos comments information added by a posted by a about user user user 7 © DataStax, All Rights Reserved. academy.datastax.com | @jscarp

8. Take the KillrVideo Tour 8 © DataStax, All Rights Reserved. academy.datastax.com | @jscarp

9. 2. Get your data model right 9 © DataStax, All Rights Reserved. academy.datastax.com | @jscarp

10.10

11. CREATE TABLE users ( Relational Modeling id number(12) NOT NULL , firstname nvarchar2(25) NOT NULL , lastname nvarchar2(25) NOT NULL, email nvarchar2(50) NOT NULL, password nvarchar2(255) NOT NULL, • Create entity table created_date timestamp(6), PRIMARY KEY (id), • Add constraints ); CONSTRAINT email_uq UNIQUE (email) • Index fields -- Users by email address index CREATE INDEX idx_users_email ON users (email); • Foreign Key relationships CREATE TABLE videos ( id number(12), userid number(12) NOT NULL, name nvarchar2(255), • SQL != CQL description nvarchar2(500), location nvarchar2(255), location_type int, added_date timestamp, CONSTRAINT users_userid_fk FOREIGN KEY (userid) REFERENCES users (Id) ON DELETE CASCADE, PRIMARY KEY (id) ); 11

12. Queries in KillrVideo to Support Workflows Users User Logs into Find user by email Show basic information Find user by id site address about user Comments Show Show Find comments by comments Find comments by comments for a video video (latest first) posted by a user user (latest first) Ratings Show ratings for a video Find ratings by video 12 © DataStax, All Rights Reserved. academy.datastax.com | @jscarp

13. Designing Tables Based on Queries Show video Show videos Find videos by user (latest and its Find video by id added by a details user first) CREATE TABLE videos ( CREATE TABLE user_videos ( videoid uuid, userid uuid, userid uuid, added_date timestamp, name text, videoid uuid, description text, name text, location text, preview_image_location text, location_type int, PRIMARY KEY (userid, preview_image_location text, added_date, videoid) tags set<text>, ) added_date timestamp, WITH CLUSTERING ORDER BY ( PRIMARY KEY (videoid) added_date DESC, ); videoid ASC); 13 © DataStax, All Rights Reserved. academy.datastax.com | @jscarp

14. Designing for fast access CREATE TABLE user_videos ( userid uuid, added_date timestamp, videoid uuid, name text, Partition key – preview_image_location text, which node(s) PRIMARY KEY (userid, added_date, videoid) ) WITH CLUSTERING ORDER BY ( added_date DESC, videoid ASC); Clustering columns – … uniqueness layout on disk 14 © DataStax, All Rights Reserved. academy.datastax.com | @jscarp

15. Data Modeling Best Practices • Table per query • Highly recommended: DS220: Practical • Use denormalization to minimize Application Data Modeling with Apache number of queries required Cassandra • Make sure primary key guarantees uniqueness • Use bucketing to break up large partitions 15 © DataStax, All Rights Reserved. academy.datastax.com | @jscarp

16. 3. Avoid tombstones 16 © DataStax, All Rights Reserved. academy.datastax.com | @jscarp

17. Deletion and Tombstones • Append-only storage model – SSTables are immutable • Tombstones used to explicitly indicate deleted data – Prevent accidental restoration of deleted data • Data actually cleaned up during compaction • Large numbers of tombstones can affect reads – Example log output Read 1 live and 123780 tombstoned cells | 19:48:36,710 | 127.0.0.1 | 128631 17 © DataStax, All Rights Reserved. academy.datastax.com | @jscarp

18. Avoid Deletes and Writing Nulls INSERT INTO myTable (primary_key, clustering_key) VALUES ('pk1', 'ck1'); VS INSERT INTO myTable (primary_key, clustering_key, regular_col) VALUES ('pk1', 'ck1', null); ⇒ Second version writes a tombstone 18 © DataStax, All Rights Reserved. academy.datastax.com | @jscarp

19. Tombstones - Mitigating • Use a journal-style data model • Set time to live (TTL) • Delete the largest possible amount of data at once – Range delete > Partition delete > row delete > cell delete > collection item delete • http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html • https://academy.datastax.com/support-blog/cleaning-tombstones-datastax-dse-and- apache-cassandra • https://academy.datastax.com/units/compaction-compaction-and-tombstones 19 © DataStax, All Rights Reserved. academy.datastax.com | @jscarp

20. 4. Know your drivers 20 © DataStax, All Rights Reserved. academy.datastax.com | @jscarp

21. DataStax Drivers • OSS Cassandra Drivers • DataStax Enterprise Drivers – CQL Support – OSS Driver features plus... – Sync / Async API – Unified Authentication – Load Balancing – Graph Fluent API – Auto Node Discovery – Geometric Types – Object Mapper • ODBC • JDBC 21 © DataStax, All Rights Reserved. @DataStaxAcademy #DataStaxDeveloperDay

22. Driver Documentation Apache Cassandra Drivers (Open DataStax Drivers (DataStax Source) Enterprise) DataStax Java Driver DataStax Enterprise Java Driver DataStax Python Driver DataStax Enterprise Python Driver DataStax Node.js Driver DataStax Enterprise Node.js Driver DataStax Ruby Driver DataStax Enterprise Ruby Driver DataStax C# Driver DataStax Enterprise C# Driver DataStax C/C++ Driver DataStax Enterprise C/C++ Driver DataStax PHP Driver DataStax Enterprise PHP Driver 22 © DataStax, All Rights Reserved. Confidential

23. DataStax Driver Tips and Tricks • Common features: • Tips – Connection management – Load balancing policy – Creating and executing statements, and – Retry policy accessing the results – Connection pool settings – Synchronous and asynchronous execution – Asynchronous operations – Object mapping – Logging and metrics – Policy management • Coming soon: Getting Started with – Threading, networking and resource management Drivers quick courses on DataStax Academy – Schema access / management 23 © DataStax, All Rights Reserved. Confidential

24. 5. Plan for and practice operations 24 © DataStax, All Rights Reserved. academy.datastax.com | @jscarp

25. Do you have a Run Book? • Installation • Upgrade • Scaling up • Scaling down • Node replacement • Repairs • Backup • Restore • Monitoring • Tuning 25 © DataStax, All Rights Reserved. academy.datastax.com | @jscarp

26. OpsCenter • Browser-based DSE cluster tool for: – Configuring – Monitoring – Managing • Two major components tied together: – OpsCenter Monitoring - monitoring and management – Life Cycle Manager (LCM) - mostly configuration and deployment 26 © DataStax, All Rights Reserved. academy.datastax.com | @jscarp

27. Start with Security Bad security results in data breaches. DSE has a number of features that can be used to secure your data at all stages. • End to end encryption • Data auditing • LDAP integration • Kerberos integration • Role based access control • Row-level access control 27 © DataStax, All Rights Reserved. @DataStaxAcademy #DataStaxDeveloperDay

28. 6. Do performance testing 28 © DataStax, All Rights Reserved. academy.datastax.com | @jscarp

29. Performance Testing Tips • Use your actual data model and realistic test data • Measure against SLAs – 99th percentile • Happy path and error / load conditions • Automate using Cassandra Stress, Gatling • Incorporate performance monitoring into operations 29 © DataStax, All Rights Reserved. academy.datastax.com | @jscarp