20_07 Using The TLP ToolChain As A Crystal Ball For Your Cluster

在集群中使用TLP工具链来性能测试/调整•研究设置,确认代码行为、比较/测试C *版本

展开查看详情

1.Apachecon 2019 ANTHONY GRASSO USING THE TLP TOOLCHAIN AS A CRYSTAL BALL FOR YOUR CLUSTER

2.APACHECON 2019 ABOUT ME 2

3.APACHECON 2019 ABOUT THE LAST PICKLE • Specialise in Apache Cassandra • Help teams with • Delivery • Improving • We want you to be successful 3

4.APACHECON 2019 TLP TOOL CHAIN • Performance testing/tuning • Investigate settings • Confirm code behaviour • Compare/test C* versions 4

5.APACHECON 2019 REAL WORLD ™ PROBLEM • Are we confident that changing a setting will help the cluster? 5

6.APACHECON 2019 WE NEED A STRESS TOOL • Apply production workloads to the cluster • What about Cassandra stress? 6

7.APACHECON 2019 PROBLEMS WITH CASSANDRA-STRESS • Infrequent updates • Simple workloads • Hard to configure and use 7

8.APACHECON 2019 NEW STRESS TOOL • Ships with common workloads • Workloads configurable • Easy to add new workloads 8

9.APACHECON 2019 TLP-STRESS •Uses Datastax Java driver •Metrics use driver instrumentation 9

10.APACHECON 2019 TLP-STRESS: WORKLOADS • BasicTimeSeries • CountersWide • KeyValue • LWT • Locking • Maps • MaterializedViews • RandomPartitionAccess • UdtTimeSeries 10

11.APACHECON 2019 TLP-STRESS: COMMANDS • run - Run a workload • info - Information about a workload • list - List all known workloads 11

12.APACHECON 2019 TLP-STRESS: KEY VALUE WORKLOAD EXAMPLE • KeyValue workload • 70% reads • 100,000 partitions • 10,000,000 iterations • 50,000 rows pre-populated 12

13.APACHECON 2019 TLP-STRESS: KEY VALUE WORKLOAD EXAMPLE $ tlp-stress run KeyValue \ --reads 0.7 \ --partitions 100k \ --iterations 10M \ --populate 50k 13

14.APACHECON 2019 Creating tlp_stress: CREATE KEYSPACE IF NOT EXISTS tlp_stress WITH replication = {'class': 'SimpleStrategy', 'replication_factor':3 } Creating schema Executing 10000000 operations with consistency level LOCAL_ONE Connected Creating Tables CREATE TABLE IF NOT EXISTS keyvalue ( key text PRIMARY KEY, value text ) WITH caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND default_time_to_live = 0 14

15.APACHECON 2019 Preparing queries Initializing metrics Connecting Creating generator random Preparing statements. 1 threads prepared. Populate Progress 100% [==============================================] 50000/50000 (0:00:04 / 0:00:00) Pre-populate complete. 15

16.APACHECON 2019 Starting main runner Running [Thread 0]: Running the profile for 10000000 iterations... Writes Reads Errors Count Latency (p99) 1min (req/s) | Count Latency (p99) 1min (req/s) | Count 1min (errors/s) 32999 9.39 0 | 76637 12.04 0 | 0 0 74482 13.95 883.73 | 173507 11.14 2049.26 | 0 0 129948 15.76 883.73 | 302804 6.88 2049.26 | 0 0 173410 15.83 2245.13 | 403228 5.23 5227.09 | 0 0 230556 15.69 3390.66 | 537861 5.69 7909.98 | 0 0 281688 7.99 3390.66 | 657753 6.29 7909.98 | 0 0 334738 8.75 4546.51 | 781929 6.82 10622.34 | 0 0 366948 8.75 4546.51 | 857862 7.44 10622.34 | 0 0 420986 8.75 5222.95 | 982964 7.11 12205.07 | 0 0 473449 8.75 6244.76 | 1105734 5.3 14588.35 | 0 0 ... Stress complete, 1. 16

17.APACHECON 2019 TLP-STRESS: TIME SERIES WORKLOAD EXAMPLE • BasicTimeSeries workload • 50% reads • 1,000,000 partitions • Run for 4 days, 13 hours, 21 minutes • TimeWindowCompactionStrategy • 6 hour TTL 17

18.APACHECON 2019 TLP-STRESS: TIME SERIES WORKLOAD EXAMPLE $ tlp-stress run BasicTimeSeries \ --reads 0.5 \ --partitions 1M \ --duration 4d 13h 21m \ --compaction "{ 'class':'TimeWindowCompactionStrategy', 'compaction_window_unit':'HOURS', 'compaction_window_size':6 }" \ --ttl 21600 18

19.APACHECON 2019 Creating tlp_stress: CREATE KEYSPACE IF NOT EXISTS tlp_stress WITH replication = {'class': 'SimpleStrategy', 'replication_factor':3 } Creating schema Executing 0 operations with consistency level LOCAL_ONE Connected Creating Tables CREATE TABLE IF NOT EXISTS sensor_data ( sensor_id text, timestamp timeuuid, data text, primary key(sensor_id, timestamp)) WITH CLUSTERING ORDER BY (timestamp DESC) AND compaction = { 'class':'TimeWindowCompactionStrategy', 'compaction_window_unit':'HOURS', 'compaction_window_size':6 } AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND default_time_to_live = 21600 19

20.APACHECON 2019 Preparing queries Initializing metrics Connecting Creating generator random Preparing statements. 1 threads prepared. Starting main runner Running [Thread 0]: Running the profile for 6561min... Writes Reads Errors Count Latency (p99) 1min (req/s) | Count Latency (p99) 1min (req/s) | Count 1min (errors/s) 24384 10.25 0 | 24217 7.91 0 | 0 0 75336 12.89 11407 | 75039 10.43 11323 | 0 0 138179 13.87 11407 | 137140 16.65 11323 | 0 0 189391 3.78 12062.73 | 188213 8.74 11977.24 | 0 0 257294 8.95 12719.65 | 255883 8.37 12633.45 | 0 0 322926 4.88 12719.65 | 321679 2.5 12633.45 | 0 0 382906 28.59 13356.85 | 381638 2.66 13280.61 | 0 0 444251 12 13356.85 | 442828 2.67 13280.61 | 0 0 511648 4.88 13975.87 | 510521 6.79 13898.84 | 0 0 569022 14.78 14502.66 | 567793 5.74 14437.87 | 0 0 ... Stress complete, 1. 20

21.APACHECON 2019 TLP-STRESS: RANDOM PARTITION ACCESS EXAMPLE • RandomPartitionAccess workload • 98% reads • 1,000,000 partitions • 10,000,000 iterations, • 1,000,000 rows pre-populated • Lower index interval 21

22.APACHECON 2019 TLP-STRESS: INSPECT WORKLOAD $ tlp-stress info RandomPartitionAccess CREATE TABLE IF NOT EXISTS random_access ( partition_id text, row_id int, value text, primary key (partition_id, row_id) ) Default read rate: 0.01 (override with -r) Dynamic workload parameters (override with --workload.name=X) Name | Description | Type_________ rows | Number of rows per partition, defaults to 100 | kotlin.Int select | Select random row or the entire partition. | kotlin.String | Acceptable values: row, partition | 22

23.APACHECON 2019 TLP-STRESS: RANDOM PARTITION ACCESS EXAMPLE $ tlp-stress run RandomPartitionAccess \ --reads 0.98 \ --partitions 1M \ --iterations 10M \ --populate 1M \ --cql "ALTER TABLE tlp_stress.random_access WITH max_index_interval = 1024 AND min_index_interval = 64" \ --workload.rows="50" \ --workload.select="row" 23

24.APACHECON 2019 Creating tlp_stress: CREATE KEYSPACE IF NOT EXISTS tlp_stress WITH replication = {'class': 'SimpleStrategy', 'replication_factor':3 } Creating schema Executing 10000000 operations with consistency level LOCAL_ONE Connected Creating Tables CREATE TABLE IF NOT EXISTS random_access ( partition_id text, row_id int, value text, primary key (partition_id, row_id) ) WITH caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND default_time_to_live = 0 ALTER TABLE tlp_stress.random_access WITH max_index_interval = 1024 AND min_index_interval = 64 24

25.APACHECON 2019 Preparing queries Preparing single row reads Initializing metrics Connecting Creating generator random Preparing statements. Preparing single row reads 1 threads prepared. Using 50 rows per partition Populate Progress 100% [======================================================>] 1000000/1000000 (0:00:30 / 0:00:00) Pre-populate complete. 25

26.APACHECON 2019 TLP-STRESS: RUN OPTIONS - REPORTING • Reporting: • csv - compressed CSV • Prometheus • Exposes metrics 26

27.APACHECON 2019 TLP-STRESS: OTHER RUN OPTIONS - CLIENT • client: • cl - consistency level • concurrency - concurrent requests • coordinatoronly - connect to coordinator node • dc - datacenter 27

28.APACHECON 2019 TLP-STRESS: OTHER RUN OPTIONS - TABLE • table: • compression - set compression • keycache - set keycache • rowcache - set rowcache • replication - set data replication factor 28

29.APACHECON 2019 TLP-STRESS: OTHER RUN OPTIONS - WORKLOAD • workload: • partitiongenerator - partition generation method • random, sequential, gaussian • rate - rate limiter 29