20_01 Apache Cassandra Sidecar



1.Apache C* Sidecar let’s make C* attractive and easy to operate Vinay Chella, Dinesh Joshi

2.Agenda ● Operating C* ● Operating C* with a sidecar ● State of community sidecars ● Lessons learned from operating with sidecars ● Goals of C* management sidecar

3.Operating C* ● Bootstrap and data movement ● Configuration (files, jmx) ● Maintenance ● Monitoring/Metrics ● Backup/Restore ● Repair

4.Operating C*: Bootstrapping Create a New Cluster Add/Remove/Replace ● Seeds ● Serial or parallel? ● Token assignment ● Streaming?

5.Operating C*: Configuration ● Probably Have to Tune a. cassandra.yaml b. topology props c. JVM options ● May Have to Tune a. Logging b. Incremental Backup c. More JVM options

6. Operating C*: Lifecycle Rolling Restarts (Upgrades) ● Semi-complex single node procedure ● One at a time is too slow ● Token range aware restarts? What happens when Cassandra dies? Ring source @ https://v2.overleaf.com/read/zchtrzskkyjb

7.Operating C*: Maintenance ● All the Power of JMX ● … So many possibilities a. Many work with jmxterm/jmxsh b. Many only work with Java code ● What if you want to do it on all nodes?

8.Operating C*: Monitoring ● Many Metrics (good!) ● How to Collect Them? ○ JMX … no ○ Agent! ● Which agent ...

9.Operating C*: Ring Health ● Cassandra ring health depends on replication ● Strategies ○ Monitor replication of keyspaces ○ Topology Aware ○ Maintenance Aware

10. Operating C*: Backup/Restore The Cloud ● What even do I need to backup!? ● Restore is legitimately tricky, do you practice?

11. Operating C*: Repair Datacenter 1 Datacenter 2 “Eventually” Consistent N1 N2 N3 N4 N5 N6 1. Partial Write 0 1 0 0 0 0 2. Read Repair 0 1 1 0 0 0 3. Hints play 0 1 1 0 1 0 … Nope not enough 0 1 1 0 1 0 4. Repair 1 1 1 1 1 1

12. Sidecar: Bootstrapping Automatic Seed Management using ASGs/db Automatic Instance Replacement Equation+Graph from “Cassandra Availability with Virtual Nodes” by Joey Lynch and Josh Snyder

13.Operating C* In General

14.Operating C* In General

15.What is needed to Operate C*? Separate solutions for ... ● Bootstrap and data movement ● Maintenance ● Configuration (files, jmx) ● Monitoring/Metrics ● Backup/Restore ● Repair

16.We need better tools!

17.Community needs

18.Current state of the art?



21.Operating C* with Sidecar(s) Sidecar

22.What’s a Sidecar? Sidecars Live Outside Main Daemon Scope sidecar ● Often built for a specific purpose Cassandra metrics-agent ● Typically a different OS ... process

23.Sidecar: Configuration ● Hierarchy: Environment -> Cluster -> Node ● Flat namespace that is merged to provide Priam config

24.Sidecar: Configuration ● Hierarchy prod ● Flat namespace that cass_nflx is merged to provide i-08da5d... Priam config ● Functions for defaults (e.g. based on cpu)

25.Sidecar: Lifecycle Execute Stop Fail Drain with Script Healthcheck timeout (systemd)

26. Sidecar: Lifecycle Execute Start Ensure Pass Script Health Healthcheck (systemd) Rolling Restarts (Upgrades) ● Cluster automation is now much easier What happens when Cassandra dies? ● Continuous health monitoring and supervision (OOM) ● Priam + systemd + jvmkill1 == pretty good 1 https://github.com/airlift/jvmkill

27.Sidecar: Maintenance ● JMX methods on cron ● Can add arbitrary tasks like compactions, flushes, etc

28.Sidecar: Maintenance ● Sidecar provides JMX over HTTP ○ Cleanup ○ Invoke complex JMX methods using curl ○ Many of these are better done scheduled (e.g. repair, compaction, flushes, etc)

29.Sidecar: Monitoring

由Apache Cassandra PMC & Committers发起。致力于发布与传播Apache Cassandra技术,生态,最佳实践,前沿信息。