1.Apache C* Sidecar let’s make C* attractive and easy to operate Vinay Chella, Dinesh Joshi
2.Agenda ● Operating C* ● Operating C* with a sidecar ● State of community sidecars ● Lessons learned from operating with sidecars ● Goals of C* management sidecar
3.Operating C* ● Bootstrap and data movement ● Configuration (files, jmx) ● Maintenance ● Monitoring/Metrics ● Backup/Restore ● Repair
4.Operating C*: Bootstrapping Create a New Cluster Add/Remove/Replace ● Seeds ● Serial or parallel? ● Token assignment ● Streaming?
5.Operating C*: Configuration ● Probably Have to Tune a. cassandra.yaml b. topology props c. JVM options ● May Have to Tune a. Logging b. Incremental Backup c. More JVM options
6. Operating C*: Lifecycle Rolling Restarts (Upgrades) ● Semi-complex single node procedure ● One at a time is too slow ● Token range aware restarts? What happens when Cassandra dies? Ring source @ https://v2.overleaf.com/read/zchtrzskkyjb
7.Operating C*: Maintenance ● All the Power of JMX ● … So many possibilities a. Many work with jmxterm/jmxsh b. Many only work with Java code ● What if you want to do it on all nodes?
8.Operating C*: Monitoring ● Many Metrics (good!) ● How to Collect Them? ○ JMX … no ○ Agent! ● Which agent ...
9.Operating C*: Ring Health ● Cassandra ring health depends on replication ● Strategies ○ Monitor replication of keyspaces ○ Topology Aware ○ Maintenance Aware
10. Operating C*: Backup/Restore The Cloud ● What even do I need to backup!? ● Restore is legitimately tricky, do you practice?
11. Operating C*: Repair Datacenter 1 Datacenter 2 “Eventually” Consistent N1 N2 N3 N4 N5 N6 1. Partial Write 0 1 0 0 0 0 2. Read Repair 0 1 1 0 0 0 3. Hints play 0 1 1 0 1 0 … Nope not enough 0 1 1 0 1 0 4. Repair 1 1 1 1 1 1
12. Sidecar: Bootstrapping Automatic Seed Management using ASGs/db Automatic Instance Replacement Equation+Graph from “Cassandra Availability with Virtual Nodes” by Joey Lynch and Josh Snyder
13.Operating C* In General
14.Operating C* In General
15.What is needed to Operate C*? Separate solutions for ... ● Bootstrap and data movement ● Maintenance ● Configuration (files, jmx) ● Monitoring/Metrics ● Backup/Restore ● Repair
16.We need better tools!
18.Current state of the art?
21.Operating C* with Sidecar(s) Sidecar
22.What’s a Sidecar? Sidecars Live Outside Main Daemon Scope sidecar ● Often built for a specific purpose Cassandra metrics-agent ● Typically a different OS ... process
23.Sidecar: Configuration ● Hierarchy: Environment -> Cluster -> Node ● Flat namespace that is merged to provide Priam config
24.Sidecar: Configuration ● Hierarchy prod ● Flat namespace that cass_nflx is merged to provide i-08da5d... Priam config ● Functions for defaults (e.g. based on cpu)
25.Sidecar: Lifecycle Execute Stop Fail Drain with Script Healthcheck timeout (systemd)
26. Sidecar: Lifecycle Execute Start Ensure Pass Script Health Healthcheck (systemd) Rolling Restarts (Upgrades) ● Cluster automation is now much easier What happens when Cassandra dies? ● Continuous health monitoring and supervision (OOM) ● Priam + systemd + jvmkill1 == pretty good 1 https://github.com/airlift/jvmkill
27.Sidecar: Maintenance ● JMX methods on cron ● Can add arbitrary tasks like compactions, flushes, etc
28.Sidecar: Maintenance ● Sidecar provides JMX over HTTP ○ Cleanup ○ Invoke complex JMX methods using curl ○ Many of these are better done scheduled (e.g. repair, compaction, flushes, etc)