15/10 - Cisco - Cassandra adoption on Cisco UCS & OpenStack

About Cisco • Why Cassandra at Cisco • Use Cases • Architecture & Implementation • Key Callout • Q & A

1.TOMORROW starts here 1

2. Cassandra Adoption on Cisco UCS & OpenStack Nayeem Khaja, Program Manager, Cisco IT Bidhu Das, Database Domain Architect, Cisco IT

3.Agenda •  About Cisco •  Why Cassandra at Cisco •  Use Cases •  Architecture & Implementation •  Key Callout •  Q&A 3

4.About Cisco locations in 32 data centers and server rooms countries of data center space offices of UPS power to raised floors employees servers virtualized in new DCs, 2000+ Applications overall 1500+ Databases (Prod & Non-Prod) Virtualization goal = HANA, Legacy EDW, Hadoop Supporting Mission Critical Environments 4

5.Cisco Products 5 http://www.cisco.com/c/en/us/products/index.html

6.Cisco UCS Models 6

7.Why Cassandra at Cisco •  Active ~ Active i.e. distributed Architecture •  Linear scalability •  High availability with zero downtime •  Better throughput with Multi-DC Architecture •  Align with Industry Cloud Native Application strategy 7

8.Use Cases Mobility! Personalization and v  Cisco Network Recommendation! Management" v  Cisco Video" IoT ! Cassandra v  v  Cisco Supply Chain" Cisco Project Polaris" Security & Fraud 
 v  Cisco Webex" Detection! v  Cisco Voicemail Player" v  Cisco Security" v  Cisco Finance Cloud and Operations! Fraud Analytics" v  Cisco Commerce" v  Cisco Collaboration" v  Quote & Validate system" v  AutoTest scanning for Cisco Services Group" 8

9.Cisco Commerce Renewals Cloud Cisco UCE Browser Clients App Partner app IOS Android Notifications Pricing Search Upload Quoting Applications Validation Platform, build, Test Automation Data Loader Conversion Ordering Web (drools) (Puppet, Nagios, Jenkins Nginx Tomcat Elastic Search Rabbit MQ (Web Server) (Java appServer) (Search Engine, Log Mining) (Messaging) Memcached Platform HAProxy Logstash Kibana Quartz (In Memory (Load Balancer) (Log Forwarder) (Log Visualizer) (Scheduler) Cache) Cassandra (Database) IaaS Compute Networking Storage 9

10.Commerce Analytics & Reporting Transactional (Oracle) 10

11.eStore : Database Provisioning Tool Maria DB DB Category RDBMS NoSQL BigData - Hadoop RDBMS Capability NoSQL Capability BigData Capability Open Source Huge Dataset with structures as well Open Source Very high Data volume as unstructured High Transactional Batch Oriented Distributed across Geo-location Immediate Consistency Multi-Master replication Active ~ Active Multi-Master replication Large Database Support Master – Slave Architecture In-Memory Capability Highly Scalable Better Security Schema less Architecture Columnar Search Cost & Support Key-Value pair – In-Memory, small read/ small write, Vendor Supported – P1 Apps large objects Document Oriented Community Support < P1 Apps Graph Database – for complex, highly connected hierarchical data 11

12.Architecture & Implementation 12

13.What is OpenStack ? OpenStack is an open source platform consisting of set of software tools , used for building and managing the cloud computing platform for both public & private clouds. 13

14.OpenStack Components KVM libvirt ovs Haproxy (Load balancer) Swift Glance Cinder Nova Neutron Heat Cellometer (Object Storage) (Imaging) (block storage) (Compute) (Networking) (Orchrastration) (Telemetry) Keystone (Identity) 14

15.OpenStack on Cisco UCS Compute Nodes Storage Nodes APP APP APP APP APP APP APP … APP …. OS OS OS OS OS OS OS OS Ceph Cluster Red Hat OpenStack Infrastructure Red Hat OpenStack Infrastructure RHEL RHEL …… RHEL RHEL RHEL ….. FEX -A FEX -B FI -A FI - B 15

16.Cassandra on OpenStack Architecture Cassandra Cluster Config : •  8 CPU & 64 GB Memory per Node Client Driver Co-ordinator Node •  Ceph storage •  OS : RHEL 6.4 •  Apache Cassandra 2.1 Node 1 Node 9 Node 2 Openstack R Mangement A Nodes Node 8 C Users c Nova K Node 3 Compute Nodes 3 UCS B Nexus Series Automati 5000 Node 7 on Packs Compute Cluster Node 4 RADOS R A Prime C GW (Swift) K Service Process Node 2 Catalog Orchestrator Node 5 OSD / 6 UCS C MON Series Network Storage Cluster Fabric OpenStack Infrastructure Replication Factor = 3 Consistency Level = Local Quorum 16 Replication Strategy = NetworkTopologyStrategy

17.Cassandra & Spark on Physical host & SSD 1 1 6 2 (ETL /Spark (Transactional 2 Physical/SSD) 3 Physical/SSD) 5 3 4 •  C220 M4 Servers •  256 GB Memory each •  8 SSD Drives 960GB each •  RHEL 6.5 OS 64bit •  JBOD Configuration •  Datastax 4.7 17

18.Ephemeral storage is a storage solution where the storage is directly attached to compute (nova) layer i.e. local to the hypervisor. The data is not persistent on the disk which means the data is lost in case of terminating the VM, but not in case of VM shutdown. Hypervisor with local Hypervisor with Ceph Storage (dedicated Storage (shared) per hypervisor) Hypervisor 1 Hypervisor 2 Hypervisor 1 Hypervisor 2 Hypervisor 3 Hypervisor 4 Hypervisor 3 Hypervisor 4 Ceph 18

19.Ephemeral solution for Cassandra Dedicated Cluster at DB level for each applications, but Multi- Tenant at OpenStack level with common storage pool per Cass Cass Cass Cass Cass Cass Cass Cass Cass Hypervisor. …. andra andra andra andra andra andra andra andra andra OS OS OS OS OS OS OS OS OS 1 Nova Nova Redhat Nova Redhat … Redhat 6 2 Openstack Openstack Openstack Cluster 1 C-Series C-Series C-Series Host Host … Host 5 3 4 Storage (Locally Storage (Locally attached or External … Storage (Locally attached or External attached or External (connected thru FC)) (connected thru FC)) 1 (connected thru FC)) 6 2 Cluster 2 This configuration provides consistent/ better I/O throughput as compared to OpenStack w/Ceph (shared) storage. 5 3 4 19

20.Platform Migration & Upgrade with zero downtime 1 9 2 1 2 3 8 3 9 DC1- PROD DC2- PROD 4 7 Openstack 4 8 Physical/SSD 5 6 5 7 6 Shared Ceph Storage Backup Frequency Retention Comments 82days weeks Type retention Cluster VTL retention 230we ays deks rete retent ionn ntio Metadata Daily 2 weeks 30 days Snap shot Full Daily 2 weeks 30 days VTL/Data. Snapshot t Domain . .. Incremental 6 hrs 1 week 30 days Only in case of high Snapsho . . critical applications as an exception. 20

21.Monitoring Cassandra : •  Compaction status •  Nodetool tpstats for pending request or dropped mutations •  Nodetool cfstats/ cfhistograms for latency distribution •  Recent restarts, Dead Node •  Node Health check wrt Gossip, Thrift & Native transport •  Event driven / metrics driven Alerts System : •  CPU , Memory, IO status •  Load average 21

22.Centralized Operations Dashboard CCW Cassandra CCW Cassandra P3 123.45.678.001 CCW Cassandra P3 123.45.678.002 CCW Cassandra P3 123.45.678.003 CCW Cassandra 123.45.678.004 P3 CCW Cassandra 123.45.678.005 P3 CCW Cassandra 123.45.678.006 P3 CCW Cassandra 123.45.678.007 P3 cass-prd-08:8900 CCW Cassandra cass-prd-08.cisco.com 123.45.678.008 8 P3 cass-prd-09:8900 CCW Cassandra 123.45.678.009 9 cass-prd-09.cisco.com P3 P2 123.45.678.001 1 P2 123.45.678.002 2 P2 123.45.678.003 3 P2 123.45.678.004 4 P2 123.45.678.005 5 P2 123.45.678.006 6 P2 123.45.678.007 7 P2 123.45.678.008 8 P2 123.45.678.009 9 22

23.Integrated Real time Database Dashboard xyz-prd-03.cisco.com xyz-prd-03 XYZPRD.CISCO.COM_XYZPRD1 xyz-prd-03.cisco.com xyz-prd-03 LISTENER_XYZPRD1_xyz-prd-03 abc-prd-01.cisco.com abc-prd-01 ABCPRD cass-prd-07 123.45.678.007 cass-prd-07.cisco.com cass-prd-07 23

24.Monitoring using DataStax Opscenter 24

25.Lesson Learn & Key Call out •  Do not use Nodetool command with more concurrency. •  Enable Incremental backup, only if required. •  Snapshot retention policy. •  Use Nodetool repair thru OpsCenter •  Compaction Strategy (Size tiered ~ Level Tiered) •  Disable replication at storage layer 25


27.Thank you 27