加拿大最大的癌症研究中心Ontario Institude for Cancer Research(OICR)有超过1700名研究员,临床科学家和其他雇员。由加拿大政府资助,构建3000个计算核心及15PB的数据存储云计算服务,提供给OICR用于生物医疗研究。完全用普通的硬件资源和100%的开源软件以节约成本,怎么用OpenStack和Docker化服务来搭建这套复杂云计算环境?如何监控其性能和计算能力指标?而这一切只有2名云计算基础服务工程师来完成。

1.In-depth monitoring for Openstack services George Mihaiescu, Senior Cloud Architect Jared Baker, Cloud Specialist

2.2 The infrastructure team George Mihaiescu Cloud architect for the Cancer Genome Collaboratory 7 years of Openstack experience First deployment - Cactus First conference - Boston 2011 Openstack speaker at Barcelona, Boston and Vancouver conferences Jared Baker Cloud specialist for the Cancer Genome Collaboratory 2 years of Openstack experience 10 years MSP experience First deployment - Liberty First conference (and speaker - Boston 2017

3.3 Ontario Institute for Cancer Research (OICR) Largest cancer research institute in Canada, funded by the government of Ontario Together with its collaborators and partners supports more than 1,700 researchers, clinician scientists research staff and trainees

4.4 4 Cancer Genome Collaboratory Project goals and motivation Cloud computing environment built for biomedical research by OICR, and funded by government of Canada grants Enables large scale cancer research on the world’s largest cancer genome dataset currently produced by the International Cancer Genome Consortium (ICGC) Entirely built using open-source software like Openstack and Ceph Compute infrastructure goal to provide 3,000 cores and 15 PB storage A system for cost-recovery

5.No frills design 5 Use high density commodity hardware to reduce physical footprint & related overhead Use open source software and tools Prefer copper over fiber for network connectivity Spend 100% of the hardware budget on the infrastructure that supports cancer research, not on licenses or “nice to have” features

6.Hardware architecture Compute nodes 6

7.7 Hardware architecture Ceph storage nodes

8.Openstack controllers Three controllers in HA configuration (2 x 6 cores CPU, 128 GB RAM, 6 x 200 GB Intel S3700 SSD drives) Separate partitions for OS, Ceph Mon and MySQL Haproxy (SSL termination with ECC certs) and Keepalived 4 x 10 GbE bonded interfaces, 802.3ad, layer 3+4 hash Neutron + GRE, HA routers, no DVR 8

9.Networking Ruckus ICX 7750-48C top-of-rack switches configured in a stack ring topology 6 x 40Gb Twinax cables between the racks, providing 240 Gbps non-blocking redundant connectivity (2:1 oversubscription ratio) 9

10.Capacity vs. extreme performance 10

11.Upgrades 11

12.Software – entirely open source 12

13.Rally – end-to-end tests Rally test that runs every hour and does end-to-end chec k Starts a VM Assigns floating IP Connects over SSH Pings an external host five times Alert if the check fails, takes too long to complete or packet loss is greater than 40% It sends runtime info to Graphite for long term graphing. Grafana a lerts us if average runtime is above a threshold. 13

14.Rally – RBD volume performance test Another Rally check monitors RBD volume (Ceph based) write performance over time: it boots an instance from a volume it a ssigns a floating I P it c onnects over SSH it runs a script that wr ites a 10 GB file three times it captures the average IO throughput at the end i t sends throughput info to Graphite for long term graphing it a lerts if the average runtime is above the threshold 14

15.Rally– custom checks 15

16.Rally smoke tests & load tests 16

17.Zabbix and Grafana 17

18.Zabbix and Grafana 18

19.Dockerized monitoring stack We run a number of tools in containers: Sflowtrend Prometheus Graphite Collectd Grafana Ceph_exporter Elasticsearch Logstash Kibana 19

20.Ceph Monitoring IOPS 20

21.Ceph Monitoring Performance & Integrity 21

22.Openstack c apacity usage 22

23.Sflowtrend 23

24.Zabbix 200+ hosts 38,000+ items 15,000+ triggers Performant Reliable Customizable 24

25.Zabbix The Zabbix Agent (client) CPU Disk I/O Memory Filesystem Security Services running HW Raid card Fans, temperature, power supply status PDU power usage 25

26.Zabbix C ustom checks When security updates are available When new cloud images are released Number of IPs banned by fail2ban Iptables rules across all controllers are in sync Openvswitch ports tagged with VLAN 4095 (bad) Number of Cinder volumes != Number of RBD volumes Agg memory us e per process type (e.g. Nova-api, Radosgw, etc) Compute nodes have the “neutron-openvswi-sg-chain” 26 openstack volume list --all -f value -c ID >> /tmp/rbdcindervolcompare rbd -p volumes ls | sed "s/volume-//" >> /tmp/rbdcindervolcompare sort /tmp/rbdcindervolcompare | uniq -u

27.Zabbix Openstack APIs Multiple checks per API: Is the process running ? Is the port listening? Internal checks (from each controller) External checks (from monitoring server) Memory usage aggregated per process type Response time, number and type of API calls 27

28.Zabbix Open S tack services memory usage 28

29.Zabbix Neutron router traffic 29