Monitoring Kubernetes with Prometheus

普罗米修斯已经成为监控云本机基础设施(如Kubernetes)的Go-to系统,具有许多集成点和选项。
在这篇演讲中,我们将看到所有与设置普罗米修斯监控你的库伯奈特集群相关的部分——主要是从用户的角度看,但也要从引擎盖下看事情是如何工作的。
在 Prometheus 方面,我们将介绍服务发现和重新绑定的不同使用方式,以及它们如何使 Prometheus 如此灵活。
在kubernetes方面,我们将看到可用度量导出器(kube state metrics、cadwisor、node exporter和friends)的字母组合。
最后,我们将后退一步,考虑普罗米修斯的独特设计和功能集如何适应更广泛的监控系统环境,以及它如何特别适合像Kubernetes这样的云原生环境。

展开查看详情

1.Monitoring Kubernetes with Prometheus Henri Dubois-Ferriere @henridf Percona Live, 2018-11-06

2.Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years, from network to web apps via many startups. PhD in CS from EPFL Repatriate from San Francisco to Switzerland

3. Outline ● Kubernetes ● Prometheus ● Kubernetes metrics & sources ● Deployment

4. Monitor why? ● Know about outages before users tell me ● Understand my production environment (or try…) ● Plan/trend/forecast

5.Kubernetes

6.Kubernetes - Container orchestration system - aka “OS for your cluster” - Abstracts away the underlying infra - declarative APIs with control loops

7.https://commons.wikimedia.org/wiki/File:Kubernetes.png

8.Prometheus

9.Prometheus ❏ Started at SoundCloud in 2012 ❏ Motivated by challenges with monitoring dynamic environments ❏ Made public 2015, now second CNCF “graduate”

10. More than a TSDB https://prometheus.io/assets/architecture.png

11.It’s all about the pull - Prom scrapes targets to get metrics - Nice side effect: know when target down - Needs to know what to scrape

12.What should Prometheus scrape? - Service discovery provides answer - Azure, Consul, GCE, K8S, EC2, ... - Can also watch a file containing target list

13.Dimensional data model Query: http_requests_total{code=”200”, method=”get”} Metric name Selector (aka filter)

14.Dimensional data model Query: http_requests_total{code=”200”, method=”get”} Response: http_requests_total{code="200", method=”get”, route="/api/users"} 1528706829.115 1741 http_requests_total{code="200", method=”get”, route="/api/objects"} 1528706829.115 1920 Label/value pairs (aka dimensions)

15.Dimensional data model Query: http_requests_total{code=”200”, method=”get”} Response: http_requests_total{code="200", method=”get”, route="/api/users"} 1528706829.115 1741 http_requests_total{code="200", method=”get”, route="/api/objects"} 1528706829.115 1920 Timestamp value

16.Metadata discovery - SD also provides metadata - Metadata can be mixed in with metrics - Powerful relabelling feature for label manipulation at ingest

17.Instrumentation

18.Off-the-shelf or write your own

19.Kubernetes metrics

20.Monitoring resources and methods - For resources like memory, queues, CPUs, disks… - USE Method: Utilization, Saturation, Errors - http://www.brendangregg.com/usemethod.html - For services - “RED” Method: Request rate, Error rate, Duration - https://www.weave.works/blog/the-red-method-key-metrics-for-micr oservices-architecture/

21. node_exporter: node metrics - Host metrics - CPU - Memory - Disk - Network - ... - Not K8S specific, but useful as referential and for totals

22. cAdvisor: container metrics - Runs in kubelet (usually, for now..) - Resource stats about running containers - Mostly container and node-level labels… - (k8s: plus namespace and pod_name)

23.Sample cAdvisor metric queries Percent of total cluster memory used: sum(container_memory_rss) / sum(machine_memory_bytes) Memory used by kubernetes namespace: sum(container_memory_rss) by (namespace) Top 5 pods by network I/O: topk(5, sum by (pod_name) (rate(container_network_transmit_bytes_total[5m])))

24. Kube-state metrics $ kubectl get deploy my-app -o yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: my-app ... spec: replicas: 4 ... status: replicas: 4 ...

25. Kube-state metrics $ kubectl get deploy my-app -o yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: my-app ... spec: replicas: 4 kube_deployment_spec_replicas{deployment="my-app", ...} ... Metrics created by kube-state-metrics status: With label set from this deployment replicas: 4 kube_deployment_status_replicas{deployment="my-app", ...} ...

26.Sample kube-state-metrics queries Deployments with issues kube_deployment_spec_replicas != kube_deployment_status_replicas_available Top 10 longest-running pods (“reverse uptime”) topk(10, sort_desc(time() - kube_pod_created))

27. Kube core service metrics - API Server - etcd3 - kube-dns - scheduler, controller-manager

28. Metrics recap Deployment mode How many Metrics about node_exporter daemonset 1 per node node resources cAdvisor inside kubelet 1 per node container resources kube-state-metrics deployment singleton k8s object state etcd, Api Server, core service singleton or HA group Itself controller manager, ...

29.Deploying