Kubernetes實錄系列記錄文檔完整目錄參考:
Kubernetes實錄-目錄
相關記錄連結位址 :
- 第15篇:Kubernetes監控方案-使用prometheus實作全棧監控(1)-監控場景與架構
- 第16篇:Kubernetes監控方案-使用prometheus實作全棧監控(2)-Kubernetes內建部署各種metrics用戶端服務
- 第17篇:Kubernetes監控方案-使用prometheus實作全棧監控(3)-prometheus部署并配置擷取各種metrics
- 第18篇:Kubernetes監控方案-使用prometheus實作全棧監控(4)-監控可視化工具grafana配置
上一篇 記錄了我所在場景的監控需求以及方案架構,本篇記錄該方案下Kubernetes叢集内部prometheus子系統的各種用來收集監控資料的metrics用戶端服務的部署配置。
1. 通過cAdvisor擷取監控名額
cAdvisor是Kubernetes的生态中為容器監控資料采集的Agent,已經內建到Kubernetes中不需要單獨部署了。
Kubernetes 1.7.3之前,cAdvisor的metrics資料內建在kubelet的metrics中,通過節點開放的4194端口擷取資料
Kubernetes 1.7.3之後,cAdvisor的metrics被從kubelet的metrics獨立出來了,在prometheus采集的時候變成兩個scrape的job。網上很多文檔記錄都說在node節點會開放4194端口,可以通過該端口擷取cAdvisor的metrics資料,新版本kubelet中的cadvisor沒有對外開放4194端口,隻能通過apiserver提供的api做代理擷取監控名額metrics。
-
cAdvisor收集的監控名額類型
cAdvisor能夠擷取目前節點上運作的所有容器的資源使用情況。監控名額key的字首是container_*
container_cpu_* container_fs_* container_memeory_* container_network_* container_spec_* container_last_seen, container_scrape_error, container_start_seconds, container_tasks_state
- API
項目 API prometheus配置 備注 cAdvisor的metric /api/v1/nodes/{node}/proxy/metrics/cadvisor replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor - cAdvisor 擷取metrics接口測試
kubectl get --raw "/api/v1/nodes/ejucsnode-shqs-1/proxy/metrics/cadvisor" 或者如下方式: kubectl proxy --port=6080 curl http://localhost:6080/api/v1/nodes/ejucsnode-shqs-1/proxy/metrics/cadvisor # HELP cadvisor_version_info A metric with a constant '1' value labeled by kernel version, OS version, docker version, cadvisor version & cadvisor revision. # TYPE cadvisor_version_info gauge cadvisor_version_info{cadvisorRevision="",cadvisorVersion="",dockerVersion="18.06.1-ce",kernelVersion="3.10.0-862.el7.x86_64",osVersion="CentOS Linux 7 (Core)"} 1 # HELP container_cpu_cfs_periods_total Number of elapsed enforcement period intervals. # TYPE container_cpu_cfs_periods_total counter container_cpu_cfs_periods_total{container_name="",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice",image="",name="",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 849686 container_cpu_cfs_periods_total{container_name="prometheus",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice/docker-58c340d3f0bbff761882a8a53127dc4e95b962514106748baa4a7dc01129962b.scope",image="sha256:690f4cf8dee25c239aa517a16dd73392ecd81485b29cad881d901a99b5b1a303",name="k8s_prometheus_prometheus-576b4fb6bb-4947p_kube-system_a5666056-3349-11e9-a0fa-14187763656c_0",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 849710 # HELP container_cpu_cfs_throttled_periods_total Number of throttled period intervals. # TYPE container_cpu_cfs_throttled_periods_total counter container_cpu_cfs_throttled_periods_total{container_name="",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice",image="",name="",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 10576 container_cpu_cfs_throttled_periods_total{container_name="prometheus",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice/docker-58c340d3f0bbff761882a8a53127dc4e95b962514106748baa4a7dc01129962b.scope",image="sha256:690f4cf8dee25c239aa517a16dd73392ecd81485b29cad881d901a99b5b1a303",name="k8s_prometheus_prometheus-576b4fb6bb-4947p_kube-system_a5666056-3349-11e9-a0fa-14187763656c_0",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 10266 # HELP container_cpu_cfs_throttled_seconds_total Total time duration the container has been throttled. # TYPE container_cpu_cfs_throttled_seconds_total counter container_cpu_cfs_throttled_seconds_total{container_name="",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice",image="",name="",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 16523.995575912 container_cpu_cfs_throttled_seconds_total{container_name="prometheus",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice/docker-58c340d3f0bbff761882a8a53127dc4e95b962514106748baa4a7dc01129962b.scope",image="sha256:690f4cf8dee25c239aa517a16dd73392ecd81485b29cad881d901a99b5b1a303",name="k8s_prometheus_prometheus-576b4fb6bb-4947p_kube-system_a5666056-3349-11e9-a0fa-14187763656c_0",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 10673.627579073 ... ... ... ... #省略很多
2. 通過kubelet擷取監控名額
-
kubelet收集的監控名額類型
待補充
- API
項目 API prometheus配置 備注 kubelnet的metric /api/v1/nodes/{node}/proxy/metrics replacement: /api/v1/nodes/${1}/proxy/metrics - kubelet 擷取metrics接口測試
kubectl proxy --port=6080 curl http://localhost:6080/api/v1/nodes/ejucsnode-shqs-1/proxy/metrics # HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend. # TYPE apiserver_audit_event_total counter apiserver_audit_event_total 0 # HELP apiserver_audit_requests_rejected_total Counter of apiserver requests rejected due to an error in audit logging backend. # TYPE apiserver_audit_requests_rejected_total counter apiserver_audit_requests_rejected_total 0 ... ... ... ... #省略很多 # HELP apiserver_storage_data_key_generation_latencies_microseconds Latencies in microseconds of data encryption key(DEK) generation operations. # TYPE apiserver_storage_data_key_generation_latencies_microseconds histogram apiserver_storage_data_key_generation_latencies_microseconds_bucket{le="5"} 0 apiserver_storage_data_key_generation_latencies_microseconds_bucket{le="10"} 0 apiserver_storage_data_key_generation_latencies_microseconds_bucket{le="20"} 0 apiserver_storage_data_key_generation_latencies_microseconds_bucket{le="40"} 0 ... ... ... ... #省略很多
3. node_exporter
Prometheus提供的NodeExporter項目可以提取主機節點的關鍵度量名額,通過Kubernetes的DeamonSet模式可以在各主機節點上部署一個NodeExporter執行個體,實作對主機性能名額資料的監控。
- 定義檔案 prometheus-node-exporter-daemonset.yaml
cat prometheus-node-exporter-daemonset.yaml apiVersion: extensions/v1beta1 kind: DaemonSet metadata: name: prometheus-node-exporter namespace: kube-system labels: app: prometheus-node-exporter spec: template: metadata: name: prometheus-node-exporter labels: app: prometheus-node-exporter spec: containers: - image: prom/node-exporter:v0.17.0 imagePullPolicy: IfNotPresent name: prometheus-node-exporter ports: - name: prom-node-exp #^ must be an IANA_SVC_NAME (at most 15 characters, ..) containerPort: 9100 hostPort: 9100 tolerations: - key: "node-role.kubernetes.io/master" effect: "NoSchedule" hostNetwork: true hostPID: true hostIPC: true restartPolicy: Always --- apiVersion: v1 kind: Service metadata: annotations: prometheus.io/scrape: 'true' prometheus.io/app-metrics: 'true' prometheus.io/app-metrics-path: '/metrics' name: prometheus-node-exporter namespace: kube-system labels: app: prometheus-node-exporter spec: clusterIP: None ports: - name: prometheus-node-exporter port: 9100 protocol: TCP selector: app: prometheus-node-exporter type: ClusterIP
- 配置指令
kubectl apply -f prometheus-node-exporter-daemonset.yaml daemonset.extensions/prometheus-node-exporter created service/prometheus-node-exporter created kubectl get -f prometheus-node-exporter-daemonset.yaml NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.extensions/prometheus-node-exporter 6 6 6 6 6 <none> 5m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/prometheus-node-exporter ClusterIP None <none> 9100/TCP 5m
- 驗證
任意主機節點上: netstat -pltn |grep 9100 tcp6 0 0 :::9100 :::* LISTEN 104168/node_exporte curl {nodeIP}:9100/metrics # HELP go_gc_duration_seconds A summary of the GC invocation durations. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 0.000117217 go_gc_duration_seconds{quantile="0.25"} 0.000159431 go_gc_duration_seconds{quantile="0.5"} 0.000200323 .............#省略很多
4. kube-state-metrics
kube-state-metrics收集Kubernetes叢集内部資源對象的監控名額,資源對象包括daemonset,deployment,job,namespace,node,pvc,pod_container,pod,replicaset,service,statefulset。
- kube-state-metrics部署
-
部署檔案下載下傳
kube-state-metrics的部署定義檔案,github下載下傳位址:kube-state-metrics,目前最新版本1.5.0
mkdir kube-state-metrics cd kube-state-metrics wget https://github.com/kubernetes/kube-state-metrics/archive/v1.5.0.zip unzip v1.5.0.zip cd kube-state-metrics-1.5.0/kubernetes/ tree ├── kube-state-metrics-cluster-role-binding.yaml ├── kube-state-metrics-cluster-role.yaml ├── kube-state-metrics-deployment.yaml ├── kube-state-metrics-role-binding.yaml ├── kube-state-metrics-role.yaml ├── kube-state-metrics-service-account.yaml └── kube-state-metrics-service.yaml
-
部署檔案修改
kube-state-metrics預設部署在kube-system命名空間下,如果需要部署在其他namespace下,可以進行定制修改。
kube-state-metrics-deployment.yaml檔案中引用的2個docker image位址需要梯子,可以改成國内可以通路的鏡像源
image: quay.io/coreos/kube-state-metrics:v1.5.0 修改為: mirrorgooglecontainers/kube-state-metrics:v1.5.0 k8s.gcr.io/addon-resizer:1.8.3 修改為(最新版本): mirrorgooglecontainers/addon-resizer:1.8.4
- 部署
kubectl apply -f ./ clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics unchanged clusterrole.rbac.authorization.k8s.io/kube-state-metrics unchanged deployment.apps/kube-state-metrics created rolebinding.rbac.authorization.k8s.io/kube-state-metrics unchanged role.rbac.authorization.k8s.io/kube-state-metrics-resizer unchanged serviceaccount/kube-state-metrics unchanged service/kube-state-metrics unchanged kubectl get -f ./ NAME AGE clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics 70s NAME AGE clusterrole.rbac.authorization.k8s.io/kube-state-metrics 70s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/kube-state-metrics 1/1 1 1 50s NAME AGE rolebinding.rbac.authorization.k8s.io/kube-state-metrics 70s NAME AGE role.rbac.authorization.k8s.io/kube-state-metrics-resizer 70s NAME SECRETS AGE serviceaccount/kube-state-metrics 1 70s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kube-state-metrics ClusterIP 10.106.107.13 <none> 8080/TCP,8081/TCP 70s
-
- 通過kube-state-metrics擷取metrics接口測試
kubectl get svc kube-state-metrics -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-state-metrics ClusterIP 10.106.107.13 <none> 8080/TCP,8081/TCP 5m33s curl 10.106.107.13:8080/metrics # HELP kube_configmap_info Information about configmap. # TYPE kube_configmap_info gauge kube_configmap_info{namespace="kube-system",configmap="kubernetes-dashboard-settings"} 1 kube_configmap_info{namespace="kube-system",configmap="prometheus-config"} 1 kube_configmap_info{namespace="kube-system",configmap="kubeadm-config"} 1 kube_configmap_info{namespace="kube-public",configmap="cluster-info"} 1 kube_configmap_info{namespace="kube-system",configmap="calico-config"} 1 kube_configmap_info{namespace="kube-system",configmap="coredns"} 1 kube_configmap_info{namespace="kube-system",configmap="extension-apiserver-authentication"} 1 kube_configmap_info{namespace="kube-system",configmap="kube-proxy"} 1 kube_configmap_info{namespace="kube-system",configmap="kubelet-config-1.13"} 1 ... ... #省略很多 curl -I 10.106.107.13:8081/healthz HTTP/1.1 200 OK Date: Wed, 20 Feb 2019 02:20:10 GMT Content-Length: 264 Content-Type: text/html; charset=utf-8
5. blackbox-exporter
blackbox-exporter是一個黑盒探測工具,可以對服務的http、tcp、icmp等進行網絡探測。github位址 blackbox-exporter,目前最新版本:v0.13.0
- 部署定義檔案
cat blackbox-exporter-configmap.yaml apiVersion: v1 kind: ConfigMap metadata: labels: app: blackbox-exporter name: blackbox-exporter namespace: kube-system data: blackbox.yml: |- modules: http_2xx: prober: http timeout: 10s http: valid_http_versions: ["HTTP/1.1", "HTTP/2"] valid_status_codes: [] method: GET preferred_ip_protocol: "ip4" http_post_2xx: prober: http timeout: 10s http: valid_http_versions: ["HTTP/1.1", "HTTP/2"] method: POST preferred_ip_protocol: "ip4" tcp_connect: prober: tcp timeout: 10s icmp: prober: icmp timeout: 10s icmp: preferred_ip_protocol: "ip4"
cat blackbox-exporter-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: blackbox-exporter namespace: kube-system spec: selector: matchLabels: app: blackbox-exporter replicas: 1 template: metadata: labels: app: blackbox-exporter spec: restartPolicy: Always containers: - name: blackbox-exporter image: prom/blackbox-exporter:v0.13.0 imagePullPolicy: IfNotPresent ports: - name: blackbox-port containerPort: 9115 readinessProbe: tcpSocket: port: 9115 initialDelaySeconds: 5 timeoutSeconds: 5 resources: requests: memory: 50Mi cpu: 100m limits: memory: 60Mi cpu: 200m volumeMounts: - name: config mountPath: /etc/blackbox_exporter args: - --config.file=/etc/blackbox_exporter/blackbox.yml - --log.level=debug - --web.listen-address=:9115 volumes: - name: config configMap: name: blackbox-exporter nodeSelector: node-role.kubernetes.io/master: "" tolerations: - key: "node-role.kubernetes.io/master" operator: "Equal" value: "" effect: "NoSchedule" --- apiVersion: v1 kind: Service metadata: labels: app: blackbox-exporter name: blackbox-exporter namespace: kube-system annotations: prometheus.io/scrape: 'true' spec: type: ClusterIP selector: app: blackbox-exporter ports: - name: blackbox port: 9115 targetPort: 9115 protocol: TCP
- 部署
kubectl apply -f blackbox-exporter-configmap.yaml configmap/blackbox-exporter created kubectl apply -f blackbox-exporter-deployment.yaml deployment.apps/blackbox-exporter created service/blackbox-exporter created kubectl get -f blackbox-exporter-configmap.yaml NAME DATA AGE blackbox-exporter 1 3m12s kubectl get -f blackbox-exporter-deployment.yaml NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/blackbox-exporter 1/1 1 1 69s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/blackbox-exporter ClusterIP 10.109.48.146 <none> 9115/TCP 69s
- 測試驗證
# 驗證(1),驗證服務示範正常 curl 10.109.48.146:9115 <html> <head><title>Blackbox Exporter</title></head> <body> <h1>Blackbox Exporter</h1> <p><a href="/probe?target=prometheus.io&module=http_2xx">Probe prometheus.io for http_2xx</a></p> <p><a href="/probe?target=prometheus.io&module=http_2xx&debug=true">Debug probe prometheus.io for http_2xx</a></p> <p><a href="/metrics">Metrics</a></p> <p><a href="/config">Configuration</a></p> <h2>Recent Probes</h2> <table border='1'><tr><th>Module</th><th>Target</th><th>Result</th><th>Debug</th></table></body> </html> # 驗證(2)驗證tcp探測,以grafana舉例 kubectl get svc -n kube-system -l app=blackbox-exporter NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE blackbox-exporter ClusterIP 10.109.48.146 <none> 9115/TCP 27h kubectl describe svc monitoring-grafana -n kube-system Name: monitoring-grafana Namespace: kube-system Labels: <none> Annotations: kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"prometheus.io/scrape":"true","prometheus.io/tcp-probe":"true","prometheus.... prometheus.io/scrape: true prometheus.io/tcp-probe: true prometheus.io/tcp-probe-port: 80 Selector: k8s-app=grafana Type: ClusterIP IP: 10.99.65.209 Port: grafana 80/TCP TargetPort: 3000/TCP Endpoints: 192.168.1.6:3000 Session Affinity: None Events: <none> curl '10.109.48.146:9115/probe?module=tcp_connect&target=monitoring-grafana.kube-system:80' # HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds # TYPE probe_dns_lookup_time_seconds gauge probe_dns_lookup_time_seconds 0.002059111 # HELP probe_duration_seconds Returns how long the probe took to complete in seconds # TYPE probe_duration_seconds gauge probe_duration_seconds 0.002815779 # HELP probe_failed_due_to_regex Indicates if probe failed due to regex # TYPE probe_failed_due_to_regex gauge probe_failed_due_to_regex 0 # HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6 # TYPE probe_ip_protocol gauge probe_ip_protocol 4 # HELP probe_success Displays whether or not the probe was a success # TYPE probe_success gauge probe_success 1
到這裡,監控Kubernetes叢集的相關exporter已經配置完成,下一步就是部署prometheus收集這些exporter的監控名額。