使用 Prometheus 監控 Kubernetes 應用

我們和大家介紹了

Prometheus

的資料名額是通過一個公開的 HTTP(S) 資料接口擷取到的，我們不需要單獨安裝監控的 agent，隻需要暴露一個 metrics 接口，Prometheus 就會定期去拉取資料；對于一些普通的 HTTP 服務，我們完全可以直接重用這個服務，添加一個

/metrics

接口暴露給 Prometheus；而且擷取到的名額資料格式是非常易懂的，不需要太高的學習成本。

現在很多服務從一開始就内置了一個

/metrics

接口，比如 Kubernetes 的各個元件、istio 服務網格都直接提供了資料名額接口。有一些服務即使沒有原生內建該接口，也完全可以使用一些 exporter 來擷取到名額資料，比如 mysqld_exporter、node_exporter，這些 exporter 就有點類似于傳統監控服務中的 agent，作為一直服務存在，用來收集目标服務的名額資料然後直接暴露給 Prometheus。

普通應用監控

前面我們已經和大家學習了 ingress 的使用，我們采用的是

Traefik

作為我們的 ingress-controller，是我們 Kubernetes 叢集内部服務和外部使用者之間的橋梁。Traefik 本身内置了一個

/metrics

的接口，但是需要我們在參數中配置開啟:

[metrics]
 [metrics.prometheus]
 entryPoint = "traefik"
 buckets = [0.1, 0.3, 1.2, 5.0]

之前的版本中是通過 --web 和 --web.metrics.prometheus 兩個參數進行開啟的，要注意檢視對應版本的文檔。

我們需要在

traefik.toml

的配置檔案中添加上上面的配置資訊，然後更新 ConfigMap 和 Pod 資源對象即可，Traefik Pod 運作後，我們可以看到我們的服務 IP：

$ kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
......
traefik-ingress-service NodePort 10.101.33.56 <none> 80:31692/TCP,8080:32115/TCP 63d

然後我們可以使用

curl

檢查是否開啟了 Prometheus 名額資料接口，或者通過 NodePort 通路也可以：

$ curl 10.101.33.56:8080/metrics
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0.000121036
go_gc_duration_seconds{quantile="0.25"} 0.000210328
go_gc_duration_seconds{quantile="0.5"} 0.000279974
go_gc_duration_seconds{quantile="0.75"} 0.000420738
go_gc_duration_seconds{quantile="1"} 0.001191494
go_gc_duration_seconds_sum 0.004353914
go_gc_duration_seconds_count 12
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 63
......

從這裡可以看到 Traefik 的監控資料接口已經開啟成功了，然後我們就可以将這個

/metrics

接口配置到

prometheus.yml

中去了，直接加到預設的

prometheus

這個 job 下面：(prome-cm.yaml)

apiVersion: v1
kind: ConfigMap
metadata:
 name: prometheus-config
 namespace: kube-ops
data:
 prometheus.yml: |
 global:
 scrape_interval: 30s
 scrape_timeout: 30s

 scrape_configs:
 - job_name: 'prometheus'
 static_configs:
 - targets: ['localhost:9090']

 - job_name: 'traefik'
 static_configs:
 - targets: ['traefik-ingress-service.kube-system.svc.cluster.local:8080']

當然，我們這裡隻是一個很簡單的配置，scrape_configs 下面可以支援很多參數，例如：

basic_auth 和 bearer_token：比如我們提供的 /metrics 接口需要 basic 認證的時候，通過傳統的使用者名/密碼或者在請求的header中添加對應的 token 都可以支援
kubernetes_sd_configs 或 consul_sd_configs：可以用來自動發現一些應用的監控資料

由于我們這裡 Traefik 對應的 servicename 是

traefik-ingress-service

，并且在 kube-system 這個 namespace 下面，是以我們這裡的

targets

的路徑配置則需要使用

FQDN

的形式：

traefik-ingress-service.kube-system.svc.cluster.local

，當然如果你的 Traefik 和 Prometheus 都部署在同一個命名空間的話，則直接填

servicename:serviceport

即可。然後我們重新更新這個 ConfigMap 資源對象：

$ kubectl delete -f prome-cm.yaml
configmap "prometheus-config" deleted
$ kubectl create -f prome-cm.yaml
configmap "prometheus-config" created

現在 Prometheus 的配置檔案内容已經更改了，隔一會兒被挂載到 Pod 中的 prometheus.yml 檔案也會更新，由于我們之前的 Prometheus 啟動參數中添加了

--web.enable-lifecycle

參數，是以現在我們隻需要執行一個 reload 指令即可讓配置生效：

$ kubectl get svc -n kube-ops
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
prometheus NodePort 10.102.74.90 <none> 9090:30358/TCP 3d
$ curl -X POST "http://10.102.74.90:9090/-/reload"

由于 ConfigMap 通過 Volume 的形式挂載到 Pod 中去的熱更新需要一定的間隔時間才會生效，是以需要稍微等一小會兒。

reload 這個 url 是一個 POST 請求，是以這裡我們通過 service 的 CLUSTER-IP:PORT 就可以通路到這個重載的接口，這個時候我們再去看 Prometheus 的 Dashboard 中檢視采集的目标資料：

可以看到我們剛剛添加的

traefik

這個任務已經出現了，然後同樣的我們可以切換到 Graph 下面去，我們可以找到一些 Traefik 的名額資料，至于這些名額資料代表什麼意義，一般情況下，我們可以去檢視對應的

/metrics

接口，裡面一般情況下都會有對應的注釋。

到這裡我們就在 Prometheus 上配置了第一個 Kubernetes 應用。

使用 exporter 監控應用

上面我們也說過有一些應用可能沒有自帶

/metrics

接口供 Prometheus 使用，在這種情況下，我們就需要利用 exporter 服務來為 Prometheus 提供名額資料了。Prometheus 官方為許多應用就提供了對應的 exporter 應用，也有許多第三方的實作，我們可以前往官方網站進行檢視：

exporters

比如我們這裡通過一個

redis-exporter

的服務來監控 redis 服務，對于這類應用，我們一般會以 sidecar 的形式和主應用部署在同一個 Pod 中，比如我們這裡來部署一個 redis 應用，并用 redis-exporter 的方式來采集監控資料供 Prometheus 使用，如下資源清單檔案：（prome-redis.yaml）

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
 name: redis
 namespace: kube-ops
spec:
 template:
 metadata:
 annotations:
 prometheus.io/scrape: "true"
 prometheus.io/port: "9121"
 labels:
 app: redis
 spec:
 containers:
 - name: redis
 image: redis:4
 resources:
 requests:
 cpu: 100m
 memory: 100Mi
 ports:
 - containerPort: 6379
 - name: redis-exporter
 image: oliver006/redis_exporter:latest
 resources:
 requests:
 cpu: 100m
 memory: 100Mi
 ports:
 - containerPort: 9121
---
kind: Service
apiVersion: v1
metadata:
 name: redis
 namespace: kube-ops
spec:
 selector:
 app: redis
 ports:
 - name: redis
 port: 6379
 targetPort: 6379
 - name: prom
 port: 9121
 targetPort: 9121

可以看到上面我們在 redis 這個 Pod 中包含了兩個容器，一個就是 redis 本身的主應用，另外一個容器就是 redis_exporter。現在直接建立上面的應用：

$ kubectl create -f prome-redis.yaml
deployment.extensions "redis" created
service "redis" created

建立完成後，我們可以看到 redis 的 Pod 裡面包含有兩個容器：

$ kubectl get pods -n kube-ops
NAME READY STATUS RESTARTS AGE
prometheus-8566cd9699-gt9wh 1/1 Running 0 3d
redis-544b6c8c54-8xd2g 2/2 Running 0 3m
$ kubectl get svc -n kube-ops
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
prometheus NodePort 10.102.74.90 <none> 9090:30358/TCP 3d
redis ClusterIP 10.104.131.44 <none> 6379/TCP,9121/TCP 5m

我們可以通過 9121 端口來校驗是否能夠采集到資料：

$ curl 10.104.131.44:9121/metrics
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0
go_gc_duration_seconds_sum 0
go_gc_duration_seconds_count 0
......
# HELP redis_used_cpu_user_children used_cpu_user_childrenmetric
# TYPE redis_used_cpu_user_children gauge
redis_used_cpu_user_children{addr="redis://localhost:6379",alias=""} 0

同樣的，現在我們隻需要更新 Prometheus 的配置檔案：

- job_name: 'redis'
 static_configs:
 - targets: ['redis:9121']

由于我們這裡的 redis 服務和 Prometheus 處于同一個 namespace，是以我們直接使用 servicename 即可。

配置檔案更新後，重新加載：

$ kubectl delete -f prome-cm.yaml
configmap "prometheus-config" deleted
$ kubectl create -f prome-cm.yaml
configmap "prometheus-config" created
# 隔一會兒執行reload操作
$ curl -X POST "http://10.102.74.90:9090/-/reload"

這個時候我們再去看 Prometheus 的 Dashboard 中檢視采集的目标資料：

可以看到配置的 redis 這個 job 已經生效了。切換到 Graph 下面可以看到很多關于 redis 的名額資料：

我們選擇任意一個名額，比如

redis_exporter_scrapes_total

，然後點選執行就可以看到對應的資料圖表了：

注意，如果時間有問題，我們需要手動在 Graph 下面調整下時間

除了監控群集中部署的服務之外，我們下節課再和大家學習怎樣監視 Kubernetes 群集本身。

本文轉自掘金-

使用 Prometheus 監控 Kubernetes 應用

使用 Prometheus 監控 Kubernetes 應用

普通應用監控

使用 exporter 監控應用

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入