天天看點

Prometheus Operator 進階配置 Prometheus Operator 自動發現以及資料持久化

​​​​​​上節課我們一起學習了如何在 Prometheus Operator 下面自定義一個監控選項​​​,以及​​自定義報警規則​​的使用。那麼我們還能夠直接使用前面課程中的自動發現功能嗎?如果在我們的 Kubernetes 叢集中有了很多的 Service/Pod,那麼我們都需要一個一個的去建立一個對應的 ServiceMonitor 對象來進行監控嗎?這樣豈不是又變得麻煩起來了?

自動發現配置

為解決上面的問題,Prometheus Operator 為我們提供了一個額外的抓取配置的來解決這個問題,我們可以通過添加額外的配置來進行服務發現進行自動監控。和前面自定義的方式一樣,我們想要在 Prometheus Operator 當中去自動發現并監控具有​

​prometheus.io/scrape=true​

​這個 annotations 的 Service,之前我們定義的 Prometheus 的配置如下:

- job_name: 'kubernetes-service-endpoints'
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
    action: replace
    target_label: __scheme__
    regex: (https?)
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
    action: replace
    target_label: __address__
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    action: replace
    target_label: kubernetes_name      

如果你對上面這個配置還不是很熟悉的話,建議去檢視下前面關于 Kubernetes常用資源對象監控章節的介紹,要想自動發現叢集中的 Service,就需要我們在 Service 的​

​annotation​

​​區域添加​

​prometheus.io/scrape=true​

​的聲明,将上面檔案直接儲存為 prometheus-additional.yaml,然後通過這個檔案建立一個對應的 Secret 對象:

$ kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring
secret "additional-configs" created      
注意我們所有的操作都在 Prometheus Operator 源碼​

​contrib/kube-prometheus/manifests/​

​目錄下面。

建立完成後,會将上面配置資訊進行 base64 編碼後作為 prometheus-additional.yaml 這個 key 對應的值存在:

$ kubectl get secret additional-configs -n monitoring -o yaml
apiVersion: v1
data:
  prometheus-additional.yaml: LSBqb2JfbmFtZTogJ2t1YmVybmV0ZXMtc2VydmljZS1lbmRwb2ludHMnCiAga3ViZXJuZXRlc19zZF9jb25maWdzOgogIC0gcm9sZTogZW5kcG9pbnRzCiAgcmVsYWJlbF9jb25maWdzOgogIC0gc291cmNlX2xhYmVsczogW19fbWV0YV9rdWJlcm5ldGVzX3NlcnZpY2VfYW5ub3RhdGlvbl9wcm9tZXRoZXVzX2lvX3NjcmFwZV0KICAgIGFjdGlvbjoga2VlcAogICAgcmVnZXg6IHRydWUKICAtIHNvdXJjZV9sYWJlbHM6IFtfX21ldGFfa3ViZXJuZXRlc19zZXJ2aWNlX2Fubm90YXRpb25fcHJvbWV0aGV1c19pb19zY2hlbWVdCiAgICBhY3Rpb246IHJlcGxhY2UKICAgIHRhcmdldF9sYWJlbDogX19zY2hlbWVfXwogICAgcmVnZXg6IChodHRwcz8pCiAgLSBzb3VyY2VfbGFiZWxzOiBbX19tZXRhX2t1YmVybmV0ZXNfc2VydmljZV9hbm5vdGF0aW9uX3Byb21ldGhldXNfaW9fcGF0aF0KICAgIGFjdGlvbjogcmVwbGFjZQogICAgdGFyZ2V0X2xhYmVsOiBfX21ldHJpY3NfcGF0aF9fCiAgICByZWdleDogKC4rKQogIC0gc291cmNlX2xhYmVsczogW19fYWRkcmVzc19fLCBfX21ldGFfa3ViZXJuZXRlc19zZXJ2aWNlX2Fubm90YXRpb25fcHJvbWV0aGV1c19pb19wb3J0XQogICAgYWN0aW9uOiByZXBsYWNlCiAgICB0YXJnZXRfbGFiZWw6IF9fYWRkcmVzc19fCiAgICByZWdleDogKFteOl0rKSg/OjpcZCspPzsoXGQrKQogICAgcmVwbGFjZW1lbnQ6ICQxOiQyCiAgLSBhY3Rpb246IGxhYmVsbWFwCiAgICByZWdleDogX19tZXRhX2t1YmVybmV0ZXNfc2VydmljZV9sYWJlbF8oLispCiAgLSBzb3VyY2VfbGFiZWxzOiBbX19tZXRhX2t1YmVybmV0ZXNfbmFtZXNwYWNlXQogICAgYWN0aW9uOiByZXBsYWNlCiAgICB0YXJnZXRfbGFiZWw6IGt1YmVybmV0ZXNfbmFtZXNwYWNlCiAgLSBzb3VyY2VfbGFiZWxzOiBbX19tZXRhX2t1YmVybmV0ZXNfc2VydmljZV9uYW1lXQogICAgYWN0aW9uOiByZXBsYWNlCiAgICB0YXJnZXRfbGFiZWw6IGt1YmVybmV0ZXNfbmFtZQo=
kind: Secret
metadata:
  creationTimestamp: 2018-12-20T14:50:35Z
  name: additional-configs
  namespace: monitoring
  resourceVersion: "41814998"
  selfLink: /api/v1/namespaces/monitoring/secrets/additional-configs
  uid: 9bbe22c5-0466-11e9-a777-525400db4df7
type: Opaque      

然後我們隻需要在聲明 prometheus 的資源對象檔案中添加上這個額外的配置:(prometheus-prometheus.yaml)

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    prometheus: k8s
  name: k8s
  namespace: monitoring
spec:
  alerting:
    alertmanagers:
    - name: alertmanager-main
      namespace: monitoring
      port: web
  baseImage: quay.io/prometheus/prometheus
  nodeSelector:
    beta.kubernetes.io/os: linux
  replicas: 2
  secrets:
  - etcd-certs
  resources:
    requests:
      memory: 400Mi
  ruleSelector:
    matchLabels:
      prometheus: k8s
      role: alert-rules
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  additionalScrapeConfigs:
    name: additional-configs
    key: prometheus-additional.yaml
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  version: v2.5.0      

添加完成後,直接更新 prometheus 這個 CRD 資源對象:

$ kubectl apply -f prometheus-prometheus.yaml
prometheus.monitoring.coreos.com "k8s" configured      

隔一小會兒,可以前往 Prometheus 的 Dashboard 中檢視配置是否生效:

在 Prometheus Dashboard 的配置頁面下面我們可以看到已經有了對應的的配置資訊了,但是我們切換到 targets 頁面下面卻并沒有發現對應的監控任務,檢視 Prometheus 的 Pod 日志:

$ kubectl logs -f prometheus-k8s-0 prometheus -n monitoring
level=error ts=2018-12-20T15:14:06.772903214Z caller=main.go:240 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:302: Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list pods at the cluster scope"
level=error ts=2018-12-20T15:14:06.773096875Z caller=main.go:240 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:301: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list services at the cluster scope"
level=error ts=2018-12-20T15:14:06.773212629Z caller=main.go:240 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:300: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list endpoints at the cluster scope"
......      

可以看到有很多錯誤日志出現,都是​

​xxx is forbidden​

​,這說明是 RBAC 權限的問題,通過 prometheus 資源對象的配置可以知道 Prometheus 綁定了一個名為 prometheus-k8s 的 ServiceAccount 對象,而這個對象綁定的是一個名為 prometheus-k8s 的 ClusterRole:(prometheus-clusterRole.yaml)

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-k8s
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get      

上面的權限規則中我們可以看到明顯沒有對 Service 或者 Pod 的 list 權限,是以報錯了,要解決這個問題,我們隻需要添加上需要的權限即可:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-k8s
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  - nodes/proxy
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - configmaps
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get      

更新上面的 ClusterRole 這個資源對象,然後重建下 Prometheus 的所有 Pod,正常就可以看到 targets 頁面下面有 kubernetes-service-endpoints 這個監控任務了:

我們這裡自動監控了兩個 Service,第一個就是我們之前建立的 Redis 的服務,我們在 Redis Service 中有兩個特殊的 annotations:

annotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "9121"      

是以被自動發現了,當然我們也可以用同樣的方式去配置 Pod、Ingress 這些資源對象的自動發現。

資料持久化

上面我們在修改完權限的時候,重新開機了 Prometheus 的 Pod,如果我們仔細觀察的話會發現我們之前采集的資料已經沒有了,這是因為我們通過 prometheus 這個 CRD 建立的 Prometheus 并沒有做資料的持久化,我們可以直接檢視生成的 Prometheus Pod 的挂載情況就清楚了:

$ kubectl get pod prometheus-k8s-0 -n monitoring -o yaml
......
    volumeMounts:
    - mountPath: /etc/prometheus/config_out
      name: config-out
      readOnly: true
    - mountPath: /prometheus
      name: prometheus-k8s-db
......
  volumes:
......
  - emptyDir: {}
    name: prometheus-k8s-db
......      

我們可以看到 Prometheus 的資料目錄 /prometheus 實際上是通過 emptyDir 進行挂載的,我們知道 emptyDir 挂載的資料的生命周期和 Pod 生命周期一緻的,是以如果 Pod 挂掉了,資料也就丢失了,這也就是為什麼我們重建 Pod 後之前的資料就沒有了的原因,對應線上的監控資料肯定需要做資料的持久化的,同樣的 prometheus 這個 CRD 資源也為我們提供了資料持久化的配置方法,由于我們的 Prometheus 最終是通過 Statefulset 控制器進行部署的,是以我們這裡需要通過 storageclass 來做資料持久化,首先建立一個 StorageClass 對象:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: prometheus-data-db
provisioner: fuseim.pri/ifs      

這裡我們聲明一個 StorageClass 對象,其中 provisioner=fuseim.pri/ifs,則是因為我們叢集中使用的是 nfs 作為存儲後端,而前面我們課程中建立的 nfs-client-provisioner 中指定的 PROVISIONER_NAME 就為 fuseim.pri/ifs,這個名字不能随便更改,将該檔案儲存為 prometheus-storageclass.yaml:

$ kubectl create -f prometheus-storageclass.yaml
storageclass.storage.k8s.io "prometheus-data-db" created      

然後在 prometheus 的 CRD 資源對象中添加如下配置:

storage:
  volumeClaimTemplate:
    spec:
      storageClassName: prometheus-data-db
      resources:
        requests:
          storage: 10Gi      

注意這裡的 storageClassName 名字為上面我們建立的 StorageClass 對象名稱,然後更新 prometheus 這個 CRD 資源。更新完成後會自動生成兩個 PVC 和 PV 資源對象:

$ kubectl get pvc -n monitoring
NAME                                 STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS         AGE
prometheus-k8s-db-prometheus-k8s-0   Bound     pvc-0cc03d41-047a-11e9-a777-525400db4df7   10Gi       RWO            prometheus-data-db   8m
prometheus-k8s-db-prometheus-k8s-1   Bound     pvc-1938de6b-047b-11e9-a777-525400db4df7   10Gi       RWO            prometheus-data-db   1m
$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                           STORAGECLASS         REASON    AGE
pvc-0cc03d41-047a-11e9-a777-525400db4df7   10Gi       RWO            Delete           Bound       monitoring/prometheus-k8s-db-prometheus-k8s-0   prometheus-data-db             2m
pvc-1938de6b-047b-11e9-a777-525400db4df7   10Gi       RWO            Delete           Bound       monitoring/prometheus-k8s-db-prometheus-k8s-1   prometheus-data-db             1m      
$ kubectl get pod prometheus-k8s-0 -n monitoring -o yaml
......
    volumeMounts:
    - mountPath: /etc/prometheus/config_out
      name: config-out
      readOnly: true
    - mountPath: /prometheus
      name: prometheus-k8s-db
......
  volumes:
......
  - name: prometheus-k8s-db
    persistentVolumeClaim:
      claimName: prometheus-k8s-db-prometheus-k8s-0
......      
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    prometheus: k8s
  name: k8s
  namespace: monitoring
spec:
  alerting:
    alertmanagers:
    - name: alertmanager-main
      namespace: monitoring
      port: web
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: prometheus-data-db
        resources:
          requests:
            storage: 10Gi
  baseImage: quay.io/prometheus/prometheus
  nodeSelector:
    beta.kubernetes.io/os: linux
  replicas: 2
  secrets:
  - etcd-certs
  additionalScrapeConfigs:
    name: additional-configs
    key: prometheus-additional.yaml
  resources:
    requests:
      memory: 400Mi
  ruleSelector:
    matchLabels:
      prometheus: k8s
      role: alert-rules
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  version: v2.5.0