天天看點

關于 Kubernetes中Pod健康檢測和服務可用性檢查的一些筆記(LivenessProbe+ReadinessProbe)Pod健康檢查和服務可用性檢查

寫在前面

  • 學習

    k8s

    這裡整理記憶
  • 博文内容涉及:
  • LivenessProbe

    ,

    ReadinessProbe

    兩種

    探針

    的一些基本理論
  • ExecAction

    TCPSocketAction

    HTTPGetAction

    三種

    檢測方式

    的Demo

中秋明月,豪門有,貧家也有。極慰人心。 ——烽火戲諸侯《劍來》

*

Pod健康檢查和服務可用性檢查

健康檢查的目的

探測的目的

: 用來維持 pod的健壯性,當pod挂掉之後,deployment會生成新的pod,但如果pod是正常運作的,但pod裡面出了問題,此時deployment是監測不到的。故此需要探測(probe)-pod是不是正常提供服務的

探針類似

Kubernetes

Pod

的健康狀态可以通過兩類探針來檢查:

LivenessProbe

ReadinessProbe

, kubelet定期執行這兩類探針來診斷容器的健康狀況。都是通過deployment實作的

探針類型 描述
LivenessProbe探針 用于判斷容器是否存活(Running狀态) ,如果LivenessProbe探針探測到容器不健康,則kubelet将殺掉該容器,并根據容器的重新開機政策做相應的處理。如果一個容器不包含LivenesspProbe探針,那麼kubelet認為該容器的LivenessProbe探針傳回的值永遠是Success。
ReadinessProbe探針 用于判斷容器服務是否可用(Ready狀态) ,達到Ready狀态的Pod才可以接收請求。對于被Service管理的Pod, Service與Pod Endpoint的關聯關系也将基于Pod是否Ready進行設定。如果在運作過程中Ready狀态變為False,則系統自動将其從Service的後端Endpoint清單中隔離出去,後續再把恢複到Ready狀态的Pod加回後端Endpoint清單。這樣就能保證用戶端在通路Service時不會被轉發到服務不可用的Pod執行個體上。

檢測方式及參數配置

LivenessProbe

ReadinessProbe

均可配置以下三種實作方式。

方式
ExecAction 在容器内部執行一個指令,如果該指令的傳回碼為0,則表明容器健康。
TCPSocketAction 通過容器的IP位址和端口号執行TC檢查,如果能夠建立TCP連接配接,則表明容器健康。
HTTPGetAction 通過容器的IP位址、端口号及路徑調用HTTP Get方法,如果響應的狀态碼大于等于200且小于400,則認為容器健康。

對于每種探測方式,需要設定

initialDelaySeconds

timeoutSeconds

等參數,它們的含義分别如下。

參數
initialDelaySeconds: 啟動容器後進行首次健康檢查的等待時間,機關為s。
timeoutSeconds: 健康檢查發送請求後等待響應的逾時時間,機關為s。當逾時發生時, kubelet會認為容器已經無法提供服務,将會重新開機該容器。
periodSeconds 執行探測的頻率,預設是10秒,最小1秒。
successThreshold 探測失敗後,最少連續探測成功多少次才被認定為成功,預設是1,對于liveness必須是1,最小值是1。
failureThreshold 當 Pod 啟動了并且探測到失敗,Kubernetes 的重試次數。存活探測情況下的放棄就意味着重新啟動容器。就緒探測情況下的放棄 Pod 會被打上未就緒的标簽。預設值是 3。最小值是 1

Kubernetes的ReadinessProbe機制可能無法滿足某些複雜應用對容器内服務可用狀态的判斷

是以Kubernetes從1.11版本開始,引入PodReady++特性對Readiness探測機制進行擴充,在1.14版本時達到GA穩定版,稱其為Pod Readiness Gates。

通過Pod Readiness Gates機制,使用者可以将自定義的ReadinessProbe探測方式設定在Pod上,輔助Kubernetes設定Pod何時達到服務可用狀态(Ready) 。為了使自定義的ReadinessProbe生效,使用者需要提供一個外部的控制器(Controller)來設定相應的Condition狀态。

Pod的Readiness Gates在Pod定義中的ReadinessGate字段進行設定。下面的例子設定了一個類型為www.example.com/feature-1的新ReadinessGate:

關于 Kubernetes中Pod健康檢測和服務可用性檢查的一些筆記(LivenessProbe+ReadinessProbe)Pod健康檢查和服務可用性檢查
新增的自定義Condition的狀态(status)将由使用者自定義的外部控·制器設定,預設值為False. Kubernetes将在判斷全部readinessGates條件都為True時,才設定Pod為服務可用狀态(Ready為True) 。
這個不是太懂,需要以後再研究下

學習環境準備

┌──[[email protected]]-[~/ansible]
└─$mkdir liveness-probe
┌──[[email protected]]-[~/ansible]
└─$cd liveness-probe/
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl  create ns liveness-probe
namespace/liveness-probe created
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl config current-context
kubernetes-admin@kubernetes
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl config set-context $(kubectl config current-context) --namespace=liveness-probe
Context "kubernetes-admin@kubernetes" modified.           

用于判斷容器是否存活(Running狀态) ,如果LivenessProbe探針探測到容器不健康,則kubelet将殺掉該容器,并根據容器的重新開機政策做相應的處理

ExecAction方式:command

資源檔案定義

┌──[[email protected]]-[~/ansible/liveness-probe]
└─$cat liveness-probe.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: pod-liveness
  name: pod-liveness
spec:
  containers:
  - args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; slee 10
    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 5 #容器啟動的5s内不監測
      periodSeconds: 5 #每5s鐘檢測一次
    image: busybox
    imagePullPolicy: IfNotPresent
    name: pod-liveness
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always 
status: {}           

運作這個deploy。當pod建立成功後,建立檔案,并睡眠30s,删掉檔案在睡眠。使用liveness檢測檔案的存在

┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl  apply  -f liveness-probe.yaml
pod/pod-liveness created
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl  get pods
NAME           READY   STATUS    RESTARTS     AGE
pod-liveness   1/1     Running   1 (8s ago)   41s   # 30檔案沒有重新開機           

運作超過30s後。檔案被删除,是以被健康檢測命中,pod根據重新開機政策重新開機

┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl  get pods
NAME           READY   STATUS    RESTARTS      AGE
pod-liveness   1/1     Running   2 (34s ago)   99s           

99s後已經從起了第二次

┌──[[email protected]]-[~/ansible]
└─$ansible 192.168.26.83 -m shell -a "docker ps | grep pod-liveness"
192.168.26.83 | CHANGED | rc=0 >>
00f4182c014e   7138284460ff                                        "/bin/sh -c 'touch /…"   6 seconds ago   Up 5 seconds             k8s_pod-liveness_pod-liveness_liveness-probe_81b4b086-fb28-4657-93d0-bd23e67f980a_0
01c5cfa02d8c   registry.aliyuncs.com/google_containers/pause:3.5   "/pause"                 7 seconds ago   Up 6 seconds             k8s_POD_pod-liveness_liveness-probe_81b4b086-fb28-4657-93d0-bd23e67f980a_0
┌──[[email protected]]-[~/ansible]
└─$kubectl get pods
NAME           READY   STATUS    RESTARTS   AGE
pod-liveness   1/1     Running   0          25s
┌──[[email protected]]-[~/ansible]
└─$kubectl get pods
NAME           READY   STATUS    RESTARTS      AGE
pod-liveness   1/1     Running   1 (12s ago)   44s
┌──[[email protected]]-[~/ansible]
└─$ansible 192.168.26.83 -m shell -a "docker ps | grep pod-liveness"
192.168.26.83 | CHANGED | rc=0 >>
1eafd7e8a12a   7138284460ff                                        "/bin/sh -c 'touch /…"   15 seconds ago   Up 14 seconds             k8s_pod-liveness_pod-liveness_liveness-probe_81b4b086-fb28-4657-93d0-bd23e67f980a_1
01c5cfa02d8c   registry.aliyuncs.com/google_containers/pause:3.5   "/pause"                 47 seconds ago   Up 47 seconds             k8s_POD_pod-liveness_liveness-probe_81b4b086-fb28-4657-93d0-bd23e67f980a_0
┌──[[email protected]]-[~/ansible]
└─$           

檢視節點機docker中的容器ID,前後不一樣,确定是POD被殺掉後重新開機。

HTTPGetAction的方式

通過容器的IP位址、端口号及路徑調用HTTP Get方法,如果響應的狀态碼大于等于200且小于400,則認為容器健康。

建立資源檔案,即相關參數使用

┌──[[email protected]]-[~/ansible/liveness-probe]
└─$cat liveness-probe-http.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: pod-livenss-probe
  name: pod-livenss-probe
spec:
  containers:
  - image: nginx
    imagePullPolicy: IfNotPresent
    name: pod-livenss-probe
    livenessProbe:
      failureThreshold: 3 #當 Pod 啟動了并且探測到失敗,Kubernetes 的重試次數
      httpGet:
        path: /index.html
        port: 80
        scheme: HTTP
      initialDelaySeconds: 10  #容器啟動後第一次執行探測是需要等待多少秒
      periodSeconds: 10   #執行探測的頻率,預設是10秒,最小1秒
      successThreshold: 1 #探測失敗後,最少連續探測成功多少次才被認定為成功
      timeoutSeconds: 10 #探測逾時時間,預設1秒,最小1秒
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}           

運作deploy,這個的探測機制通路Ngixn的預設歡迎頁

┌──[[email protected]]-[~/ansible/liveness-probe]
└─$vim liveness-probe-http.yaml
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl apply  -f liveness-probe-http.yaml
pod/pod-livenss-probe created
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl  get pods
NAME                READY   STATUS    RESTARTS   AGE
pod-livenss-probe   1/1     Running   0          15s
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl exec -it pod-livenss-probe -- rm /usr/share/nginx/html/index.html
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl  get pods
NAME                READY   STATUS    RESTARTS     AGE
pod-livenss-probe   1/1     Running   1 (1s ago)   2m31s
           

當歡迎頁被删除時,通路報錯,被檢測命中,pod重新開機

TCPSocketAction方式

通過容器的IP位址和端口号執行TCP檢查,如果能夠建立TCP連接配接,則表明容器健康。

┌──[[email protected]]-[~/ansible/liveness-probe]
└─$cat liveness-probe-tcp.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: pod-livenss-probe
  name: pod-livenss-probe
spec:
  containers:
  - image: nginx
    imagePullPolicy: IfNotPresent
    name: pod-livenss-probe
    livenessProbe:
      failureThreshold: 3
      tcpSocket:
        port: 8080
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 10
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}           

通路8080端口,但是8080端口未開放,是以通路會逾時,不能建立連接配接,命中檢測,重新開機Pod

┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl  apply  -f liveness-probe-tcp.yaml
pod/pod-livenss-probe created
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl  get pods
NAME                READY   STATUS    RESTARTS   AGE
pod-livenss-probe   1/1     Running   0          8s
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl  get pods
NAME                READY   STATUS    RESTARTS     AGE
pod-livenss-probe   1/1     Running   1 (4s ago)   44s
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$           

用于判斷容器服務是否可用(Ready狀态) ,達到Ready狀态的Pod才可以接收請求。負責不能進行通路

資源檔案定義,使用鈎子建好需要檢查的檔案

┌──[[email protected]]-[~/ansible/liveness-probe]
└─$cat readiness-probe.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: pod-liveness
  name: pod-liveness
spec:
  containers:
  - readinessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 5 #容器啟動的5s内不監測
      periodSeconds: 5 #每5s鐘檢測一次
    image: nginx
    imagePullPolicy: IfNotPresent
    name: pod-liveness
    resources: {}
    lifecycle:
      postStart:
        exec:
          command: ["/bin/sh", "-c","touch /tmp/healthy"]
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}           

建立3個有Ngixn的pod,通過POD建立一個SVC做測試用

┌──[[email protected]]-[~/ansible/liveness-probe]
└─$sed 's/pod-liveness/pod-liveness-1/' readiness-probe.yaml | kubectl apply  -f -
pod/pod-liveness-1 created
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$sed 's/pod-liveness/pod-liveness-2/' readiness-probe.yaml | kubectl apply  -f -
pod/pod-liveness-2 created
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl  get pods -o wide
NAME             READY   STATUS    RESTARTS   AGE    IP             NODE                         NOMINATED NODE   READINESS GATES
pod-liveness     1/1     Running   0          3m1s   10.244.70.50   vms83.liruilongs.github.io   <none>           <none>
pod-liveness-1   1/1     Running   0          2m     10.244.70.51   vms83.liruilongs.github.io   <none>           <none>
pod-liveness-2   1/1     Running   0          111s   10.244.70.52   vms83.liruilongs.github.io   <none>           <none>           

修改首頁文字

┌──[[email protected]]-[~/ansible/liveness-probe]
└─$serve=pod-liveness
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl exec  -it $serve -- sh -c "echo $serve > /usr/share/nginx/html/index.html"
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl exec  -it $serve -- sh -c "cat /usr/share/nginx/html/index.html"
pod-liveness
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$serve=pod-liveness-1
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl exec  -it $serve -- sh -c "echo $serve > /usr/share/nginx/html/index.html"
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$serve=pod-liveness-2
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl exec  -it $serve -- sh -c "echo $serve > /usr/share/nginx/html/index.html"
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$           

修改标簽

┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl get pods --show-labels
NAME             READY   STATUS    RESTARTS   AGE   LABELS
pod-liveness     1/1     Running   0          15m   run=pod-liveness
pod-liveness-1   1/1     Running   0          14m   run=pod-liveness-1
pod-liveness-2   1/1     Running   0          14m   run=pod-liveness-2
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl edit pods pod-liveness-1
pod/pod-liveness-1 edited
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl edit pods pod-liveness-2
pod/pod-liveness-2 edited
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl get pods --show-labels
NAME             READY   STATUS    RESTARTS   AGE   LABELS
pod-liveness     1/1     Running   0          17m   run=pod-liveness
pod-liveness-1   1/1     Running   0          16m   run=pod-liveness
pod-liveness-2   1/1     Running   0          16m   run=pod-liveness
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$           

要删除檔案檢測

┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl  exec -it pod-liveness -- ls /tmp/
healthy
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl  exec -it pod-liveness-1 -- ls /tmp/
healthy           

使用POD建立SVC

┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl expose  --name=svc pod pod-liveness --port=80
service/svc exposed
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl  get ep
NAME   ENDPOINTS                                         AGE
svc    10.244.70.50:80,10.244.70.51:80,10.244.70.52:80   16s
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl get svc
NAME   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
svc    ClusterIP   10.104.246.121   <none>        80/TCP    36s
┌──[[email protected]]-[~/ansible/liveness-probe]
└─$kubectl get pods -o wide
NAME             READY   STATUS    RESTARTS   AGE   IP             NODE                         NOMINATED NODE   READINESS GATES
pod-liveness     1/1     Running   0          24m   10.244.70.50   vms83.liruilongs.github.io   <none>           <none>
pod-liveness-1   1/1     Running   0          23m   10.244.70.51   vms83.liruilongs.github.io   <none>           <none>
pod-liveness-2   1/1     Running   0          23m   10.244.70.52   vms83.liruilongs.github.io   <none>           <none>           

測試SVC正常,三個POD會正常 負載

┌──[[email protected]]-[~/ansible]
└─$while true; do curl 10.104.246.121 ; sleep 1
> done
pod-liveness
pod-liveness-2
pod-liveness
pod-liveness-1
pod-liveness-2
^C           

删除檔案測試

┌──[[email protected]]-[~/ansible]
└─$kubectl exec -it pod-liveness -- rm  -rf /tmp/
┌──[[email protected]]-[~/ansible]
└─$kubectl exec -it pod-liveness -- ls /tmp/
ls: cannot access '/tmp/': No such file or directory
command terminated with exit code 2
┌──[[email protected]]-[~/ansible]
└─$while true; do curl 10.104.246.121 ; sleep 1; done
pod-liveness-2
pod-liveness-2
pod-liveness-2
pod-liveness-1
pod-liveness-2
pod-liveness-2
pod-liveness-1
^C
           

會發現pod-liveness的pod已經不提供服務了

kubeadm 中的一些健康檢測

kube-apiserver.yaml中的使用,兩種探針同時使用

┌──[[email protected]]-[~/ansible]
└─$cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep -A 8 readi
    readinessProbe:
      failureThreshold: 3
      httpGet:
        host: 192.168.26.81
        path: /readyz
        port: 6443
        scheme: HTTPS
      periodSeconds: 1
      timeoutSeconds: 15
┌──[[email protected]]-[~/ansible]
└─$cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep -A 9 liveness
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 192.168.26.81
        path: /livez
        port: 6443
        scheme: HTTPS
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
┌──[[email protected]]-[~/ansible]
└─$