Kubernetes管理經驗

叢集管理相關指令

kubectl get cs

# 檢視節點
kubectl get nodes

kubectl get ing pdd --n java
# 不排程
kubectl taint nodes node1 key=value:NoSchedule
kubectl cluster-info dump


kubectl get svc --sort-by=.metadata.creationTimestamp
kubectl get no --sort-by=.metadata.creationTimestamp
kubectl get po --field-selector spec.nodeName=xxxx
kubectl get events  --field-selector involvedObject.kind=Service --sort-by='.metadata.creationTimestamp'

參考連結:

kubernetes 節點維護 cordon, drain, uncordon

應用管理相關

kubectl top pod
kubectl delete deployment,services -l app=nginx 
kubectl scale deployment/nginx-deployment --replicas=2
kubectl get svc --all-namespaces=true

強制删除

有時删除pv/pvc時會有問題,這個使用得加2個指令參數

--grace-period=0 --force

删除所有失敗的pod

kubectl get po --all-namespaces --field-selector 'status.phase==Failed'
  kubectl delete po  --field-selector 'status.phase==Failed'

一些技巧

k8s目前沒有沒有類似docker-compose的

depends_on

依賴啟動機制,建議使用

wait-for-it

重寫鏡像的command.

叢集管理經(教)驗(訓)

節點問題

taint别亂用

kubectl taint nodes xx  elasticsearch-test-ready=true:NoSchedule
kubectl taint nodes xx  elasticsearch-test-ready:NoSchedule-

master節點本身就自帶taint,是以才會導緻我們釋出的容器不會在master節點上面跑.但是如果自定義

taint

的話就要注意了!所有

DaemonSet

和kube-system,都需要帶上相應的

tolerations

.不然該節點會驅逐所有不帶這個

tolerations

的容器,甚至包括網絡插件,kube-proxy,後果相當嚴重,請注意

taint

跟

tolerations

是結對對應存在的,操作符也不能亂用

NoExecute

tolerations:
        - key: "elasticsearch-exclusive"
          operator: "Equal"
          value: "true"
          effect: "NoExecute"

kubectl taint node cn-shenzhen.xxxx elasticsearch-exclusive=true:NoExecute

NoExecute是立刻驅逐不滿足容忍條件的pod,該操作非常兇險,請務必先行确認系統元件有對應配置tolerations.

特别注意用

Exists

這個操作符是無效的,必須用

Equal

NoSchedule

tolerations:
        - key: "elasticsearch-exclusive"
          operator: "Exists"
          effect: "NoSchedule"
        - key: "elasticsearch-exclusive"
          operator: "Equal"
          value: "true"
          effect: "NoExecute"

kubectl taint node cn-shenzhen.xxxx elasticsearch-exclusive=true:NoSchedule

是盡量不往這上面排程,但實際上還是會有pod在那上面跑

Exists

和

Exists

随意使用,不是很影響

值得一提的是,同一個key可以同時存在多個effect

Taints:             elasticsearch-exclusive=true:NoExecute
                    elasticsearch-exclusive=true:NoSchedule

其他參考連結：

隔離節點的正确步驟

# 驅逐除了ds以外所有的pod
kubectl drain <node name>   --ignore-daemonsets
kubectl cordon <node name>

這個時候運作get node指令,狀态會變

node.xx   Ready,SchedulingDisabled   <none>   189d   v1.11.5

最後

kubectl delete <node name>

維護節點的正确步驟

kubectl drain <node name> --ignore-daemonsets
kubectl uncordon <node name>

節點出現磁盤壓力(DiskPressure)

--eviction-hard=imagefs.available<15%,memory.available<300Mi,nodefs.available<10%,nodefs.inodesFree<5%

kubelet在啟動時指定了磁盤壓力,以阿裡雲為例,

imagefs.available<15%

意思是說容器的讀寫層少于15%的時候,節點會被驅逐.節點被驅逐的後果就是産生DiskPressure這種狀況,并且節點上再也不能運作任何鏡像,直至磁盤問題得到解決.如果節點上容器使用了宿主目錄,這個問題将會是緻命的.因為你不能把目錄删除掉,但是真是這些主控端的目錄堆積,導緻了節點被驅逐.

是以,平時要養好良好習慣,容器裡面别瞎寫東西(容器裡面寫檔案會占用ephemeral-storage,ephemeral-storage過多pod會被驅逐),多使用無狀态型容器,謹慎選擇存儲方式,盡量别用hostpath這種存儲

出現狀況時,真的有種欲哭無淚的感覺.

Events:
  Type     Reason                 Age                   From                                            Message
  ----     ------                 ----                  ----                                            -------
  Warning  FreeDiskSpaceFailed    23m                   kubelet, node.xxxx1     failed to garbage collect required amount of images. Wanted to free 5182058496 bytes, but freed 0 bytes
  Warning  FreeDiskSpaceFailed    18m                   kubelet, node.xxxx1     failed to garbage collect required amount of images. Wanted to free 6089891840 bytes, but freed 0 bytes
  Warning  ImageGCFailed          18m                   kubelet, node.xxxx1     failed to garbage collect required amount of images. Wanted to free 6089891840 bytes, but freed 0 bytes
  Warning  FreeDiskSpaceFailed    13m                   kubelet, node.xxxx1     failed to garbage collect required amount of images. Wanted to free 4953321472 bytes, but freed 0 bytes
  Warning  ImageGCFailed          13m                   kubelet, node.xxxx1     failed to garbage collect required amount of images. Wanted to free 4953321472 bytes, but freed 0 bytes
  Normal   NodeHasNoDiskPressure  10m (x5 over 47d)     kubelet, node.xxxx1     Node node.xxxx1 status is now: NodeHasNoDiskPressure
  Normal   Starting               10m                   kube-proxy, node.xxxx1  Starting kube-proxy.
  Normal   NodeHasDiskPressure    10m (x4 over 42m)     kubelet, node.xxxx1     Node node.xxxx1 status is now: NodeHasDiskPressure
  Warning  EvictionThresholdMet   8m29s (x19 over 42m)  kubelet, node.xxxx1     Attempting to reclaim ephemeral-storage
  Warning  ImageGCFailed          3m4s                  kubelet, node.xxxx1     failed to garbage collect required amount of images. Wanted to free 4920913920 bytes, but freed 0 bytes

節點CPU彪高

有可能是節點在進行GC(container GC/image GC),用

describe node

查查.我有次遇到這種狀況,最後節點上的容器少了很多,也是有點郁悶

Events:
  Type     Reason                 Age                 From                                         Message
  ----     ------                 ----                ----
  Warning  ImageGCFailed          45m                 kubelet, cn-shenzhen.xxxx  failed to get image stats: rpc error: code = DeadlineExceeded desc = context deadline exceeded

參考:

kubelet 源碼分析：Garbage Collect

對象問題

pod

pod頻繁重新開機

原因有多種,不可一概而論

資源達到limit設定值

調高limit或者檢查應用

Readiness/Liveness connection refused

Readiness檢查失敗的也會重新開機,但是

Readiness

檢查失敗不一定是應用的問題,如果節點本身負載過重,也是會出現connection refused或者timeout

這個問題要上節點排查

pod被驅逐(Evicted)

節點加了污點導緻pod被驅逐
ephemeral-storage超過限制被驅逐
1. EmptyDir 的使用量超過了他的 SizeLimit，那麼這個 pod 将會被驅逐
2. Container 的使用量（log，如果沒有 overlay 分區，則包括 imagefs）超過了他的 limit，則這個 pod 會被驅逐
3. Pod 對本地臨時存儲總的使用量（所有 emptydir 和 container）超過了 pod 中所有container 的 limit 之和，則 pod 被驅逐

ephemeral-storage是一個pod用的臨時存儲.

resources:
       requests: 
           ephemeral-storage: "2Gi"
       limits:
           ephemeral-storage: "3Gi"

節點被驅逐後通過get po還是能看到,用describe指令,可以看到被驅逐的曆史原因

Message: The node was low on resource: ephemeral-storage. Container codis-proxy was using 10619440Ki, which exceeds its request of 0.

kubectl exec 進入容器失敗

這種問題我在搭建codis-server的時候遇到過,當時沒有配置就緒以及健康檢查.但擷取pod描述的時候,顯示running.其實這個時候容器以及不正常了.

~ kex codis-server-3 sh
rpc error: code = 2 desc = containerd: container not found
command terminated with exit code 126

解決辦法:删了這個pod,配置

livenessProbe

pod的virtual host name

Deployment

衍生的pod,

virtual host name

就是

pod name

StatefulSet

virtual host name

是

<pod name>.<svc name>.<namespace>.svc.cluster.local

.相比

Deployment

顯得更有規律一些.而且支援其他pod通路

pod接連Crashbackoff

Crashbackoff

有多種原因.

沙箱建立(FailedCreateSandBox)失敗,多半是cni網絡插件的問題

鏡像拉取,有中國特色社會主義的問題,可能太大了,拉取較慢

也有一種可能是容器并發過高,流量雪崩導緻.

比如,現在有3個容器abc,a突然遇到流量洪峰導緻内部奔潰,繼而

Crashbackoff

,那麼a就會被

service

剔除出去,剩下的bc也承載不了那麼多流量,接連崩潰,最終網站不可通路.這種情況,多見于高并發網站+低效率web容器.

在不改變代碼的情況下,最優解是增加副本數,并且加上hpa,實作動态伸縮容.

deploy

MinimumReplicationUnavailable

如果

deploy

配置了SecurityContext,但是api-server拒絕了,就會出現這個情況,在api-server的容器裡面,去掉

SecurityContextDeny

這個啟動參數.

具體見

Using Admission Controllers

service

建了一個服務,但是沒有對應的po,會出現什麼情況?

請求時一直不會有響應,直到request timeout

參考

Configure Out Of Resource Handling

service connection refuse

原因可能有

pod沒有設定readinessProbe,請求到未就緒的pod
kube-proxy當機了(kube-proxy負責轉發請求)
網絡過載

service沒有負載均衡

檢查一下是否用了

headless service

headless service

是不會自動負載均衡的...

kind: Service
spec:
# clusterIP: None的即為`headless service`
  type: ClusterIP
  clusterIP: None

具體表現service沒有自己的虛拟IP,nslookup會出現所有pod的ip.但是ping的時候隻會出現第一個pod的ip

/ # nslookup consul
nslookup: can't resolve '(null)': Name does not resolve

Name:      consul
Address 1: 172.31.10.94 172-31-10-94.consul.default.svc.cluster.local
Address 2: 172.31.10.95 172-31-10-95.consul.default.svc.cluster.local
Address 3: 172.31.11.176 172-31-11-176.consul.default.svc.cluster.local

/ # ping consul
PING consul (172.31.10.94): 56 data bytes
64 bytes from 172.31.10.94: seq=0 ttl=62 time=0.973 ms
64 bytes from 172.31.10.94: seq=1 ttl=62 time=0.170 ms
^C
--- consul ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.170/0.571/0.973 ms

/ # ping consul
PING consul (172.31.10.94): 56 data bytes
64 bytes from 172.31.10.94: seq=0 ttl=62 time=0.206 ms
64 bytes from 172.31.10.94: seq=1 ttl=62 time=0.178 ms
^C
--- consul ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.178/0.192/0.206 ms

普通的type: ClusterIP service,nslookup會出現該服務自己的IP

/ # nslookup consul
nslookup: can't resolve '(null)': Name does not resolve

Name:      consul
Address 1: 172.30.15.52 consul.default.svc.cluster.local

ReplicationController不更新

ReplicationController不是用apply去更新的,而是

kubectl rolling-update

,但是這個指令也廢除了,取而代之的是

kubectl rollout

.是以應該使用

kubectl rollout

作為更新手段,或者懶一點,apply file之後,delete po.

盡量使用deploy吧.

StatefulSet更新失敗

StatefulSet是逐一更新的,觀察一下是否有

Crashbackoff

的容器,有可能是這個容器導緻更新卡住了,删掉即可.

進階排程

使用親和度確定節點在目标節點上運作

nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: elasticsearch-test-ready
                operator: Exists

使用反親和度確定每個節點隻跑同一個應用

affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: 'app'
                operator: In
                values:
                - nginx-test2
            topologyKey: "kubernetes.io/hostname"
            namespaces: 
            - test

容忍運作

master節點之是以不允許普通鏡像,是因為master節點帶了污點,如果需要強制在master上面運作鏡像,則需要容忍相應的污點.

tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/master
          operator: Exists
        - effect: NoSchedule
          key: node.cloudprovider.kubernetes.io/uninitialized
          operator: Exists

阿裡雲Kubernetes問題

修改預設ingress

建立一個指向ingress的負載均衡型svc,然後修改一下

kube-system

下

nginx-ingress-controller

啟動參數.

- args:
            - /nginx-ingress-controller
            - '--configmap=$(POD_NAMESPACE)/nginx-configuration'
            - '--tcp-services-configmap=$(POD_NAMESPACE)/tcp-services'
            - '--udp-services-configmap=$(POD_NAMESPACE)/udp-services'
            - '--annotations-prefix=nginx.ingress.kubernetes.io'
            - '--publish-service=$(POD_NAMESPACE)/<自定義svc>'
            - '--v=2'

LoadBalancer服務一直沒有IP

具體表現是EXTERNAL-IP一直顯示pending.

~ kg svc consul-web
NAME         TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
consul-web   LoadBalancer   172.30.13.122   <pending>     443:32082/TCP   5m

這問題跟

Alibaba Cloud Provider

這個元件有關,

cloud-controller-manager

有3個元件,他們需要内部選主,可能哪裡出錯了,當時我把其中一個出問題的

pod

删了,就好了.

清理Statefulset動态PVC

目前阿裡雲

Statefulset

動态PVC用的是nas。

對于這種存儲，需要先把容器副本将為0，或者整個 Statefulset 删除。
删除PVC
把nas挂載到任意一台伺服器上面，然後删除pvc對應nas的目錄。

更新到v1.12.6-aliyun.1之後節點可配置設定記憶體變少

該版本每個節點保留了1Gi,相當于整個叢集少了N GB(N為節點數)供Pod配置設定.

如果節點是4G的,Pod請求3G,極其容易被驅逐.

建議提高節點規格.

Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.6-aliyun.1", GitCommit:"8cb561c", GitTreeState:"", BuildDate:"2019-04-22T11:34:20Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

新加節點出現NetworkUnavailable

RouteController failed to create a route

看一下kubernetes events,是否出現了

timed out waiting for the condition -> WaitCreate: ceate route for table vtb-wz9cpnsbt11hlelpoq2zh error, Aliyun API Error: RequestId: 7006BF4E-000B-4E12-89F2-F0149D6688E4 Status Code: 400 Code: QuotaExceeded Message: Route entry quota exceeded in this route table

出現這個問題是因為達到了

VPC的自定義路由條目限制

,預設是48,需要提高

vpc_quota_route_entrys_num

的配額

參考(應用排程相關):