天天看點

kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations

1.排程器作用

排程器通過 kubernetes 的 watch 機制來發現叢集中新建立且尚未被排程到 Node 上的 Pod。排程器會将發現的每一個未排程的 Pod 排程到一個合适的 Node 上來運作。

kube-scheduler 是 Kubernetes 叢集的預設排程器,并且是叢集控制面的一部分。如果你真的希望或者有這方面的需求,kube-scheduler 在設計上是允許你自己寫一個排程元件并替換原有的 kube-scheduler。

在做排程決定時需要考慮的因素包括:單獨和整體的資源請求、硬體/軟體/政策限制、親和以及反親和要求、資料局域性、負載間的幹擾等等。

預設政策可以參考:https://kubernetes.io/zh/docs/concepts/scheduling/kube-scheduler/

排程架構:https://kubernetes.io/zh/docs/concepts/configuration/scheduling-framework/

2. nodeName

nodeName 是節點選擇限制的最簡單方法,但一般不推薦。

如果 nodeName 在 PodSpec 中指定了,則它優先于其他的節點選擇方法。

使用 nodeName 來選擇節點的一些限制:

如果指定的節點不存在。

如果指定的節點沒有資源來容納 pod,則pod 排程失敗。

雲環境中的節點名稱并非總是可預測或穩定的。

示例

[[email protected] ~]# cd sduler/
[[email protected] sduler]# vim pod.yml 
[[email protected] sduler]# cat pod.yml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
  nodeName: server3	##指定pod排程到server3節點
[[email protected] sduler]# kubectl apply -f pod.yml 
pod/nginx created
[[email protected] sduler]# kubectl get pod -o wide
NAME    READY   STATUS    RESTARTS   AGE   IP            NODE      NOMINATED NODE   READINESS GATES
nginx   1/1     Running   0          13s   10.244.1.30   server3   <none>           <none>
[[email protected] sduler]# 
           
kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations

測試:指定節點沒有資源來容納 pod,則pod 排程失敗

kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations

3. nodeSelector

nodeSelector 是節點選擇限制的最簡單推薦形式。

給選擇的節點添加标簽:

kubectl label nodes server2 disktype=ssd

添加 nodeSelector 字段到 pod 配置中:

[[email protected] sduler]# kubectl label nodes server3 disktype=ssd	##添加标簽到節點server3
node/server3 labeled
[[email protected] sduler]# kubectl get node --show-labels
NAME      STATUS   ROLES    AGE     VERSION   LABELS
server2   Ready    master   5d15h   v1.18.5   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=server2,kubernetes.io/os=linux,node-role.kubernetes.io/master=
server3   Ready    <none>   5d15h   v1.18.5   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=server3,kubernetes.io/os=linux
server4   Ready    <none>   5d15h   v1.18.5   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=server4,kubernetes.io/os=linux
[[email protected] sduler]# vim pod.yml 
[[email protected] sduler]# cat pod.yml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
  nodeSelector: 	##按照所添加的标簽選擇節點
    disktype: ssd
[[email protected] sduler]# kubectl apply -f pod.yml
pod/nginx created
[[email protected] sduler]# kubectl get pod -o wide
NAME    READY   STATUS    RESTARTS   AGE   IP            NODE      NOMINATED NODE   READINESS GATES
nginx   1/1     Running   0          19s   10.244.1.31   server3   <none>           <none>
[[email protected] sduler]# 
           
kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations
kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations
kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations

未比對到标簽時,pod将一直處于pending狀态

kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations

4. 親和和與反親和

nodeSelector 提供了一種非常簡單的方法來将 pod 限制到具有特定标簽的節點上。

親和/反親和功能極大地擴充了你可以表達限制的類型。

你可以發現規則是“軟”/“偏好”,而不是硬性要求,是以,如果排程器無法滿足該要求,仍然排程該 pod

你可以使用節點上的 pod 的标簽來限制,而不是使用節點本身的标簽,來允許哪些 pod 可以或者不可以被放置在一起。

參考:https://kubernetes.io/zh/docs/concepts/configuration/assign-pod-node/

4.1 節點親和

requiredDuringSchedulingIgnoredDuringExecution 必須滿足

preferredDuringSchedulingIgnoredDuringExecution 傾向滿足

IgnoreDuringExecution 表示如果在Pod運作期間Node的标簽發生變化,導緻親和性政策不能滿足,則繼續運作目前的Pod。

nodeaffinity還支援多種規則比對條件的配置如

In:label 的值在清單内

NotIn:label 的值不在清單内

Gt:label 的值大于設定的值,不支援Pod親和性

Lt:label 的值小于設定的值,不支援pod親和性

Exists:設定的label 存在

DoesNotExist:設定的 label 不存在

節點親和性pod示例一:

[[email protected] sduler]# vim pod.yml 
[[email protected] sduler]# cat pod.yml 
apiVersion: v1
kind: Pod
metadata:
  name: node-affinity
spec:
  containers:
  - name: nginx
    image: nginx
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
           nodeSelectorTerms:
           - matchExpressions:
             - key: disktype
               operator: In
               values:
                 - ssd
[[email protected] sduler]# kubectl apply -f pod.yml 
pod/node-affinity created
[[email protected] sduler]# kubectl get pod -o wide
NAME            READY   STATUS    RESTARTS   AGE   IP            NODE      NOMINATED NODE   READINESS GATES
node-affinity   1/1     Running   0          10s   10.244.1.32   server3   <none>           <none>
[[email protected] sduler]# 
           
kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations

示例二:

[[email protected] sduler]# vim pod.yml 
[[email protected] sduler]# cat pod.yml 
apiVersion: v1
kind: Pod
metadata:
  name: node-affinity
spec:
  containers:
  - name: nginx
    image: nginx
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:	##必須滿足
           nodeSelectorTerms:
           - matchExpressions:
             - key: kubernetes.io/hostname
               operator: NotIn
               values:
               - server1
      preferredDuringSchedulingIgnoredDuringExecution:	##傾向滿足
      - weight: 1
        preference:
          matchExpressions:
          - key: disktype
            operator: In
            values:
            - ssd     
[[email protected] sduler]# kubectl apply -f pod.yml 
pod/node-affinity created
[[email protected] sduler]# kubectl get pod -o wide
NAME            READY   STATUS    RESTARTS   AGE   IP            NODE      NOMINATED NODE   READINESS GATES
node-affinity   1/1     Running   0          5s    10.244.1.33   server3   <none>           <none>
[[email protected] sduler]# 
           
kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations

4.2 pod 親和性和反親和性

podAffinity 主要解決POD可以和哪些POD部署在同一個拓撲域中的問題(拓撲域用主機标簽實作,可以是單個主機,也可以是多個主機組成的cluster、zone等。)

podAntiAffinity主要解決POD不能和哪些POD部署在同一個拓撲域中的問題。它們處理的是Kubernetes叢集内部POD和POD之間的關系。

Pod 間親和與反親和在與更進階别的集合(例如 ReplicaSets,StatefulSets,Deployments 等)一起使用時,它們可能更加有用。可以輕松配置一組應位于相同定義拓撲(例如,節點)中的工作負載。

pod親和性示例:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  containers:
  - name: nginx
    image: nginx

---
apiVersion: v1
kind: Pod
metadata:
  name: mysql
  labels:
    app: mysql
spec:
  containers:
  - name: mysql
    image: mysql
    env:
     - name: "MYSQL_ROOT_PASSWORD"
       value: "westos"
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - nginx
        topologyKey: kubernetes.io/hostname
[[email protected] sduler]# vim pod.yml
[[email protected] sduler]# kubectl apply -f pod.yml 
pod/nginx created
pod/mysql created
[[email protected] sduler]# kubectl get pod -o wide
NAME    READY   STATUS    RESTARTS   AGE   IP            NODE      NOMINATED NODE   READINESS GATES
mysql   1/1     Running   0          25s   10.244.1.36   server3   <none>           <none>
nginx   1/1     Running   0          25s   10.244.1.35   server3   <none>           <none>
[[email protected] sduler]# 
           
kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations

設定排程,mysql跟随nginx的pod建立,建立pod到相應的标簽

kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations
kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations

pod反親和性示例:

[[email protected] sduler]# vim pod.yml 
[[email protected] sduler]# cat pod.yml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  containers:
  - name: nginx
    image: nginx
  nodeName: server3
---
apiVersion: v1
kind: Pod
metadata:
  name: mysql
  labels:
    app: mysql
spec:
  containers:
  - name: mysql
    image: mysql:5.7
    env:
     - name: "MYSQL_ROOT_PASSWORD"
       value: "westos"
  affinity:
    podAntiAffinity:		##反親和性
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - nginx
        topologyKey: "kubernetes.io/hostname"
[[email protected] sduler]# kubectl apply -f pod.yml 
pod/nginx created
pod/mysql created
[[email protected] sduler]# kubectl get pod -o wide	##mysql和nginx不在同一節點
NAME    READY   STATUS    RESTARTS   AGE   IP            NODE      NOMINATED NODE   READINESS GATES
mysql   1/1     Running   0          14s   10.244.2.32   server4   <none>           <none>
nginx   1/1     Running   0          14s   10.244.1.37   server3   <none>           <none>
[[email protected] sduler]# 
           
kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations

5. Node屬性Taints污點、Pod屬性Tolerations容忍

5.1污點、容忍概述

NodeAffinity節點親和性,是Pod上定義的一種屬性,使Pod能夠按我們的要求排程到某個Node上,而Taints則恰恰相反,它可以讓Node拒絕運作Pod,甚至驅逐Pod。

Taints(污點)是Node的一個屬性,設定了Taints後,是以Kubernetes是不會将Pod排程到這個Node上的,于是Kubernetes就給Pod設定了個屬性Tolerations(容忍),隻要Pod能夠容忍Node上的污點,那麼Kubernetes就會忽略Node上的污點,就能夠(不是必須)把Pod排程過去。

可以使用指令 kubectl taint 給節點增加一個 taint:

kubectl taint nodes node1 key=value:NoSchedule	//建立
kubectl describe nodes  server1 |grep Taints		//查詢
kubectl taint nodes node1 key:NoSchedule-		//删除
           

其中[effect] 可取值: [ NoSchedule | PreferNoSchedule | NoExecute ]

NoSchedule:POD 不會被排程到标記為 taints 節點。

PreferNoSchedule:NoSchedule 的軟政策版本。

NoExecute:該選項意味着一旦 Taint 生效,如該節點内正在運作的 POD 沒有對應 Tolerate 設定,會直接被逐出

[[email protected] sduler]# kubectl describe nodes server2 | grep Taints
Taints:             node-role.kubernetes.io/master:NoSchedule
[[email protected] sduler]# kubectl describe nodes server3 | grep Taints
Taints:             <none>
[[email protected] sduler]# kubectl describe nodes server4 | grep Taints
Taints:             <none>
[[email protected] sduler]# 
           
kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations

部署myapp deployment示例

apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment-v1
spec:
  replicas: 2
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: myapp:v1
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - myapp
            topologyKey: kubernetes.io/hostname
[[email protected] sduler]# vim deployment.yml 
[[email protected] sduler]# kubectl apply -f deployment.yml 
deployment.apps/deployment-v1 created
[[email protected] sduler]# kubectl get pod -o wide
NAME                             READY   STATUS    RESTARTS   AGE   IP            NODE      NOMINATED NODE   READINESS GATES
deployment-v1-6498765b4b-59ncg   1/1     Running   0          22s   10.244.1.39   server3   <none>           <none>
deployment-v1-6498765b4b-rkpc5   1/1     Running   0          22s   10.244.2.34   server4   <none>           <none>
mysql                            1/1     Running   0          82m   10.244.2.32   server4   <none>           <none>
nginx                            1/1     Running   0          82m   10.244.1.37   server3   <none>           <none>
[[email protected] sduler]# 
           
kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations

5.2 污點的添加與容忍設定

tolerations中定義的key、value、effect,要與node上設定的taint保持一直:

如果 operator 是 Exists ,value可以省略。

如果 operator 是 Equal ,則key與value之間的關系必須相等。

如果不指定operator屬性,則預設值為Equal。

還有兩個特殊值:

當不指定key,再配合Exists 就能比對所有的key與value ,可以容忍所有污點。

當不指定effect ,則比對所有的effect。

添加污點

[[email protected] sduler]# kubectl taint node server3 node-role.kubernetes.io/master:NoSchedule	##給Server3節點打上taint
node/server3 tainted
[[email protected] sduler]# kubectl describe nodes server3 |grep Taints
Taints:             node-role.kubernetes.io/master:NoSchedule
[[email protected] sduler]# kubectl apply -f deployment.yml 
deployment.apps/deployment-v1 created
[[email protected] sduler]# kubectl get pod -o wide	##可以看到server3上的Pod被驅離:
NAME                             READY   STATUS    RESTARTS   AGE   IP            NODE      NOMINATED NODE   READINESS GATES
deployment-v1-6498765b4b-ds7s7   1/1     Running   0          8s    10.244.2.36   server4   <none>           <none>
deployment-v1-6498765b4b-vqcld   0/1     Pending   0          8s    <none>        <none>    <none>           <none>
           

可以看到server3上的Pod被驅離

kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations

在PodSpec中為容器設定容忍标簽:

tolerations:
      - operator: "Exists"
        effect: "NoSchedule"
           
kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations

為Pod設定容忍後,server3又可以運作Pod了

kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations

6. 影響pod排程的指令

影響Pod排程的指令還有:cordon、drain、delete,後期建立的pod都不會被排程到該節點上,但操作的暴力程度不一樣。

cordon 停止排程:

影響最小,隻會将node調為SchedulingDisabled,新建立pod,不會被排程到該節點,節點原有pod不受影響,仍正常對外提供服務。

[[email protected] sduler]# kubectl cordon server3
node/server3 cordoned
[[email protected] sduler]# kubectl get no
NAME      STATUS                     ROLES    AGE     VERSION
server2   Ready                      master   5d18h   v1.18.5
server3   Ready,SchedulingDisabled   <none>   5d18h   v1.18.5
server4   Ready                      <none>   5d18h   v1.18.5
[[email protected] sduler]# 
           
kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment-v1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: myapp:v1
[[email protected] sduler]# vim deployment.yml 
[[email protected] sduler]# kubectl apply -f deployment.yml 
deployment.apps/deployment-v1 created
[[email protected] sduler]# kubectl get pod -o wide	##server3沒有被排程
NAME                             READY   STATUS    RESTARTS   AGE   IP            NODE      NOMINATED NODE   READINESS GATES
deployment-v1-7449b5b68f-5zvj6   1/1     Running   0          12s   10.244.2.38   server4   <none>           <none>
deployment-v1-7449b5b68f-89bn5   1/1     Running   0          12s   10.244.2.40   server4   <none>           <none>
deployment-v1-7449b5b68f-rpqb4   1/1     Running   0          12s   10.244.2.39   server4   <none>           <none>
[[email protected] sduler]# 
           

應用yaml檔案後,叢集中server3節點沒有被排程

kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations

恢複server3節點的工作狀态

[[email protected] sduler]# kubectl uncordon server3
node/server3 uncordoned
[[email protected] sduler]# kubectl get no
NAME      STATUS   ROLES    AGE     VERSION
server2   Ready    master   5d18h   v1.18.5
server3   Ready    <none>   5d18h   v1.18.5
server4   Ready    <none>   5d18h   v1.18.5
[[email protected] sduler]# 
           
kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations

drain 驅逐節點:

首先驅逐node上的pod,在其他節點重新建立,然後将節點調為SchedulingDisabled。

[[email protected] sduler]# kubectl  drain server3 --ignore-daemonsets
node/server3 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-flannel-ds-amd64-zx97k, kube-system/kube-proxy-l2cz5
evicting pod kube-system/coredns-bd97f9cd9-vzw6w
pod/coredns-bd97f9cd9-vzw6w evicted
node/server3 evicted
[[email protected] sduler]# kubectl get nodes 
NAME      STATUS                     ROLES    AGE     VERSION
server2   Ready                      master   5d18h   v1.18.5
server3   Ready,SchedulingDisabled   <none>   5d18h   v1.18.5
server4   Ready                      <none>   5d18h   v1.18.5
[[email protected] sduler]# kubectl get pod -o wide
NAME                             READY   STATUS    RESTARTS   AGE     IP            NODE      NOMINATED NODE   READINESS GATES
deployment-v1-7449b5b68f-5zvj6   1/1     Running   0          7m31s   10.244.2.38   server4   <none>           <none>
deployment-v1-7449b5b68f-89bn5   1/1     Running   0          7m31s   10.244.2.40   server4   <none>           <none>
deployment-v1-7449b5b68f-rpqb4   1/1     Running   0          7m31s   10.244.2.39   server4   <none>           <none>
[[email protected] sduler]# 
           
kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations

恢複server3節點的工作狀态

kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations

delete 删除節點:

最暴力的一個,首先驅逐node上的pod,在其他節點重新建立,然後,從master節點删除該node,master失去對其控制,如要恢複排程,需進入node節點,重新開機kubelet服務

kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations
kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations
kubernetes叢集實戰——Pod排程、親和性與反親和性、Node屬性Taints、Pod屬性Tolerations

繼續閱讀