問題背景
客戶的防火牆抓到了沒有Endpoint的Service請求,從K8S角度來說,正常情況下不應該存在這種現象的,因為沒有Endpoint的Service請求會被iptables規則reject掉才對。
分析過程
先本地環境複現,建立一個沒有後端的服務,例如grafana-service111:
[root@node01 ~]# kubectl get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 2d
kube-system grafana-service ClusterIP 10.96.78.163 <none> 3000/TCP 2d
kube-system grafana-service111 ClusterIP 10.96.52.101 <none> 3000/TCP 13s
[root@node01 ~]# kubectl get ep -A
NAMESPACE NAME ENDPOINTS AGE
default kubernetes 10.10.72.15:6443 2d
kube-system grafana-service 10.78.104.6:3000,10.78.135.5:3000 2d
kube-system grafana-service111 <none> 18s
進入一個業務Pod,并請求grafana-service111,結果請求卡住并逾時終止:
[root@node01 ~]# kubectl exec -it -n kube-system influxdb-rs1-5bdc67f4cb-lnfgt bash
root@influxdb-rs1-5bdc67f4cb-lnfgt:/# time curl http://10.96.52.101:3000
curl: (7) Failed to connect to 10.96.52.101 port 3000: Connection timed out
real 2m7.307s
user 0m0.006s
sys 0m0.008s
檢視grafana-service111的iptables規則,發現有reject規則,但從上面的實測現象看,應該是沒有生效:
[root@node01 ~]# iptables-save |grep 10.96.52.101
-A KUBE-SERVICES -d 10.96.52.101/32 -p tcp -m comment --comment "kube-system/grafana-service111: has no endpoints" -m tcp --dport 3000 -j REJECT --reject-with icmp-port-unreachable
在業務Pod容器網卡上抓包,沒有發現響應封包(不符合預期):
[root@node01 ~]# tcpdump -n -i calie2568ca85e4 host 10.96.52.101
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on calie2568ca85e4, link-type EN10MB (Ethernet), capture size 262144 bytes
20:31:34.647286 IP 10.78.166.136.39230 > 10.96.52.101.hbci: Flags [S], seq 1890821953, win 29200, options [mss 1460,sackOK,TS val 792301056 ecr 0,nop,wscale 7], length 0
在節點網卡上抓包,存在服務請求包(不符合預期):
[root@node01 ~]# tcpdump -n -i eth0 host 10.96.52.101
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
20:33:36.994881 IP 10.10.72.10.41234 > 10.96.52.101.hbci: Flags [S], seq 3530065013, win 29200, options [mss 1460,sackOK,TS val 792423403 ecr 0,nop,wscale 7], length 0
20:33:37.995298 IP 10.10.72.10.41234 > 10.96.52.101.hbci: Flags [S], seq 3530065013, win 29200, options [mss 1460,sackOK,TS val 792424404 ecr 0,nop,wscale 7], length 0
20:33:39.999285 IP 10.10.72.10.41234 > 10.96.52.101.hbci: Flags [S], seq 3530065013, win 29200, options [mss 1460,sackOK,TS val 792426408 ecr 0,nop,wscale 7], length 0
既然reject規則存在,初步懷疑可能影響該規則的元件有兩個:
- kube-proxy
- calico-node
基于上一篇《使用Kubeasz一鍵部署K8S叢集》,在最新的K8S叢集上做相同的測試,發現不存在該問題,說明該問題在新版本已經修複了。分别在K8S和Calico的issue上查詢相關問題,最後發現是Calico的bug,相關issue見參考資料[1, 2],修複記錄見參考資料[3]。
下面是新老版本的Calico處理cali-FORWARD鍊的差異點:
有問題的環境:
[root@node4 ~]# iptables -t filter -S cali-FORWARD
-N cali-FORWARD
-A cali-FORWARD -m comment --comment "cali:vjrMJCRpqwy5oRoX" -j MARK --set-xmark 0x0/0xe0000
-A cali-FORWARD -m comment --comment "cali:A_sPAO0mcxbT9mOV" -m mark --mark 0x0/0x10000 -j cali-from-hep-forward
-A cali-FORWARD -i cali+ -m comment --comment "cali:8ZoYfO5HKXWbB3pk" -j cali-from-wl-dispatch
-A cali-FORWARD -o cali+ -m comment --comment "cali:jdEuaPBe14V2hutn" -j cali-to-wl-dispatch
-A cali-FORWARD -m comment --comment "cali:12bc6HljsMKsmfr-" -j cali-to-hep-forward
-A cali-FORWARD -m comment --comment "cali:MH9kMp5aNICL-Olv" -m comment --comment "Policy explicitly accepted packet." -m mark --mark 0x10000/0x10000 -j ACCEPT
//問題在這最後這一條規則,新版本的calico把這條規則移到了FORWARD鍊
正常的環境:
[root@node01 ~]# iptables -t filter -S cali-FORWARD
-N cali-FORWARD
-A cali-FORWARD -m comment --comment "cali:vjrMJCRpqwy5oRoX" -j MARK --set-xmark 0x0/0xe0000
-A cali-FORWARD -m comment --comment "cali:A_sPAO0mcxbT9mOV" -m mark --mark 0x0/0x10000 -j cali-from-hep-forward
-A cali-FORWARD -i cali+ -m comment --comment "cali:8ZoYfO5HKXWbB3pk" -j cali-from-wl-dispatch
-A cali-FORWARD -o cali+ -m comment --comment "cali:jdEuaPBe14V2hutn" -j cali-to-wl-dispatch
-A cali-FORWARD -m comment --comment "cali:12bc6HljsMKsmfr-" -j cali-to-hep-forward
-A cali-FORWARD -m comment --comment "cali:NOSxoaGx8OIstr1z" -j cali-cidr-block
下面是在最新的K8S叢集上做相同的測試記錄,可以跟異常環境做對比。
模拟一個業務請求pod:
[root@node01 home]# kubectl run busybox --image=busybox-curl:v1.0 --image-pull-policy=IfNotPresent -- sleep 300000
pod/busybox created
[root@node01 home]# kubectl get pod -A -owide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE default busybox 1/1 Running 0 14h 10.78.153.73 10.10.11.49
模拟一個業務響應服務metrics-server111,并且該服務無後端endpoint:
[root@node01 home]# kubectl get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.68.0.1 <none> 443/TCP 18h
kube-system dashboard-metrics-scraper ClusterIP 10.68.174.38 <none> 8000/TCP 17h
kube-system kube-dns ClusterIP 10.68.0.2 <none> 53/UDP,53/TCP,9153/TCP 17h
kube-system kube-dns-upstream ClusterIP 10.68.41.41 <none> 53/UDP,53/TCP 17h
kube-system kubernetes-dashboard NodePort 10.68.160.45 <none> 443:30861/TCP 17h
kube-system metrics-server ClusterIP 10.68.65.249 <none> 443/TCP 17h
kube-system metrics-server111 ClusterIP 10.68.224.53 <none> 443/TCP 14h
kube-system node-local-dns ClusterIP None <none> 9253/TCP 17h
[root@node01 ~]# kubectl get ep -A
NAMESPACE NAME ENDPOINTS AGE
default kubernetes 172.28.11.49:6443 18h
kube-system dashboard-metrics-scraper 10.78.153.68:8000 18h
kube-system kube-dns 10.78.153.67:53,10.78.153.67:53,10.78.153.67:9153 18h
kube-system kube-dns-upstream 10.78.153.67:53,10.78.153.67:53 18h
kube-system kubernetes-dashboard 10.78.153.66:8443 18h
kube-system metrics-server 10.78.153.65:4443 18h
kube-system metrics-server111 <none> 15h
kube-system node-local-dns 172.28.11.49:9253 18h
進入業務請求pod,做curl測試,請求立刻被拒絕(符合預期):
[root@node01 02-k8s]# kubectl exec -it busybox bash
/ # curl -i -k https://10.68.224.53:443
curl: (7) Failed to connect to 10.68.224.53 port 443 after 2 ms: Connection refused
tcpdump抓取容器網卡封包,出現tcp port https unreachable(符合預期):
tcpdump -n -i cali12d4a061371
21:54:42.697437 IP 10.78.153.73.41606 > 10.68.224.53.https: Flags [S], seq 3510100476, win 29200, options [mss 1460,sackOK,TS val 2134372616 ecr 0,nop,wscale 7], length 0
21:54:42.698804 IP 10.10.11.49> 10.78.153.73: ICMP 10.68.224.53 tcp port https unreachable, length 68
tcpdump抓取節點網卡封包,無請求從測試容器内發出叢集(符合預期);
[root@node01 bin]# tcpdump -n -i eth0 host 10.68.224.53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
2 packets received by filter
0 packets dropped by kernel
解決方案
更新Calico,要求版本>=v3.16.0。
參考資料
https://github.com/projectcalico/calico/issues/1055
https://github.com/projectcalico/calico/issues/3901
https://github.com/projectcalico/felix/pull/2424