排查過很多次pod網絡有問題的場景 ,一直沒太弄明白,pod内的網絡封包怎麼抓,今天遇到一個liveness健康檢查失敗的問題,liveness是kubelet去通路pod(發生get請求)來确認的,那麼對應的通路日志在pod内是可以看到的,是以可以嘗試抓包排查,但是pod并不能簡簡單單的使用tcpdump抓包即可,docker/kubectl cp 一個tcpdump進去也不好使,那麼,pod内的包怎麼抓呢?
容器的網絡隔離使用的是linux的network namespace ,是以我們到對應的ns裡面抓包即可,下面示範
1.檢視指定 pod 運作在哪個主控端上
拿到主控端的資訊,然後登陸上去 cn-shenzhen.192.168.0.178
# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
busybox-5fc755d7d9-nc8bz 1/1 Running 2 5d5h 172.20.2.21 cn-shenzhen.192.168.0.178 <none>
2.獲得容器的 pid
登入到對應的node上,找到對應容器的pid
# docker ps|grep busy
d5ae39bad811 busybox "sleep 360000" 29 minutes ago Up 29 minutes k8s_busybox_busybox-5fc755d7d9-nc8bz_default_b9a845f1-f09b-11e9-a7ea-00163e0e34c8_2
cdab20715cd9 registry-vpc.cn-shenzhen.aliyuncs.com/acs/pause-amd64:3.0 "/pause" 29 minutes ago Up 29 minutes k8s_POD_busybox-5fc755d7d9-nc8bz_default_b9a845f1-f09b-11e9-a7ea-00163e0e34c8_4
[root@izwz9314kt10onbwuw6odez ~]# docker inspect -f {{.State.Pid}} d5ae39bad811
6875
3.進入該容器的 network namespace
nsenter依賴util-linux,預設k8s叢集的機器已經安裝了該包
# yum -y install util-linux.x86_64
進入到對應容器的network ns裡面,并指向ip a檢視ip
# nsenter --target 6875 -n
[root@izwz9314kt10onbwuw6odez ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
3: eth0@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
link/ether d2:cc:e9:b1:f6:9d brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.20.2.21/25 brd 172.20.2.127 scope global eth0
valid_lft forever preferred_lft forever
對比一下第一步拿到的pod ip 是一樣的
4.使用 tcpdump 抓包,指定 eth0 網卡
直接抓包嘗試
tcpdump -i eth0 tcp and port 80 -vvv
新開一個視窗進入到pod裡面配合通路測試
# kubectl exec -it busybox-5fc755d7d9-nc8bz sh
/ # nc -vz www.baidu.com 80
www.baidu.com (14.215.177.38:80) open
5.退出network namespace
直接exit即可
# exit
logout
[root@izwz9314kt10onbwuw6odez ~]# ifconfig
cali4f23c4d62c6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.20.2.1 netmask 255.255.255.255 broadcast 172.20.2.1
ether 3e:65:de:6b:77:69 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
......
實戰:模拟kubelet的liveness健康檢查失敗抓包
1.起個nginx的deployment,加上一個不存在的路徑做liveness
livenessProbe:
failureThreshold: 5
httpGet:
path: /muyuan.html
port: 80
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 20
successThreshold: 1
timeoutSeconds: 1
2,确認node,并登陸擷取pid
# kubectl get pods -o wide -l app=nginx
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
nginx-deployment-basic-58d68cc69d-cg9xr 1/1 Running 0 10s 172.20.2.233 cn-shenzhen.192.168.0.131 <none>
擷取pid
# docker inspect -f {{.State.Pid}} ca32f2599302
7030
進入ns
# nsenter --target 7030 -n
3,抓到kubelet的包了
# tcpdump -i any port 80 -nnvv
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
18:33:15.996389 IP (tos 0x0, ttl 64, id 9563, offset 0, flags [DF], proto TCP (6), length 60)
172.20.2.129.34098 > 172.20.2.233.80: Flags [S], cksum 0x5dc1 (incorrect -> 0x3346), seq 2310580335, win 29200, options [mss 1460,sackOK,TS val 3575092 ecr 0,nop,wscale 9], length 0
18:33:15.996426 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
172.20.2.233.80 > 172.20.2.129.34098: Flags [S.], cksum 0x5dc1 (incorrect -> 0xdef8), seq 2455778657, ack 2310580336, win 28960, options [mss 1460,sackOK,TS val 3575092 ecr 3575092,nop,wscale 9], length 0
18:33:15.996448 IP (tos 0x0, ttl 64, id 9564, offset 0, flags [DF], proto TCP (6), length 52)
172.20.2.129.34098 > 172.20.2.233.80: Flags [.], cksum 0x5db9 (incorrect -> 0x7ead), seq 1, ack 1, win 58, options [nop,nop,TS val 3575092 ecr 3575092], length 0
18:33:15.996591 IP (tos 0x0, ttl 64, id 9565, offset 0, flags [DF], proto TCP (6), length 176)
172.20.2.129.34098 > 172.20.2.233.80: Flags [P.], cksum 0x5e35 (incorrect -> 0x3eab), seq 1:125, ack 1, win 58, options [nop,nop,TS val 3575092 ecr 3575092], length 124: HTTP, length: 124
GET /muyuan.html HTTP/1.1
Host: 172.20.2.233:80
User-Agent: kube-probe/1.12+
Accept-Encoding: gzip
Connection: close
18:33:15.996611 IP (tos 0x0, ttl 64, id 47190, offset 0, flags [DF], proto TCP (6), length 52)
172.20.2.233.80 > 172.20.2.129.34098: Flags [.], cksum 0x5db9 (incorrect -> 0x7e32), seq 1, ack 125, win 57, options [nop,nop,TS val 3575092 ecr 3575092], length 0
18:33:15.996721 IP (tos 0x0, ttl 64, id 47191, offset 0, flags [DF], proto TCP (6), length 369)
172.20.2.233.80 > 172.20.2.129.34098: Flags [P.], cksum 0x5ef6 (incorrect -> 0xe1af), seq 1:318, ack 125, win 57, options [nop,nop,TS val 3575092 ecr 3575092], length 317: HTTP, length: 317
HTTP/1.1 404 Not Found
Server: nginx/1.7.9
Date: Tue, 22 Oct 2019 10:33:15 GMT
Content-Type: text/html
Content-Length: 168
Connection: close
<html>
<head><title>404 Not Found</title></head>
<body bgcolor="white">
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.7.9</center>
</body>
</html>
18:33:15.996747 IP (tos 0x0, ttl 64, id 9566, offset 0, flags [DF], proto TCP (6), length 52)