天天看點

Kubernetes內建Calico + 遇到的問題1.背景2.安裝3.Troubleshooting4.小結

1.背景

原來在使用kubernetes(v1.6.2)叢集網絡時,一直使用flannel,今天嘗 試使用calico(v2.5.1)三層網絡路由模式進行部署安裝。

2.安裝

完全參考官網手工搭模組化式(Integration Guide)即可正常安裝,也可以選擇官方推薦的hosted方式更加簡潔。官方指南:

https://docs.projectcalico.or...

3.Troubleshooting

(1)kubelet配置

第一個小插曲是kubelet的配置問題:

--cni-bin-dir=/opt/cni/bin           

官網說如果是v1.4.0以上版本,cni-conf-dir和cni-bin-dir參數已經not supported了,需要使用--network-plugin-dir=/etc/cni/net.d參數配置。但是實際部署時,發現如果不配置cni-bin-dir程式會報錯:

Sep 25 12:28:42 datanode10 kubelet: E0925 12:28:42.979143 176648 remote_runtime.go:109] StopPodSandbox "d00d2e5c3579775ded32f9dc0099510b879c4facedafce462a7e07ce1700090f" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "kubernetes-dashboard-2445590520-xd1gv_kube-system" network: failed to find plugin "calico" in path [/etc/cni/net.d /opt/calico/bin]
           

kubernetes預設(可能是待研究)會查找/opt/calico/bin路徑,配置好cni-bin-dir後不再報錯啦。

(2)calico啟動參數

calicoctl啟動時的指令是:

calicoctl node run           

比較有價值的參數是--node-image和--ip。第一次安裝時安裝官方文檔指定了--node-image=quay.io/calico/node:v2.5.1,沒有指定--ip,然後calico一直不能正常建立連接配接,使用calicoctl node diags檢視日志/tmp/calico230225863/diagnostics/logs/bird/current(需要自行解壓,在/tmp目錄下):

2017-09-25_05:09:34.42961 bird: BGP: Unexpected connect from unknown address 10.1.8.103 (port 37890)
2017-09-25_05:09:39.43958 bird: BGP: Unexpected connect from unknown address 10.1.8.103 (port 51894)
2017-09-25_05:09:44.45164 bird: BGP: Unexpected connect from unknown address 10.1.8.103 (port 33414)
2017-09-25_05:09:48.46058 bird: BGP: Unexpected connect from unknown address 10.1.8.103 (port 57101)
2017-09-25_05:09:52.46944 bird: BGP: Unexpected connect from unknown address 10.1.8.103 (port 46725)
2017-09-25_05:09:57.47937 bird: BGP: Unexpected connect from unknown address 10.1.8.103 (port 55515)
2017-09-25_05:10:02.49236 bird: BGP: Unexpected connect from unknown address 10.1.8.103 (port 40148)
2017-09-25_05:10:06.50045 bird: BGP: Unexpected connect from unknown address 10.1.8.103 (port 44207)
2017-09-25_05:10:10.51034 bird: BGP: Unexpected connect from unknown address 10.1.8.103 (port 56880)
2017-09-25_05:10:15.52110 bird: BGP: Unexpected connect from unknown address 10.1.8.103 (port 33334)
2017-09-25_05:10:19.53213 bird: BGP: Unexpected connect from unknown address 10.1.8.103 (port 41658)
2017-09-25_05:10:24.54219 bird: BGP: Unexpected connect from unknown address 10.1.8.103 (port 38064)
2017-09-25_05:10:29.55189 bird: BGP: Unexpected connect from unknown address 10.1.8.103 (port 60790)
2017-09-25_05:10:33.56258 bird: BGP: Unexpected connect from unknown address 10.1.8.103 (port 50308)
2017-09-25_05:10:37.57183 bird: BGP: Unexpected connect from unknown address 10.1.8.103 (port 38529)
2017-09-25_05:10:42.58203 bird: BGP: Unexpected connect from unknown address 10.1.8.103 (port 52686)
2017-09-25_05:10:47.59242 bird: BGP: Unexpected connect from unknown address 10.1.8.103 (port 38432)
2017-09-25_05:10:52.60446 bird: BGP: Unexpected connect from unknown address 10.1.8.103 (port 36509)
2017-09-25_05:10:56.61362 bird: BGP: Unexpected connect from unknown address 10.1.8.103 (port 41643)           

執行指令calicoctl node status後的狀态是connect和active *#$%,不是正常的established。後來仔細看了指令的内容,PEER ADDRESS被預設指定到另一個ip上,果斷添加--ip參數,完美解決。

[root@datanode10 ~]# calicoctl node status
Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+--------------+-------------------+-------+----------+-------------+
| 10.1.8.103 | node-to-node mesh | up | 10:26:38 | Established |
+--------------+-------------------+-------+----------+-------------+           

(3)ping不通

一切部署順利完成,然後開始ping啦~ 但是發現一切都ping不通...不通... 那豈不是白安裝calico了... 然後慢慢地抓包發現連本機都ping不通(容器ping主控端),但是可以從veth pair抓到imcp的資料包。

接着檢視ip forwarding也是正常配置,沒有問題。

再接下來覺得可能是iptables的問題。果然... 在iptables中發現若幹個DROP規則...一定是這裡...

:KUBE-MARK-DROP - [0:0]
-A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
-A DOCKER-ISOLATION -i docker0 -o br-dcd8c151afdd -j DROP
-A DOCKER-ISOLATION -i br-dcd8c151afdd -o docker0 -j DROP
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
-A cali-from-wl-dispatch -m comment --comment "cali:0CzWdDTPLgBBz6Ll" -m comment --comment "Unknown interface" -j DROP
-A cali-fw-cali2feb0dad56a -m comment --comment "cali:A3N5_YNzOIpK7HpM" -m conntrack --ctstate INVALID -j DROP
-A cali-fw-cali2feb0dad56a -m comment --comment "cali:80T7Du_D0q6hU5kt" -m comment --comment "Drop if no profiles matched" -j DROP
-A cali-pri-k8s_ns.default -m comment --comment "cali:yVw1jR_AsGFo9Rpe" -j DROP
-A cali-pro-k8s_ns.default -m comment --comment "cali:YnsVE8VOF5QMzGPk" -j DROP
-A cali-to-wl-dispatch -m comment --comment "cali:qpABzRADIk3YxZik" -m comment --comment "Unknown interface" -j DROP
-A cali-tw-cali2feb0dad56a -m comment --comment "cali:MGvDdzDcopAPnoXN" -m conntrack --ctstate INVALID -j DROP
-A cali-tw-cali2feb0dad56a -m comment --comment "cali:WCfHewdVqzxx80YL" -m comment --comment "Drop if no profiles matched" -j DROP           

删除calico的DROP規則後,跨主機的網絡就可以正常通信了。但是,到這裡還沒有結束,正在思考為什麼會有這些規則準備開放規則解決時,這些DROP又回來了,網絡又不通了...

沒關系繼續檢視calico的配置,此處還發現calicoctl完全模仿了kubectl的操作方式...更加友善啦~ 檢視calico的各種狀态,發現profile内容是空的,但是規則裡有cali-pri-k8s_ns.default項(pri是profile inbound,pro是profile outbound),kubernetes和calico結合時沒有自動配置上(不了解)。

# calicoctl get profile NAME            

檢視calico workloadEndpoint資訊證明節點是屬于k8s_ns.default這個profile的。

- apiVersion: v1 kind: workloadEndpoint metadata: activeInstanceID: e903eae04edbe9152d924c610121d62f02242c0791ec2e7d849975a2ea919922 labels: calico/k8s_ns: default name: eth0 node: datanode10 orchestrator: k8s workload: default.nginx spec: interfaceName: calic440f455693 ipNetworks: - 192.168.250.192/32 mac: 22:4b:61:ee:e3:54 profiles: - k8s_ns.default           

沒有就自己動手添加一個...規則全放開...然後就通了...

apiVersion: v1
kind: profile
metadata:  name: k8s_ns.default  labels:
 calico/k8s_ns: default spec:  ingress:
 - action: allow
 egress:
 - action: allow           

4.小結

以上是今天安裝部署時遇到的真實問題,暫時還不清楚是不是因為使用的不當導緻的這些問題:(,後續會再學習calico的網絡原理,研究一段calico的配置使用。

本文轉自SegmentFault-

Kubernetes內建Calico + 遇到的問題

繼續閱讀