天天看點

panel.sh:一個nginx+docker的雲函和線上IDE面闆,發明你自己的paas(1)

本文關鍵字:Cannot connect to the Docker daemon at,containerd cannot properly do "clean-up" with shim process during start up,用标準方法實作的類群晖paas,with debugable appliance inside built

在前面《利用openfaas faasd在你的雲主機上部署function serverless面闆》中我們介紹了用從

https://github.com/openfaas/faasd/tree/0.9.2/cloud-config.txt

提取的腳本安裝openfaas(後來我們用上了0.9.5),和在雲主機上使用它的方法,見《在openfaas面闆上安裝onemanager1,2》,如果說這3文定位主要是基本安裝,排錯,和調試,那麼本文開始就着重于增強和提高腳本的體驗了。前3文的成果和努力依舊有效。

第一個問題,腳本要能在一台幹淨的ubuntu1804的機器上安裝,盡量一次成功,如果不能成功,那麼它也要求能多次覆寫安裝不緻于弄壞系統。這就要求腳本中安裝的元件分開,各元件包括其配置要standalone方式放置,這樣可以重裝時拔插和替換,覆寫。

第二個問題,雖然我前3文中從來沒遇到過,但是後來的嘗試中,我發現在ubuntu1804同樣的安裝方式群組件版本(v1.3.3containerd+cni0.4.0+cniplugins0.8.5+faasd0.9.5),居然gateway那個container開啟一會之後就會停止,導緻8080根本不能通路。

不廢話了,直接上新的腳本:

前置

更新了安裝說明。集中化了全局變量,注意deps prepare部分,bridge-utils是為了控制cni制造的那個openfaas0虛拟網卡用。安裝docker.io,是ubuntu上它可以同時安裝containerd1.3.3和runc

從Docker 1.11開始,Docker容器運作已經不是簡單的通過Docker daemon來啟動,而是內建了containerd、runC等多個元件。如果去搜尋一番,就會發現:docker-containerd 就是 containerd,而 docker-runc 就是 runc。containerd是真正管控容器的daemon,執行容器的時候用的是runc。

為什麼 要分的七零八散呢?為了防止docker一家獨大,docker當年的實作被拆分出了幾個标準化的子產品,,标準化的目的是子產品是可被其他實作替換的,其實也是為了實作類llvm的元件化可開發效果(軟體抽象上,源頭如果有分才能合,如果一開始就是合的就難分)。也是為了分布式效果。docker也像git一樣做分布式部件化了,分布式就是設定2個部件,cliserver,這樣在本地和遠端都可這樣架構。

而為什麼是dockerio而不是docker-ce:

事實是我還發現,有些系統上,安裝了docker-ce再安裝containerd。會導緻系統出問題Cannot connect to the Docker daemon at 。docker與containerd不相容,是以隻好安裝ubuntu維護的docker.io這種解決了containerd依賴的,它預設依賴containerd和runc(不過稍後我們會提到替換更新containerd的版本)。我們選用加入了最新cn ubuntu deb src後apt-get update得到的sudo apt install docker.io=19.03.6-0ubuntu1~18.04.1,sudo apt-cache madison docker.io出來的版本.

#!/bin/bash

## currently tested under ubuntu1804 64b,easy to be ported to centos(can be tested with replacing apt-get and /etc/systemd/system)

## How to use this script in a cloudhost
## su root and then: ./panel.sh -d 'your domain to be binded' -m 'email you use to pass to certbot' -p 'your inital passwords'(email and passwords are not neccessary,feed email only if you encount the "toomanyrequestofagivetype" error)
## (no prefix https/http needed,should bind to the right ip ahead for laster certbot working)


export DOMAIN_NAME=''
export EMAIL_NAME='[email protected]'
export PANEL_TYPE='0'
export PASS_INIT='5cTWUsD75ZgL3VJHdzpHLfcvJyOrUnza1jr6KXry5pXUUNmGtqmCZU4yGoc9yW4'

MIRROR_PATH="http://default-8g95m46n2bd18f80.service.tcloudbase.com/d/demos"
# the pai backend
SERVER_PATH=${MIRROR_PATH}/pai/pai-agent/stable/pai_agent_framework
PAI_MATE_SERVER_PATH=${MIRROR_PATH}/pai/pai-mate/stable/install
# the openfaas backend
OPENFAAS_PATH=${MIRROR_PATH}/faasd
# the code-server web ide
CODE_SERVER_PATH=${MIRROR_PATH}/codeserver

#install dir
INSTALL_DIR="/root/.local"
CONFIG_DIR="/root/.config"
# datadir only for pai and common data
DATA_DIR="/data"


while [[ $# -ge 1 ]]; do
    case $1 in
      -d|--domain)
        shift
        DOMAIN_NAME="$1"
        shift
        ;;
      -m|--mail)
        shift
        EMAIL_NAME="$1"
        shift
        ;;
      -t|--paneltype)
        shift
        PANEL_TYPE="$1"
        shift
        ;;
      -p|--passinit)
        shift
        PASS_INIT="$1"
        shift
        ;;
      *)
        if [[ "$1" != 'error' ]]; then echo -ne "\nInvaild option: '$1'\n\n"; fi
        echo -ne " Usage(args are self explained):\n\tbash $(basename $0)\t-d/--domain\n\t\t\t\t\-m/--mail\n\t\t\t\t\-t/--paneltype\n\t\t\t\t-p/--passinit\n\t\t\t\t\n"
        exit 1;
        ;;
      esac
    done

[[ "$EUID" -ne '0' ]] && echo "Error:This script must be run as root!" && exit 1;

beginTime=$(date +%s)

# write log with time
writeProgressLog() {
    echo "[`date '+%Y-%m-%d %H:%M:%S'`][$1][$2]"
    echo "[`date '+%Y-%m-%d %H:%M:%S'`][$1][$2]" >> ${DATA_DIR}/h5/access.log
}

# update install progress
updateProgress() {
    progress=$1
    message=$2
    status=$3
    installType=$4

    # echo "=====================$installType progress======================="
    echo "=======================$installType progress=======================" >> ${DATA_DIR}/h5/access.log
    writeProgressLog "installType" $installType
    writeProgressLog "progress" $progress
    writeProgressLog "status" $status
    echo $message >> ${DATA_DIR}/h5/access.log

    if [ $status == "0" ]; then
      code=0
      message="success"
    else
      code=1
      message="$installType error"
      # exit 1
    fi

    cat << EOF > ${DATA_DIR}/h5/progress.json
{
    "code": $code,
    "message": "$message",
    "data": {
        "installType": "$installType",
        "progress": $progress
    }
}
EOF

    if [ $status == "0" ]; then
      code=0
      message="success"
    else
      code=1
      message="$installType error"
      # exit 1
    fi

    if [ $status != "0" ]; then
      echo $message >> ${DATA_DIR}/h5/installErr.log
    fi
}

echo "=====================begin .....====================="
echo "PANEL_TYPE: ${PANEL_TYPE}"
echo "DOMAIN_NAME: ${DOMAIN_NAME}"
echo "SERVER_PATH: ${MIRROR_PATH}"
echo "OPENFAAS_PATH: ${OPENFAAS_PATH}"
echo "PAI_MATE_SERVER_PATH: ${PAI_MATE_SERVER_PATH}"
echo "CODE_SERVER_PATH: ${CODE_SERVER_PATH}"
echo "INSTALL_DIR: ${INSTALL_DIR}"


rm -rf ${DATA_DIR}/h5
mkdir -p ${DATA_DIR}/h5
rm -rf ${DATA_DIR}/h5/index.json

rm -rf ${DATA_DIR}/logs
mkdir -p ${DATA_DIR}/logs

mkdir -p ${INSTALL_DIR}/bin
mkdir -p ${CONFIG_DIR}

echo "=====================deps prepare progress(this may take long...)======================="
msg=$( #begin
    if [ $PANEL_TYPE == "0" ]; then

        apt-key adv --recv-keys --keyserver keyserver.Ubuntu.com 3B4FE6ACC0B21F32
        echo deb http://cn.archive.ubuntu.com/ubuntu/ bionic main restricted universe multiverse >> /etc/apt/sources.list
        echo deb http://cn.archive.ubuntu.com/ubuntu/ bionic-security main restricted universe multiverse >> /etc/apt/sources.list
        echo deb http://cn.archive.ubuntu.com/ubuntu/ bionic-updates main restricted universe multiverse >> /etc/apt/sources.list
        echo deb http://cn.archive.ubuntu.com/ubuntu/ bionic-proposed main restricted universe multiverse >> /etc/apt/sources.list
        echo deb http://cn.archive.ubuntu.com/ubuntu/ bionic-backports main restricted universe multiverse >> /etc/apt/sources.list
        apt-get update
        apt-get install docker.io=19.03.6-0ubuntu1~18.04.1 --no-install-recommends bridge-utils -y
        apt-get install nginx python git python-certbot-nginx -y
        # sed '1{:a;N;5!b a};$d;N;P;D' -i /etc/apt/sources.list
        # apt-get update

    else
        apt-get update && apt-get install git nginx gcc python3.6 python3-pip python3-virtualenv python-certbot-nginx golang -y
    fi 2>&1)
status=$?
updateProgress 30 "$msg" "$status" "deps prepare"           

基礎元件代碼:nginx front and docker backend

這部分雖然寫死了各條轉發。但重點在于如何根據具體的轉發需要布置代碼。這裡的理論在于:如果代理伺服器位址(proxy_pass後面那個)中是帶有URI的,此URI會替換掉 location 所比對的URI部分。 而如果代理伺服器位址中是不帶有URI的,則會用完整的請求URL來轉發到代理伺服器。

confignginx() {

    echo "=====================certbot renew+start+init progress======================="
    systemctl enable nginx.service
    systemctl start nginx

    #cp -f /lib/systemd/system/certbot.service /etc/systemd/system/certbot-renew.service
    #echo '[Install]' >> /etc/systemd/system/certbot-renew.service
    #echo 'WantedBy=multi-user.target' >> /etc/systemd/system/certbot-renew.service
    #cp -f /lib/systemd/system/certbot.timer /etc/systemd/system/certbot-renew.timer

    # sed -i "s/renew/renew --nginx/g" /etc/systemd/system/certbot-renew.service


    rm -rf /etc/systemd/system/certbot-renew.service
    cat << 'EOF' > /etc/systemd/system/certbot-renew.service

[Unit]
Description=Certbot
Documentation=file:///usr/share/doc/python-certbot-doc/html/index.html
Documentation=https://letsencrypt.readthedocs.io/en/latest/
[Service]
Type=oneshot
ExecStart=/usr/bin/certbot -q renew
PrivateTmp=true
[Install]
WantedBy=multi-user.target
EOF

    rm -rf /etc/systemd/system/certbot-renew.timer
    cat << 'EOF' > /etc/systemd/system/certbot-renew.timer

[Unit]
Description=Run certbot twice daily

[Timer]
OnCalendar=*-*-* 00,12:00:00
RandomizedDelaySec=43200
Persistent=true

[Install]
WantedBy=timers.target
EOF

    msg=$(
    #first time renew
    certbot certonly --quiet --standalone --agree-tos --non-interactive -m ${EMAIL_NAME} -d ${DOMAIN_NAME} --pre-hook "systemctl stop nginx"

    systemctl daemon-reload 
    systemctl enable certbot-renew.service
    systemctl start certbot-renew.service
    systemctl start certbot-renrew.timer 2>&1)
    status=$?
    updateProgress 40 "$msg" "$status" "certbot renew+start+init"


    echo "=====================nginx reconfig progress======================="
    # add nginx conf
    rm -rf /etc/nginx/conf.d/default.conf
    cat << 'EOF' > /etc/nginx/conf.d/default.conf

server {
    listen 443 http2 ssl;
    listen [::]:443 http2 ssl;

    server_name DOMAIN_NAME;

    ssl on;
    ssl_certificate /etc/letsencrypt/live/DOMAIN_NAME/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/DOMAIN_NAME/privkey.pem;
    ssl_session_timeout 5m;
    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
    ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:HIGH:!aNULL:!MD5:!RC4:!DHE;
    ssl_prefer_server_ciphers on;

    location / {
      proxy_pass http://localhost:PORT;

      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection upgrade;
      proxy_set_header Accept-Encoding gzip;
    }

    location /pai/ {
      proxy_pass http://localhost:5523;

      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection upgrade;
      proxy_set_header Accept-Encoding gzip;
    }

    location /faasd/ {
      proxy_pass http://localhost:8080/ui/;

      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection upgrade;
      proxy_set_header Accept-Encoding gzip;
    }

    location /codeserver/ {
      proxy_pass http://localhost:5000/;
      proxy_redirect http:// https://;
      proxy_set_header Host $host:443/codeserver;

      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection upgrade;
      proxy_set_header Accept-Encoding gzip;
    }

}

server {
    listen 80;
    server_name DOMAIN_NAME;

    if ($host = DOMAIN_NAME) {
        return 301 https://$host$request_uri;
    }

    return 404;
}
EOF

    sed -i "s#DOMAIN_NAME#${DOMAIN_NAME}#g" /etc/nginx/conf.d/default.conf

    if [ $PANEL_TYPE == "0" ]; then
        sed -i "s#PORT#8080/functions/#g" /etc/nginx/conf.d/default.conf
    else
        sed -i "s#PORT#3000#g" /etc/nginx/conf.d/default.conf
    fi



    # restart nginx
    msg=$( #begin
    [[ $(systemctl is-active nginx.service) == "activating" ]] && systemctl reload nginx.service
    systemctl restart nginx 2>&1)
    status=$?
    updateProgress 50 "$msg" "$status" "nginx reconfig"

}

confignginx           

為了讓docker能覆寫安裝,接下來腳本開頭處邏輯的清空了配置,這裡的重點問題是containerd與cni,與openfaasd的複雜關系:

Container Network Interface (CNI) 最早是由CoreOS發起的容器網絡規範,是Kubernetes網絡插件的基礎。其基本思想為:Container Runtime在建立容器時,先建立好network namespace,然後調用CNI插件為這個netns配置網絡,其後再啟動容器内的程序。現已加入CNCF,成為CNCF主推的網絡模型。CNI負責了在容器建立或删除期間的所有與網絡相關的操作,它将建立所有規則以確定從容器進和出的網絡連接配接正常,但它并不負責設定網絡媒體,例如建立網橋或分發路由以連接配接位于不同主機中的容器。

這個工作由openfaasd等完成。docker的這些元件->containerd+cni+ctr+runc,是由faasd來配置運作的。單獨啟動第一次安裝完的containerd+cni+ctr+runc并不會啟動cni和開啟網卡(單獨啟動containerd提示cni conf not found沒關系它依然會啟動),需要openfaasd中的動作給後者帶來cni和網卡配置。但這種結合很緊密,使得接下來容器的完全清理工作有難度。

對于容器的清除,用ctr tasks kill && ctr tasks delete && ctr container delete可以看到ps aux|grep manual看到主機空間的shim任務和/proc/id号/ns都被删掉了,但還是某些地方有殘留。這是因為這二者很難分開,shim開啟的task關聯容器和/var/run/containerd無法清理,導緻前者很難單獨拔插/進行配置解除安裝,也難于在下一次覆寫安裝時能從0全新開始。

而這其實是一個bug導緻的,containerd cannot properly do "clean-up" with shim process during start up? #3971(

https://github.com/containerd/containerd/issues/3971

),直到1.4.0beta才被解決(

https://github.com/containerd/containerd/pull/4100/commits/488d6194f2080709d9667e00ff244fbdc7ff95b2

),但我測試了(cd /var/lib/faasd/ faasd up),隻是效果好點,1.3.3是提示id exists不能重建container,1.40是提示/run/container下的files exists,同樣沒解決完全清理以全新覆寫安裝containerd的需求,是以我腳本中提示了“containerd install+start progress(this may hang long and if you over install the script you may encount /run/containerd device busy error,for this case you need to reboot to fix after scripts finished”,這個基本如果你遇到了var/run删不掉錯誤,等安裝程式跑完,重新開機即可。

是以我選擇了1.40的containerd,它也同時解決了我開頭提到的,gateway失效的問題。用的cni plugins還是0.8.5,本來想用那個cri-containerd-cni-1.4.0-linux-amd64.tar.gz,但裡面的cni是0.7.1,與faasd要求的0.4.0不符。

對于cni的解除安裝和清除,則不屬于ctr的能控制範疇,cni沒有主控端上的控制,除非将程序網絡命名空間恢複到主機目錄,或在在容器網絡空間内運作IP指令來檢查網絡接口是否已正确設定,都挺麻煩,用上面删容器的ctr tasks kill && ctr tasks delete && ctr container delete三部曲删可以看到ifconfig中五個task對應的虛拟網卡也被幹掉了,是以我也就沒有再深入研究cni的解除安裝邏輯。

configdocker() {



    [[ $(systemctl is-active faasd-provider) == "activating" ]] && systemctl stop faasd-provider
    [[ $(systemctl is-active faasd) == "activating" ]] && systemctl stop faasd
    [[ $(systemctl is-active containerd) == "activating" ]] && ctr image remove docker.io/openfaas/basic-auth-plugin:0.18.18 docker.io/library/nats-streaming:0.11.2 docker.io/prom/prometheus:v2.14.0 docker.io/openfaas/gateway:0.18.18 docker.io/openfaas/queue-worker:0.11.2  && for i in basic-auth-plugin nats prometheus gateway queue-worker; do ctr tasks kill -s SIGKILL $i;ctr tasks delete $i;ctr container delete $i; done && systemctl stop containerd && sleep 10
    ps -ef|grep containerd|awk '{print $2}'|xargs kill -9
    rm -rf /var/run/containerd /run/containerd

    [[ ! -z "$(brctl show|grep openfaas0)" ]] && ifconfig openfaas0 down && brctl delbr openfaas0
    rm -rf /etc/cni

    echo "===============================cniplugins installonly================================="
    msg=$( #begin

    if [ ! -f "/tmp/cni-plugins-linux-amd64-v0.8.5.tar.gz" ]; then
        wget --no-check-certificate -qO- ${MIRROR_PATH}/docker/containernetworking/plugins/v0.8.5/cni-plugins-linux-amd64-v0.8.5.tar.gz > /tmp/cni-plugins-linux-amd64-v0.8.5.tar.gz
    fi

    mkdir -p /opt/cni/bin
    tar -xf /tmp/cni-plugins-linux-amd64-v0.8.5.tar.gz -C /opt/cni/bin

    /sbin/sysctl -w net.ipv4.conf.all.forwarding=1 2>&1)
    status=$?
    updateProgress 50 "$msg" "$status" "cniplugins installonly"


    echo "======containerd install+start progress(this may hang long and if you over install the script you may encount /run/containerd device busy error,for this case you need to reboot to fix after scripts finished)====="
    msg=$( #begin
    # del original deb by docker.io
    rm -rf /usr/bin/containerd* /usr/bin/ctr

    # replace with new bins
    if [ ! -f "/tmp/containerd-1.4.0-linux-amd64.tar.gz" ]; then
        wget --no-check-certificate -qO- ${MIRROR_PATH}/docker/containerd/v1.4.0/containerd-1.4.0-linux-amd64.tar.gz > /tmp/containerd-1.4.0-linux-amd64.tar.gz
    fi

    tar -xf /tmp/containerd-1.4.0-linux-amd64.tar.gz -C ${INSTALL_DIR}/bin/ --strip-components=1 && ln -sf ${INSTALL_DIR}/bin/containerd* /usr/local/bin/ && ln -sf ${INSTALL_DIR}/bin/ctr /usr/local/bin/ctr

    rm -rf /etc/systemd/system/containerd.service
    cat << 'EOF' > /etc/systemd/system/containerd.service

[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target local-fs.target
#After=network.target containerd.socket containerd.service
#Requires=containerd.socket containerd.service

[Service]
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/containerd

Type=notify
Delegate=yes
KillMode=process 
#changed to mixed to let systemctl stop containerd kill shims
#KillMode=mixed
Restart=always
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity
LimitNOFILE=1048576
# Comment TasksMax if your systemd version does not supports it.
# Only systemd 226 and above support this version.
TasksMax=infinity

[Install]
WantedBy=multi-user.target
EOF

    systemctl daemon-reload && systemctl enable containerd 
    systemctl start containerd --no-pager 2>&1)
    status=$?
    updateProgress 50 "$msg" "$status" "containerd install+start"



}

configdocker           

未來等containerd的這個bug徹底解決或許有可能讓containerd的shim task實作徹底停止和移除。來說點别的,還記得我在《enginx》中對openresty可以腳本程式設計轉發連結遊戲伺服器的多元件叢集,形成demo based programming的能力的設想嗎(類似組成openfaas的五個containers,是組建一個單節點叢集分布式的典型職責機關。有驗證有網關,有業務)。還有基于jupyter的engitor,那麼現在,我們用openfaas+vscodeonline來實作它們。我們知道openfaas這種就是建構一個分布式函數的“城市”,讓城市組成的世界在二進制級,互相調用分布式API,進行demo組合,構成應用。是真正的demo積木程式設計。因為它可以Turn Any CLI into a Function,甚至是本地native cli。比如它能使shell完全變成分布式語言。直接在二進制上程式設計。

(此處不設回複,掃碼到微信參與留言,或直接點選到原文)

繼續閱讀