天天看點

ceph部署手冊初始 monitor 節點cephx 認證授權副本數,應該 <= OSD 數目最小副本數PG 和 PGP 預設值隻啟用 Centos Kernel 支援的 layering 特性檔案系統調優Journal 調優Op trackerOSD ClientObjectorThrottlesOSD ThreadsNetwork,适用于多個網卡的情況

本手冊詳解講解部署、運維和使用 Ceph 的過程。

部署:涉及 Ceph 資源規劃、元件安裝&配置、狀态檢查等,提供一個高性能、高可靠性、多功能的存儲叢集;

運維:擴容、下線節點、常見問題和故障、Troubleshooting 等;

應用:詳細示範 磁盤快、對象、檔案系統 的使用方式,以及作為 K8S 持久化存儲的使用方式(PV PVC StorageClass) 等;

一、部署

本手冊講解使用 ceph-deploy 工具部署 luminous 版本 Ceph 叢集的步驟。

主機規劃如下:

IP 主機名 内容
172.27.132.65 kube-node1 mgr、mon、osd
172.27.132.66 kube-node2 osd
172.27.132.67 kube-node3 mds、osd

1. 節點初始化

節點初始化

配置軟體源

sudo yum install -y epel-release

cat << EOM > /etc/yum.repos.d/ceph.repo

[ceph-noarch]

name=Ceph noarch packages

baseurl=https://download.ceph.com/rpm-luminous/el7/noarch

enabled=1

gpgcheck=1

type=rpm-md

gpgkey=https://download.ceph.com/keys/release.asc

EOM

安裝依賴的軟體包

sudo yum install -y ntp ntpdate ntp-doc openssh-server

建立和配置 ceph 賬戶

在所有的 Ceph Node 建立 ceph 運作的專有賬戶:

sudo useradd -d /home/ceph -m ceph

sudo passwd ceph # 這裡設定密碼為 ceph

為 Ceph 使用者添加 sudo 權限:

echo “ceph ALL = (root) NOPASSWD:ALL” | sudo tee /etc/sudoers.d/ceph

sudo chmod 0440 /etc/sudoers.d/ceph

配置主機名稱

在所有節點上設定 hosts,使得各個 ceph node 可以通過 hostname 通路,注意通路自己的 hostname 的時候,不能解析到 127.0.0.1:

$ grep node /etc/hosts

172.27.132.65 kube-node1 kube-node1

172.27.132.66 kube-node2 kube-node2

172.27.132.67 kube-node3 kube-node3

關閉 SELinux

關閉 SELinux,否則後續 K8S 挂載目錄時可能報錯 Permission denied:

$ sudo setenforce 0

$ grep SELINUX /etc/selinux/config

SELINUX=disabled

修改配置檔案,永久生效;

其它

關閉 requiretty:修改 /etc/sudoers 檔案,注釋 Defaults requiretty,或設定為:Defaults:ceph !requiretty

初始化 Ceph deploy 節點

按照規則,172.27.132.65 kube-node1 節點将作為 deploy 節點。

配置 kube-node1 節點的 ceph 賬戶可以免密碼登陸到所有節點(包括自身):

su -l ceph

ssh-keygen -t rsa

ssh-copy-id [email protected]

ssh-copy-id [email protected]

ssh-copy-id [email protected]

設定 kube-node1 的 ceph 賬戶預設登入其它節點的使用者名為 ceph:

cat >>/home/ceph/.ssh/config <<EOF

Host kube-node1

Hostname kube-node1

User ceph

Host kube-node2

Hostname kube-node2

User ceph

Host kube-node3

Hostname kube-node3

User ceph

EOF

chmod 600 ~/.ssh/config

安裝 ceph-deploy 工具:

sudo yum update

sudo yum install ceph-deploy

2. 部署 monitor 節點

建立 Ceph 叢集和部署 monitor 節點

如果未指明,本文檔中的所有操作均在 deploy 節點 ceph 使用者家目錄 (/home/ceph) 下進行。

建立 deploy 工作目錄,儲存安裝過程中生成的檔案:

su -l ceph

mkdir my-cluster

cd my-cluster

建立 ceph 叢集

建立名為 ceph 的叢集:

[[email protected] my-cluster]$ ceph-deploy new kube-node1 # 參數為初始的 monitor 節點(實際上隻是在目前目錄下生成 ceph.conf 和 ceph.mon.keyring 檔案)

輸出:

[ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph/.cephdeploy.conf

[ceph_deploy.cli][INFO ] Invoked (2.0.0): /bin/ceph-deploy new kube-node1

[ceph_deploy.new][DEBUG ] Creating new cluster named ceph

[ceph_deploy.new][INFO ] making sure passwordless SSH succeeds

[kube-node1][DEBUG ] connection detected need for sudo

[kube-node1][DEBUG ] connected to host: kube-node1

[kube-node1][DEBUG ] detect platform information from remote host

[kube-node1][DEBUG ] detect machine type

[kube-node1][DEBUG ] find the location of an executable

[kube-node1][INFO ] Running command: sudo /usr/sbin/ip link show

[kube-node1][INFO ] Running command: sudo /usr/sbin/ip addr show

[kube-node1][DEBUG ] IP addresses found: [u’172.30.53.0’, u’172.30.53.1’, u’172.27.132.65’]

[ceph_deploy.new][DEBUG ] Resolving host kube-node1

[ceph_deploy.new][DEBUG ] Monitor kube-node1 at 172.27.132.65

[ceph_deploy.new][DEBUG ] Monitor initial members are [‘kube-node1’]

[ceph_deploy.new][DEBUG ] Monitor addrs are [‘172.27.132.65’]

[ceph_deploy.new][DEBUG ] Creating a random mon key…

[ceph_deploy.new][DEBUG ] Writing monitor keyring to ceph.mon.keyring…

[ceph_deploy.new][DEBUG ] Writing initial config to ceph.conf…

指令結束後,在目前工作目錄自動生成了叢集配置檔案 ceph.conf、日志檔案和用于 bootstrap monitor 節點的 ceph.mon.keyring 檔案:

[[email protected] my-cluster]$ ls .

ceph.conf ceph-deploy-ceph.log ceph.mon.keyring

修改 ceph.conf 檔案中的預設配置,最終結果如下:

[[email protected] my-cluster]$ cat ceph.conf

[global]

fsid = 0dca8efc-5444-4fa0-88a8-2c0751b47d28

初始 monitor 節點

mon_initial_members = kube-node1

mon_host = 10.64.3.9

cephx 認證授權

auth_cluster_required = cephx

auth_service_required = cephx

auth_client_required = cephx

副本數,應該 <= OSD 數目

osd pool default size = 3

最小副本數

osd pool default min size = 1

PG 和 PGP 預設值

osd pool default pg num = 128

osd pool default pgp num = 128

隻啟用 Centos Kernel 支援的 layering 特性

rbd_default_features = 1

osd crush chooseleaf type = 1

max mds = 5

mds max file size = 100000000000000

mds cache size = 1000000

檔案系統調優

osd_mkfs_type = xfs

osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k,delaylog

osd_mkfs_options_xfs = -f -i size=2048

Journal 調優

journal_max_write_entries = 1000

journal_queue_max_ops = 3000

journal_max_write_bytes = 1048576000

journal_queue_max_bytes = 1048576000

Op tracker

osd_enable_op_tracker = false

OSD Client

osd_client_message_size_cap = 0

osd_client_message_cap = 0

Objector

objecter_inflight_ops = 102400

objector_inflight_op_bytes = 1048576000

Throttles

ms_dispatch_throttle_bytes = 1048576000

OSD Threads

osd_op_threads = 32

osd_op_num_shards = 5

osd_op_num_threads_per_shard = 2

Network,适用于多個網卡的情況

public network = 10.0.0.0/8 # 适用于 ceph client 與叢集間的通信

cluster network = 10.0.0.0/8 # 适用于 ceph OSD 之間的資料傳輸和通信

在所有節點上安裝 ceph 軟體包(ceph 和 ceph-radosgw):

–release 參數指定安裝的版本為 luminous(不指定時預設為 jewel):

[[email protected] my-cluster]$ ceph-deploy install --release luminous kube-node1 kube-node2 kube-node3

部署 monitor 節點

初始化 ceph-deploy new kube-node1 指令指定的初始 monitor 節點:

[[email protected] my-cluster]$ ceph-deploy mon create-initial # create-initial/stat/remove

輸出:

kube-node1][INFO ] monitor: mon.kube-node1 is running

[kube-node1][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.kube-node1.asok mon_status

[ceph_deploy.mon][INFO ] processing monitor mon.kube-node1

[kube-node1][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.kube-node1.asok mon_status

[ceph_deploy.mon][INFO ] mon.kube-node1 monitor has reached quorum!

[ceph_deploy.mon][INFO ] all initial monitors are running and have formed quorum

[ceph_deploy.mon][INFO ] Running gatherkeys…

[kube-node1][INFO ] Running command: sudo /usr/bin/ceph --connect-timeout=25 --cluster=ceph --admin-daemon=/var/run/ceph/ceph-mon.kube-node1.asok mon_status

[kube-node1][INFO ] Running command: sudo /usr/bin/ceph --connect-timeout=25 --cluster=ceph --name mon. --keyring=/var/lib/ceph/mon/ceph-kube-node1/keyring auth get client.admin

[kube-node1][INFO ] Running command: sudo /usr/bin/ceph --connect-timeout=25 --cluster=ceph --name mon. --keyring=/var/lib/ceph/mon/ceph-kube-node1/keyring auth get client.bootstrap-mds

[kube-node1][INFO ] Running command: sudo /usr/bin/ceph --connect-timeout=25 --cluster=ceph --name mon. --keyring=/var/lib/ceph/mon/ceph-kube-node1/keyring auth get client.bootstrap-mgr

[kube-node1][INFO ] Running command: sudo /usr/bin/ceph --connect-timeout=25 --cluster=ceph --name mon. --keyring=/var/lib/ceph/mon/ceph-kube-node1/keyring auth get-or-create client.bootstrap-mgr mon allow profile bootstrap-mgr

[kube-node1][INFO ] Running command: sudo /usr/bin/ceph --connect-timeout=25 --cluster=ceph --name mon. --keyring=/var/lib/ceph/mon/ceph-kube-node1/keyring auth get client.bootstrap-osd

[kube-node1][INFO ] Running command: sudo /usr/bin/ceph --connect-timeout=25 --cluster=ceph --name mon. --keyring=/var/lib/ceph/mon/ceph-kube-node1/keyring auth get client.bootstrap-rgw

[ceph_deploy.gatherkeys][INFO ] Storing ceph.client.admin.keyring

[ceph_deploy.gatherkeys][INFO ] Storing ceph.bootstrap-mds.keyring

[ceph_deploy.gatherkeys][INFO ] Storing ceph.bootstrap-mgr.keyring

[ceph_deploy.gatherkeys][INFO ] keyring ‘ceph.mon.keyring’ already exists

[ceph_deploy.gatherkeys][INFO ] Storing ceph.bootstrap-osd.keyring

[ceph_deploy.gatherkeys][INFO ] Storing ceph.bootstrap-rgw.keyring

[ceph_deploy.gatherkeys][INFO ] Destroy temp directory /tmp/tmpw0Skr7

指令執行結束後,會在目前目錄生成初始化 mds、osd、rgw 的 keyring 檔案 {cluster-name}.bootstrap-{type}.keyring,同時建立了 client.admin 使用者和它的 keyring,這些 keyring 都已經儲存到叢集中,供後續部署相應節點時使用:

[[email protected] my-cluster]$ ls -l

total 476

-rw------- 1 ceph ceph 113 Jul 5 16:08 ceph.bootstrap-mds.keyring

-rw------- 1 ceph ceph 71 Jul 5 16:08 ceph.bootstrap-mgr.keyring

-rw------- 1 ceph ceph 113 Jul 5 16:08 ceph.bootstrap-osd.keyring

-rw------- 1 ceph ceph 113 Jul 5 16:08 ceph.bootstrap-rgw.keyring

-rw------- 1 ceph ceph 129 Jul 5 16:08 ceph.client.admin.keyring

-rw-rw-r-- 1 ceph ceph 201 Jul 5 14:15 ceph.conf

-rw-rw-r-- 1 ceph ceph 456148 Jul 5 16:08 ceph-deploy-ceph.log

-rw------- 1 ceph ceph 73 Jul 5 14:15 ceph.mon.keyring

[[email protected] my-cluster]$ ls -l /var/lib/ceph/

total 0

drwxr-x— 2 ceph ceph 26 Apr 24 00:59 bootstrap-mds

drwxr-x— 2 ceph ceph 26 Apr 24 00:59 bootstrap-mgr

drwxr-x— 2 ceph ceph 26 Apr 24 00:59 bootstrap-osd

drwxr-x— 2 ceph ceph 6 Apr 24 00:59 bootstrap-rbd

drwxr-x— 2 ceph ceph 26 Apr 24 00:59 bootstrap-rgw

drwxr-x— 2 ceph ceph 6 Apr 24 00:59 mds

drwxr-x— 3 ceph ceph 29 Apr 24 00:59 mgr

drwxr-x— 3 ceph ceph 29 Apr 24 00:59 mon

drwxr-x— 2 ceph ceph 6 Apr 24 00:59 osd

drwxr-xr-x 2 root root 6 Apr 24 00:59 radosgw

drwxr-x— 2 ceph ceph 6 Apr 24 00:59 tmp

[[email protected] my-cluster]$ ls /var/lib/ceph//

/var/lib/ceph/bootstrap-mds/ceph.keyring /var/lib/ceph/bootstrap-mgr/ceph.keyring /var/lib/ceph/bootstrap-osd/ceph.keyring /var/lib/ceph/bootstrap-rgw/ceph.keyring

/var/lib/ceph/mon/ceph-kube-node1:

done keyring kv_backend store.db systemd

推送 admin keyring 和 ceph.conf 叢集配置檔案到所有節點( /etc/ceph/ 目錄下),這樣後續執行 ceph 指令時不需要指定 monitor 位址和 ceph.client.admin.keyring 檔案路徑:

[[email protected] my-cluster]$ ceph-deploy admin kube-node1 kube-node2 kube-node3

部署 manager 節點 (luminous+ 版本才需要部署 manager 節點)

[[email protected] my-cluster] ceph-deploy mgr create kube-node1

輸出:

ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph/.cephdeploy.conf

[ceph_deploy.mgr][DEBUG ] Deploying mgr, cluster ceph hosts kube-node1:kube-node1

[kube-node1][DEBUG ] connection detected need for sudo

[kube-node1][DEBUG ] connected to host: kube-node1

[kube-node1][DEBUG ] detect platform information from remote host

[kube-node1][DEBUG ] detect machine type

[ceph_deploy.mgr][INFO ] Distro info: CentOS Linux 7.5.1804 Core

[ceph_deploy.mgr][DEBUG ] remote host will use systemd

[ceph_deploy.mgr][DEBUG ] deploying mgr bootstrap to kube-node1

[kube-node1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf

[kube-node1][INFO ] Running command: sudo ceph --cluster ceph --name client.bootstrap-mgr --keyring /var/lib/ceph/bootstrap-mgr/ceph.keyring auth get-or-create mgr.kube-node1 mon allow profile mgr osd allow * mds allow * -o /var/lib/ceph/mgr/ceph-kube-node1/keyring

[kube-node1][INFO ] Running command: sudo systemctl enable [email protected]

[kube-node1][WARNIN] Created symlink from /etc/systemd/system/ceph-mgr.target.wants/[email protected] to /usr/lib/systemd/system/[email protected]

[kube-node1][INFO ] Running command: sudo systemctl start [email protected]

[kube-node1][INFO ] Running command: sudo systemctl enable ceph.target

檢視叢集狀态

切換到 montor 節點 kube-node1,修改 keyring 檔案的權限,使非 root 可讀:

$ ssh [email protected]

[[email protected] ~]$ ls /etc/ceph/

ceph.client.admin.keyring ceph.conf rbdmap tmp018nTi

[[email protected] ~]$ ls -l /etc/ceph/ceph.client.admin.keyring

-rw------- 1 root root 129 Mar 11 23:43 /etc/ceph/ceph.client.admin.keyring

[[email protected] ~]$ sudo chmod +r /etc/ceph/ceph.client.admin.keyring

[[email protected] ~]$ ls -l /etc/ceph/ceph.client.admin.keyring

-rw-r–r-- 1 root root 129 Mar 11 23:43 /etc/ceph/ceph.client.admin.keyring

檢視目前叢集狀态:

[[email protected] my-cluster]$ ceph -s

cluster:

id: b7b9e370-ea9b-4cc0-8b09-17167c876c24

health: HEALTH_ERR

64 pgs are stuck inactive for more than 60 seconds

64 pgs stuck inactive

64 pgs stuck unclean

no osds

services:

mon: 1 daemons, quorum kube-node1

mgr: kube-node1(active)

osd: 0 osds: 0 up, 0 in

data:

pools: 1 pools, 64 pgs

objects: 0 objects, 0 bytes

usage: 0 kB used, 0 kB / 0 kB avail

pgs: 100.000% pgs not active

64 creating

檢視 monitor 節點資訊:

[[email protected] my-cluster]$ ceph mon dump

dumped monmap epoch 2

epoch 2

fsid b7b9e370-ea9b-4cc0-8b09-17167c876c24

last_changed 2018-07-05 16:34:09.194222

created 2018-07-05 16:07:57.975307

0: 172.27.132.65:6789/0 mon.kube-node1

擴容 montor 節點

安裝 luminous 版本的 ceph 軟體包:

[[email protected] my-cluster]$ ceph-deploy install --release luminous kube-node4

建立新的 mon 節點:

[[email protected] my-cluster]$ ceph-deploy mon create kube-node4

修改 ceph.conf 檔案,添加擴容的 kube-node4 節點:

[[email protected] my-cluster]$ cat ceph.conf

mon_initial_members = kube-node1,kube-node4

mon_host = 172.27.132.65,172.27.132.68

推送新的 ceph.conf 到所有節點:

[[email protected] my-cluster]$ ceph-deploy config push kube-node1 kube-node2 kube-node3 kube-node4

登入所有 monitor 節點,重新開機 ceph-mon 服務:

[[email protected] my-cluster]$ sudo systemctl restart ‘[email protected]’

删除 montor 節點

停止 montor 服務:

[[email protected] ~]$ sudo systemctl stop ‘[email protected]’

将 monitor kube-node4 從叢集删除:

[[email protected] ~]$ ceph mon remove kube-node4

修改 ceph.conf 檔案,删除 kube-node3 的 montor 資訊:

$ cat ceph.conf

mon_initial_members = kube-node1

mon_host = 172.27.132.65

推送新的 ceph.conf 到所有節點:

[[email protected] my-cluster]$ ceph-deploy config push kube-node1 kube-node2 kube-node3 kube-node4

登入所有 monitor 節點,重新開機 ceph-mon 服務:

[[email protected] my-cluster]$ sudo systemctl restart ‘[email protected]’

3. 部署 OSD 節點

部署 OSD 節點

安裝 ceph 軟體包(ceph 和 ceph-radosgw),–release 可以指定版本:

[[email protected] my-cluster]$ ceph-deploy install --release luminous kube-node1 kube-node2 kube-node3

[[email protected] my-cluster]$ ceph-deploy config push kube-node1 kube-node2 kube-node3

準備 OSD 資料盤磁盤

OSD 需要使用整塊磁盤或分區來儲存資料,如果機器已經挂載了資料盤,需要先解除安裝:

[[email protected] my-cluster]$ df -h |grep /mnt

/dev/vda3 923G 33M 923G 1% /mnt/disk01

[[email protected] my-cluster]$ sudo umount /dev/vda3 # 解除安裝資料盤分區

檢查和解除安裝所有 OSD 節點的資料盤分區;

部署 OSD 節點

[[email protected] my-cluster]$ ceph-deploy osd create --data /dev/vda3 kube-node1 kube-node2 kube-node3 # --data 指定資料盤分區

輸出:

[ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph/.cephdeploy.conf

[ceph_deploy.osd][DEBUG ] Creating OSD on cluster ceph with data device /dev/vda3

[kube-node2][DEBUG ] connection detected need for sudo

[kube-node2][DEBUG ] connected to host: kube-node2

[kube-node2][DEBUG ] detect platform information from remote host

[kube-node2][DEBUG ] detect machine type

[kube-node2][DEBUG ] find the location of an executable

[ceph_deploy.osd][INFO ] Distro info: CentOS Linux 7.3.1611 Core

[ceph_deploy.osd][DEBUG ] Deploying osd to kube-node2

[kube-node2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf

[kube-node2][WARNIN] osd keyring does not exist yet, creating one

[kube-node2][DEBUG ] create a keyring file

[kube-node2][DEBUG ] find the location of an executable

[kube-node2][INFO ] Running command: sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/vda3

[kube-node2][DEBUG ] Running command: /bin/ceph-authtool --gen-print-key

[kube-node2][DEBUG ] Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 149b2781-077f-4146-82f3-4d8061d24043

[kube-node2][DEBUG ] Running command: vgcreate --force --yes ceph-4f6edf77-637a-4752-888e-df74d102cd4e /dev/vda3

[kube-node2][DEBUG ] stdout: Wiping xfs signature on /dev/vda3.

[kube-node2][DEBUG ] stdout: Physical volume “/dev/vda3” successfully created.

[kube-node2][DEBUG ] stdout: Volume group “ceph-4f6edf77-637a-4752-888e-df74d102cd4e” successfully created

[kube-node2][DEBUG ] Running command: lvcreate --yes -l 100%FREE -n osd-block-149b2781-077f-4146-82f3-4d8061d24043 ceph-4f6edf77-637a-4752-888e-df74d102cd4e

[kube-node2][DEBUG ] stdout: Logical volume “osd-block-149b2781-077f-4146-82f3-4d8061d24043” created.

[kube-node2][DEBUG ] Running command: /bin/ceph-authtool --gen-print-key

[kube-node2][DEBUG ] Running command: mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-1

[kube-node2][DEBUG ] Running command: chown -R ceph:ceph /dev/dm-0

[kube-node2][DEBUG ] Running command: ln -s /dev/ceph-4f6edf77-637a-4752-888e-df74d102cd4e/osd-block-149b2781-077f-4146-82f3-4d8061d24043 /var/lib/ceph/osd/ceph-1/block

[kube-node2][DEBUG ] Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-1/activate.monmap

[kube-node2][DEBUG ] stderr: got monmap epoch 2

[kube-node2][DEBUG ] Running command: ceph-authtool /var/lib/ceph/osd/ceph-1/keyring --create-keyring --name osd.1 --add-key AQD24T1btVJ+MhAAu2BxGrSrv5uhSMvRIzGf3A==

[kube-node2][DEBUG ] stdout: creating /var/lib/ceph/osd/ceph-1/keyring

[kube-node2][DEBUG ] stdout: added entity osd.1 auth auth(auid = 18446744073709551615 key=AQD24T1btVJ+MhAAu2BxGrSrv5uhSMvRIzGf3A== with 0 caps)

[kube-node2][DEBUG ] Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-1/keyring

[kube-node2][DEBUG ] Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-1/

[kube-node2][DEBUG ] Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 1 --monmap /var/lib/ceph/osd/ceph-1/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-1/ --osd-uuid 149b2781-077f-4146-82f3-4d8061d24043 --setuser ceph --setgroup ceph

[kube-node2][DEBUG ] --> ceph-volume lvm prepare successful for: /dev/vda3

[kube-node2][DEBUG ] Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-4f6edf77-637a-4752-888e-df74d102cd4e/osd-block-149b2781-077f-4146-82f3-4d8061d24043 --path /var/lib/ceph/osd/ceph-1

[kube-node2][DEBUG ] Running command: ln -snf /dev/ceph-4f6edf77-637a-4752-888e-df74d102cd4e/osd-block-149b2781-077f-4146-82f3-4d8061d24043 /var/lib/ceph/osd/ceph-1/block

[kube-node2][DEBUG ] Running command: chown -R ceph:ceph /dev/dm-0

[kube-node2][DEBUG ] Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-1

[kube-node2][DEBUG ] Running command: systemctl enable [email protected]

[kube-node2][DEBUG ] stderr: Created symlink from /etc/systemd/system/multi-user.target.wants/[email protected] to /usr/lib/systemd/system/[email protected]

[kube-node2][DEBUG ] Running command: systemctl start [email protected]

[kube-node2][DEBUG ] --> ceph-volume lvm activate successful for osd ID: 1

[kube-node2][DEBUG ] --> ceph-volume lvm create successful for: /dev/vda3

[kube-node2][INFO ] checking OSD status…

[kube-node2][DEBUG ] find the location of an executable

[kube-node2][INFO ] Running command: sudo /bin/ceph --cluster=ceph osd stat --format=json

[ceph_deploy.osd][DEBUG ] Host kube-node2 is now ready for osd use.

調用 ceph-volume --cluster ceph lvm create --bluestore --data /dev/vda3 指令建立 LVM 的 VG 和 LV;

啟動 OSD 服務;

檢視 OSD 清單

[[email protected] my-cluster]$ ceph-deploy osd list kube-node1

輸出:

[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf

[ceph_deploy.cli][INFO ] Invoked (2.0.0): /bin/ceph-deploy osd list kube-node1

[ceph_deploy.cli][INFO ] ceph-deploy options:

[ceph_deploy.cli][INFO ] username : None

[ceph_deploy.cli][INFO ] verbose : False

[ceph_deploy.cli][INFO ] debug : False

[ceph_deploy.cli][INFO ] overwrite_conf : False

[ceph_deploy.cli][INFO ] subcommand : list

[ceph_deploy.cli][INFO ] quiet : False

[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f1b37400ab8>

[ceph_deploy.cli][INFO ] cluster : ceph

[ceph_deploy.cli][INFO ] host : [‘kube-node1’]

[ceph_deploy.cli][INFO ] func : <function osd at 0x7f1b37438230>

[ceph_deploy.cli][INFO ] ceph_conf : None

[ceph_deploy.cli][INFO ] default_release : False

[kube-node1][DEBUG ] connected to host: kube-node1

[kube-node1][DEBUG ] detect platform information from remote host

[kube-node1][DEBUG ] detect machine type

[kube-node1][DEBUG ] find the location of an executable

[ceph_deploy.osd][INFO ] Distro info: CentOS Linux 7.5.1804 Core

[ceph_deploy.osd][DEBUG ] Listing disks on kube-node1…

[kube-node1][DEBUG ] find the location of an executable

[kube-node1][INFO ] Running command: /usr/sbin/ceph-volume lvm list

[kube-node1][DEBUG ]

[kube-node1][DEBUG ]

[kube-node1][DEBUG ] ====== osd.0 =======

[kube-node1][DEBUG ]

[kube-node1][DEBUG ] [block] /dev/ceph-7856263d-442b-4e33-9737-b94faa68621b/osd-block-9f987e76-8640-4c3b-a9fd-06f701b63903

[kube-node1][DEBUG ]

[kube-node1][DEBUG ] type block

[kube-node1][DEBUG ] osd id 0

[kube-node1][DEBUG ] cluster fsid b7b9e370-ea9b-4cc0-8b09-17167c876c24

[kube-node1][DEBUG ] cluster name ceph

[kube-node1][DEBUG ] osd fsid 9f987e76-8640-4c3b-a9fd-06f701b63903

[kube-node1][DEBUG ] encrypted 0

[kube-node1][DEBUG ] cephx lockbox secret

[kube-node1][DEBUG ] block uuid bjGqMD-HRZe-vfXP-p2ma-Mofx-j4lK-Gcu0hA

[kube-node1][DEBUG ] block device /dev/ceph-7856263d-442b-4e33-9737-b94faa68621b/osd-block-9f987e76-8640-4c3b-a9fd-06f701b63903

[kube-node1][DEBUG ] vdo 0

[kube-node1][DEBUG ] crush device class None

檢視叢集 OSD 節點狀态

切換到 OSD 節點:

$ ssh kube-node2

檢視 OSD 程序和指令行參數:

[[email protected] ~]$ ps -elf|grep ceph-osd|grep -v grep

4 S ceph 23498 1 0 80 0 - 351992 futex_ Jul09 ? 00:08:43 /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph

調整 keryring 檔案的讀權限,允許非 root 賬号讀取:

$ ssh [email protected]

[[email protected] ~]$ ls /etc/ceph/

ceph.client.admin.keyring ceph.conf rbdmap tmp018nTi

[[email protected] ~]$ ls -l /etc/ceph/ceph.client.admin.keyring

-rw------- 1 root root 129 Mar 11 23:43 /etc/ceph/ceph.client.admin.keyring

[[email protected] ~]$ sudo chmod +r /etc/ceph/ceph.client.admin.keyring

[[email protected] ~]$ ls -l /etc/ceph/ceph.client.admin.keyring

-rw-r–r-- 1 root root 129 Mar 11 23:43 /etc/ceph/ceph.client.admin.keyring

檢視 Ceph 叢集狀态:

[[email protected] my-cluster]$ ceph -s

cluster:

id: b7b9e370-ea9b-4cc0-8b09-17167c876c24

health: HEALTH_OK

services:

mon: 1 daemons, quorum kube-node1

mgr: kube-node1(active)

osd: 3 osds: 3 up, 3 in

data:

pools: 1 pools, 64 pgs

objects: 0 objects, 0 bytes

usage: 3080 MB used, 2765 GB / 2768 GB avail

pgs: 64 active+clean

檢視 OSD 狀态:

[[email protected] ~]$ ceph osd stat

3 osds: 3 up, 3 in

檢視 OSD 節點樹:

[[email protected] ~]$ ceph osd tree

ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF

-1 2.70419 root default

-9 0 host ambari

-2 0.90140 host kube-node1

0 hdd 0.90140 osd.0 up 1.00000 1.00000

-3 0.90140 host kube-node2

1 hdd 0.90140 osd.1 up 1.00000 1.00000

-4 0.90140 host kube-node3

2 hdd 0.90140 osd.2 up 1.00000 1.00000

Dump OSD 的詳細資訊:

[[email protected] ~]$ ceph osd dump

epoch 180

fsid b7b9e370-ea9b-4cc0-8b09-17167c876c24

created 2018-07-05 16:07:58.315940

modified 2018-07-10 15:45:17.050188

flags sortbitwise,recovery_deletes,purged_snapdirs

crush_version 10

full_ratio 0.95

backfillfull_ratio 0.9

nearfull_ratio 0.85

require_min_compat_client firefly

min_compat_client firefly

require_osd_release luminous

pool 0 ‘rbd’ replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 39 flags hashpspool stripe_width 0 application rbd

removed_snaps [1~3]

max_osd 4

osd.0 up in weight 1 up_from 151 up_thru 177 down_at 148 last_clean_interval [121,147) 172.27.132.65:6800/41937 172.27.132.65:6801/41937 172.27.132.65:6802/41937 172.27.132.65:6803/41937 exists,up 9f987e76-8640-4c3b-a9fd-06f701b63903

osd.1 up in weight 1 up_from 139 up_thru 177 down_at 136 last_clean_interval [129,135) 172.27.132.66:6800/23498 172.27.132.66:6801/23498 172.27.132.66:6802/23498 172.27.132.66:6803/23498 exists,up 149b2781-077f-4146-82f3-4d8061d24043

osd.2 up in weight 1 up_from 175 up_thru 177 down_at 172 last_clean_interval [124,171) 172.27.132.67:6801/28744 172.27.132.67:6802/28744 172.27.132.67:6803/28744 172.27.132.67:6804/28744 exists,up 9aa9379e-a0a9-472f-9c5d-0c36d02c9ebf

測試 OSD 對象存儲

[[email protected] ~]$ echo {Test-data} > testfile.txt # 測試檔案

[[email protected] ~]$ ceph osd pool create mytest 8 # 建立 pool

[[email protected] ~]$ rbd pool init mytest # 初始化 pool

[[email protected] ~]$ rados put test-object-1 testfile.txt --pool=mytest # 将檔案儲存到 pool

[[email protected] ~]$ rados -p mytest ls # 檢視 pool 中對象清單

test-object-1

[[email protected] ~]$ ceph osd map mytest test-object-1 # 檢視對象映射的 PG 和 OSD

osdmap e25 pool ‘mytest’ (1) object ‘test-object-1’ -> pg 1.74dc35e2 (1.2) -> up ([1,2,0], p1) acting ([1,2,0], p1)

[[email protected] ~]$ rados rm test-object-1 --pool=mytest # 删除對象

[[email protected] ~]$ ceph osd pool rm mytest # 删除 pool

擴容 OSD 節點

$ ceph-deploy install kube-node4 # 安裝軟體包

$ ceph-deploy osd create --data /dev/vda3 kube-node4 # --data 指定資料盤分區,必須先 umount

$ ceph-deploy config push kube-node4 # 将更新的 ceph 配置檔案推送到指定的主機清單

删除 OSD 節點

$ sudo systemctl stop ‘[email protected]’ # 為目前 OSD 節點 ID

$ ceph-disk deactivate /dev/vda3

$ ceph-volume lvm zap /dev/vda3

$ ceph-disk destroy /dev/sdb # 從 ceph 中删除 osd

$ ceph osd out # 告訴 mon 這個節點已經不能服務了,需要在其他的osd上進行資料的恢複

$ ceph osd crush remove osd. # 完全從叢集的分布當中剔除掉,讓叢集的crush進行一次重新計算,之前節點還占着這個crush weight,會影響到目前主機的host crush weight

$ ceph auth del osd. # 從認證當中去删除這個節點的資訊

$ ceph osd rm # 從叢集裡面删除這個節點的記錄

$ cd my-cluster # 進入 ceph-deploy 的工作目錄

$ # 修改 ceph.conf 檔案,推送到所有其它節點

$ ceph-deploy --overwrite-conf config push ceph-1 ceph-2 ceph-3

重新開機 OSD 服務

檢視 OSD 程序 ID:

[[email protected] ~]$ ps -elf|grep osd|grep -v grep

4 S ceph 23498 1 0 80 0 - 351992 futex_ Jul09 ? 00:08:47 /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph

根據 ID 重新開機 OSD 服務:

[[email protected] ~]$ sudo systemctl restart ‘[email protected]’ # @N,N 為 OSD ID;

4. 用戶端使用 RBD

用戶端使用 RBD

在用戶端節點安裝 ceph 軟體包并拷貝 clien.admin 的 keyring 檔案:

[[email protected] my-cluster]$ ceph-deploy install --release luminous kube-node2

[[email protected] my-cluster]$ ceph-deploy config push kube-node2

調整 keryring 檔案的讀權限,允許非 root 賬号讀取:

$ ssh [email protected]

[[email protected] ~]$ ls /etc/ceph/

ceph.client.admin.keyring ceph.conf rbdmap tmp018nTi

[[email protected] ~]$ ls -l /etc/ceph/ceph.client.admin.keyring

-rw------- 1 root root 129 Mar 11 23:43 /etc/ceph/ceph.client.admin.keyring

[[email protected] ~]$ sudo chmod +r /etc/ceph/ceph.client.admin.keyring

[[email protected] ~]$ ls -l /etc/ceph/ceph.client.admin.keyring

-rw-r–r-- 1 root root 129 Mar 11 23:43 /etc/ceph/ceph.client.admin.keyring

建立 Pool 和鏡像

建立一個 Pool:

[[email protected] ~]$ ceph osd pool create mytest 8

[[email protected] ~]$ rbd pool init mytest

[[email protected] ~]$ ceph osd lspools

0 rbd,1 mytest,

在 Pool 中建立一個 4GB 的鏡像:

[[email protected] ~]$ rbd create foo --size 4096 -p mytest

[[email protected] ~]$ rbd list -p mytest

foo

[[email protected] ~]$ rbd info foo -p mytest

rbd image ‘foo’:

size 4096 MB in 1024 objects

order 22 (4096 kB objects)

block_name_prefix: rbd_data.374974b0dc51

format: 2

features: layering

flags:

create_timestamp: Thu Jul 5 17:45:09 2018

使用 RBD 鏡像

将 RBD 鏡像 foo 映射到本地塊裝置:

[[email protected] ~]$ sudo rbd map mytest/foo

/dev/rbd0

調整 RBD 磁盤的參數(結果值如下):

[[email protected] ~]$ cat /sys/block/rbd0/queue/optimal_io_size

4194304

[[email protected] ~]$ cat /sys/block/rbd0/alignment_offset

[[email protected] ~]$ cat /sys/block/rbd0/queue/physical_block_size

512

格式化塊裝置并挂載:

[[email protected] ~]$ sudo mkfs.ext4 -m0 /dev/rbd0

[[email protected] ~]$ sudo mkdir /mnt/ceph-block-device

[[email protected] ~]$ sudo mount /dev/rbd0 /mnt/ceph-block-device

[[email protected] ~]$ cd /mnt/ceph-block-device

檢視本地的映射關系:

[[email protected] ~]$ rbd showmapped

id pool image snap device

0 mytest foo - /dev/rbd0

删除 RBD 鏡像

[[email protected] ~]$ umount /dev/rbd0 # umount 磁盤

[[email protected] ~]$ rbd unmap mytest/foo # 删除磁盤映射

[[email protected] ~]$ rbd rm mytest/foo # 删除鏡像

Removing image: 100% complete…done.

使用 rbdmap 實作自動 map 和 unmap RBD 磁盤

Client 挂載 RBD Image 後,如果在關機前沒有 rbd unmap,則會 hang 在 umount 該 RBD 裝置上。

可以通過 rbdmap 解決該問題。rbdmap 是一個 shell 腳本,配置檔案:/etc/ceph/rbdmap,格式如下:

[[email protected] ~]$ cat /etc/ceph/rbdmap 
# RbdDevice        Parameters
#poolname/imagename id=client,keyring=/etc/ceph/ceph.client.keyring
mytest/foo --id admin --keyring /etc/ceph/ceph.client.admin.keyring
           

5. 部署 RGW 節點

部署 RGW 節點

安裝 ceph-radosgw 軟體包,–release 可以指定版本:

[[email protected] ~]$ ceph-deploy install --release --rgw kube-node3

輸出:

[ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph/.cephdeploy.conf

[ceph_deploy.cli][INFO ] Invoked (2.0.0): /bin/ceph-deploy install --rgw kube-node2

[kube-node2][WARNIN] altered ceph.repo priorities to contain: priority=1

[kube-node2][INFO ] Running command: sudo yum -y install ceph-radosgw

[kube-node2][DEBUG ] Loaded plugins: fastestmirror, priorities

[kube-node2][WARNIN] Not using downloaded repomd.xml because it is older than what we have:

[kube-node2][WARNIN] Current : Thu Apr 26 21:06:11 2018

[kube-node2][WARNIN] Downloaded: Fri Oct 6 01:41:59 2017

[kube-node2][WARNIN] Not using downloaded repomd.xml because it is older than what we have:

[kube-node2][WARNIN] Current : Thu Apr 26 21:02:10 2018

[kube-node2][WARNIN] Downloaded: Fri Oct 6 01:38:33 2017

[kube-node2][WARNIN] Not using downloaded repomd.xml because it is older than what we have:

[kube-node2][WARNIN] Current : Thu Apr 26 21:02:30 2018

[kube-node2][WARNIN] Downloaded: Fri Oct 6 01:38:39 2017

[kube-node2][DEBUG ] Loading mirror speeds from cached hostfile

[kube-node2][DEBUG ] * epel: mirrors.huaweicloud.com

[kube-node2][DEBUG ] 8 packages excluded due to repository priority protections

[kube-node2][DEBUG ] Package 2:ceph-radosgw-12.2.5-0.el7.x86_64 already installed and latest version

[kube-node2][DEBUG ] Nothing to do

[kube-node2][INFO ] Running command: sudo ceph --version

[kube-node2][DEBUG ] ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)

配置和啟用 RGW 節點

[email protected] my-cluster]$ ceph-deploy rgw create kube-node2 # rgw 預設監聽 7480 端口

輸出:

[ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph/.cephdeploy.conf

[ceph_deploy.rgw][DEBUG ] Deploying rgw, cluster ceph hosts kube-node2:rgw.kube-node2

[kube-node2][DEBUG ] connection detected need for sudo

[kube-node2][DEBUG ] connected to host: kube-node2

[kube-node2][DEBUG ] detect platform information from remote host

[kube-node2][DEBUG ] detect machine type

[ceph_deploy.rgw][INFO ] Distro info: CentOS Linux 7.3.1611 Core

[ceph_deploy.rgw][DEBUG ] remote host will use systemd

[ceph_deploy.rgw][DEBUG ] deploying rgw bootstrap to kube-node2

[kube-node2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf

[kube-node2][WARNIN] rgw keyring does not exist yet, creating one

[kube-node2][DEBUG ] create a keyring file

[kube-node2][INFO ] Running command: sudo ceph --cluster ceph --name client.bootstrap-rgw --keyring /var/lib/ceph/bootstrap-rgw/ceph.keyring auth get-or-create client.rgw.kube-node2 osd allow rwx mon allow rw -o /var/lib/ceph/radosgw/ceph-rgw.kube-node2/keyring

[kube-node2][INFO ] Running command: sudo systemctl enable [email protected]

[kube-node2][WARNIN] Created symlink from /etc/systemd/system/ceph-radosgw.target.wants/[email protected] to /usr/lib/systemd/system/[email protected]

[kube-node2][INFO ] Running command: sudo systemctl start [email protected]

[kube-node2][INFO ] Running command: sudo systemctl enable ceph.target

[ceph_deploy.rgw][INFO ] The Ceph Object Gateway (RGW) is now running on host kube-node2 and default port 7480

建立和啟動 ceph-radosgw 服務;

檢查監聽的端口

$ ssh [email protected]

[[email protected] ~]# netstat -lnpt|grep radosgw

tcp 0 0 0.0.0.0:7480 0.0.0.0:* LISTEN 33110/radosgw

注意:可以在 ceph.conf 配置檔案中修改 RGW 監聽的位址,然後重新開機 rgw 程序生效:

[client]

rgw frontends = civetweb port=80

$ sudo systemctl restart ceph-radosgw.service

測試對象存儲

測試端口連通性:

[[email protected] ~]$ curl http://172.27.132.66:7480

<?xml version="1.0" encoding="UTF-8"?>anonymous

建立對象存儲賬号:

[[email protected] ~]$ radosgw-admin user create --uid=demo --display-name=“ceph sgw demo user”

輸出:

{

“user_id”: “demo”,

“display_name”: “ceph sgw demo user”,

“email”: “”,

“suspended”: 0,

“max_buckets”: 1000,

“auid”: 0,

“subusers”: [],

“keys”: [

{

“user”: “demo”,

“access_key”: “BY5C4TRTAH755NH8B8K8”,

“secret_key”: “bdsqOAntwrMJAWTVGngxDPAMXrx7zalSQk8YUwIq”

}

],

“swift_keys”: [],

“caps”: [],

“op_mask”: “read, write, delete”,

“default_placement”: “”,

“placement_tags”: [],

“bucket_quota”: {

“enabled”: false,

“check_on_raw”: false,

“max_size”: -1,

“max_size_kb”: 0,

“max_objects”: -1

},

“user_quota”: {

“enabled”: false,

“check_on_raw”: false,

“max_size”: -1,

“max_size_kb”: 0,

“max_objects”: -1

},

“temp_url_keys”: [],

“type”: “rgw”

}

s 建立子賬号:

[[email protected] ~]$ radosgw-admin subuser create --uid demo --subuser=demo:swift --access=full --secret=secretkey --key-type=swift

輸出:

{

“user_id”: “demo”,

“display_name”: “ceph sgw demo user”,

“email”: “”,

“suspended”: 0,

“max_buckets”: 1000,

“auid”: 0,

“subusers”: [

{

“id”: “demo:swift”,

“permissions”: “full-control”

}

],

“keys”: [

{

“user”: “demo”,

“access_key”: “BY5C4TRTAH755NH8B8K8”,

“secret_key”: “bdsqOAntwrMJAWTVGngxDPAMXrx7zalSQk8YUwIq”

}

],

“swift_keys”: [

{

“user”: “demo:swift”,

“secret_key”: “secretkey”

}

],

“caps”: [],

“op_mask”: “read, write, delete”,

“default_placement”: “”,

“placement_tags”: [],

“bucket_quota”: {

“enabled”: false,

“check_on_raw”: false,

“max_size”: -1,

“max_size_kb”: 0,

“max_objects”: -1

},

“user_quota”: {

“enabled”: false,

“check_on_raw”: false,

“max_size”: -1,

“max_size_kb”: 0,

“max_objects”: -1

},

“temp_url_keys”: [],

“type”: “rgw”

}

生成子賬号秘鑰:

[[email protected] ~]$ radosgw-admin key create --subuser=demo:swift --key-type=swift --gen-secret

輸出:

{

“user_id”: “demo”,

“display_name”: “ceph sgw demo user”,

“email”: “”,

“suspended”: 0,

“max_buckets”: 1000,

“auid”: 0,

“subusers”: [

{

“id”: “demo:swift”,

“permissions”: “full-control”

}

],

“keys”: [

{

“user”: “demo”,

“access_key”: “BY5C4TRTAH755NH8B8K8”,

“secret_key”: “bdsqOAntwrMJAWTVGngxDPAMXrx7zalSQk8YUwIq”

}

],

“swift_keys”: [

{

“user”: “demo:swift”,

“secret_key”: “ttQcU1O17DFQ4I9xzKqwgUe7WIYYX99zhcIfU9vb”

}

],

“caps”: [],

“op_mask”: “read, write, delete”,

“default_placement”: “”,

“placement_tags”: [],

“bucket_quota”: {

“enabled”: false,

“check_on_raw”: false,

“max_size”: -1,

“max_size_kb”: 0,

“max_objects”: -1

},

“user_quota”: {

“enabled”: false,

“check_on_raw”: false,

“max_size”: -1,

“max_size_kb”: 0,

“max_objects”: -1

},

“temp_url_keys”: [],

“type”: “rgw”

}

測試 S3 bucket

安裝庫檔案

[[email protected] ~]$ sudo yum install python-boto

建立 S3 測試腳本:

import boto.s3.connection

access_key = ‘BY5C4TRTAH755NH8B8K8’

secret_key = ‘bdsqOAntwrMJAWTVGngxDPAMXrx7zalSQk8YUwIq’

conn = boto.connect_s3(

aws_access_key_id=access_key,

aws_secret_access_key=secret_key,

host=‘172.27.132.66’, port=7480,

is_secure=False, calling_format=boto.s3.connection.OrdinaryCallingFormat(),

)

bucket = conn.create_bucket(‘my-new-bucket’)

for bucket in conn.get_all_buckets():

print “{name} {created}”.format(

name=bucket.name,

created=bucket.creation_date,

)

執行 S3 測試腳本:

[[email protected] ~]$ python s3test.py

my-new-bucket 2018-07-11T12:26:05.329Z

測試 swift

安裝 swift 用戶端:

[[email protected] ~]$ sudo yum install python2-pip

[[email protected] ~]$ sudo pip install python-swiftclient

檢視 bucket 清單:

[[email protected] ~]$ swift -V 1.0 -A http://kube-node2:7480/auth -U demo:swift -K ttQcU1O17DFQ4I9xzKqwgUe7WIYYX99zhcIfU9vb list

my-new-bucket

檢視 bucket 狀态和統計資訊:

[[email protected] ~]$ radosgw-admin bucket stats --bucket my-new-bucket

{

“bucket”: “my-new-bucket”,

“zonegroup”: “b71dde3d-8d8d-4240-ad99-1c85182d3e9b”,

“placement_rule”: “default-placement”,

“explicit_placement”: {

“data_pool”: “”,

“data_extra_pool”: “”,

“index_pool”: “”

},

“id”: “1f3f02c4-fe58-4626-992b-c6c0fe4c8acf.64249.1”,

“marker”: “1f3f02c4-fe58-4626-992b-c6c0fe4c8acf.64249.1”,

“index_type”: “Normal”,

“owner”: “demo”,

“ver”: “0#1”,

“master_ver”: “0#0”,

“mtime”: “2018-07-11 20:26:05.332943”,

“max_marker”: “0#”,

“usage”: {},

“bucket_quota”: {

“enabled”: false,

“check_on_raw”: false,

“max_size”: -1,

“max_size_kb”: 0,

“max_objects”: -1

}

}

6. 部署 MDS 節點

部署 metadata 節點

在用戶端節點安裝 ceph 軟體包并拷貝 client.admin 的 keyring 檔案:

[[email protected] my-cluster]$ ceph-deploy install --release luminous kube-node3

[[email protected] my-cluster]$ ceph-deploy config push kube-node3

部署 metadata server

[[email protected] my-cluster]$ ceph-deploy mds create kube-node3

檢查 meta 服務

登入到部署的 kube-node3 節點,檢查服務和端口狀态:

$ ssh [email protected]

[[email protected] ~]# systemctl status ‘[email protected]’|grep ‘Active:’

Active: active (running) since 一 2018-07-09 15:45:08 CST; 2 days ago

[[email protected] ~]# netstat -lnpt|grep ceph-mds

tcp 0 0 0.0.0.0:6800 0.0.0.0:* LISTEN 1251/ceph-mds

確定狀态為 running,監聽端口 6800;

建立 cephfs

建立兩個池,最後的數字是 PG 的數量

[[email protected] my-cluster]# ceph osd pool create cephfs_data 100

pool ‘cephfs_data’ created

[[email protected] my-cluster]# ceph osd pool create cephfs_metadata 20

pool ‘cephfs_metadata’ created

建立 cephfs 檔案系統,注意一個 ceph 隻能建立一個cephfs 檔案系統:

[[email protected] my-cluster]# ceph fs new cephfs cephfs_metadata cephfs_data

new fs with metadata pool 3 and data pool 2

建立秘鑰檔案

查詢 client.admin 裝好的秘鑰:

[[email protected] my-cluster]$ cd ~/my-cluster/

[[email protected] my-cluster]$ grep key ceph.client.admin.keyring

key = AQDe0T1babvPNhAApxQdjXNU20vYqkDG+YWACw==

将上面的秘鑰儲存到一個檔案中,如 admin.secret:

$ cat admin.secret

[client.admin]

key = AQDe0T1babvPNhAApxQdjXNU20vYqkDG+YWACw==

挂載和使用 cephfs

挂載 cephFS 有兩種方式:

kernel driver 方式:

[[email protected] ~]$ sudo mkdir /mnt/mycephfs

[[email protected] ~]$ vi admin.secret

[[email protected] ~]$ sudo mount -t ceph kube-node1:6789:/ /mnt/mycephfs -o name=admin,secretfile=admin.secret

1.fuse 方式:

[[email protected] ~]$ sudo yum install ceph-fuse
[[email protected] ~]$ sudo mkdir ~/mycephfs
[[email protected] ~]$ sudo ceph-fuse -k ./ceph.client.admin.keyring -m kube-node1:6789 ~/mycephfs
           

二、運維

Troubleshooting

  1. 執行 ceph-deploy new 指令失敗,提示 ImportError: No module named pkg_resources

    現象:

[[email protected] my-cluster]$ ceph-deploy new kube-node1

Traceback (most recent call last):

File “/bin/ceph-deploy”, line 18, in

from ceph_deploy.cli import main

File “/usr/lib/python2.7/site-packages/ceph_deploy/cli.py”, line 1, in

import pkg_resources

ImportError: No module named pkg_resources

原因:

系統缺少 python2-pip 包。

解決辦法:

[[email protected] my-cluster]$ sudo yum install python2-pip

2. 執行 ceph-deploy disk zap 指令失敗,提示 AttributeError: ‘Namespace’ object has no attribute ‘debug’

現象:

[[email protected] my-cluster]$ ceph-deploy disk zap kube-node3 /dev/vda3

[kube-node3][DEBUG ] find the location of an executable

[ceph_deploy][ERROR ] Traceback (most recent call last):

[ceph_deploy][ERROR ] File “/usr/lib/python2.7/site-packages/ceph_deploy/util/decorators.py”, line 69, in newfunc

[ceph_deploy][ERROR ] return f(*a, **kw)

[ceph_deploy][ERROR ] File “/usr/lib/python2.7/site-packages/ceph_deploy/cli.py”, line 164, in _main

[ceph_deploy][ERROR ] return args.func(args)

[ceph_deploy][ERROR ] File “/usr/lib/python2.7/site-packages/ceph_deploy/osd.py”, line 438, in disk

[ceph_deploy][ERROR ] disk_zap(args)

[ceph_deploy][ERROR ] File “/usr/lib/python2.7/site-packages/ceph_deploy/osd.py”, line 336, in disk_zap

[ceph_deploy][ERROR ] if args.debug:

[ceph_deploy][ERROR ] AttributeError: ‘Namespace’ object has no attribute ‘debug’

原因:

代碼 bugs;

解決辦法:

sudo vim /usr/lib/python2.7/site-packages/ceph_deploy/osd.py

修改第 336 行

if args.debug:

if False:

3. 執行 ceph 指令失敗,提示 ERROR: missing keyring, cannot use cephx for authentication

現象:

[[email protected] ~]# sudo ceph-volume lvm create --data /dev/vda3

Running command: /bin/ceph-authtool --gen-print-key

Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 987317ae-38c1-4236-a621-387d30ff9d36

stderr: 2018-07-05 17:08:49.355424 7fcec57d1700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory

stderr: 2018-07-05 17:08:49.355441 7fcec57d1700 -1 monclient: ERROR: missing keyring, cannot use cephx for authentication

stderr: 2018-07-05 17:08:49.355443 7fcec57d1700 0 librados: client.bootstrap-osd initialization error (2) No such file or directory

stderr: [errno 2] error connecting to the cluster

–> RuntimeError: Unable to create a new OSD id

原因:

缺少 /var/lib/ceph/boostrap-XX/*.keyring 檔案,這些檔案預設位于 deploy 節點上。

解決辦法:

在 deploy 節點上執行 ceph-deploy osd create --data /dev/vda3 kube-node2 指令。

  1. all OSDs are running luminous or later but require_osd_release < luminous

    現象:

[[email protected] my-cluster]$ sudo ceph health

HEALTH_WARN all OSDs are running luminous or later but require_osd_release < luminous

[[email protected] my-cluster]$ ceph -s

cluster:

id: b7b9e370-ea9b-4cc0-8b09-17167c876c24

health: HEALTH_WARN

all OSDs are running luminous or later but require_osd_release < luminous

services:

mon: 1 daemons, quorum kube-node1

mgr: kube-node1(active)

osd: 3 osds: 3 up, 3 in

data:

pools: 1 pools, 64 pgs

objects: 0 objects, 0 bytes

usage: 3079 MB used, 2765 GB / 2768 GB avail

pgs: 64 active+clean

原因:

叢集版本資訊不一緻;

解決辦法:

[[email protected] my-cluster]$ ceph osd require-osd-release luminous

recovery_deletes is set

5. create pool 時提示 pg_num 值太大

現象:

[[email protected] ~]$ ceph osd pool create k8s 128 128

Error ERANGE: pg_num 128 size 3 would mean 960 total pgs, which exceeds max 600 (mon_max_pg_per_osd 200 * num_in_osds 3)

原因:

OSD 數目太少,或者 mon_max_pg_per_osd 參數值太小;

解決辦法:

擴容 OSD 示例;

調大 mon_max_pg_per_osd 參數值:

[[email protected] my-cluster]$ ceph tell injectargs ‘–mon_max_pg_per_osd=350’

injectargs:mon_max_pg_per_osd = ‘350’ (not observed, change may require restart)

參考:

https://www.wanglf.net/ceph-pg-num-is-too-large.html http://blog.51cto.com/michaelkang/1727667

  1. application not enabled on 1 pool(s)

    現象:

建立 Pool 後,叢集狀态異常:

[[email protected] my-cluster]# ceph -s

cluster:

id: b7b9e370-ea9b-4cc0-8b09-17167c876c24

health: HEALTH_WARN

application not enabled on 1 pool(s)

services:

mon: 1 daemons, quorum kube-node1

mgr: kube-node1(active), standbys: kube-node2

osd: 3 osds: 3 up, 3 in

data:

pools: 2 pools, 72 pgs

objects: 1 objects, 12 bytes

usage: 3081 MB used, 2765 GB / 2768 GB avail

pgs: 72 active+clean

原因:

建立的 Pool 沒有自動打上 application 的 tag:

或者,Pool 沒有初始化;

解決辦法:

對建立的 Pool 打 tag:

[[email protected] my-cluster]# ceph osd pool application enable mytest rbd # mytest 為 pool 的名稱

enabled application ‘rbd’ on pool ‘mytest’

或者,在建立 Pool 後進行初始化:

ceph osd pool create my-pool 8

ceph pool init my-pool

參考:

https://ceph.com/community/new-luminous-pool-tags/

  1. RBD image 映射到本地失敗:rbd: sysfs write failed

    現象:

映射 RBD 到本地失敗:

[[email protected] my-cluster]# sudo rbd map foo -p mytest

rbd: sysfs write failed

RBD image feature set mismatch. Try disabling features unsupported by the kernel with “rbd feature disable”.

In some cases useful info is found in syslog - try “dmesg | tail”.

rbd: map failed: (6) No such device or address

原因:

CentoOS kernel 不支援全部特性:[layering, striping, exclusive-lock, object-map, fast-diff, deep-flatten, journaling, data-pool]

解決方法:

disable RBD 的部分特性:

[[email protected] my-cluster]# rbd feature disable mytest/foo fast-diff,object-map,exclusive-lock,deep-flatten

[[email protected] my-cluster]# rbd info mytest/foo |grep features

features: layering

在 /etc/ceph/ceph.conf 配置檔案中添加配置 rbd_default_features = 3,這樣建立的 RBD 均隻啟用 layering 特性:

features 是如下幾個值的和:

+1 for layering, +2 for stripingv2, +4 for exclusive lock, +8 for object map +16 for fast-diff, +32 for deep-flatten, +64 for journaling

到 v4.6 linux kernel 隻支援 layering 和 stripingv2。

參考:

http://www.zphj1987.com/2016/06/07/rbd%E6%97%A0%E6%B3%95map-rbd-feature-disable/

  1. K8S Mount PV 對應的 RBD 鏡像失敗,提示 rbd: map failed exit status 6 rbd: sysfs write failed

    現象:

K8S Mount PV 對應的 RBD 鏡像失敗,提示 rbd: map failed exit status 6 rbd: sysfs write failed

[[email protected] ~]$ kubectl get pods|grep prometheus-server

wishful-ladybird-prometheus-server-f744b8794-mxr5l 0/2 Init:0/1 0 55s

[[email protected] ~]$ kubectl describe pods wishful-ladybird-prometheus-server-f744b8794-mxr5l|tail -10

Normal Scheduled 2m default-scheduler Successfully assigned wishful-ladybird-prometheus-server-f744b8794-mxr5l to kube-node2

Normal SuccessfulMountVolume 2m kubelet, kube-node2 MountVolume.SetUp succeeded for volume “config-volume”

Normal SuccessfulMountVolume 2m kubelet, kube-node2 MountVolume.SetUp succeeded for volume “wishful-ladybird-prometheus-server-token-8grpr”

Warning FailedMount 59s (x8 over 2m) kubelet, kube-node2 MountVolume.SetUp failed for volume “ceph-pv-8g” : rbd: map failed exit status 6 rbd: sysfs write failed

RBD image feature set mismatch. Try disabling features unsupported by the kernel with “rbd feature disable”.

In some cases useful info is found in syslog - try “dmesg | tail”.

rbd: map failed: (6) No such device or address

Warning FailedMount 2s kubelet, kube-node2 Unable to mount volumes for pod “wishful-ladybird-prometheus-server-f744b8794-mxr5l_default(7cdc39e7-80f0-11e8-9331-525400ce676d)”: timeout expired waiting for volumes to attach/mount for pod “default”/“wishful-ladybird-prometheus-server-f744b8794-mxr5l”. list of unattached/unmounted volumes=[storage-volume]

Warning FailedSync 2s kubelet, kube-node2 Error syncing pod

[[email protected] ~]$ kubectl get pv ceph-pv-8g -o yaml|grep image

image: prometheus-server

直接 disable feature 失敗,提示 Read-only file system:

[[email protected] ~]$ rbd feature disable prometheus-server fast-diff,object-map,exclusive-lock,deep-flatten

rbd: failed to update image features: (30) Read-only file system

删除 RBD Image 失敗:

[[email protected] ~]$ rbd rm prometheus-server

2018-07-06 18:05:04.968455 7f9ce1ffb700 -1 librbd::image::RemoveRequest: 0x7f9d05abc0f0 handle_exclusive_lock: cannot obtain exclusive lock - not removing

Removing image: 0% complete…failed.

rbd: error: image still has watchers

This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.

[[email protected] ~]$ rbd status prometheus-server

Watchers: none

原因:

CentoOS kernel 不支援全部特性:[layering, striping, exclusive-lock, object-map, fast-diff, deep-flatten, journaling, data-pool]

解決方法:

删除 RBD Image 的 Lock:

[[email protected] ~]$ rbd lock list prometheus-server

There is 1 exclusive lock on this image.

Locker ID Address

client.44142 kubelet_lock_magic_ambari.hadoop 172.27.132.67:0/710596833

[[email protected] ~]$ rbd lock rm prometheus-server kubelet_lock_magic_ambari.hadoop client.44142

[[email protected] ~]$ rbd rm prometheus-server

Removing image: 100% complete…done.

[[email protected] ~]$

9. unmap rbd 失敗

現象:

[[email protected] ~]# sudo rbd unmap foo

rbd: sysfs write failed

rbd: unmap failed: (16) Device or resource busy

原因:

未知;

解決辦法:

使用 -o force 強制 unmap:

[[email protected] ~]# sudo rbd unmap -o force foo

參考:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-July/019160.html

  1. 同時挂載一個 RBD,并發讀寫後,引起檔案系統錯誤,提示 Input/output error

    現象:

[[email protected] ~]# sudo rbd map foo # 第二次挂載同一個 RBD

[[email protected] ~]# sudo mkdir /mnt/ceph-block-device

[[email protected] ~]# sudo mount /dev/rbd0 /mnt/ceph-block-device

[[email protected] ~]# ls /mnt/ceph-block-device/

ls: cannot access /mnt/ceph-block-device/test: Input/output error

lost+found test

[[email protected] ~]# dmesg|tail

[598628.843913] rbd: loaded (major 251)

[598628.850614] libceph: mon0 172.27.132.65:6789 session established

[598628.851312] libceph: client14187 fsid b7b9e370-ea9b-4cc0-8b09-17167c876c24

[598628.860645] rbd: rbd0: capacity 4294967296 features 0x1

[598639.147675] EXT4-fs (rbd0): recovery complete

[598639.147682] EXT4-fs (rbd0): mounted filesystem with ordered data mode. Opts: (null)

[598675.693950] EXT4-fs (rbd0): mounted filesystem with ordered data mode. Opts: (null)

[598678.177275] EXT4-fs error (device rbd0): ext4_lookup:1441: inode #2: comm ls: deleted inode referenced: 12

[598680.603020] EXT4-fs error (device rbd0): ext4_lookup:1441: inode #2: comm ls: deleted inode referenced: 12

[598701.607815] EXT4-fs error (device rbd0): ext4_lookup:1441: inode #2: comm ls: deleted inode referenced: 12

原因:

RBD 不支援多次挂載的并發通路。

解決辦法:

使用 fsck 修複。

[[email protected] ~]# umount /dev/rbd1

[[email protected] ~]# fsck -y /dev/rbd1

fsck from util-linux 2.23.2

e2fsck 1.42.9 (28-Dec-2013)

/dev/rbd1 contains a file system with errors, check forced.

Pass 1: Checking inodes, blocks, and sizes

Pass 2: Checking directory structure

Entry ‘test’ in / (2) has deleted/unused inode 12. Clear? yes

Pass 3: Checking directory connectivity

Pass 4: Checking reference counts

Pass 5: Checking group summary information

Inode bitmap differences: -12

Fix? yes

Free inodes count wrong for group #0 (8180, counted=8181).

Fix? yes

Free inodes count wrong (262132, counted=262133).

Fix? yes

/dev/rbd1: ***** FILE SYSTEM WAS MODIFIED *****

/dev/rbd1: 11/262144 files (0.0% non-contiguous), 53326/1048576 blocks

[[email protected] ~]# mount /dev/rbd1 /mnt/ceph-block-device/

[[email protected] ~]# ls -l /mnt/ceph-block-device/

total 16

drwx------ 2 root root 16384 Jul 5 18:07 lost+found

[[email protected] ~]#

11. too few PGs per OSD (16 < min 30)

現象:

$ ceph -s

cluster 85510587-14c6-4526-9636-83179bda2751

health HEALTH_WARN

too few PGs per OSD (16 < min 30)

monmap e3: 3 mons at {controller-01=10.90.3.7:6789/0,controller-02=10.90.3.2:6789/0,controller-03=10.90.3.5:6789/0}

election epoch 8, quorum 0,1,2 controller-02,controller-03,controller-01

osdmap e74: 12 osds: 12 up, 12 in

pgmap v38670: 1408 pgs, 10 pools, 18592 MB data, 4304 objects

56379 MB used, 20760 GB / 20815 GB avail

1408 active+clean

client io 5127 B/s wr, 2 op/s

原因:

Ceph叢集目前每個osd的pg數門檻值最小是 30,一旦每個osd上的pg數低于30,就會提示“too few PGs per OSD”:

$ ceph --show-config | grep mon_pg_warn_min_per_osd

mon_pg_warn_min_per_osd = 30

解決辦法:

檢視 Ceph pool

ceph osd lspools 0 rbd,

建立的叢集隻有一個 rbd 池。

檢視rbd 池的PG和PGP值

ceph osd pool get rbd pg_num pg_num: 64

ceph osd pool get rbd pgp_num pgp_num: 64

檢視 池的副本數

ceph osd dump | grep size

pool 0 ‘rbd’ replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0

pgs為64,因為是3副本的配置,是以當有12個osd的時候,每個osd上均分了64/12 *3=16個pgs,低于最低的配置30。

調整 rbd 池的pgs

pg num 的設定不能随便配設定,過大或者過小都不可以。如果過大,那麼backfill和recovery的時候負載太大,如果過小,資料就沒法很好的均勻分布。 PG總數=(OSD 總數 x100)/ 最大副本數

結果必須舍入到最接近2的N次幂的值。

修改池的PG和PGP:

ceph osd pool set rbd pg_num 512 ceph osd pool set rbd pgp_num 512

不能調整的太大,否則出現 too many PGs per OSD (352 > max 300) 的錯誤。

  1. too many PGs per OSD (352 > max 300)

    現象:

$ ceph -s

cluster 85510587-14c6-4526-9636-83179bda2751

health HEALTH_WARN

too many PGs per OSD (352 > max 300)

monmap e3: 3 mons at {controller-01=10.90.3.7:6789/0,controller-02=10.90.3.2:6789/0,controller-03=10.90.3.5:6789/0}

election epoch 8, quorum 0,1,2 controller-02,controller-03,controller-01

osdmap e74: 12 osds: 12 up, 12 in

pgmap v38670: 1408 pgs, 10 pools, 18592 MB data, 4304 objects

56379 MB used, 20760 GB / 20815 GB avail

1408 active+clean

client io 5127 B/s wr, 2 op/s

原因:

Ceph 叢集目前每個osd 的 pg 數最大門檻值是300,一旦每個osd 的pg數超過300,就會提示“too many PGs per OSD”

$ ceph --show-config | grep mon_pg_warn_max_per_osd

mon_pg_warn_max_per_osd = 300

解決辦法:

檢視目前的pg分布

pg num和per osd:把所有Pool的PG數量求和,除以OSD的個數,就得到平均每個OSD上的PG數量。

修改所有monitor節點的 ceph.conf

$ vim /etc/ceph/ceph.conf

[global]

mon_pg_warn_max_per_osd = 0 # 在global部分最後添加此行

重新開機所有的節點

$ restart ceph-mon-all

如果覺得覺得修改配置檔案麻煩,也可以直接一條指令搞定:

ceph tell ‘mon.*’ injectargs “–mon_pg_warn_max_per_osd 0”

使用tell指令修改的配置隻是臨時的,mon 服務重新開機後會丢失。解決辦法是把這個配置加到Ceph mon節點的配置檔案裡,然後重新開機mon服務。

  1. 啟動 rgw 失敗

    現象:

[[email protected] ~]# journalctl -u [email protected]|tail

7月 05 20:04:04 kube-node2 radosgw[29818]: 2018-07-05 20:04:04.799739 7fe6b8908e80 -1 ERROR: failed to initialize watch: (34) Numerical result out of range

7月 05 20:04:04 kube-node2 radosgw[29818]: 2018-07-05 20:04:04.802725 7fe6b8908e80 -1 Couldn’t init storage provider (RADOS)

7月 05 20:04:04 kube-node2 systemd[1]: [email protected]: main process exited, code=exited, status=5/NOTINSTALLED

7月 05 20:04:04 kube-node2 systemd[1]: Unit [email protected] entered failed state.

7月 05 20:04:04 kube-node2 systemd[1]: [email protected] failed.

7月 05 20:04:05 kube-node2 systemd[1]: [email protected] holdoff time over, scheduling restart.

7月 05 20:04:05 kube-node2 systemd[1]: start request repeated too quickly for [email protected]

7月 05 20:04:05 kube-node2 systemd[1]: Failed to start Ceph rados gateway.

7月 05 20:04:05 kube-node2 systemd[1]: Unit [email protected] entered failed state.

7月 05 20:04:05 kube-node2 systemd[1]: [email protected] failed.

原因:

pg_num、pgp_num、mon_max_pg_per_osd 等參數值設定的不正确或太小。

解決辦法:

[[email protected] my-cluster]$ ceph tell injectargs ‘–mon_max_pg_per_osd=350’ # 還應該添加到 ceph.conf 檔案中;

參考:

http://tracker.ceph.com/issues/22351#note-11

  1. s3test.py 傳回 416 錯誤

    執行 http://docs.ceph.com/docs/master/install/install-ceph-gateway/ 的 s3 test.py 程式失敗:

[[email protected] cert]$ python s3test.py

Traceback (most recent call last):

File “s3test.py”, line 12, in

bucket = conn.create_bucket(‘my-new-bucket’)

File “/usr/lib/python2.7/site-packages/boto/s3/connection.py”, line 625, in create_bucket

response.status, response.reason, body)

boto.exception.S3ResponseError: S3ResponseError: 416 Requested Range Not Satisfiable

原因:

pg_num、pgp_num、mon_max_pg_per_osd 等參數值設定的不正确或太小。

解決辦法:

[[email protected] my-cluster]$ ceph tell injectargs ‘–mon_max_pg_per_osd=350’ # 還應該添加到 ceph.conf 檔案中;

參考:

https://tracker.ceph.com/issues/21497

  1. map rbd 失敗,提示 timeout

    現象:

$ rbd feature disable foo fast-diff,object-map,exclusive-lock,deep-flatten

$ rbd map foo

rbd: sysfs write failed

In some cases useful info is found in syslog - try “dmesg | tail”.

rbd: map failed: (110) Connection timed out

核心日志:

$ dmesg|tail

[2937489.640621] libceph: mon1 172.27.128.101:6789 feature set mismatch, my 106b84a842a42 < server’s 40106b84a842a42, missing 400000000000000

[2937489.643198] libceph: mon1 172.27.128.101:6789 missing required protocol features

[2937742.427929] libceph: mon2 172.27.128.102:6789 feature set mismatch, my 106b84a842a42 < server’s 40106b84a842a42, missing 400000000000000

[2937742.430234] libceph: mon2 172.27.128.102:6789 missing required protocol features

[2937752.725957] libceph: mon2 172.27.128.102:6789 feature set mismatch, my 106b84a842a42 < server’s 40106b84a842a42, missing 400000000000000

[2937752.728805] libceph: mon2 172.27.128.102:6789 missing required protocol features

[2937762.737960] libceph: mon2 172.27.128.102:6789 feature set mismatch, my 106b84a842a42 < server’s 40106b84a842a42, missing 400000000000000

[2937762.740282] libceph: mon2 172.27.128.102:6789 missing required protocol features

[2937772.722343] libceph: mon2 172.27.128.102:6789 feature set mismatch, my 106b84a842a42 < server’s 40106b84a842a42, missing 400000000000000

[2937772.724659] libceph: mon2 172.27.128.102:6789 missing required protocol features

原因:

Linux kernel version v4.5 開始才支援 400000000000000 對應的 CRUSH_TUNABLES5;

解決辦法:

更新核心到 4.5 或更高版本;

或更新 feature flag,不需要 400000000000000;

對于後一種方式,執行指令 ceph osd crush tunables hammer;

參考:

http://cephnotes.ksperis.com/blog/2014/01/21/feature-set-mismatch-error-on-ceph-kernel-client http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-July/019387.html

  1. kuberntes PVC 使用 StorageClass 建立 PV 失敗

    現象:

$ kubectl get pv # 沒有自動建立 PV

No resources found.

$ kubectl describe pvc | tail

Warning ProvisioningFailed 25m persistentvolume-controller Failed to provision volume with StorageClass “ceph”: failed to create rbd image: exit status 1, command output: 2018-07-26 17:35:05.749211 7f4e037687c0 -1 did not load config file, using default settings.

rbd: extraneous parameter --image-feature

$ kubectl get endpoints -n kube-system kube-controller-manager -o yaml|grep leader

control-plane.alpha.kubernetes.io/leader: ‘{“holderIdentity”:“m7-devops-128123”,“leaseDurationSeconds”:15,“acquireTime”:“2018-07-25T14:00:31Z”,“renewTime”:“2018-07-26T10:02:01Z”,“leaderTransitions”:3}’

$ journalctl -u kube-controller-manager |tail -4 # 檢視 leader m7-devops-128123 的日志

7月 26 18:01:20 m7-devops-128123 kube-controller-manager[27948]: E0726 18:01:20.798621 27948 rbd.go:367] rbd: create volume failed, err: failed to create rbd image: exit status 1, command output: 2018-07-26 18:01:20.765861 7f912376db00 -1 did not load config file, using default settings.

7月 26 18:01:20 m7-devops-128123 kube-controller-manager[27948]: rbd: extraneous parameter --image-feature

7月 26 18:01:20 m7-devops-128123 kube-controller-manager[27948]: I0726 18:01:20.798671 27948 pv_controller.go:1317] failed to provision volume for claim “default/pvc-test-claim” with StorageClass “ceph”: failed to create rbd image: exit status 1, command output: 2018-07-26 18:01:20.765861 7f912376db00 -1 did not load config file, using default settings.

7月 26 18:01:20 m7-devops-128123 kube-controller-manager[27948]: rbd: extraneous parameter --image-feature

原因:

kuberntes 節點安裝的 ceph-common 版本過低,與 ceph 叢集不相容。

解決辦法:

配置節點 YUM 源,使用與 ceph 叢集版本相比對的源。然後更新安裝 ceph-common。

$ cat /etc/yum.repos.d/ceph.repo

[Ceph]

name=Ceph packages for b a s e a r c h b a s e u r l = h t t p : / / d o w n l o a d . c e p h . c o m / r p m − l u m i n o u s / e l 7 / basearch baseurl=http://download.ceph.com/rpm-luminous/el7/ basearchbaseurl=http://download.ceph.com/rpm−luminous/el7/basearch

enabled=1

gpgcheck=1

type=rpm-md

gpgkey=https://download.ceph.com/keys/release.asc

priority=1

[ceph-noarch]

name=Ceph noarch packages

baseurl=https://download.ceph.com/rpm-luminous/el7/noarch

enabled=1

gpgcheck=1

type=rpm-md

gpgkey=https://download.ceph.com/keys/release.asc

priority=1

[ceph-source]

name=Ceph source packages

baseurl=http://download.ceph.com/rpm-luminous/el7/SRPMS

enabled=1

gpgcheck=1

type=rpm-md

gpgkey=https://download.ceph.com/keys/release.asc

priority=1

$ yum clean all && yum update

17. 建立使用 PVC 的 POD 卡住,一直處于 ContainerCreating 或 Init:0/1 狀态(多容器的情況)

現象:

[[email protected] prometheus]# kubectl get pods -n devops-monitoring -o wide|grep -v Running

NAME READY STATUS RESTARTS AGE IP NODE

monitoring-prometheus-alertmanager-67674fb84b-pxfbz 0/2 ContainerCreating 0 16m m7-devops-128107

monitoring-prometheus-server-5777d76b75-l5bdv 0/2 Init:0/1 0 16m m7-devops-128107

prometheus-alertmanager 和 prometheus-server 均使用名為 ceph 的 PVC StorageClass;

POD 一直沒有 Running;

檢視 StorageClass ceph 詳情:

[[email protected] ~]# kubectl get storageclass ceph -o yaml

apiVersion: storage.k8s.io/v1

kind: StorageClass

metadata:

creationTimestamp: 2018-07-26T09:49:30Z

name: ceph

resourceVersion: “462349”

selfLink: /apis/storage.k8s.io/v1/storageclasses/ceph

uid: 3116e9a4-90b9-11e8-b43c-0cc47a2af650

parameters:

adminId: admin

adminSecretName: ceph-secret-admin

adminSecretNamespace: default

imageFeatures: layering

imageFormat: “2”

monitors: 172.27.128.100:6789,172.27.128.101:6789,172.27.128.102:6789

pool: rbd

userId: admin

userSecretName: ceph-secret-admin

provisioner: kubernetes.io/rbd

reclaimPolicy: Delete

該 storageclass 使用的 adminSecret ceph-secret-admin 位于 adminSecretNamespace 指定的 default 命名空間中;

對應節點的 kubelet 日志:

[[email protected] ~]# journalctl -u kubelet -f

– Logs begin at 一 2018-07-30 18:42:12 CST. –

8月 01 17:19:41 m7-devops-128107 kubelet[6120]: E0801 17:19:41.578011 6120 rbd.go:504] failed to get secret from [“devops-monitoring”/“ceph-secret-admin”]

8月 01 17:19:41 m7-devops-128107 kubelet[6120]: E0801 17:19:41.578092 6120 rbd.go:126] Couldn’t get secret from devops-monitoring/&LocalObjectReference{Name:ceph-secret-admin,}

原因:

PVC 指定的 StorageClass 的 userSecretName 必須位于 PVC 所在的命名空間。而 ceph-secret-admin 在 devops-monitoring 命名空間中沒有定義,是以出錯。

解決辦法:

在 PVC 所在的命名空間 devops-monitoring 中建立 ceph-secret-admin secret,注意類型必須是 kubernetes.io/rbd;

[[email protected] k8s]# Secret=$(awk ‘/key = / {print $3}’ /etc/ceph/ceph.client.admin.keyring | base64)

[[email protected] k8s]# cat > ceph-secret-admin.yaml <<EOF

apiVersion: v1

kind: Secret

type: kubernetes.io/rbd

metadata:

name: ceph-secret-admin

namespace: devops-monitoring

data:

key: $Secret

EOF

[[email protected] k8s]# kubectl create -f ceph-secret-admin.yaml

參考: https://kubernetes.io/docs/concepts/storage/storage-classes/#ceph-rbd