Netapp存儲無法正常工作導緻小機資料庫無法連接配接
1.使用sysconfig -r檢視系統狀态硬碟狀态
SBJYJ-02> sysconfig –r
Aggregate aggr0 (online, raid_dp, degraded, hybrid_enabled) (block checksums)
Plex /aggr0/plex0 (online, normal, active)
RAID group /aggr0/plex0/rg0 (degraded, block checksums)
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0b.10.1 0b 10 1 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
parity 0b.20.1 0b 20 1 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
data 0b.20.17 0b 20 17 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
data 0b.10.3 0b 10 3 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
data 0b.20.3 0b 20 3 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
data 4c.30.3 4c 30 3 SA:A 0 SAS 15000 560000/1146880000 560208/1147307688
data 0b.10.5 0b 10 5 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
data 0b.20.5 0b 20 5 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
data 4c.30.5 4c 30 5 SA:A 0 SAS 15000 560000/1146880000 560208/1147307688
data 0b.10.7 0b 10 7 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
data FAILED N/A 560000/ -
data 4c.30.7 4c 30 7 SA:A 0 SAS 15000 560000/1146880000 560208/1147307688
data 0b.10.9 0b 10 9 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
data 0b.20.9 0b 20 9 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
data 4c.30.9 4c 30 9 SA:A 0 SAS 15000 560000/1146880000 560208/1147307688
data 0b.10.11 0b 10 11 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
data 0b.20.11 0b 20 11 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
RAID group /aggr0/plex0/rg1 (double degraded, block checksums)
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 4c.30.11 4c 30 11 SA:A 0 SAS 15000 560000/1146880000 560208/1147307688
parity 0b.20.13 0b 20 13 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
data 0b.10.13 0b 10 13 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
data 4c.30.13 4c 30 13 SA:A 0 SAS 15000 560000/1146880000 560208/1147307688
data 0b.20.15 0b 20 15 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
data FAILED N/A 560000/ -
data 4c.30.15 4c 30 15 SA:A 0 SAS 15000 560000/1146880000 560208/1147307688
data 0b.10.23 0b 10 23 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
data 4c.30.1 4c 30 1 SA:A 0 SAS 15000 560000/1146880000 560879/1148681096
data 4c.30.17 4c 30 17 SA:A 0 SAS 15000 560000/1146880000 560208/1147307688
data 0b.20.19 0b 20 19 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
data FAILED N/A 560000/ -
data 4c.30.19 4c 30 19 SA:A 0 SAS 15000 560000/1146880000 560208/1147307688
data 0b.20.21 0b 20 21 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
data 0b.10.21 0b 10 21 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
data 0b.10.15 0b 10 15 SA:B 0 SAS 15000 560000/1146880000 560879/1148681096
data 0b.20.23 0b 20 23 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
Pool1 spare disks (empty)
Pool0 spare disks
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
Spare disks for block checksum
spare 0a.00.1 0a 0 1 SA:B 0 SSD N/A 190532/390209536 190782/390721968
spare 0a.00.3 0a 0 3 SA:B 0 SSD N/A 190532/390209536 190782/390721968
Broken disks
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
failed 0b.10.17 0b 10 17 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
failed 0b.10.19 0b 10 19 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
failed 0b.20.7 0b 20 7 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
failed 4c.30.23 4c 30 23 SA:A 0 SAS 15000 560000/1146880000 560208/1147307688
Partner disks
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
partner 0b.10.22 0b 10 22 SA:B 0 SAS 15000 560000/1146880000 560879/1148681096
partner 4c.30.21 4c 30 21 SA:A 0 SAS 15000 0/0 560879/1148681096
partner 4c.30.22 4c 30 22 SA:A 0 SAS 15000 0/0 560879/1148681096
partner 4c.20.10 4c 20 10 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 0b.10.2 0b 10 2 SA:B 0 SAS 15000 0/0 560208/1147307688
partner 0b.10.16 0b 10 16 SA:B 0 SAS 15000 0/0 560208/1147307688
partner 0b.10.20 0b 10 20 SA:B 0 SAS 15000 0/0 560208/1147307688
partner 0b.10.0 0b 10 0 SA:B 0 SAS 15000 0/0 560208/1147307688
partner 0b.10.18 0b 10 18 SA:B 0 SAS 15000 0/0 560208/1147307688
partner 0b.10.8 0b 10 8 SA:B 0 SAS 15000 0/0 560208/1147307688
partner 4c.30.14 4c 30 14 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.20.12 4c 20 12 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.30.6 4c 30 6 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.20.6 4c 20 6 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.30.20 4c 30 20 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.20.2 4c 20 2 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.20.16 4c 20 16 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.30.0 4c 30 0 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.30.16 4c 30 16 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.20.8 4c 20 8 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.30.2 4c 30 2 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 0b.10.12 0b 10 12 SA:B 0 SAS 15000 0/0 560208/1147307688
partner 0b.10.14 0b 10 14 SA:B 0 SAS 15000 0/0 560208/1147307688
partner 0b.10.10 0b 10 10 SA:B 0 SAS 15000 0/0 560208/1147307688
partner 0b.10.4 0b 10 4 SA:B 0 SAS 15000 0/0 560208/1147307688
partner 0b.10.6 0b 10 6 SA:B 0 SAS 15000 0/0 560208/1147307688
partner 4c.20.14 4c 20 14 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.20.22 4c 20 22 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.30.12 4c 30 12 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.20.20 4c 20 20 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.20.18 4c 20 18 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.30.10 4c 30 10 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.30.18 4c 30 18 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.20.0 4c 20 0 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.20.4 4c 20 4 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.30.4 4c 30 4 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4c.30.8 4c 30 8 SA:A 0 SAS 15000 0/0 560208/1147307688
partner 4a.00.2 4a 0 2 SA:A 0 SSD N/A 0/0 190782/390721968
partner 4a.00.0 4a 0 0 SA:A 0 SSD N/A 0/0 190782/390721968
看到存儲做了raid dp 初步斷定硬碟損壞,資料不受影響
由以上日志可看出,壞盤一共四塊,壞在兩個raid組中,vol status -r 指令檢視熱備盤已用完,無法進行壞硬碟更換,而下方硬碟沒有被控制器二所接管,無法看到硬碟具體狀态。
2.更換控制器二所屬的硬碟
①将壞掉的硬碟拔出,等待30秒(防止磁盤斷電後還在轉動,防止磁盤造成實體損壞為資料恢複增加困難),插入新硬碟(确認黃燈亮、綠燈不閃爍)
②插入硬碟後disk show -v 檢視新硬碟是否配置設定了owner(控制器)
DISK OWNER POOL SERIAL NUMBER DR HOME CHKSUM
------------ ------------- ----- ------------- ------------- -------
4a.00.0 esad (22312421) Pool0 S142NEAD806058 SBJYJ-01 (2017242430) Block
4a.00.2 SBJYJ-01 (2017242430) Pool0 S142NEAD803850 SBJYJ-01 (2017242430) Block
發現新插入的硬碟不屬于此控制器,帶有其他控制器的資訊或raid資訊
使用disk assign -f <disk_id> -s <owner_id> 強制配置設定給一個控制器 *慎用,配置設定完成後使用disk show –v檢視是否配置設定成功
使用aggr destroy <aggr名稱> 删除一個AGGR *慎用
③使用vol status -r 檢視硬碟狀态,如果硬碟為Bad Label,執行步驟(1);如果硬碟已經進入spare disks中,并且磁盤最後标注了not zerod執行步驟(2)
(1) 在vol status –r中看到帶有bad label标簽的盤,但是已經将新的硬碟安裝上
先priv set advanced 進入進階模式,使用 disk unfail -s <硬碟id 0b.**.**> 去除Bad标簽,再退出進階模式priv set
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
failed 0b.10.19 0b 10 19 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
failed 0b.20.7 0b 20 7 SA:B 0 SAS 15000 560000/1146880000 560208/1147307688
failed 4c.30.23 4c 30 23 SA:A 0 SAS 15000 560000/1146880000 560208/1147307688
bad label 0b.10.17 0b 10 17 SA:B 0 SAS 15000 560000/1146880000 560879/1148681096
(2) 當熱備盤中硬碟後跟not zerod,
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
Spare disks for block checksum
spare 0a.00.1 0a 0 1 SA:B 0 SSD N/A 190532/390209536 190782/390721968 not zerod
spare 0a.00.3 0a 0 3 SA:B 0 SSD N/A 190532/390209536 190782/390721968
使用指令disk zero spares 初始化所有spare disk
拉起控制器一
控制器一因硬碟損壞,進行保護硬碟,自己關閉了控制器,防止使用者繼續通路存儲造成硬碟的繼續損壞導緻資料丢失
使用console線直連存儲,會進入LOADER>模式,使用help指令檢視可用指令,使用boot_ontap指令強制控制器啟動(若無法啟動,則控制器可能損壞),拉起控制器後進入控制器更換硬碟即可
總結
存儲進行重構、初始化需要一段時間,當存儲硬碟重構、初始化完成後raid組自動将降級取消,存儲正常運作
參考指令
sysconfig -v 檢視存儲狀态
sysconfig -r 檢視存儲硬碟狀态
sysconfig -a 檢視系統資訊詳情
vol status -v 檢視volume狀态
vol status -f 檢查是否有故障硬碟
disk show 檢視磁盤配置設定資訊
disk show -v 檢視硬碟所屬控制器
storage show disk -p 檢視硬碟位置
disk zero spares 初始化所有spare disk
environment status 檢查電源、風扇狀态
rdfile /etc/messages 檢查最新的日志
cf status 檢查控制器狀态
df 檢查邏輯卷磁盤使用率
environment status 檢視環境資訊
license 檢視許可資訊
ifconfig -a 檢視網絡配置
aggr status 檢視raid組資訊
aggr status -r 檢視raid組詳情
df -Vh 檢視卷空間
df -Ah 檢視aggr空間
disk assign -f <disk_id> -s <owner_id> 強制配置設定給一個控制器 *慎用
disk replace start disk_name spare_disk_name 使用spare disk 替換一塊磁盤 *慎用
disk replace stop disk_name 停止替換硬碟 *慎用
disk sanitize start disk_name 将磁盤上面所有資料移除 *慎用
disk sanitize abort disk_name 停止 *慎用
aggr destroy aggrname 删除一個AGGR *慎用
disk remove onwership disk_name 删除硬碟owner *慎用