登入使用者的UCS Manager,發現Chassis 2第6個槽位的風扇報警,訓示燈不亮,将該槽位的風扇與其它正常槽位的風扇互換,該風扇又工作正常了,可是原來好用的風扇插在第6個槽位後,仍然報警,訓示燈不亮。初步判斷應該是刀箱存在問題。
于是向思科開了一個case,思科TAC工程師遠端連到6248上,通過指令檢視風扇确實工作異常,資訊如下:
magic: 0x486f7403 # OK
valid: 1
pid: 1554
interval: 15 # seconds
write_ts: 1451028436 # Fri Dec 25 07:27:16 2015
stale_ts: 1451028458 # Fri Dec 25 07:27:38 2015 OK
now: 1451028443 # Fri Dec 25 07:27:23 2015
status: 1 # ACTIVE
policy_state: 1 # COOL
xreading: 1 # DEVELOPER_MODE: FALSE
hwconf_valid: 1
maxfans: 8
fan[1].fault/read/req: 0/30/30 # OK
fan[2].fault/read/req: 0/30/30 # OK
fan[3].fault/read/req: 0/30/30 # OK
fan[4].fault/read/req: 0/30/30 # OK
fan[5].fault/read/req: 0/30/30 # OK
fan[6].fault/read/req: 1/0/30 # MISSING
fan[7].fault/read/req: 0/30/30 # OK
fan[8].fault/read/req: 0/30/30 # OK
通過上面的資訊可以看到第6個槽位的風扇缺失。
在閱讀日志等相關資訊後,思科TAC工程師懷疑問題出現在UCS5108的内部總線上,于是給出解決建議如下:
第一步:
Remove PSU1 let sit for 2 minutes replace, wait 10 secondsconfirm PSU1 has power, Move to PSU2
Remove PSU2 let sit for 2 minutes replace, wait 10 seconds haspower, Move to PSU3
Remove PSU3 let sit for 2 minutes replace, wait 10 seconds PSU3has power, Move to PSU4
Remove PSU4 let sit for 2 minutes replace, wait 10 seconds PSU4has power, Move to Fan1
第二步:
Basically, the power supply reseat did not have business impact,you could do it now.
If the issue still exist, please take the action below in amaintenance window:
Remove Fan1 let sit for 30 seconds replace, wait 10 secondsconfirm Fan1 has power, Move to Fan2
Remove Fan2 let sit for 30 seconds replace, wait 10 secondsconfirm Fan2 has power, Move to Fan3
Remove Fan3 let sit for 30 seconds replace, wait 10 secondsconfirm Fan3 has power, Move to Fan4
Remove Fan4 let sit for 30 seconds replace, wait 10 secondsconfirm Fan4 has power, Move to Fan5
Remove Fan5 let sit for 30 seconds replace, wait 10 secondsconfirm Fan5 has power, Move to Fan6
Remove Fan6 let sit for 30 seconds replace, wait 10 secondsconfirm Fan6 has power, Move to Fan7
Remove Fan7 let sit for 30 seconds replace, wait 10 seconds confirmFan7 has power, Move to Fan8
Remove Fan8 let sit for 30 seconds replace, wait 10 secondsconfirm Fan8 has power
第三步:
Remove right IOM, let sit for 5 minutes replace, confirmthat IO MOD is UP and Running before you reseat left IOM
Once right IOM is Up and Running finally reseat left IOM let sitfor 5 minutes, and place it back into the chassis.
最後一步:
If all the above did not fix the issue, then you need to power-cycle thewhole chassis 2.
當我執行到第三步的時候,風扇上面的***報警框已經消失了,但是通過指令檢視風扇仍然工作不正常,于是執行了最後一步,當刀箱重新開機完成之後通過指令檢視風扇工作正常。
pid: 1511
write_ts: 1451289000 # Mon Dec 28 07:50:00 2015
stale_ts: 1451289022 # Mon Dec 28 07:50:22 2015 OK
now: 1451289010 # Mon Dec 28 07:50:10 2015
fan[6].fault/read/req: 0/30/30 # OK
至此故障解決。
*************************************************************************************
後記:雖然替換法可以快速定位故障,但是有時也并不一定準确,問題不一定就是出現在硬體上,也有可能是軟體出了問題。
感謝思科TAC趙工!
本文轉自 彎月樓主 51CTO部落格,原文連結:http://blog.51cto.com/05wylz/1729934,如需轉載請自行聯系原作者