天天看點

Cisco UCS 5108刀箱風扇故障處理

登入使用者的UCS Manager,發現Chassis 2第6個槽位的風扇報警,訓示燈不亮,将該槽位的風扇與其它正常槽位的風扇互換,該風扇又工作正常了,可是原來好用的風扇插在第6個槽位後,仍然報警,訓示燈不亮。初步判斷應該是刀箱存在問題。

于是向思科開了一個case,思科TAC工程師遠端連到6248上,通過指令檢視風扇确實工作異常,資訊如下:

magic:                0x486f7403        # OK

valid:                1

pid:                1554

interval:        15                # seconds

write_ts:        1451028436        # Fri Dec 25 07:27:16 2015

stale_ts:        1451028458        # Fri Dec 25 07:27:38 2015 OK

now:                1451028443        # Fri Dec 25 07:27:23 2015

status:                1                # ACTIVE

policy_state:        1                # COOL

xreading:        1                # DEVELOPER_MODE: FALSE

hwconf_valid:        1

maxfans:                8

fan[1].fault/read/req:        0/30/30        # OK

fan[2].fault/read/req:        0/30/30        # OK

fan[3].fault/read/req:        0/30/30        # OK

fan[4].fault/read/req:        0/30/30        # OK

fan[5].fault/read/req:        0/30/30        # OK

fan[6].fault/read/req:        1/0/30        # MISSING

fan[7].fault/read/req:        0/30/30        # OK

fan[8].fault/read/req:        0/30/30        # OK

通過上面的資訊可以看到第6個槽位的風扇缺失。

在閱讀日志等相關資訊後,思科TAC工程師懷疑問題出現在UCS5108的内部總線上,于是給出解決建議如下:

第一步:

Remove PSU1 let sit for 2 minutes replace, wait 10 secondsconfirm PSU1 has power, Move to PSU2

Remove PSU2 let sit for 2 minutes replace, wait 10 seconds haspower, Move to PSU3

Remove PSU3 let sit for 2 minutes replace, wait 10 seconds PSU3has power, Move to PSU4

Remove PSU4 let sit for 2 minutes replace, wait 10 seconds PSU4has power, Move to Fan1

 第二步:

Basically, the power supply reseat did not have business impact,you could do it now.

If the issue still exist, please take the action below in amaintenance window:

Remove Fan1 let sit for 30 seconds replace, wait 10 secondsconfirm Fan1 has power, Move to Fan2

Remove Fan2 let sit for 30 seconds replace, wait 10 secondsconfirm Fan2 has power, Move to Fan3

Remove Fan3 let sit for 30 seconds replace, wait 10 secondsconfirm Fan3 has power, Move to Fan4

Remove Fan4 let sit for 30 seconds replace, wait 10 secondsconfirm Fan4 has power, Move to Fan5

Remove Fan5 let sit for 30 seconds replace, wait 10 secondsconfirm Fan5 has power, Move to Fan6

Remove Fan6 let sit for 30 seconds replace, wait 10 secondsconfirm Fan6 has power, Move to Fan7

Remove Fan7 let sit for 30 seconds replace, wait 10 seconds confirmFan7 has power, Move to Fan8

Remove Fan8 let sit for 30 seconds replace, wait 10 secondsconfirm Fan8 has power

第三步:

Remove right IOM,  let sit for 5 minutes replace, confirmthat IO MOD is UP and Running before you reseat left IOM

Once right IOM is Up and Running finally reseat left IOM let sitfor 5 minutes, and place it back into the chassis.

最後一步:

If all the above did not fix the issue, then you need to power-cycle thewhole chassis 2.

當我執行到第三步的時候,風扇上面的***報警框已經消失了,但是通過指令檢視風扇仍然工作不正常,于是執行了最後一步,當刀箱重新開機完成之後通過指令檢視風扇工作正常。

pid:                1511

write_ts:        1451289000        # Mon Dec 28 07:50:00 2015

stale_ts:        1451289022        # Mon Dec 28 07:50:22 2015 OK

now:                1451289010        # Mon Dec 28 07:50:10 2015

fan[6].fault/read/req:        0/30/30        # OK

至此故障解決。

*************************************************************************************

後記:雖然替換法可以快速定位故障,但是有時也并不一定準确,問題不一定就是出現在硬體上,也有可能是軟體出了問題。

感謝思科TAC趙工!

本文轉自 彎月樓主 51CTO部落格,原文連結:http://blog.51cto.com/05wylz/1729934,如需轉載請自行聯系原作者

繼續閱讀