已發現2個記憶體錯誤,應用名稱(kernel:),日志内容(hangzhou-jishuan-DDS0248 kernel: sbridge: HANDLING MCE MEMORY ERROR hangzhou-jishuan-DDS0248 kernel: EDAC MC0: CE row 5, channel 0, label CPU_SrcID#0_Channel#3_DIMM#1:1 Unknown
error(s): memory scrubbing on FATAL area : cpu=0 Err=0008:00c1 (ch=1), addr = 0x1c9bea000 = socket=0, Channel=3(mask=8), rank=5)
如何判斷是第幾條記憶體?
擷取伺服器記憶體資訊(此資訊可以在報修的時候提供給硬體廠商工程師,記得告訴他們僅供參考)。
shell指令:dmidecode | grep -A 9 -B 6 DIMM | grep Bank
Bank Locator: BRANCH 0 CHANNEL 1 DIMM 0
Bank Locator: BRANCH 0 CHANNEL 1 DIMM 1
Bank Locator: BRANCH 0 CHANNEL 2 DIMM 0
Bank Locator: BRANCH 0 CHANNEL 2 DIMM 1
Bank Locator: BRANCH 0 CHANNEL 3 DIMM 0
Bank Locator: BRANCH 0 CHANNEL 3 DIMM 1
Bank Locator: BRANCH 1 CHANNEL 1 DIMM 0
Bank Locator: BRANCH 1 CHANNEL 1 DIMM 1
Bank Locator: BRANCH 1 CHANNEL 2 DIMM 0
Bank Locator: BRANCH 1 CHANNEL 2 DIMM 1
Bank Locator: BRANCH 1 CHANNEL 3 DIMM 0
Bank Locator: BRANCH 1 CHANNEL 3 DIMM 1
記憶體順序是從上向下1-12.根據報錯資訊CPU_SrcID#0_Channel#3_DIMM#1 : 得到CPU_SrcID 0,CHANNEL 3,DIMM 1。
可以判斷為第六條條記憶體故障,也可以說第一顆cpu控制記憶體區域,CHANNEL為3,記憶體id為1。