天天看點

心跳丢失造成RAC節點驅逐

趕上了遷庫風,我們的一套再建生産業務庫發生了主機重新開機,什麼原因呢

[root@nqzeyddb2 ~]# uptime

15:42:46 up 5 days, 22:29,  4 users,  load average: 0.08, 0.09, 0.10

alert日志裡面沒有明顯告警,crs裡面是有明顯的逾時告警的,懷疑是心跳網卡丢失造成腦裂進行的節點驅逐

2021-10-09 00:47:55.743: [    CSSD][580302592]clssnmPollingThread: node nqzeyddb1 (1) at 50% heartbeat fatal, removal in 14.910 seconds

2021-10-09 00:47:55.743: [    CSSD][580302592]clssnmPollingThread: node nqzeyddb1 (1) is impending reconfig, flag 2491406, misstime 15090

2021-10-09 00:47:55.743: [    CSSD][580302592]clssnmPollingThread: local diskTimeout set to 27000 ms, remote disk timeout set to 27000, impending reconfig status(1)

2021-10-09 00:47:55.743: [    CSSD][586610432]clssnmvDHBValidateNcopy: node 1, nqzeyddb1, has a disk HB, but no network HB, DHB has rcfg 528920094, wrtcnt, 880951, LATS 286157624, lastSeqNo 880924, uniqueness 1633579205, timestamp 1633711660/286147104

2021-10-09 00:47:55.863: [    CSSD][589764352]clssnmvDiskPing: Writing with status 0x3, timestamp 1633711675/286157744

2021-10-09 00:47:56.144: [    CSSD][594495232]clssnmvDiskPing: Writing with status 0x3, timestamp 1633711676/286158024

2021-10-09 00:47:56.306: [    CSSD][943818496]clssgmpcBuildNodeList: nodename for node 0 is NULL

2021-10-09 00:48:07.745: [    CSSD][580302592]clssnmPollingThread: node nqzeyddb1 (1) at 90% heartbeat fatal, removal in 2.910 seconds,

2021-10-09 00:48:10.656: [    CSSD][580302592]clssnmMarkNodeForRemoval: node 1, nqzeyddb1 marked for removal

後找網絡同僚幫忙檢視一下交換機有無異常

心跳丢失造成RAC節點驅逐

結果顯而易見~

繼續閱讀