一、問題描述
環境描述:
節點 | sid | db_name | software_version | 備注 |
172.16.2.22 | hdls1 | HDLS | 11.2.0.4 | rac節點 |
172.16.2.23 | hdls2 | HDLS | 11.2.0.4 | rac 節點 |
事件原因:
兩個節點的心跳網絡異常,導緻RAC腦裂,中斷了節點運作的oracle實列程序,資料庫服務宕掉。
二、過程
2.1 時間:16:45報障處理
檢查發現兩台oracle執行個體程序中止,無法正常連接配接。
2.2 時間:17:25恢複23節點
恢複23節點,保證業務作業可正常進行,排查22節點故障。等待作業完成處理。
- 重新開機22節點後,23節點的資料服務恢複正常
reboot -f
- 檢查23節點的資料庫服務狀态
crs_stat -t
2.3 對節點22進行分析
1、EVMD日志
2022-09-06 22:37:17.970: [GIPCHTHR][3844073216]gipchaWorkerCreateInterface: created remote interface for node 'hdls02', haName 'fe0a-b4a2-f838-ac00', inf 'udp://11.0.0.23:19879'
2022-09-06 22:37:17.970: [GIPCHGEN][3844073216]gipchaWorkerAttachInterface: Interface attached inf 0x7f21c4021fa0 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', local 0x7f21c402b1b0, ip '11.0.0.23:19879', subnet '11.0.0.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 0, flags 0x6 }
2022-09-06 22:37:17.970: [GIPCXCPT][3844073216]gipchaLowerRecv: message from unrecognized node 'udp://11.0.0.23:19879', hdr 0x7f21c002bf68 { len 80, seq 0, type gipchaHdrTypeAck (3), lastSeq 1, lastAck 0, minAck 2, flags 0x1, srcLuid 24d64699-7050de6f, dstLuid 6678805d-500d8712, msgId 1 }, ret gipcretFail (1)
2022-09-06 22:37:17.970: [GIPCHALO][3844073216]gipchaLowerCallback: EXCEPTION[ ret gipcretFail (1) ] error while processing req 0x7f21e51fbe60 { type gipcreqtypeRecv, endp 0000000000001950, ret gipcretSuccess, local 'udp://11.0.0.22:18417', peer 'udp://11.0.0.23:19879', buf 0x7f21c002bf68, len 10240, olen 80 }, hctx 0x21a9430 [0000000000000010] { gipchaContext : host 'hdls01', name '5e52-0b6f-5d73-b878', luid '6678805d-00000000', numNode 1, numInf 1, usrFlags 0x0, flags 0x5 }
2022-09-06 22:37:17.971: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
2022-09-06 22:37:17.971: [GIPCHALO][3844073216]gipchaLowerSend: deffering startup of hdr 0x7f21c001f2d8 { len 232, seq 0, type gipchaHdrTypeSend (1), lastSeq 0, lastAck 0, minAck 0, flags 0x0, srcLuid 00000000-00000000, dstLuid 00000000-00000000, msgId 0 }, node 0x7f21c4013650 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', srcLuid 6678805d-500d8712, dstLuid 00000000-00000000 numInf 1, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [1 : 1], createTime 7114604, sentRegister 1, localMonitor 0, flags 0x4 }
2022-09-06 22:37:17.981: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
2022-09-06 22:37:17.991: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
2022-09-06 22:37:18.001: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
2022-09-06 22:37:18.011: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
2022-09-06 22:37:18.021: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
2022-09-06 22:37:18.031: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
2022-09-06 22:37:18.035: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
2022-09-06 22:37:18.045: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
2022-09-06 22:37:18.055: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
2022-09-06 22:37:18.060: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
2022-09-06 22:37:18.070: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
2022-09-06 22:37:18.075: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
2022-09-06 22:37:18.079: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
2022-09-06 22:37:18.087: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
2022-09-06 22:37:18.097: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
2022-09-06 22:37:18.107: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
2022-09-06 22:37:18.117: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
2022-09-06 22:37:18.127: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
2022-09-06 22:37:18.972: [GIPCHALO][3844073216]gipchaLowerProcessAcks: ESTABLISH finished for node 0x7f21c4013650 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', srcLuid 6678805d-500d8712, dstLuid 24d64699-7050de6f numInf 1, contigSeq 2, lastAck 2, lastValidAck 0, sendSeq [1 : 1], createTime 7114604, sentRegister 1, localMonitor 0, flags 0x20c }
2022-09-06 22:37:18.972: [GIPCHALO][3844073216]gipchaLowerProcessWaitQ: triggering deffered startup of msg 0x7f21c001f2d8 { len 232, seq 0, type gipchaHdrTypeSend (1), lastSeq 0, lastAck 0, minAck 0, flags 0x0, srcLuid 00000000-00000000, dstLuid 00000000-00000000, msgId 0 }, node 0x7f21c4013650 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', srcLuid 6678805d-500d8712, dstLuid 24d64699-7050de6f numInf 1, contigSeq 2, lastAck 2, lastValidAck 0, sendSeq [2 : 2], createTime 7114604, sentRegister 1, localMonitor 0, flags 0x208 }
2022-09-06 22:37:18.973: [GIPCXCPT][3844073216]gipchaInternalResolve: failed to resolve ret gipcretKeyNotFound (36), host 'hdls01', port '13b5-9956-d0b9-0552', hctx 0x21a9430 [0000000000000010] { gipchaContext : host 'hdls01', name '5e52-0b6f-5d73-b878', luid '6678805d-00000000', numNode 1, numInf 1, usrFlags 0x0, flags 0x5 }, ret gipcretKeyNotFound (36)
2022-09-06 22:37:18.973: [GIPCHGEN][3844073216]gipchaResolveF [gipcmodGipcResolve : gipcmodGipc.c : 815]: EXCEPTION[ ret gipcretKeyNotFound (36) ] failed to resolve ctx 0x21a9430 [0000000000000010] { gipchaContext : host 'hdls01', name '5e52-0b6f-5d73-b878', luid '6678805d-00000000', numNode 1, numInf 1, usrFlags 0x0, flags 0x5 }, host 'hdls01', port '13b5-9956-d0b9-0552', flags 0x0
2022-09-06 22:37:18.973: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
2022-09-06 22:37:18.973: [GIPCXCPT][3844073216]gipchaInternalResolve: failed to resolve ret gipcretKeyNotFound (36), host 'hdls01', port '0277-7bff-f7af-8073', hctx 0x21a9430 [0000000000000010] { gipchaContext : host 'hdls01', name '5e52-0b6f-5d73-b878', luid '6678805d-00000000', numNode 1, numInf 1, usrFlags 0x0, flags 0x5 }, ret gipcretKeyNotFound (36)
2022-09-06 22:37:18.973: [GIPCHGEN][3844073216]gipchaResolveF [gipcmodGipcResolve : gipcmodGipc.c : 815]: EXCEPTION[ ret gipcretKeyNotFound (36) ] failed to resolve ctx 0x21a9430 [0000000000000010] { gipchaContext : host 'hdls01', name '5e52-0b6f-5d73-b878', luid '6678805d-00000000', numNode 1, numInf 1, usrFlags 0x0, flags 0x5 }, host 'hdls01', port '0277-7bff-f7af-8073', flags 0x0
2022-09-06 22:37:18.973: [ CRSCCL][3837388544]clsCclNewConn: added new conn to tempConList: newPeerCon = bc007ba0
2022-09-06 22:37:18.973: [ CRSCCL][3837388544]PNC: Disconnecting conn from node (2,18699304).
2022-09-06 22:37:18.973: [ CRSCCL][3837388544]PNC: Keeping our connection to node (2,18699304).
2022-09-06 22:37:18.973: [GIPCHAUP][3844073216]gipchaUpperDisconnect: initiated discconnect umsg 0x7f21c0010c20 { msg 0x7f21c002dc88, ret gipcretRequestPending (15), flags 0x2 }, msg 0x7f21c002dc88 { type gipchaMsgTypeDisconnect (5), srcCid 00000000-00001a62, dstCid 00000000-000005da }, endp 0x7f21c0016c00 [0000000000001a62] { gipchaEndpoint : port 'EVMDMAIN2_1/2b61-5a3a-b3b0-633e', peer 'hdls02:d60b-3fd7-4897-3466', srcCid 00000000-00001a62, dstCid 00000000-000005da, numSend 0, maxSend 100, groupListType 2, hagroup 0x21d35a0, usrFlags 0x4000, flags 0x21c }
2022-09-06 22:37:18.973: [ CRSCCL][3837388544]ConnAccepted from Peer:msgTag= 0xcccccccc version= 0 msgType= 4 msgId= 0 msglen = 0 clschdr.size_clscmsgh= 88 src= (2, 18699304) dest= (1, 4294793640)
2022-09-06 22:37:18.973: [GIPCXCPT][3844073216]gipchaUpperProcessDisconnect: dropping Disconnect to unknown msg 0x7f21c0036a68 { type gipchaMsgTypeDisconnect (5), srcCid 00000000-000005da, dstCid 00000000-00001a62 }, node 0x7f21c4013650 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', srcLuid 6678805d-500d8712, dstLuid 24d64699-7050de6f numInf 1, contigSeq 7, lastAck 5, lastValidAck 6, sendSeq [6 : 6], createTime 7114604, sentRegister 1, localMonitor 0, flags 0x208 }, ret gipcretFail (1)
2022-09-06 22:37:18.973: [GIPCHAUP][3844073216]gipchaUpperProcessDisconnect: EXCEPTION[ ret gipcretFail (1) ] error during DISCONNECT processing for node 0x7f21c4013650 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', srcLuid 6678805d-500d8712, dstLuid 24d64699-7050de6f numInf 1, contigSeq 7, lastAck 5, lastValidAck 6, sendSeq [6 : 6], createTime 7114604, sentRegister 1, localMonitor 0, flags 0x208 }
2022-09-06 22:37:18.973: [GIPCHAUP][3844073216]gipchaUpperCallbackDisconnect: completed DISCONNECT ret gipcretSuccess (0), umsg 0x7f21c0010c20 { msg 0x7f21c002dc88, ret gipcretSuccess (0), flags 0x2 }, msg 0x7f21c002dc88 { type gipchaMsgTypeDisconnect (5), srcCid 00000000-00001a62, dstCid 00000000-000005da }, hendp 0x7f21c0016c00 [0000000000001a62] { gipchaEndpoint : port 'EVMDMAIN2_1/2b61-5a3a-b3b0-633e', peer 'hdls02:d60b-3fd7-4897-3466', srcCid 00000000-00001a62, dstCid 00000000-000005da, numSend 0, maxSend 100, groupListType 2, hagroup 0x21d35a0, usrFlags 0x4000, flags 0x21c }
2022-09-06 22:37:18.984: [ EVMD][3964278592] Authorization database built successfully.
2022-09-06 22:37:19.042: [ CLSE][3964278592]clse_get_auth_loc: Returning default authloc: /oracle/grid/crs_1/auth/evm/hdls01
2022-09-06 22:37:23.254: [ GIPCNET][3844073216]gipcmodNetworkProcessSend: [network] failed send attempt endp 0x7f21c001ecd0 [0000000000001950] { gipcEndpoint : localAddr 'udp://11.0.0.22:18417', remoteAddr '', numPend 5, numReady 1, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj 0x7f21c001eae0, sendp 0x7f21c0011550flags 0x3, usrFlags 0x4000 }, req 0x7f21c0011250 [0000000000001c19] { gipcSendRequest : addr 'udp://11.0.0.23:19879', data 0x7f21b0013448, len 1384, olen 0, parentEndp 0x7f21c001ecd0, ret gipcretEndpointNotAvailable (40), objFlags 0x0, reqFlags 0x2 }
2022-09-06 22:37:23.254: [ GIPCNET][3844073216]gipcmodNetworkProcessSend: slos op : sgipcnValidateSocket
2022-09-06 22:37:23.255: [ GIPCNET][3844073216]gipcmodNetworkProcessSend: slos dep : Invalid argument (22)
2022-09-06 22:37:23.255: [ GIPCNET][3844073216]gipcmodNetworkProcessSend: slos loc : address not
2022-09-06 22:37:23.255: [ GIPCNET][3844073216]gipcmodNetworkProcessSend: slos info: addr '11.0.0.22:18417', len 1384, buf 0x7f21b0013448, cookie 0x7f21c0011250
2022-09-06 22:37:23.255: [GIPCXCPT][3844073216]gipcInternalSendSync: failed sync request, ret gipcretEndpointNotAvailable (40)
2022-09-06 22:37:23.255: [GIPCXCPT][3844073216]gipcSendSyncF [gipchaLowerInternalSend : gipchaLower.c : 846]: EXCEPTION[ ret gipcretEndpointNotAvailable (40) ] failed to send on endp 0x7f21c001ecd0 [0000000000001950] { gipcEndpoint : localAddr 'udp://11.0.0.22:18417', remoteAddr '', numPend 5, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj 0x7f21c001eae0, sendp 0x7f21c0011550flags 0x3, usrFlags 0x4000 }, addr 0x7f21c00183a0 [0000000000001a10] { gipcAddress : name 'udp://11.0.0.23:19879', objFlags 0x0, addrFlags 0x1 }, buf 0x7f21b0013448, len 1384, flags 0x0
2022-09-06 22:37:23.255: [GIPCHGEN][3844073216]gipchaInterfaceFail: marking interface failing 0x7f21c4021fa0 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', local 0x7f21c402b1b0, ip '11.0.0.23:19879', subnet '11.0.0.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 4, flags 0x6 }
2022-09-06 22:37:23.255: [GIPCHALO][3844073216]gipchaLowerInternalSend: failed to initiate send on interface 0x7f21c4021fa0 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', local 0x7f21c402b1b0, ip '11.0.0.23:19879', subnet '11.0.0.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 4, flags 0x86 }, hctx 0x21a9430 [0000000000000010] { gipchaContext : host 'hdls01', name '5e52-0b6f-5d73-b878', luid '6678805d-00000000', numNode 1, numInf 1, usrFlags 0x0, flags 0x5 }
2022-09-06 22:37:23.255: [GIPCHGEN][3844073216]gipchaInterfaceDisable: disabling interface 0x7f21c402b1b0 { host '', haName '5e52-0b6f-5d73-b878', local (nil), ip '11.0.0.22:18417', subnet '11.0.0.0', mask '255.255.255.0', mac 'ec-c0-1b-08-5e-b6', ifname 'eno3', numRef 0, numFail 1, idxBoot 0, flags 0x10d }
2022-09-06 22:37:23.255: [GIPCHGEN][3844073216]gipchaInterfaceDisable: disabling interface 0x7f21c4021fa0 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', local 0x7f21c402b1b0, ip '11.0.0.23:19879', subnet '11.0.0.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 4, flags 0x86 }
2022-09-06 22:37:23.255: [GIPCHALO][3844073216]gipchaLowerCleanInterfaces: performing cleanup of disabled interface 0x7f21c4021fa0 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', local 0x7f21c402b1b0, ip '11.0.0.23:19879', subnet '11.0.0.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 4, flags 0xa6 }
2022-09-06 22:37:23.255: [GIPCHGEN][3844073216]gipchaInterfaceReset: resetting interface 0x7f21c4021fa0 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', local 0x7f21c402b1b0, ip '11.0.0.23:19879', subnet '11.0.0.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 4, flags 0xa6 }
2022-09-06 22:37:23.372: [GIPCHDEM][3844073216]gipchaWorkerCleanInterface: performing cleanup of disabled interface 0x7f21c402b1b0 { host '', haName '5e52-0b6f-5d73-b878', local (nil), ip '11.0.0.22:18417', subnet '11.0.0.0', mask '255.255.255.0', mac 'ec-c0-1b-08-5e-b6', ifname 'eno3', numRef 0, numFail 0, idxBoot 0, flags 0x12d }
2022-09-06 22:37:23.372: [GIPCHTHR][3844073216]gipchaWorkerCreateInterface: created remote interface for node 'hdls02', haName 'fe0a-b4a2-f838-ac00', inf 'udp://11.0.0.23:19879'
2022-09-06 22:37:23.373: [GIPCXCPT][3841971968]gipchaDaemonProcessRecv: dropping unrecognized daemon request 17, hctx 0x21a9430 [0000000000000010] { gipchaContext : host 'hdls01', name '5e52-0b6f-5d73-b878', luid '6678805d-00000000', numNode 1, numInf 0, usrFlags 0x0, flags 0x5 }, ret gipcretFail (1)
2022-09-06 22:37:23.373: [GIPCHDEM][3841971968]gipchaDaemonProcessRecv: EXCEPTION[ ret gipcretFail (1) ] exception processing requset type 17, hctx 0x21a9430 [0000000000000010] { gipchaContext : host 'hdls01', name '5e52-0b6f-5d73-b878', luid '6678805d-00000000', numNode 1, numInf 0, usrFlags 0x0, flags 0x5 }
2022-09-06 22:37:27.377: [GIPCHDEM][3841971968]gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0x21a9430 [0000000000000010] { gipchaContext : host 'hdls01', name '5e52-0b6f-5d73-b878', luid '6678805d-00000000', numNode 1, numInf 0, usrFlags 0x0, flags 0x1 } to gipcd
2022-09-06 22:37:28.916: [GIPCHALO][3844073216]gipchaLowerProcessNode: no valid interfaces found to node for 5660 ms, node 0x7f21c4013650 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', srcLuid 6678805d-500d8712, dstLuid 24d64699-7050de6f numInf 1, contigSeq 9, lastAck 20, lastValidAck 9, sendSeq [21 : 27], createTime 7114604, sentRegister 1, localMonitor 0, flags 0x8 }
2022-09-06 22:37:32.598: [GIPCHDEM][3841971968]gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0x21a9430 [0000000000000010] { gipchaContext : host 'hdls01', name '5e52-0b6f-5d73-b878', luid '6678805d-00000000', numNode 1, numInf 0, usrFlags 0x0, flags 0x1 } to gipcd
2022-09-06 22:37:32.612: [GIPCHGEN][3841971968]gipchaNodeAddInterface: adding interface information for inf 0x7f21c4024d60 { host '', haName '5e52-0b6f-5d73-b878', local (nil), ip '11.0.0.22', subnet '11.0.0.0', mask '255.255.255.0', mac 'ec-c0-1b-08-5e-b6', ifname 'eno3', numRef 0, numFail 0, idxBoot 0, flags 0x1 }
2022-09-06 22:37:33.409: [GIPCHTHR][3844073216]gipchaWorkerCreateInterface: created local interface for node 'hdls01', haName '5e52-0b6f-5d73-b878', inf 'udp://11.0.0.22:23405'
2022-09-06 22:37:33.409: [GIPCHGEN][3844073216]gipchaWorkerAttachInterface: Interface attached inf 0x7f21c4021fa0 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', local 0x7f21c4024d60, ip '11.0.0.23:19879', subnet '11.0.0.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 4, flags 0x6 }
2022-09-07 00:25:13.978: [GIPCHGEN][3841971968]gipchaInterfaceFail: marking interface failing 0x7f21c4024d60 { host '', haName '5e52-0b6f-5d73-b878', local (nil), ip '11.0.0.22:23405', subnet '11.0.0.0', mask '255.255.255.0', mac 'ec-c0-1b-08-5e-b6', ifname 'eno3', numRef 1, numFail 0, idxBoot 0, flags 0xd }
2022-09-07 00:25:14.397: [GIPCHGEN][3844073216]gipchaInterfaceFail: marking interface failing 0x7f21c4021fa0 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', local 0x7f21c4024d60, ip '11.0.0.23:19879', subnet '11.0.0.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 4, flags 0x6 }
2022-09-07 00:25:15.398: [GIPCHGEN][3844073216]gipchaInterfaceDisable: disabling interface 0x7f21c4024d60 { host '', haName '5e52-0b6f-5d73-b878', local (nil), ip '11.0.0.22:23405', subnet '11.0.0.0', mask '255.255.255.0', mac 'ec-c0-1b-08-5e-b6', ifname 'eno3', numRef 0, numFail 1, idxBoot 0, flags 0x18d }
2022-09-07 00:25:15.398: [GIPCHGEN][3844073216]gipchaInterfaceDisable: disabling interface 0x7f21c4021fa0 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', local 0x7f21c4024d60, ip '11.0.0.23:19879', subnet '11.0.0.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 4, flags 0x86 }
2022-09-07 00:25:15.398: [GIPCHALO][3844073216]gipchaLowerCleanInterfaces: performing cleanup of disabled interface 0x7f21c4021fa0 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', local 0x7f21c4024d60, ip '11.0.0.23:19879', subnet '11.0.0.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 4, flags 0xa6 }
2022-09-07 00:25:15.398: [GIPCHGEN][3844073216]gipchaInterfaceReset: resetting interface 0x7f21c4021fa0 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', local 0x7f21c4024d60, ip '11.0.0.23:19879', subnet '11.0.0.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 4, flags 0xa6 }
2022-09-07 00:25:16.399: [GIPCHDEM][3844073216]gipchaWorkerCleanInterface: performing cleanup of disabled interface 0x7f21c4024d60 { host '', haName '5e52-0b6f-5d73-b878', local (nil), ip '11.0.0.22:23405', subnet '11.0.0.0', mask '255.255.255.0', mac 'ec-c0-1b-08-5e-b6', ifname 'eno3', numRef 0, numFail 0, idxBoot 0, flags 0x1ad }
通過上面的日志可以看出,兩個節點之間心跳網絡通信異常,不能各自擷取對端節點的資訊,導緻oracle執行個體程序中止。
2、系統日志
通過上述日志可以看出eno3心跳網口狀态一直在DOWN和UP之間循環,狀态不穩定。
2.4 時間:22:30出現23單節點監聽挂起
由于心跳網絡故障,兩節點無法正常通信,22:30,23節點執行個體中斷,23:38,23節點資料庫服務恢複。
2.5 時間:00:30作業結束後,更換心跳6類線
- 等業務作業運作結束後,對心跳線進行更換,更換心跳6類線。
- 22節點嘗試啟動資料庫服務,成功。
- srvctl start instance -d HDLS -i hdls1
2.6 時間:00:30 資料庫恢複正常
- 監聽狀态
-
[grid@hdls01 ~]$ srvctl status listener
Listener LISTENER is enabled
Listener LISTENER is running on node(s): hdls01,hdls02
- 資料庫執行個體狀态
[grid@hdls01 ~]$ ps -ef |grep -Ei "ora_"
oracle 13648 1 0 15:05 ? 00:00:00 ora_w002_hdls1
oracle 20745 1 0 00:48 ? 00:00:08 ora_pmon_hdls1
oracle 20747 1 0 00:48 ? 00:00:02 ora_psp0_hdls1
oracle 20749 1 0 00:48 ? 00:01:46 ora_vktm_hdls1
oracle 20754 1 0 00:48 ? 00:00:00 ora_gen0_hdls1
oracle 20756 1 0 00:48 ? 00:00:08 ora_diag_hdls1
oracle 20758 1 0 00:48 ? 00:00:03 ora_dbrm_hdls1
oracle 20760 1 0 00:48 ? 00:00:01 ora_ping_hdls1
oracle 20762 1 0 00:48 ? 00:00:00 ora_acms_hdls1
oracle 20764 1 0 00:48 ? 00:02:48 ora_dia0_hdls1
oracle 20766 1 0 00:48 ? 00:01:36 ora_lmon_hdls1
oracle 20768 1 0 00:48 ? 00:00:17 ora_lmd0_hdls1
oracle 20770 1 0 00:48 ? 00:01:17 ora_lms0_hdls1
oracle 20774 1 0 00:48 ? 00:01:18 ora_lms1_hdls1
oracle 20778 1 0 00:48 ? 00:01:14 ora_lms2_hdls1
oracle 20782 1 0 00:48 ? 00:01:14 ora_lms3_hdls1
oracle 20786 1 0 00:48 ? 00:01:14 ora_lms4_hdls1
oracle 20790 1 0 00:48 ? 00:00:00 ora_rms0_hdls1
oracle 20792 1 0 00:48 ? 00:00:01 ora_lmhb_hdls1
oracle 20794 1 0 00:48 ? 00:00:14 ora_mman_hdls1
oracle 20796 1 0 00:48 ? 00:00:03 ora_dbw0_hdls1
oracle 20798 1 0 00:48 ? 00:00:03 ora_dbw1_hdls1
oracle 20800 1 0 00:48 ? 00:00:03 ora_dbw2_hdls1
oracle 20802 1 0 00:48 ? 00:00:03 ora_dbw3_hdls1
oracle 20804 1 0 00:48 ? 00:00:03 ora_dbw4_hdls1
oracle 20806 1 0 00:48 ? 00:00:03 ora_dbw5_hdls1
oracle 20808 1 0 00:48 ? 00:00:03 ora_dbw6_hdls1
oracle 20810 1 0 00:48 ? 00:00:03 ora_dbw7_hdls1
oracle 20812 1 0 00:48 ? 00:00:03 ora_dbw8_hdls1
oracle 20814 1 0 00:48 ? 00:00:03 ora_dbw9_hdls1
oracle 20816 1 0 00:48 ? 00:00:03 ora_dbwa_hdls1
oracle 20818 1 0 00:48 ? 00:00:03 ora_dbwb_hdls1
oracle 20820 1 0 00:48 ? 00:01:30 ora_lgwr_hdls1
oracle 20822 1 0 00:48 ? 00:00:34 ora_ckpt_hdls1
oracle 20824 1 0 00:48 ? 00:00:15 ora_smon_hdls1
oracle 20826 1 0 00:48 ? 00:00:00 ora_reco_hdls1
oracle 20828 1 0 00:48 ? 00:00:00 ora_rbal_hdls1
oracle 20830 1 0 00:48 ? 00:00:00 ora_asmb_hdls1
oracle 20832 1 0 00:48 ? 00:00:45 ora_mmon_hdls1
oracle 20834 1 0 00:48 ? 00:01:01 ora_mmnl_hdls1
oracle 20838 1 0 00:48 ? 00:00:00 ora_d000_hdls1
oracle 20840 1 0 00:48 ? 00:00:00 ora_mark_hdls1
oracle 20842 1 0 00:48 ? 00:00:00 ora_s000_hdls1
oracle 20899 1 0 00:48 ? 00:00:27 ora_lck0_hdls1
oracle 20901 1 0 00:48 ? 00:00:01 ora_rsmn_hdls1
oracle 20916 1 0 00:48 ? 00:00:06 ora_o000_hdls1
oracle 21003 1 0 00:48 ? 00:00:00 ora_arc0_hdls1
oracle 21005 1 0 00:48 ? 00:00:00 ora_arc1_hdls1
oracle 21007 1 0 00:48 ? 00:00:01 ora_arc2_hdls1
oracle 21009 1 0 00:48 ? 00:00:00 ora_arc3_hdls1
oracle 21053 1 0 00:48 ? 00:00:06 ora_o001_hdls1
oracle 21251 1 0 00:48 ? 00:00:25 ora_nsa2_hdls1
oracle 21253 1 0 00:48 ? 00:00:06 ora_o002_hdls1
oracle 21264 1 0 00:48 ? 00:00:00 ora_gtx0_hdls1
oracle 21268 1 0 00:48 ? 00:00:01 ora_rcbg_hdls1
oracle 21274 1 0 00:48 ? 00:00:00 ora_qmnc_hdls1
oracle 21296 1 0 00:48 ? 00:00:00 ora_q000_hdls1
oracle 21349 1 0 00:48 ? 00:00:04 ora_cjq0_hdls1
oracle 22074 1 0 00:49 ? 00:00:00 ora_q002_hdls1
oracle 26225 1 0 00:53 ? 00:00:00 ora_smco_hdls1
oracle 54283 1 0 15:56 ? 00:00:00 ora_j000_hdls1
oracle 71251 1 0 16:17 ? 00:00:00 ora_w001_hdls1
oracle 72429 1 0 16:18 ? 00:00:00 ora_pz99_hdls1
oracle 72500 1 0 16:18 ? 00:00:00 ora_j001_hdls1
grid 74000 71958 0 16:19 pts/1 00:00:00 grep --color=auto -Ei ora_
[grid@hdls01 ~]$
2.7 處理後RAC狀态檢查
- 檢查rac叢集服務
[grid@hdls01 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.ARCHLOG.dg ora....up.type ONLINE ONLINE hdls01
ora.DATA.dg ora....up.type ONLINE ONLINE hdls01
ora....ER.lsnr ora....er.type ONLINE ONLINE hdls01
ora....N1.lsnr ora....er.type ONLINE ONLINE hdls02
ora.OCRVT.dg ora....up.type ONLINE ONLINE hdls01
ora.asm ora.asm.type ONLINE ONLINE hdls01
ora.cvu ora.cvu.type ONLINE ONLINE hdls02
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora.hdls.db ora....se.type ONLINE ONLINE hdls01
ora....SM1.asm application ONLINE ONLINE hdls01
ora....01.lsnr application ONLINE ONLINE hdls01
ora.hdls01.gsd application OFFLINE OFFLINE
ora.hdls01.ons application ONLINE ONLINE hdls01
ora.hdls01.vip ora....t1.type ONLINE ONLINE hdls01
ora....SM2.asm application ONLINE ONLINE hdls02
ora....02.lsnr application ONLINE ONLINE hdls02
ora.hdls02.gsd application OFFLINE OFFLINE
ora.hdls02.ons application ONLINE ONLINE hdls02
ora.hdls02.vip ora....t1.type ONLINE ONLINE hdls02
ora....network ora....rk.type ONLINE ONLINE hdls01
ora.oc4j ora.oc4j.type ONLINE ONLINE hdls01
ora.ons ora.ons.type ONLINE ONLINE hdls01
ora.scan1.vip ora....ip.type ONLINE ONLINE hdls02
- 檢查資料庫
[grid@hdls01 ~]$ srvctl status listener
Listener LISTENER is enabled
Listener LISTENER is running on node(s): hdls01,hdls02
SQL> select name,status from v$datafile;
NAME STATUS
-------------------------------------------------------------------------------- -------
+DATA/hdls/datafile/system.306.1100288753 SYSTEM
+DATA/hdls/datafile/sysaux.264.1100288753 ONLINE
+DATA/hdls/datafile/undotbs1.263.1100288753 ONLINE
+DATA/hdls/datafile/users.260.1100288753 ONLINE
+DATA/hdls/datafile/undotbs2.277.1100288833 ONLINE
+DATA/hdls/oauser01.dbf ONLINE
+DATA/hdls/hdls2001.dbf ONLINE
+DATA/hdls/oa01.dbf ONLINE
+DATA/hdls/hdls01.dbf ONLINE
+DATA/hdls/hdls02.dbf ONLINE
+DATA/hdls/hdls03.dbf ONLINE
+DATA/hdls/hdls04.dbf ONLINE
+DATA/hdls/hdls05.dbf ONLINE
+DATA/hdls/hdls06.dbf ONLINE
+DATA/hdls/hdls07.dbf ONLINE
+DATA/hdls/hdls08.dbf ONLINE
+DATA/hdls/hdls09.dbf ONLINE
+DATA/hdls/hdls10.dbf ONLINE
+DATA/hdls/others01.dbf ONLINE
+DATA/hdls/others02.dbf ONLINE
+DATA/hdls/others03.dbf ONLINE
+DATA/hdls/indx01.dbf ONLINE
+DATA/hdls/indx02.dbf ONLINE
+DATA/hdls/indx03.dbf ONLINE
+DATA/hdls/indx04.dbf ONLINE
+DATA/hdls/indx05.dbf ONLINE
+DATA/hdls/indx06.dbf ONLINE
+DATA/hdls/indx07.dbf ONLINE
+DATA/hdls/hdls131701.dbf ONLINE
+DATA/hdls/hdls131702.dbf ONLINE
+DATA/hdls/hdls131703.dbf ONLINE
+DATA/hdls/hdls131704.dbf ONLINE
+DATA/hdls/hdls131705.dbf ONLINE
+DATA/hdls/hdls131706.dbf ONLINE
+DATA/hdls/cdc01.dbf ONLINE
35 rows selected.
SQL>
三、小結
1、節點之間連接配接心跳網絡的網線有問題,導緻心跳網絡異常,RAC節點之間不能正常通信,腦裂,ORACLE的服務被中止。RAC叢集為了保證一緻性和完整性,在心跳網絡異常的情況下,會發生腦裂,ORACLE執行個體會被強制中止。
2、更換心跳6類線後,資料庫恢複正常。