客戶的環境資料庫版本oracle12.2.0.1,redhat7.9,單機模式。資料庫出現當機,手動重新開機後恢複正常。記錄如下:
錯誤日志,檢視alert日志資訊。
2023-06-11T22:15:10.128319+08:00
Archived Log entry 66582 added for T-1.S-66573 ID 0x8bb76fc1 LAD:1
2023-06-11T22:15:11.809567+08:00
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x8] [PC:0x10815F0C, kglrfcl()+492] [flags: 0x0, count: 1]
Errors in file /data1/oradata/diag/rdbms/oradb/oradb/trace/oradb_ora_18036.trc (incident=240700):
ORA-07445: 出現異常錯誤: 核心轉儲 [kglrfcl()+492] [SIGSEGV] [ADDR:0x8] [PC:0x10815F0C] [Address not mapped to object] []
Incident details in: /data1/oradata/diag/rdbms/oradb/oradb/incident/incdir_240700/oradb_ora_18036_i240700.trc
Dumping diagnostic data in directory=[cdmp_20230611221512], requested by (instance=1, osid=18036), summary=[incident=240700].
2023-06-11T22:15:18.386955+08:00
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x8] [PC:0x10815F0C, kglrfcl()+492] [flags: 0x0, count: 1]
Errors in file /data1/oradata/diag/rdbms/oradb/oradb/trace/oradb_clmn_7090.trc (incident=240028):
ORA-07445: 出現異常錯誤: 核心轉儲 [kglrfcl()+492] [SIGSEGV] [ADDR:0x8] [PC:0x10815F0C] [Address not mapped to object] []
Incident details in: /data1/oradata/diag/rdbms/oradb/oradb/incident/incdir_240028/oradb_clmn_7090_i240028.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2023-06-11T22:15:19.270397+08:00
Errors in file /data1/oradata/diag/rdbms/oradb/oradb/trace/oradb_clmn_7090.trc:
ORA-00602: 内部程式設計異常錯誤
ORA-07445: 出現異常錯誤: 核心轉儲 [kglrfcl()+492] [SIGSEGV] [ADDR:0x8] [PC:0x10815F0C] [Address not mapped to object] []
Errors in file /data1/oradata/diag/rdbms/oradb/oradb/trace/oradb_clmn_7090.trc (incident=240029):
ORA-602 [] [] [] [] [] [] [] [] [] [] [] []
Incident details in: /data1/oradata/diag/rdbms/oradb/oradb/incident/incdir_240029/oradb_clmn_7090_i240029.trc
2023-06-11T22:15:19.596187+08:00
Dumping diagnostic data in directory=[cdmp_20230611221519], requested by (instance=1, osid=7090 (CLMN)), summary=[incident=240028].
2023-06-11T22:15:19.693430+08:00
USER (ospid: 7090): terminating the instance due to error 602
2023-06-11T22:15:19.771330+08:00
opiodr aborting process unknown ospid (23847) as a result of ORA-1092
2023-06-11T22:15:20.945003+08:00
System state dump requested by (instance=1, osid=7090 (CLMN)), summary=[abnormal instance termination].
System State dumped to trace file /data1/oradata/diag/rdbms/oradb/oradb/trace/oradb_diag_7109_20230611221520.trc
2023-06-11T22:15:25.917215+08:00
Instance terminated by USER, pid = 7090
前台程序18036程序crash觸發ora-07445錯誤,系統調用clmn(7090)程序去清理此程序的時候
clmn程序也出現了crash。最終執行個體當機。
檢視oradb_ora_18036_i240700.trc發現call stack資訊如下。
2023-06-11 22:15:12.775 :kjzduptcctx(): Notifying DIAG for crash event (incident=240700)
----- Abridged Call Stack Trace -----
ksedsts()+346<-kjzduptcctx()+868<-kjzdpcrshnfy()+380<-dbkedKstDump()+27<-dbgdaExecuteAction()+354<-dbgerRunAction()+108<-dbgerRunActions()+3719<-dbgexPhaseII()+1688<-dbgexExplicitEndInc()+602<-dbgeEndDDEInvocationImpl()+658<-ssexhd()+3274<-sslssSynchHdlr()+399
<-sslsshandler()+118<-__sighandler()<-kglrfcl()+492<-kglUnsetHandleReference()+120<-kglhdda()+777<-kghfreup()+172<-kghfrunp()+1505<-kghfnd()+984
----- End of Abridged Call Stack Trace -----
----- END DDE Action: 'dumpKSTBuffers' (SUCCESS, 3 csec) -----
故障分析
2023-06-11T22:15:11.809567+08:00 出現異常告警ORA-07445 ,18036用戶端程序異常
18036程序資訊:
Unix process pid: 18036, image: oracle@TEST01
*** 2023-06-11T22:15:11.808756+08:00
*** SESSION ID:(1370.29059) 2023-06-11T22:15:11.808772+08:00
*** CLIENT ID:() 2023-06-11T22:15:11.808775+08:00
*** SERVICE NAME:(SYS$USERS) 2023-06-11T22:15:11.808778+08:00
*** MODULE NAME:(JDBC Thin Client) 2023-06-11T22:15:11.808781+08:00
*** ACTION NAME:() 2023-06-11T22:15:11.808784+08:00
*** CLIENT DRIVER:(jdbcthin) 2023-06-11T22:15:11.808786+08:00
clmn程序去清理失敗程序的時候觸發bug,導緻自身也crash。
ORA-00602: 内部程式設計異常錯誤
ORA-07445: 出現異常錯誤: 核心轉儲 [kglrfcl()+492] [SIGSEGV] [ADDR:0x8] [PC:0x10815F0C] [Address not mapped to object] []
2023-06-11T22:15:19.291357+08:00
Incident 240029 created, dump file: /data1/oradata/diag/rdbms/oradb/oradb/incident/incdir_240029/oradb_clmn_7090_i240029.trc
ORA-602 [] [] [] [] [] [] [] [] [] [] [] []
2023-06-11 22:15:19.685 :kjzduptcctx(): Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+346<-kjzduptcctx()+868<-kjzdicrshnfy()+1113<-ksuitm_opt()+1678<-ksbrdp()+4494<-opirip()+609<-opidrv()+602<-sou2o()+145<-opimai_real()+202<-ssthrdmain()+417<-main()+262<-__libc_start_main()+245
----- End of Abridged Call Stack Trace -----
*** 2023-06-11T22:15:19.693369+08:00
USER (ospid: 7090): terminating the instance due to error 602
ksuitm: waiting up to [5] seconds before killing DIAG(7109)
最終clmn(7090)程序終止執行個體
System State dumped to trace file /data1/oradata/diag/rdbms/oradb/oradb/trace/oradb_diag_7109_20230611221520.trc
2023-06-11T22:15:25.917215+08:00
Instance terminated by USER, pid = 7090
執行個體重新開機後恢複正常。
背景程序 PMON, Cleanup Main Process (CLMN), 和 Cleanup Helper Processes (CLnn)。主要負責監控以及清理其他的程序,在某些情況下 PMON 程序組也負責對損壞的不可恢複的資源進行隔離,避免資料庫執行個體關閉。 PMON 程序檢測背景程序是否終止,并負責對必要的程序嘗試恢複。PMON 會将清理工作委托給 CLMN 程序作為清理工作的主程序,CLMN 會周期性的去清理中斷的程序、會話、事務、網絡連接配接、空閑回話以及逾時的網絡連接配接和事務等。CLnn 協助 CLMN 清理中斷的程序和會話。CLnn 可以存在多個,數量和目前需要清理的工作量和目前清理效率成正比。
檢視mos後得出報錯資訊和此文檔一緻:Bug 29458132 Instance Might Crash With ORA-07445 [kglrfcl]
Affects:
Product (Component)
Oracle Server (Rdbms)
Range of versions believed to be affected
Versions BELOW 21.3
Versions confirmed as being affected
- 19.9.0
- 19.8.0
- 19.7.0
- 19.6.0
- 19.1.0
- 18.13.0
- 18.12.0
- 18.11.0
- 18.4.0
- 12.2.0.1 (Base Release)
- 12.1.0.2 (Server Patch Set)
Platforms affected
Generic (all / most platforms affected)
Fixed:
The fix for 29458132 is first included in
- 19.10.0.0.210119 (Jan 2021) Database Release Update (DB RU)
- 12.1.0.2.211019 (OCT 2021) Database Patch Set Update
- 12.1.0.2.211019 (OCT 2021) Database Proactive Bundle Patch
-
12.1.0.2.211019 (Oct 2021) Bundle Patch for Windows Platforms
Interim patches may be available for earlier versions - click here to check.
Related To:
- Instance May Crash
- Process May Dump (ORA-7445) / Abend / Abort
- Memory Corruption
- Dump in or under kglrfcl
- Stack is likely to include kglrfcl
- (None Specified)
Description
A foreground process might crash with ORA-7445 [kglrfcl]. 和故障的資訊一緻
CLMN tries to cleanup the dead process, but fails with the
same error, hence instance crashes.
Call stack might include:
... kglrfcl kglUnsetHandleReference kglhdda ... 和故障的call stack資訊一緻
REDISCOVERY INFORMATION:
If a process fails with ORA-7445 [kglrfcl] and CLMN crashes
while cleaning the dead process, this bug might be hit