客戶現場,電信相關7*24h業務,某省份資料庫二節點alert不斷産生ORA-7445報錯,并抛出Trace dump檔案、Trc檔案。一節點alert并無7445相關報錯,僅有core檔案産生。crs、syslog等日志無明顯告警。沒有引起重新開機,沒有引起業務方面的影響。隻是不斷報錯,由于Oracle目錄空間40G,為了防止資料庫執行個體夯死,客戶還是決定解決該報錯。
客戶主機HP-UX B.11.31 ia64,Oracle Release 10.2.0.5.0。
節點二alert:
Thu Nov 24 18:27:38 EAT 2016
Errors in file /oracle/admin/essbj/udump/essbj2_ora_17889.trc:
ORA-07445: exception encountered: core dump [kghsrch()+128] [SIGSEGV] [Address not mapped to object] [0xFFFFFFFFFFFFFFF0] [] []
Thu Nov 24 18:27:40 EAT 2016
Trace dumping is performing id=[cdmp_20161124182740]
Thu Nov 24 18:28:29 EAT 2016
Errors in file /oracle/admin/essbj/udump/essbj2_ora_18076.trc:
Thu Nov 24 18:29:03 EAT 2016
Errors in file /oracle/admin/essbj/udump/essbj2_ora_19496.trc:
Thu Nov 24 18:43:22 EAT 2016
Errors in file /oracle/admin/essbj/udump/essbj2_ora_12403.trc:
Thu Nov 24 18:44:37 EAT 2016
Errors in file /oracle/admin/essbj/udump/essbj2_ora_14293.trc:
Thu Nov 24 18:46:41 EAT 2016
Errors in file /oracle/admin/essbj/udump/essbj2_ora_17336.trc:
Thu Nov 24 18:50:20 EAT 2016
Errors in file /oracle/admin/essbj/udump/essbj2_ora_20339.trc:
Thu Nov 24 18:50:22 EAT 2016
Trace dumping is performing id=[cdmp_20161124185022]
Thu Nov 24 18:53:06 EAT 2016
Errors in file /oracle/admin/essbj/udump/essbj2_ora_23424.trc:
Thu Nov 24 19:06:40 EAT 2016
Errors in file /oracle/admin/essbj/udump/essbj2_ora_18689.trc:
Thu Nov 24 19:38:31 EAT 2016
Errors in file /oracle/admin/essbj/udump/essbj2_ora_6449.trc:
Thu Nov 24 19:38:33 EAT 2016
Trace dumping is performing id=[cdmp_20161124193833]
Thu Nov 24 19:39:49 EAT 2016
Errors in file /oracle/admin/essbj/udump/essbj2_ora_10964.trc:
Thu Nov 24 19:39:53 EAT 2016
Errors in file /oracle/admin/essbj/udump/essbj2_ora_9299.trc:
Thu Nov 24 19:41:19 EAT 2016
Errors in file /oracle/admin/essbj/udump/essbj2_ora_12404.trc:
Thu Nov 24 19:42:47 EAT 2016
Errors in file /oracle/admin/essbj/udump/essbj2_ora_13863.trc:
節點一alert:
Fri Nov 25 11:16:05 EAT 2016
Trace dumping is performing id=[cdmp_20161125111604]
Fri Nov 25 12:00:05 EAT 2016
Thread 1 advanced to log sequence 34397 (LGWR switch)
Current log# 8 seq# 34397 mem# 0: /vgbj03/oradata/essbj/vgbj03_1_rd81.log
Current log# 8 seq# 34397 mem# 1: /vgbj04/oradata/essbj/vgbj04_1_rd82.log
Fri Nov 25 12:00:51 EAT 2016
Errors in file /oracle/admin/essbj/bdump/essbj1_j000_11783.trc:
ORA-12012: error on auto execute of job 281
ORA-00031: session marked for kill
ORA-06512: at "SYS.KILL_LONG_SQL", line 67
ORA-06512: at line 1
Fri Nov 25 12:17:15 EAT 2016
Trace dumping is performing id=[cdmp_20161125121714]
Fri Nov 25 12:37:21 EAT 2016
Errors in file /oracle/admin/essbj/bdump/essbj1_j000_5909.trc:
Fri Nov 25 12:50:53 EAT 2016
Trace dumping is performing id=[cdmp_20161125125053]
Fri Nov 25 13:33:53 EAT 2016
Trace dumping is performing id=[cdmp_20161125133352]
節點二中産生的trc檔案,進行了檢視,指出了一條sql
ksedmp: internal or fatal error
Current SQL statement for this session:
SELECT OCCUPY_STAFF_ID, OCCUPY_DEPART_ID FROM TF_R_TEMPOCCUPY WHERE RES_NO = :1 AND RES_TYPE_CODE = :2 AND RSRV_TAG1 = :3
打算将這個sql單獨執行,并且tail -f alert日志檢視是否是該SQL問題引起的7445相關bug報錯。因為該SQL有綁定變量。我們需要解析一下變量值,然後在執行該sql語句。
alter session set nls_date_format = 'yyyy-mm-dd,hh24:mi:ss';
set linesize 400
col sql_Id format a20
col name format a20
col datatype_string format a14
col value_string format a20
select sql_id,name, datatype_string, last_captured,value_string
from v$sql_bind_capture where sql_id='&sql_id' order by LAST_CAPTURED,POSITION;
将變量帶入SQL單獨執行後,并未發生報錯。在mos上查到跟trace檔案中内容最相近的文檔如下:
SMON Terminates With ORA-7445 [Kghsrch()+128] (文檔 ID 1189894.1)
Bug 10036960 : ORA-07445[KGHSRCH] AND ORA-07445[KGHLKREMF] FOLLOWED BY INSTANCE CRASH.
ORA-07445 [kghsrch] Associated With Session Kills (文檔 ID 787914.1)
Instance Termination with ORA-07445 [kghsrch()+144], ORA-00600 [kghfrh:ds] (文檔 ID 2128933.1)
但由于目前資料庫隻是報錯,并未造成執行個體down的後果,是以排除跟mos文檔bug完全一緻。且該系統本身資料庫版本已經是10.2.0.5,bug中描述的fix版本是該版本。
<a href="http://s1.51cto.com/wyfs02/M02/8A/EE/wKiom1g_g6KT6gDVAACakxBte8E147.jpg-wh_500x0-wm_3-wmp_4-s_1950165466.jpg" target="_blank"></a>
<a href="http://s5.51cto.com/wyfs02/M02/8A/EA/wKioL1g_g6Pjb1ynAACnkDpgYzM670.jpg-wh_500x0-wm_3-wmp_4-s_725318753.jpg" target="_blank"></a>
<a href="http://s5.51cto.com/wyfs02/M02/8A/EA/wKioL1g_g6ODO1s8AADYRGhnKkU297.jpg-wh_500x0-wm_3-wmp_4-s_3752148690.jpg" target="_blank"></a>
<a href="http://s3.51cto.com/wyfs02/M00/8A/EE/wKiom1g_g6TAROGjAAD2vJete_c663.jpg-wh_500x0-wm_3-wmp_4-s_1833915713.jpg" target="_blank"></a>
<a href="http://s3.51cto.com/wyfs02/M00/8A/EE/wKiom1g_g6XgHUf9AACUtcIum08307.jpg-wh_500x0-wm_3-wmp_4-s_2078979112.jpg" target="_blank"></a>
1.系統現有遇到的07445的問題與oracle官方提供的bug相關資訊隻有部分吻合;
2.其提供的解決方案與資料庫目前環境并不吻合(如建議由10gR2更新至10.2.0.4或10.2.0.5,但現時生産資料庫本身即為10.2.0.5);
3.資料庫沒有出現如bug提到的執行個體crash等,不影響業務的正常進行。
是以該異常的抛出可能與業務通路對象在記憶體中結構出現意外情況有關,且因為其不斷生成trace檔案以及cdump檔案,建議客戶重新開機抛出問題執行個體(執行個體2),再繼續執行業務。然後檢視是否有對應的7445報錯産生。客戶給出時間視窗後對該執行個體進行重新開機。
Tue Nov 29 21:38:37 EAT 2016
ALTER DATABASE OPEN
Block change tracking file is current.
Picked broadcast on commit scheme to generate SCNs
Tue Nov 29 21:38:38 EAT 2016
Sending CIC to internal enable redo thread
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=33, OS id=14097
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=34, OS id=14099
Thread 2 opened at log sequence 32257
Current log# 12 seq# 32257 mem# 0: /vgbj03/oradata/essbj/vgbj03_1_rd121.log
Current log# 12 seq# 32257 mem# 1: /vgbj04/oradata/essbj/vgbj04_1_rd122.log
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Successful open of redo thread 2
ARC1: Becoming the heartbeat ARCH
Starting background process CTWR
CTWR started with pid=35, OS id=14101
Block change tracking service is active.
SMON: enabling cache recovery
Tue Nov 29 21:38:39 EAT 2016
Successfully onlined Undo Tablespace 4.
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Opening with internal Resource Manager plan
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=36, OS id=14127
Tue Nov 29 21:38:43 EAT 2016
Completed: ALTER DATABASE OPEN
Tue Nov 29 21:38:52 EAT 2016
ALTER SYSTEM SET service_names='essbj_db' SCOPE=MEMORY SID='essbj2';
Tue Nov 29 21:48:47 EAT 2016
ALTER SYSTEM SET service_names='essbj_db','essbj' SCOPE=MEMORY SID='essbj2';
Wed Nov 30 00:16:10 EAT 2016
ALTER SYSTEM ARCHIVE LOG
Thread 2 advanced to log sequence 32258 (LGWR switch)
Current log# 4 seq# 32258 mem# 0: /vgbj04/oradata/essbj/vgbj04_1_rd41.log
Current log# 4 seq# 32258 mem# 1: /vgbj03/oradata/essbj/vgbj03_1_rd42.log
Wed Nov 30 00:16:15 EAT 2016
Wed Nov 30 00:16:17 EAT 2016
Thread 2 advanced to log sequence 32259 (LGWR switch)
Current log# 3 seq# 32259 mem# 0: /vgbj04/oradata/essbj/vgbj04_1_rd31.log
Current log# 3 seq# 32259 mem# 1: /vgbj03/oradata/essbj/vgbj03_1_rd32.log
Wed Nov 30 02:00:27 EAT 2016
Thread 2 advanced to log sequence 32260 (LGWR switch)
Current log# 5 seq# 32260 mem# 0: /vgbj03/oradata/essbj/vgbj03_1_rd51.log
Current log# 5 seq# 32260 mem# 1: /vgbj04/oradata/essbj/vgbj04_1_rd52.log
Wed Nov 30 06:13:11 EAT 2016
Thread 2 advanced to log sequence 32261 (LGWR switch)
Current log# 10 seq# 32261 mem# 0: /vgbj03/oradata/essbj/vgbj03_1_rd101.log
Current log# 10 seq# 32261 mem# 1: /vgbj04/oradata/essbj/vgbj04_1_rd102.log
再無任何相關報錯。
本文轉自yangjunfeng 51CTO部落格,原文連結:http://blog.51cto.com/yangjunfeng/1878374