天天看點

ORA-04031: unable to allocate 13840 bytes of shared memory (“shared pool“,“unknown object“,“sga heap

還是年前的時候,績效系統一直告警,檢視了下,庫上一直報ora-4031,很常見的錯誤,當時下意識的認為是記憶體不足了。

應該績效手上在exadata一體機上,該一體機上還裝了其他幾套系統,記憶體資源比較緊張。

當時flush了下shared_pool,但沒啥效果,因為是rac環境,便依次重新開機了執行個體,故障就消失了。

沒過多久,績效資料庫又出現了這樣的情況,共享池不足,就感覺不對勁了,但當時是年底了,馬上就過年了,沒啥心思探究。

結果過年期間,同屬于exadata上crm系統也出現這樣的情況,基本确定确實有問題了。

年後回來事情一直比較多,今天下午得空排查了一下,發現share pool一直在增長。大概可以推算出,由于共享池無節制的增長,導緻buffer cache一直在被調縮,

直至share pool占據了絕大多數的sga,無法再增長,繼而報錯,導緻資料庫無法通路。

資料庫版本及更新檔

12.2.0.1181016

在MOS上搜了下,相關bug還挺多,到處瞧到處看,找到如下文檔:

Bug 27824540 - ORA-4031 Error In Shared Pool Due To Leakage Of 'ges resource dynamic' Chunk In RAC Env (Doc ID 27824540.8)

Too many objects "ges resource dynamic" were allocated in the shared pool eventually failing with errors like: ORA-04031 

Monitoring "ges resource  dynamic" growth via the following shows a general upward trend: 

select inst_id, name, round(bytes/(1024*1024*1024),1) in_gb from gv$sgastat where name = 'ges resource dynamic'; 

- example after an instance restart: 

   INST_ID NAME                            IN_GB 

---------- -------------------------- ---------- 

         1 ges resource dynamic               .2 

         2 ges resource dynamic               .4 

         3 ges resource dynamic               .7 

- example after several days of uptime shows upward trend : 

   INST_ID NAME                            IN_GB 

---------- -------------------------- ---------- 

         1 ges resource dynamic              8.6  

         2 ges resource dynamic             13.2 

         3 ges resource dynamic             16.1 

Note: 

When not getting 4031, but proactive monitoring DB performance using AWR diff reports, just check in section : "SGA Breakdown Difference", 

and notice the %Diff values that will show the memory leak

Rediscovery Notes

-- instance alert log shows :

Errors in file /<path>/diag/rdbms/<db_name>/<oracle_sid>/trace/<oracle_sid>_lmd1_<pid>.trc  (incident=nnnnnnn):

ORA-04031: unable to allocate 13840 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","ges resource dynamic")

Incident details in: /<path>/diag/rdbms/<db_name>/<oracle_sid>/incident/incdir_NNNNNN/<oracle_sid>_lmd1_<pid>_<iNNNNNN>.trc

-- incident trc file shows :

ORA-04031: unable to allocate 13840 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","ges resource dynamic") 

============================================= 

TOP 10 MEMORY USES FOR SGA HEAP SUB POOL 1 

---------------------------------------------- 

"ges resource dynamic       "      10 GB 55% 

"free memory                    "  4426 MB 25% 

"ges enqueues                   "  1204 MB  7% 

"gcs resources                   "   371  MB  2% 

============================================ 

TOP 10 MAXIMUM MEMORY USES FOR SGA HEAP SUB POOL 1 

---------------------------------------------- 

"ges resource dynamic           "    10 GB 

"SQLA                           "  7254 MB 

"free memory                    "  5676 MB 

-- If a problem fulfills all of the conditions below, it is a duplicate of this problem.

 1. Many objects "ges resource dynamic" are allocated in the shared pool.

 2. Trace file of LMHB process traced Action 11 (kjgcr_GrowResourceCache)  was executed, and was not resetted.

 ============================================================

 kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth

 ============================================================

 3. Trace file of LMHB process keep tracing failure of metric 7 (check lck heartbeat).

 ============================================================

 kjgcr_ChkGlobalMetric: metric 7 (check lck heartbeat) failed for dbname ISPP01, inst 2, node 2 

 ============================================================

Workaround

  There are 2 possible workarounds.

 1) Disable the action 11.

 SQL> oradebug setorapname LMHB

 SQL> oradebug dyn_gcr -a 11 -disable

 Note: 

 - This oradebug command is available on 12.2 and later.

 - To keep it persistent across instance restarts one can do as below :

   $ sqlplus "/ as sysdba"

      SQL> alter system set "_oradebug_cmds_at_startup"='dyn_gcr -a 11 -disable' scope=spfile sid='*';

   Then restart database to make change take effect.

 2) Disable the GES resource cache

 Set the initialization parameter "_ges_direct_free" to TRUE.

 Note: 

 - Completely disabling GES resource cache may lead to some other side affects like contention on TM lock for insert statements, so use this workaround with

caution if there is still a need.

-----------------

在庫上執行了這條sql

select inst_id, name, round(bytes/(1024*1024*1024),1) in_gb from gv$sgastat where name = 'ges resource dynamic'; 

發現ges resource dynamic已經占了5,6g的記憶體,但其他幾套12.2環境,更新檔集是200414,不存在這樣的問題。

打算采用workaround的方式來處理下,即

alter system set "_oradebug_cmds_at_startup"='dyn_gcr -a 11 -disable' scope=spfile sid='*';

--------------------------------------

年後回來,遇到了好幾個oracle rac方面的bug,腦殼疼,後面有時間就整理記錄下來。

繼續閱讀