【故障處理】隊列等待之enq IV - contention案例

各位技術愛好者，看完本文後，你可以掌握如下的技能，也可以學到一些其它你所不知道的知識，~O(∩_∩)O~：

① 隊列等待之enq IV - contention案例（重點）

Tips：

③ 若網頁文章代碼格式有錯亂，請下載下傳pdf格式的文檔來閱讀。

④ 在本篇BLOG中，代碼輸出部分一般放在一行一列的表格中。

本文如有錯誤或不完善的地方請大家多多指正，ITPUB留言或QQ皆可，您的批評指正是我寫作的最大動力。

項目

source db

db 類型

RAC

db version

12.1.0.2.0

db 存儲

ASM

OS版本及kernel版本

SuSE Linux Enterprise Server（SLES 11） 64位

這裡簡單分析一下Up Time(hrs)，其它幾個名額都很熟悉了。表中的“Up Time(hrs)”代表自資料庫執行個體啟動到本次快照結束這段時間的小時數。例如，該AWR中資料庫執行個體1的啟動時間為“2016-08-11 20:51”，快照結束時間為“2016-12-14 21:00”，故“Up Time(hrs)”約為125.006天，轉換為小時約為3000.14小時，如下所示：

SYS@lhrdb> SELECT trunc(UP_TIME_D,3), trunc(trunc(UP_TIME_D,3)*24,2) UP_TIME_HRS FROM (SELECT (TO_DATE('2016-12-14 21:00', 'YYYY-MM-DD HH24:MI') - TO_DATE('2016-08-11 20:51', 'YYYY-MM-DD HH24:MI')) UP_TIME_D FROM DUAL);

TRUNC(UP_TIME_D,3) UP_TIME_HRS

------------------ -----------

125.006 3000.14

可以看到節點1的負載較大，而ADDM中，特殊類的等待事件較多。接下來檢視等待事件的部分：

可以看到enq: IV - contention和DFS lock handle等待較為嚴重。這裡需要說一下enq: IV - contention這個名稱。在AWR中，IV和-的前後都是1個空格，而在資料庫中記錄的是-之後有2個空格，如下：

是以，采用搜尋的時候需要注意。

根據ASH中的p1參數的值獲得：

SYS@lhrdb> SELECT CHR(BITAND(1230372867, -16777216) / 16777215) ||

2 CHR(BITAND(1230372867, 16711680) / 65535) "LOCK",

3 BITAND(1230372867, 65535) "MODE"

4 FROM DUAL;

LO MODE

-- ----------

IV 3

SELECT *

FROM V$EVENT_NAME A

WHERE A.NAME LIKE '%enq: IV - contention%';

該等待事件為12c特有，12c相比11g多了大約500多個等待事件。該問題參考MOS：12c RAC DDL Performance Issue: High "enq: IV - contention" etc if CPU Count is Different (文檔 ID 2028503.1)

The fix will be included in future PSU, patch exists for certain platform/version.

The workaround is to set the following parameter to the highest value in the cluster and restart:

_ges_server_processes

To find out the highest value, run the following grep on each node:

ps -ef| grep lmd

該等待事件主要是由于12c RAC的2個節點上的cpu_count這個變量不一緻導緻的。

從AWR中可以看出節點1的CPU為48，而節點2的CPU為96。

從dba_hist_parameter中可以看到CPU_COUNT這個參數的變化曆史：

SQL> SHOW PARAMETER CPU

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

cpu_count integer 96

parallel_threads_per_cpu integer 2

resource_manager_cpu_allocation integer 96

SQL> select snap_id, INSTANCE_NUMBER,PARAMETER_NAME,VALUE from dba_hist_parameter where PARAMETER_NAME='cpu_count' order by snap_id;

SNAP_ID INSTANCE_NUMBER PARAMETER_NAME VALUE

---------- --------------- ---------------------------------------------------------------- ------

。。。。。。。。。。。。。。。。。。。。。。。。。。。

3368 1 cpu_count 48

3369 1 cpu_count 48

3369 2 cpu_count 48

3370 1 cpu_count 48

3371 1 cpu_count 48

3372 1 cpu_count 48

3373 1 cpu_count 48

3374 1 cpu_count 48

3375 2 cpu_count 96

3375 1 cpu_count 48

3376 1 cpu_count 48

3376 2 cpu_count 96

3377 1 cpu_count 48

3377 2 cpu_count 96

3378 2 cpu_count 96

3378 1 cpu_count 48

3379 1 cpu_count 48

3379 2 cpu_count 96

。。。。。。。。。。。。。。。。。。。。

查詢告警日志：more alert*|grep -i Cpu 也可以擷取CPU的變化。

詢問客戶可知，是他們調整過系統的CPU資源，是以導緻節點2上的CPU_COUNT自動變化，引起了enq: IV - contention等待。

若主機的CPU個數變化，那麼當主機重新開機後資料庫的cpu_count參數的值會随之變化，該參數屬于作業系統依賴參數。

調整主機的CPU個數之後，該等待事件消失。

In this Document

<a href="https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=423866510014986&id=2028503.1&displayIndex=1&_afrWindowMode=0&_adf.ctrl-state=1dv7m43djg_125#SYMPTOM" target="_blank">Symptoms</a>

<a href="https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=423866510014986&id=2028503.1&displayIndex=1&_afrWindowMode=0&_adf.ctrl-state=1dv7m43djg_125#CAUSE" target="_blank">Cause</a>

<a href="https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=423866510014986&id=2028503.1&displayIndex=1&_afrWindowMode=0&_adf.ctrl-state=1dv7m43djg_125#FIX" target="_blank">Solution</a>

Information in this document applies to any platform.

12c RAC database seeing high "enq: IV - contention":

From awr report:

Top 10 Foreground Events by Total Wait Time

===================================

Event Waits Total Wait Time (sec) Wait Avg(ms) % DB time Wait Class

enq: IV - contention 52,914 1688.4 31.91 42.8 Other

row cache lock 44,865 896.8 19.99 22.7 Concurrency

tkprof of 10046 trace of SQL statement shows the same event in the top:

Event waited on Times Max. Wait Total Waited

---------------------------------------- Waited ---------- ------------

enq: IV - contention 6017 0.32 287.68

row cache lock 957 0.20 7.48

library cache lock 631 0.13 15.10

library cache pin 616 0.11 14.54

Cluster nodes have different CPU count resulting in different number of LMD processes, on one node it has two while on the other it has three.

The issue is due to the following bug:

Which is closed as duplicate of:

BUG 21395269 - ASYMMETRICAL LMDS

CONFIGURATION IN A CLUSTER LEADS TO POOR MESSAGE TRANSFER

本文轉自lhrbest 51CTO部落格，原文連結：http://blog.51cto.com/lhrbest/1885815，如需轉載請自行聯系原作者

【故障處理】隊列等待之enq IV - contention案例

繼續閱讀

ERROR 1 (HY000): Can't create/write to file '/tmp/#sql_4188_1.MYI' (Errcode: 28)

httpd服務的部署、啟動、配置和簡單優化一、部署二、啟動三、配置檔案

艱難安裝LDAP,SSL認證

《Linux指令行與Shell腳本程式設計大全第2版.布盧姆》pdf

MySQL的4種隔離級别？出現問題

XX系統實施過程問題總結

無元件上傳圖檔到資料庫中，最完整解決方案

nginx 安裝錯誤資訊解決

【MySQL資料庫】資料庫索引事務1.索引2.事務

neo4j之cypher使用文檔

Ambari介紹和架構原理

NOSQL安全攻擊

mybatis_入門程式Mybatis入門

登入plsql 報錯 the account is locked --使用者被鎖

sqlServer根據經緯查距離

SequoiaDB巨杉資料庫C++驅動概述