集團某線上系統使用OGG做了同步複制用于二期業務生産使用。有同僚過來說複制程序有點異常
-bash-3.2$ ogg
Oracle GoldenGate Command Interpreter for Oracle
Version 11.2.1.0.6 16211226 OGGCORE_11.2.1.0.6_PLATFORMS_130418.1829_FBO
Solaris, sparc, 64bit (optimized), Oracle 11g on Apr 22 2013 15:23:39
Copyright (C) 1995, 2013, Oracle and/or its affiliates. All rights reserved.
GGSCI (ODSDB) 1> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
JAGENT STOPPED
REPLICAT RUNNING RP10 00:00:00 00:00:03
REPLICAT RUNNING RP7 00:00:06 00:00:03
REPLICAT RUNNING RPS1 42:41:14 00:00:01
REPLICAT RUNNING RPS2 00:00:00 00:00:10
REPLICAT RUNNING RPS2A 00:00:00 00:00:00
REPLICAT RUNNING RPS2B 00:00:00 00:00:00
REPLICAT RUNNING RPS3 00:00:00 00:00:02
REPLICAT RUNNING RPS3A 00:00:00 00:00:03
REPLICAT RUNNING RPS3B 00:00:00 00:00:02
REPLICAT RUNNING RPS4 00:00:00 00:00:07
REPLICAT RUNNING RPS4A 00:00:00 00:00:07
REPLICAT RUNNING RPS4B 00:00:00 00:00:07
REPLICAT RUNNING RPS4C 00:00:00 00:00:07
REPLICAT RUNNING RPS4D 00:00:00 00:00:08
REPLICAT RUNNING RPS4E 00:00:00 00:00:07
REPLICAT RUNNING RPS4F 00:00:00 00:00:07
REPLICAT RUNNING RPS4G 00:00:00 00:00:07
REPLICAT RUNNING RPS4H 00:00:00 00:00:07
REPLICAT RUNNING RPS4I 00:00:00 00:00:08
REPLICAT RUNNING RPS4J 00:00:00 00:00:07
跟他去了終端旁邊,發現确實有個rps1程序延遲有點高。看着程序狀态是RUNNING,但是根據我維護六年ogg來看,有可能是假象,那麼怎麼判斷呢?
1,檢視ggserr.log
2,檢視dirrpt下面該程序的dsc檔案
3,info 程序名多次,看看RBA是否有變化
經過以上幾步rps1程序狀态是正常的,延遲高問題出在哪裡?view params rps1看到該程序隻有一張表。那我們看看這個程序到底卡在哪裡了
SQL> !ps -ef |grep rps1 |grep -v grep
orao 12169 11012 0 Jun 07 ? 614:58 /odogg/baseogg/replicat PARAMFILE /odsogg/baseogg/dirprm/rps1.prm REPORTFILE /
SQL> !ps -ef |grep 12169 |grep -v grep
orao 12170 12169 1 Jun 07 ? 2315:30 oracleodb (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
SQL> select s.sid,s.SERIAL#,sql_id from v$process p,v$session s where p.addr=s.paddr and p.spid=12170;
SID SERIAL# SQL_ID
---------- ---------- -------------
5146 113 5099dwmx3s4mf
SQL> /
5146 113 dzubcf8jm69yx
詢問了一下維護過該系統的同僚,說是不是統計資訊太舊。順着思路往下走走
SQL> select OWNER,TABLE_NAME,LAST_ANALYZED from dba_tables where table_name='TF_F_USER_REVGRPTAG';
OWNER TABLE_NAME LAST_ANALYZED
-------------------------------------------------------------------------------
TTT TF_F_USER_REXXX 2017-06-26 00:20:56
SQL> select table_name,num_rows from dba_tables where owner='TTT' and table_name='TF_F_USER_REXXX';
TABLE_NAME NUM_ROWS
------------------------------ ----------
TF_F_USER_REXXX 409269832
SQL> select count(*) from TTT.TF_F_USER_REXXX;
COUNT(*)
----------
409955510
接下來我們看看該sql執行計劃是不是走錯了?
兩條執行計劃,很明顯137632316運作了5400次但是執行時間卻遠遠高于第二個執行計劃。
<a href="https://s4.51cto.com/wyfs02/M02/9A/D7/wKioL1lbPKKhgkhLAAHDW7ez5Xk109.jpg" target="_blank"></a>
<a href="https://s4.51cto.com/wyfs02/M02/9A/D7/wKiom1lbPKORHZUwAAGw01z_87w545.jpg" target="_blank"></a>
用sql profile綁定一下執行計劃
SQL> @coe_xfr_sql_profile.sql
Parameter 1:
SQL_ID (required)
Enter value for 1: dzubcf8jm69yx
PLAN_HASH_VALUE AVG_ET_SECS
--------------- -----------
2320424968 .004
137632316 21.119
Parameter 2:
PLAN_HASH_VALUE (required)
Enter value for 2: 2320424968
Values passed to coe_xfr_sql_profile:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SQL_ID : "dzubcf8jm69yx"
PLAN_HASH_VALUE: "2320424968"
檢視是否綁定成功
SQL>select name from dba_sql_profiles;
NAME
------------------------------
coe_dzubcf8jm69yx_2320424968
SYS_SQLPROF_015d075ccdd40000
coe_a5hgtqfq09tcu_229988255
在将記憶體中的該sql語句清除出去
SQL>select address,hash_value,executions,parse_calls from v$sql where sql_TEXT like 'UPDATE "TTT"."TF_F_USER_RExxxx" SET "PARTITION_ID" = :a27,"USER_ID" = :a28%';
ADDRESS HASH_VALUE EXECUTIONS PARSE_CALLS
---------------- ---------- ---------- -----------
00000017673EA5E8 4198240878 100 0
00000017CF3E0000 897328139 673 0
00000017982D40D0 1588244011 8 0
000000049D04B1A8 590555101 10 0
00000017AA525230 4271852353 2 0
0000001777FC6A20 3666847391 735 0
00000017B9154A58 1180290247 1 0
7 rows selected.
SQL>alter session set events '5614566 trace name context forever';
Session altered.
SQL>exec dbms_shared_pool.purge('00000017673EA5E8,4198240878','C');
PL/SQL procedure successfully completed.
SQL>exec dbms_shared_pool.purge('00000017CF3E0000,897328139','C');
exec db
SQL>ms_shared_pool.purge('00000017982D40D0,1588244011','C');
SQL>exec dbms_shared_pool.purge('000000049D04B1A8,590555101','C');
SQL>exec dbms_shared_pool.purge('00000017AA525230,4271852353','C');
SQL>exec dbms_shared_pool.purge('0000001777FC6A20,3666847391','C');
SQL>exec dbms_shared_pool.purge('00000017B9154A58,1180290247','C');
觀察ogg複制程序延遲已經開始減小。該隐患順利解決!
GGSCI (ODSDB) 8> info all
REPLICAT RUNNING RP10 00:00:00 00:00:01
REPLICAT RUNNING RP7 00:00:00 00:00:07
REPLICAT RUNNING RPS1 31:10:29 00:00:50
REPLICAT RUNNING RPS2 00:00:00 00:00:00
REPLICAT RUNNING RPS3 00:00:00 00:00:01
REPLICAT RUNNING RPS3A 00:00:00 00:00:02
REPLICAT RUNNING RPS3B 00:00:00 00:00:01
REPLICAT RUNNING RPS4 00:00:00 00:00:05
REPLICAT RUNNING RPS4A 00:00:00 00:00:04
REPLICAT RUNNING RPS4B 00:00:00 00:00:04
REPLICAT RUNNING RPS4C 00:00:00 00:00:04
REPLICAT RUNNING RPS4D 00:00:00 00:00:04
REPLICAT RUNNING RPS4E 00:00:00 00:00:04
REPLICAT RUNNING RPS4F 00:00:00 00:00:04
REPLICAT RUNNING RPS4G 00:00:00 00:00:05
REPLICAT RUNNING RPS4H 00:00:00 00:00:04
REPLICAT RUNNING RPS4I 00:00:00 00:00:04
REPLICAT RUNNING RPS4J 00:00:00 00:00:04
本文轉自yangjunfeng 51CTO部落格,原文連結:http://blog.51cto.com/yangjunfeng/1944421