database version:11.2.0.3 RAC
goldengate version :11.1.1.1.2
早上發現資料同步異常,source端狀态如下:
GGSCI (ulecardrac1) 3> info all
Program Status Group Lag Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EXT232 00:00:00 06:32:33
EXTRACT RUNNING PUMP232 00:00:00 00:00:03
status還是為RUNNING,但是已經有六個半小時沒有update了,其實該程序已經hang住
檢視告警日志ggserr.log
發現存在OGG-01738提示
2015-01-15 21:12:37 INFO OGG-01517 Position of first record processed Sequence 30, RBA 170907152, SCN 0.2262790, Jan 15, 20 15 7:49:50 PM. 2015-01-16 01:12:41 INFO OGG-01738 BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p6427_extr: start=SeqNo: 35, RBA: 45666 320, SCN: 0.2287580 (2287580), Timestamp: 2015-01-16 01:12:39.000000, Thread: 1, end=SeqNo: 35, RBA: 45667328, SCN: 0.2287580 (2 287580), Timestamp: 2015-01-16 01:12:39.000000, Thread: 1. 2015-01-16 05:12:43 INFO OGG-01738 BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p6427_extr: start=SeqNo: 35, RBA: 58063 376, SCN: 0.2298655 (2298655), Timestamp: 2015-01-16 05:12:26.000000, Thread: 1, end=SeqNo: 35, RBA: 58063872, SCN: 0.2298655 (2 298655), Timestamp: 2015-01-16 05:12:26.000000, Thread: 1. 2015-01-16 09:12:52 INFO OGG-01738 BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p6427_extr: start=SeqNo: 35, RBA: 10132 2256, SCN: 0.2310699 (2310699), Timestamp: 2015-01-16 09:12:44.000000, Thread: 1, end=SeqNo: 35, RBA: 101322752, SCN: 0.2310699 (2310699), Timestamp: 2015-01-16 09:12:44.000000, Thread: 1. 2015-01-16 10:17:23 INFO OGG-06508 Wildcard MAPTABLE resolved (entry scott.*): table "SCOTT"."EMP".
MOS上有一篇關于該錯誤的文章 note 1293772.1
GGSCI> start <extract_name> BRRESET
因為extract程序ext232已經假死,無法stop掉,甚至用'send ext232 forcestop'和'stop mgr'也無法stop掉該extract程序
最後隻能在shell下kill掉程序,再重新執行
GGSCI> start ext232 BRRESET
重新啟動後,發現狀态已經正常,同步已經基本無延遲。
該bug隻在RAC中或者單執行個體設定了多個thread的情況下出現,而且在更進階版本中已經修複,為了一勞永逸,可以考慮将ogg更新至11.2.1.0.1
注意: 這裡goole 一篇文章, 說是這個報錯是ogg 11的一個bug, 但是,我的報錯,最後通過這個指令啟動,解決了: GGSCI > START EXTR_1 BRRESET