一.Trails 說明
To support thecontinuous extraction and replication of database changes, Oracle GoldenGatestores the captured changes temporarily on disk in a series of files called a trail.A trail can exist on the source or target system, or on an intermediary system,depending on how you configure Oracle GoldenGate. On the local system it isknown as an extract trail (or local trail). On a remote system it is known as aremote trail.
--為了支援連續的extraction 和 replication 資料的改變,GG抓取這些改變的資料臨時存放在一系列的磁盤檔案裡, 這些檔案就叫trail。 Trail 檔案可以放在Source或者Target system裡,甚至可以放到轉換的系統上,這主要取決與GG 的配置,如果存放在local system上,就叫extract trail 或者 localtrail,如果存放在remote system上就叫作remote trail。
By using a trailfor storage, Oracle GoldenGate supports data accuracy and fault tolerance. Theuse of a trail also allows extraction and replication activities to occurindependently of each other. With these processes separated, you have morechoices for how data is delivered. For example, instead of extracting and replicatingchanges continuously, you could extract changes continuously but store them inthe trail for replication to the target later, whenever the target applicationneeds them.
通過使用trail,GG 支援精确的資料同步和故障處理。
1.1 Processes that write to, and read, a trail
The primaryExtract process writes to a trail. Only one Extract process can write to atrail.
Extract 程序可以将capture data 寫入trail。隻允許一個Extract程序寫入一個trail。
Processes that read the trail are:
以下2個程序可以讀取trail 檔案:
(1) Data-pump Extract: Extractsdata from a local trail for further processing, if needed, and transfers it tothe target system or to the next Oracle GoldenGate process downstream in theOracle GoldenGate configuration.
(2) Replicat: Reads a trail toapply change data to the target database.
1.2 Trail maintenance
Trail files arecreated as needed during processing, and they are aged automatically to allowprocessing to continue without interruption for file maintenance. By default,trails are stored in the dirdat sub-directory of the Oracle GoldenGatedirectory.
--Trail 檔案在程序運作時建立,trail 在使用時允許程序在不中斷的情況下進行維護。 預設情況下,trails存放在GG 安裝目錄的dirdat 子目錄下。
By default, eachfile in a trail is 10 MB in size. All file names in a trail begin with the sametwo characters, which you assign when you create the trail. As files arecreated, each name is appended with a unique, six-digit serial (sequence)number from 000000 through 999999, for example c:\ggs\dirdat\tr000001. When thetrail sequence number reaches 999999, the numbering starts over at 000000.
--預設情況下,每個trails 檔案為10MB,所有的trails由2個相同的字母開頭,當建立trail時,每個trail 檔案名有我們指定的2個字母加6個數字組成,這樣來進行唯一性限制。 6個數字從000000 到999999. 當數字到達999999時,又繼續從000000開始。
You can createmore than one trail to separate the data from different objects or applications.You link the objects that are specified in a TABLE or SEQUENCE parameter to a trailthat is specified with an EXTTRAIL or RMTTRAIL parameter in the Extractparameter file.
--可以建立多個trail 檔案已分别存放不同對象的trail。
Aged trail filescan be purged by using the Manager parameter PURGEOLDEXTRACTS.
--使用中的trail 檔案可以使用Manager 參數:PURGEOLDEXTRACTS進行清空。
1.3 How processes write to a trail
To maximizethroughput, and to minimize I/O load on the system, extracted data is sent intoand out of a trail in large blocks. Transactional order is preserved. Bydefault, Oracle GoldenGate writes data to the trail in canonical format, aproprietary format which allows it to be exchanged rapidly and accurately amongheterogeneous databases. However, data can be written in other formats that arecompatible with different applications.
--為了最大的吞吐量和最小的系統I/O, trail 檔案的寫入和讀取都是按large block 進行。 預設情況下,GG 寫入trail 的檔案的内容都是特定的格式,特定的格式可以實作快速的exchange 和 不同資料庫之間實作精确的同步。當然也可以根據不同的應用,按照不同的格式進行寫入。
By default,Extract operates in append mode, where if there is a process failure, arecovery marker is written to the trail and Extract appends recovery data tothe file so that a history of all prior data is retained for recovery purposes.
預設情況下,Extract是append 模式,如果一個process 失敗,在下次啟動Extract 時,recovermarkder 可以寫入這個trail,進行Extract recovery操作,但是這個啟動Extract的過程會比較長。關于這個Extract recovery 的監控,在以下連結的1.1 小結有說明。
In append mode,the Extract initialization determines the identity of the last complete transactionthat was written to the trail at startup time. With that information, Extract endsrecovery when the commit record for that transaction is encountered in the datasource; then it begins new data capture with the next committed transactionthat qualifies for extraction and begins appending the new data to the trail. Adata pump or Replicat starts reading again from that recovery point.
--在append 模式下,在啟動Extract 初始化時,會在trail中找到last completetransaction的位置。然後從這個位置開始進行recovery,直到在data source裡遇到commitrecord,就結束Extract Recovery。Extract 正常啟動。 Extract 程序正常啟動之後,就可以正常的以append 方式寫入新的capturedata。
Data pump 程序或者Replicat 程序從Recover point 開始讀取data。
這個last complete transaction 可以了解為recover point,因為GG 也是根據歸檔和onlineredo 來的,是以這個recovery 過程也是從DB 的log裡進行判斷,找到了commit record,即GG和 DB 的log 一緻,結束recovery 過程。
這段recovery 出來的tail data 還沒有發送到target 上或者還沒有應用,是以Data pump 或 replicat 程序從這個recovery point 開始恢複。
在恢複階段,我們檢視Extract 程序的狀态,就顯示為APPEND。
這一段ExtractRecovery 過程是根據自己的了解來寫的,可能不完全正确,有其他了解的可以互相讨論。
Overwrite modeis another version of Extract recovery that was used in versions of Oracle GoldenGateprior to version 10.0. In these versions, Extract overwrites the existing transactiondata in the trail after the last write-checkpoint position, instead ofappending the new data. The first transaction that is written is the first onethat qualifies for extraction after the last read checkpoint position in thedata source.
--Overwrite 模式的Extract Recovery 是GG 10.0版本之前使用的。 Extract 重寫trail中已經存在,并在lastwrite-checkpoint position之後的transaction data,代替了append new data。
If the versionof Oracle GoldenGate on the target is older than version 10, Extract will automaticallyrevert to overwrite mode to support backward compatibility. This behavior canbe controlled manually with the RECOVERYOPTIONS parameter.
--如果Target GG的版本小于10,ExtractGG的版本大于10,為了相容性, Extract 将自動轉換成overwrite 模式。 可以通過RECOVERYOPTIONS參數手工的控制這種自動轉換的功能。
1.4 Trail format
As of OracleGoldenGate version 10.0, each file of a trail contains a file header recordthat is stored at the beginning of the file. The file header containsinformation about the trail file itself. Previous versions of Oracle GoldenGatedo not contain this header.
在GG 10.0版本裡,每個trailfile 包含一個file header,其用來存儲beginning of file。 10.0 版本之前的GG 不包含header。
Each data recordin a trail file also contains a header area, as well as a data area. The recordheader contains information about the transaction environment, and the dataarea contains the actual data values that were extracted.
每條trail file的record中包含一個headerarea和data area。 Record 的headerarea 包含transaction environment,data area 包含實際的的extracteddata。
Trail 的格式在第二部分具體進行說明。
二.Trails 中record format
我們可以通過GG 自帶的logdump 工具來檢視trails 檔案的具體格式。
2.1 logdump 工具使用說明
MOS 上有關logdump的2篇文檔:
How To Save A Part Of A GoldenGate Trail ToA New Trail [ID 966188.1]
Oracle GoldenGate - Using the GoldenGateLogdump Utility to Manually Load Balance across Multiple Processes [ID1301300.1]
2.1.1 進入GG 安裝目錄,運作logdump指令。
cgg1:/home/oracle> cd /u01/ggate/
gg1:/u01/ggate> logdump
Oracle GoldenGate Log File Dump Utility
Version 11.1.1.1OGGCORE_11.1.1_PLATFORMS_110421.2040
Copyright (C) 1995, 2011, Oracle and/or itsaffiliates. All rights reserved.
Logdump 1 >
2.1.2 set up the view –建立logdump環境
(1)To view the record header with thedata:
Logdump 1 >ghdr on
(2)To add column information
Logdump 2 >detail on
-- Column informationincludes the number and length in hex and ASCII.
(3)To add hex and ASCII data values to the column information:
Logdump 3 >detail data
(4)To view user tokens:
Logdump 4 >usertoken on
(5)To control how much record data is displayed:
Logdump 5> RECLEN <length>
Logdump 5 >reclen 1000
Reclen set to 1000
2.1.3 To open a trail file 打開檔案
(1)指定trail 檔案
Logdump 6 >open /u01/ggate/dirdat/lt000008
Current LogTrail is/u01/ggate/dirdat/lt000008
(2)To go to the first record and thenmove through records in sequence:
Logdump 7>NEXT or N
示例:
Logdump 7 >n
___________________________________________________________________
Hdr-Ind : E (x45) Partition : . (x04)
UndoFlag : . (x00) BeforeAfter: A (x41)
RecLength : 45 (x002d) IO Time : 2011/11/1713:11:28.000.000
IOType : 5 (x05) OrigNode : 255 (xff)
TransInd : . (x00) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
AuditRBA : 23 AuditPos : 16783888
Continued : N (x00) RecCount : 1 (x01)
2011/11/17 13:11:28.000.000 Insert Len 45 RBA 1009
Name: DAVE.BL
After Image: Partition 4 G b
00000006 0000 0002 424c 0001 0006 0000 0002 3839 | ........BL........89
00020015 0000 3230 3131 2d31 312d 3137 3a31 333a | ......2011-11-17:13:
30393a32 34 | 09:24
Column 0 (x0000), Len 6 (x0006)
00000002 424c | ....BL
Column 1 (x0001), Len 6 (x0006)
00000002 3839 | ....89
Column 2 (x0002), Len 21 (x0015)
00003230 3131 2d31 312d 3137 3a31 333a 3039 3a32 | ..2011-11-17:13:09:2
34 | 4
2.1.4 To go to a specific RBA in the file定位
(1)To go to an RBA anywhere in the file:
Logdump 9> POS <rba>
Logdump 10> N
GGSCI (gg1) 8> info ext1
EXTRACT EXT1 Last Started 2011-11-1910:02 Status RUNNING
Checkpoint Lag 11:49:46 (updated 00:00:06 ago)
Log Read Checkpoint Oracle Redo Logs
2011-11-18 22:12:21 Seqno 26, RBA 22834688
Logdump 10 >pos 22834688
Reading forward from RBA 22834688
Logdump 11 >n
--因為目前的extract使用的trail 檔案不是lt000008,是以這裡沒有顯示結果。
(2)To go to the first record in the file:
Logdump 11> POS FIRST
或
Logdump 11> POS 0
Logdump 12 >pos 0
Reading forward from RBA 0
Logdump 13 >n
2011/11/17 13:08:42.353.436 FileHeader Len 942 RBA 0
Name: *FileHeader*
30000199 3000 0008 4747 0d0a 544c 0a0d 3100 0002 | 0...0...GG..TL..1...
00023200 0004 2000 0000 3300 0008 02f1 d81a ab92 | ..2... ...3.........
1f1c3400 0014 0012 7572 693a 6767 313a 3a75 3031 | ..4.....uri:gg1::u01
3a676761 7465 3600 001c 001a 2f75 3031 2f67 6761 | :ggate6...../u01/gga
74652f64 6972 6461 742f 6c74 3030 3030 3038 3700 | te/dirdat/lt0000087.
00010138 0000 0400 0000 0839 0000 0800 0000 0000 | ...8.......9........
001b493a 0000 8107 3133 3633 3532 3200 0000 0000 | ..I:....1363522.....
00000000 0000 0000 0000 0000 0000 0000 0000 0000 | ....................
2.1.5 To filter based on a table name
(1)To filter out everything exceptrecords containing a specific table name:
Logdump 14> FILTER INCLUDE/EXCLUDEFILENAME <schema>.<table>
Now, when you use the N command, you willonly see records that satisfy this filter.
(2)To remove the current filtercriteria
Logdump14> FILTER CLEAR
(3)To filter on multiple conditions
Logdump 14> FILTER INCLUDEFILENAME <schema>.<table>; FILTER RECTYPE <record_type>;FILTER MATCH ALL
2.1.6 To count the records in a trail file
Logdump 14> COUNT
Logdump 14 >count
LogTrail /u01/ggate/dirdat/lt000008 has 37records
Total Data Bytes 3834
AvgBytes/Record 103
Insert 34
RestartOK 1
DDL 1
Others 1
After Images 36
Average of 4 Transactions
Bytes/Trans ..... 1402
Records/Trans ... 9
Files/Trans ..... 1
Partition 0
Total Data Bytes 1153
AvgBytes/Record 576
DDL 1
After Images 2
*FileHeader* Partition 0
Total Data Bytes 942
AvgBytes/Record 942
DAVE.BL Partition 4
Total Data Bytes 1739
AvgBytes/Record 51
After Images 34
2.1.7 To save records to a new trail file
(1)To save the whole file:
Logdump 15> SAVE <file>
Where: <file> is the name of the newfile.
(2)To save a subset of records:
Logdump15> SAVE <file> <n>RECORDS
To close the current file and open the nextone in the trail
Logdump 15> NEXTTRAIL
2.1.8 To keep a log of your session
(1) To start logging:
Logdump 15> LOG TO <filename>.txt
(2)To write text to the log:
Logdump 15> WRITELOG “<text>”
(3)To stop logging:
Logdump 15> LOG STOP
2.1.9 To see the current Logdumpenvironment
Logdump 15> ENV
This shows whichfeatures are enabled, such as filtering and header views, and it showsenvironment information such as the current trail and position.
Logdump 15 >env
Version : Linux, x64, 64bit (optimized) onApr 21 2011 22:44:06
Current Directory : /u01/ggate
LogTrail : /u01/ggate/dirdat/lt000008
Trail Format : New
End of File : 6985
Current Position : 0 Forward
Next Position : 0
Last Modtime : 2011/11/17 17:16:22.000.000
Display RecLen : 200
Logtrail Filter : On
Show Ghdr : On
Detail : Data
UserToken : On
Trans History : 0 Transactions, Records 100, Bytes100000
LargeBlock I/O : On, Blocksize 57344
Local System : LittleEndian
Logtrail Data : BigEndian/ASCII
Logtrail Headers : ASCII
Dump : ASCII
Savefile comments : Off
Timeoffset : LOCAL
Scan Notify Interval: 10000 records,Scrolling On
2.1.10 To get online command help
Logdump 17> HELP
2.1.11 To exit Logdump
Logdump 17> EXIT
Logdump 17> QUIT
2.2 Trails 格式說明
這個是官方文檔上的一個圖。我們現在來看一下,
2.2.1 Record header area
The OracleGoldenGate record header provides metadata of the data that is contained in therecord and includes the following information.
--GG record header 包含以下内容:
(1) The operation type, such as aninsert, update, or delete
(2) The before or after indicatorfor updates
(3) Transaction information, suchas the transaction group and commit timestamp
2.2.1.1 Description of header fields
The following describes the fields of the Oracle GoldenGate record header. Some fields applyonly to certain platforms.
2.2.1.2 Using header data
Some of the dataavailable in the Oracle GoldenGate record header can be used for mapping byusing the GGHEADER option of the @GETENV function or by using any of the followingtransaction elements as the source expression in a COLMAP statement in the TABLEor MAP parameter.
(1) GGS_TRANS_TIMESTAMP
(2) GGS_TRANS_RBA
(3) GGS_OP_TYPE
(4) GGS_BEFORE_AFTER_IND
2.2.2 Record data area
The data area of the Oracle GoldenGatetrail record contains the following:
Trail 的data area 記錄了如下内容:
(1) The time that the change waswritten to the Oracle GoldenGate file
(2) The type of database operation
(3) The length of the record
(4) The relative byte addresswithin the trail file
(5) The table name
(6) The data changes in hex format
The followingexplains the differences in record image formats used by Oracle GoldenGate onWindows, UNIX, Linux, and NonStop systems. The terms “full” and “compressed”image format are used in the descriptions. These terms are used in a differentcontext here than when they are used in other parts of the documentation inreference to how Extract writes column data to the trail, meaning whether onlythe key and changed columns are written (“compressed”) versus whether allcolumns are written to the trail (“uncompressed” or “full image”).
2.2.2.1 Full record image format
Full record imageformat is only generated in the trail when the source system is HP NonStop, andonly when the IOType specified in the record header is one of the following:
--Full record image 格式僅在source system 是HPNonStop,并且在IOTye 指定如下類型時生成:
3 — Delete
5 — Insert
10 — Update
Each full recordimage has the same format as if retrieved from a program reading the originalfile or table directly. For SQL tables, datetime fields, nulls, and other datais written exactly as a program would select it into an application buffer.Although datetime fields are represented internally as an eight-byte timestamp,their external form can be up to 26 bytes expressed as a string. Enscriberecords are retrieved as they exist in the original file.
When theoperation type is Insert or Update, the image contains the contents of therecord after the operation (the after image). When the operation type is Delete,the image contains the contents of the record before the operation (the beforeimage).
--如果是insert 或 update 操作,那麼image包含的是after image,即修改之後的data,如果是delete,則儲存的beforeimage。
For recordsgenerated from an Enscribe database, full record images are output unless the originalfile has the AUDITCOMPRESS attribute set to ON. When AUDITCOMPRESS is ON, compressedupdate records are generated whenever the original file receives an update operation.(A full image can be retrieved by the Extract process by using the FETCHCOMPS parameter.)
2.2.2.2 Compressed record format
By default,trail records written by processes on Windows and UNIX systems are always compressed.The format of a compressed record is as follows:
<columnindex><column length><column data>[...]
--預設情況下,trail record是compress的,為什麼要compress在下一節說明。
Where:
(1) <column index> is the ordinal index ofthe column within the source table (2 bytes).
(2) <column length> is thelength of the data (2 bytes).
(3) <column data> is thedata, including NULL or VARCHAR length indicators.
Enscribe recordswritten from the NonStop platform may be compressed. The format of a compressedEnscribe record is as follows:
<fieldoffset><field length><field value>[...]
(1) <field offset> is the offset within theoriginal record of the changed value (2 bytes).
(2) <field length> is thelength of the data (2 bytes).
(3) <field data> is the data,including NULL or VARCHAR length indicators.
The first fieldin a compressed Enscribe record is the primary or system key.
2.2.2.3 Tokens area
The trail recordalso can contain two areas for tokens. One is for internal use and is not documentedhere, and the other is the user tokens area. User tokens are environment valuesthat are captured and stored in the trail record for replication to targetcolumns or other purposes. If used, these tokens follow the data portion of therecord and appear similar to the following when viewed with Logdump:
--trail record 的token 包含2個areas,其中一個供内部使用,文檔就不做說明,另一個是user token。
2.2.2.4 Oracle GoldenGate operation types
The followingare some of the Oracle GoldenGate operation types. Types may be added as newfunctionality is added to Oracle GoldenGate. For a more updated list, use the SHOWRECTYPE command in the Logdump utility.
2.2.3 Oracle GoldenGate trail header record
In addition tothe transaction-related records that are in the Oracle GoldenGate trail, each trailfile contains a file header.
--在每個trail file中除了之前講的trail header,trailarea,還有一個file header。
The file headeris stored as a record at the beginning of a trail file preceding the data records.The information that is stored in the trail header provides enough information aboutthe records to enable an Oracle GoldenGate process to determine whether the recordsare in a format that the current version of Oracle GoldenGate supports.
The trail headerfields are stored as tokens, where the token format remains the same across allversions of Oracle GoldenGate. If a version of Oracle GoldenGate does not supportany given token, that token is ignored. Depracated tokens are assigned a defaultvalue to preserve compatibility with previous versions of Oracle GoldenGate.
You can view thetrail header with the FILEHEADER command in the Logdump utility.
三.trail再壓縮
在上面一節看到,預設情況下,trails 是compress的。當資料進行異地傳輸時,網絡負載是影響資料傳輸效率的關鍵因素。GoldenGate的其中一個優勢是,它極大地壓縮了傳輸的trail檔案。
在預設壓縮(redo=>trail)的基礎上,GoldenGate提供了再壓縮方案:在傳輸trail之前對其進行進一步的壓縮,然後在寫入遠端trail之前再進行解壓。官方文檔中說明這次再壓縮的壓縮比至少在4:1,實際中壓縮比率會更高,有網友測試達10:1。
這裡注意GoldenGate的再壓縮過程,壓縮之後發送到Target 端後,在寫入遠端trail前又解壓了回來,是以單純比較兩端的trail檔案大小沒有變化。但是在傳輸過程中确實是壓縮了。
這樣的設計,可以減少對Replicat程序提取trail的影響,進而也避免了加重原本負載就相對較大的Replicat程序。
GoldenGate的再壓縮,是在負責向遠端發送trail的Extract程序中配置的,是以一般情況下都是配置在datapump中。配置很簡單,在RMTHOST參數中加入COMPRESS子參數,如下:
GGSCI (gg1) 42> view params dpump
extract dpump
userid ggate@gg1, password ggate
rmthost gg2, mgrport7809,compress,compressthreshold 0
rmttrail /u01/ggate/dirdat/lt
passthru
table dave.pdba;
-- COMPRESS啟用再壓縮,COMPRESSTHRESHOLD表示對大于該值的trail記錄塊進行壓縮,取值0到28000,預設1000,設定為0表示全部進行壓縮
啟用了壓縮後,會消耗額外的CPU,在同步事務量較大的情況下,需要考慮CPU負載。但實際上在大部分的GoldenGate應用場景中,網絡負載都遠遠大于CPU負載,啟用trail再壓縮的可能性還是很大的。