RAC中的gc current block busy與redo log flush

這篇博文整理自我的文章： RAC中的gc current block busy與redo log flush

對于log file sync(本質上是 write redolog慢)引發gc buffer busy acquire /release 叢集等待事件的這個命題的真僞，其實Oracle在開發性能調優元件ADDM時一早給了我們答案：

RECOMMENDATION 2: Host Configuration, 12% benefit (507182 seconds)

ACTION: Investigate the possibility of improving the performance of I/O

to the online redo log files.

RATIONALE: The average size of writes to the online redo log files was

40 K and the average time per write was 10 milliseconds.

ADDITIONAL INFORMATION:

Waits on event “log file sync” were the cause of significant database

wait on “gc buffer busy” when releasing a data block. Waits on event

“log file sync” in this instance can cause global cache contention on

remote instances.

如果你在ADDM(?/rdbms/admin/addmrpt)中找到上述文字，那麼基本可以确認gc buffer busy的源頭是log file sync(雖然本質上不是)，那麼優先解決log file sync的問題； log file sync 當然有少數的bug存在，但更多的是存儲、闆卡、鍊路等硬體因素造成的。解決了log file sync後，那麼gc buffer busy往往也就解決了。

gc current block busy 等待是RAC中global cache全局緩存目前塊的争用等待事件，該等待事件時長由三個部分組成：

Time to process current block request in the cache= (pin time + flush time + send time)

gc current block flush time

The current block flush time is part of the service (or processing) time for a current block. The pending redo needs to be flushed to the log file by LGWR before LMS sends it. The operation is asynchronous in that LMS queues the request, posts LGWR, and continues processing. The LMS would check its log flush queue for completions and then send the block, or go to sleep and be posted by LGWR. The redo log write time and redo log sync time can influence the overall service time significantly.

flush time 是Oracle為了保證Instance Recovery執行個體恢複機制，而要求每一個current block在本地節點local instance被修改後(modify/update) 必須要将該current block相關的redo 寫入到logfile 後（要求LGWR必須完成寫入後才能傳回)，才能由LMS程序傳輸給其他節點使用。

而gc buffer busy acquire/release 往往是 gc current block busy的衍生産品，當同一執行個體内的多個程序并發地通路同一個資料塊時，首先發起的程序将進入 gc current block busy的等待，而在 buffer waiter list 上的後續程序會陷入gc buffer busy acquire/release 等待(A user on the same instance has started a remote operation on the same resource and the request has not completed yet or the block was requested by another node and the block has not been released by the local instance when the new local access was made)，這裡存在一個排隊效應，即 gc current block busy是緩慢的，那麼在排隊的gc buffer busy acquire/release就會更慢：

Pin time = (time to read the block into cache) + (time to modify/process the buffer)

Busy time = (average pin time) * (number of interested users waiting ahead of me)

不局限于current block （reference AWR Avg global cache current block flush time (ms)), cr block(Avg global cache cr block flush time (ms)) 也存在flush time。

可以通過設定_cr_server_log_flush to false(LMS are/is waiting for LGWR to flush the pending redo during CR fabrication. Without going too much in to details, you can turn off the behaviour by setting _cr_server_log_flush to false.) 來禁止cr server flush redo log，_gc_log_flush(if TRUE, flush redo log before a current block transfer)來讓current block transfer不用flush redo。但是上述2個參數是有其副作用的……….. 大多數情況不要考慮去設定它們，用它們是個馊主意。

_gc_split_flush if TRUE, flush index split redo before rejecting bast FALSE ==> 控制index split redo flush，預設為FALSE

以上告訴我們 IO 在RAC中是十分重要的，特别是log file的write性能，其重要性不亞于CPU 和 Interconnect network。

RAC中的gc current block busy與redo log flush

對于index block split/ enq: TX index contention而言，慢的redo flush也可能是催化劑

Log file sync delays due to slow log IO can impact the block shipping

LGWR does not post LMS when the redo is on disk

Can be considered and indicator of contention for “hot” blocks between instance competing for write or read-after-write access