PostgreSQL 12 preview - 可靠性提升 - data_sync_retry 消除os層write back failed status不可靠的問題

标簽

PostgreSQL , data_sync_retry , write back , retry , failed status

https://github.com/digoal/blog/blob/master/201903/20190309_03.md#%E8%83%8C%E6%99%AF 背景

有些OS系統，對fsync的二次調用不敏感，因為OS層可能有自己的CACHE，如果使用了buffer write，并且出現write back failed的情況，有些OS可能在下次fsync時并不能正确的回報fsync的可靠性與否。（因為這個BLOCK上一次write back可能已失敗，并且狀态未被正确的維護，是以後面發起的fsync實際上正确與否不得而知）

PG 的資料檔案，WAL檔案,CLOG檔案等重要檔案相關的程序：bgwriter, wal writer, backend process都有用到buffer write，如果OS層失守（即fsync retry不可靠）那麼曾經的write back failed，在checkpoint時使用fsync傳回可能成功，使得資料檔案中可能存在損壞的BLOCK，需要使用wal修複，然而資料庫收到的OS fsync傳回是正确的，是以會認為checkpoint是成功的，不會使用wal去修複它。

PG 12修正了這個問題，并且對所有版本做了back patch。

PANIC on fsync() failure.  
  
On some operating systems, it doesn't make sense to retry fsync(),  
because dirty data cached by the kernel may have been dropped on  
write-back failure.  In that case the only remaining copy of the  
data is in the WAL.  A subsequent fsync() could appear to succeed,  
but not have flushed the data.  That means that a future checkpoint  
could apparently complete successfully but have lost data.  
  
Therefore, violently prevent any future checkpoint attempts by  
panicking on the first fsync() failure.  Note that we already  
did the same for WAL data; this change extends that behavior to  
non-temporary data files.  
  
Provide a GUC data_sync_retry to control this new behavior, for  
users of operating systems that don't eject dirty data, and possibly  
forensic/testing uses.  If it is set to on and the write-back error  
was transient, a later checkpoint might genuinely succeed (on a  
system that does not throw away buffers on failure); if the error is  
permanent, later checkpoints will continue to fail.  The GUC defaults  
to off, meaning that we panic.  
  
Back-patch to all supported releases.  
  
There is still a narrow window for error-loss on some operating  
systems: if the file is closed and later reopened and a write-back  
error occurs in the intervening time, but the inode has the bad  
luck to be evicted due to memory pressure before we reopen, we could  
miss the error.  A later patch will address that with a scheme  
for keeping files with dirty data open at all times, but we judge  
that to be too complicated to back-patch.  
  
Author: Craig Ringer, with some adjustments by Thomas Munro  
Reported-by: Craig Ringer  
Reviewed-by: Robert Haas, Thomas Munro, Andres Freund  
Discussion: https://postgr.es/m/20180427222842.in2e4mibx45zdth5%40alap3.anarazel.de

使用者可設定參數

data_sync_retry (boolean)

When set to off, which is the default, PostgreSQL will raise a PANIC-level error on failure to flush modified data files to the filesystem. This causes the database server to crash. This parameter can only be set at server start.

On some operating systems, the status of data in the kernel's page cache is unknown after a write-back failure. In some cases it might have been entirely forgotten, making it unsafe to retry; the second attempt may be reported as successful, when in fact the data has been lost. In these circumstances, the only way to avoid data loss is to recover from the WAL after any failure is reported, preferably after investigating the root cause of the failure and replacing any faulty hardware.

If set to on, PostgreSQL will instead report an error but continue to run so that the data flushing operation can be retried in a later checkpoint. Only set it to on after investigating the operating system's treatment of buffered data in case of write-back failure.

預設值是安全的。

如果你要設定為ON，務必確定OS層的fsync是可以retry并且可靠的。

https://github.com/digoal/blog/blob/master/201903/20190309_03.md#%E5%BC%95%E7%94%B3 引申

1、目前資料庫做法是data_sync_retry直接disable，即報錯。實際上可以嘗試從WAL中提取對應failed block 的FPW以及後面的變化量進行修複，避免直接crash對使用者的體感不好。

https://github.com/digoal/blog/blob/master/201903/20190309_03.md#%E5%8F%82%E8%80%83 參考

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9ccdd7f66e3324d2b6d3dec282cfa9ff084083f1

這個patch對所有版本都已fix，是以在PG 11上也有這個patch。

https://github.com/digoal/blog/blob/master/201903/20190309_03.md#%E5%85%8D%E8%B4%B9%E9%A2%86%E5%8F%96%E9%98%BF%E9%87%8C%E4%BA%91rds-postgresql%E5%AE%9E%E4%BE%8Becs%E8%99%9A%E6%8B%9F%E6%9C%BA 免費領取阿裡雲RDS PostgreSQL執行個體、ECS虛拟機

PostgreSQL 12 preview - 可靠性提升 - data_sync_retry 消除os層write back failed status不可靠的問題

PostgreSQL 12 preview - 可靠性提升 - data_sync_retry 消除os層write back failed status不可靠的問題

标簽

https://github.com/digoal/blog/blob/master/201903/20190309_03.md#%E8%83%8C%E6%99%AF 背景

https://github.com/digoal/blog/blob/master/201903/20190309_03.md#%E5%BC%95%E7%94%B3 引申

https://github.com/digoal/blog/blob/master/201903/20190309_03.md#%E5%8F%82%E8%80%83 參考

https://github.com/digoal/blog/blob/master/201903/20190309_03.md#%E5%85%8D%E8%B4%B9%E9%A2%86%E5%8F%96%E9%98%BF%E9%87%8C%E4%BA%91rds-postgresql%E5%AE%9E%E4%BE%8Becs%E8%99%9A%E6%8B%9F%E6%9C%BA 免費領取阿裡雲RDS PostgreSQL執行個體、ECS虛拟機

繼續閱讀

Testlink安裝部署之XAMPP

set define off關閉替代變量功能

報錯：'mysql' 不是内部或外部指令，也不是可運作的程式或批處理檔案。

Linxu常用指令技巧彙總

ERROR 1 (HY000): Can't create/write to file '/tmp/#sql_4188_1.MYI' (Errcode: 28)

艱難安裝LDAP,SSL認證

《Linux指令行與Shell腳本程式設計大全第2版.布盧姆》pdf

MySQL的4種隔離級别？出現問題

XX系統實施過程問題總結

無元件上傳圖檔到資料庫中，最完整解決方案

【MySQL資料庫】資料庫索引事務1.索引2.事務

neo4j之cypher使用文檔

NOSQL安全攻擊

mybatis_入門程式Mybatis入門

登入plsql 報錯 the account is locked --使用者被鎖

SequoiaDB巨杉資料庫C++驅動概述