本文主要分析一下pg_clog是在什麼時候需要調用fsync的?
引用wiki裡的一段pg_clog的介紹。
https://wiki.postgresql.org/wiki/hint_bits
下面從代碼中分析一下pg_clog是如何調用pg_fsync刷髒頁的。
每次申請新的事務id時,都需要調用extendclog,如果通過事務id計算得到的clog page頁不存在,則需要擴充,但是并不是每次擴充都需要調用pg_fsync,因為checkpoint會将clog buffer刷到磁盤,除非在申請新的clog page時所有的clog buffer都沒有刷出髒頁,才需要主動選擇一個page并調用pg_fsync刷出對應的pg_clog/file。
src/backend/access/transam/varsup.c
extendclog(xid);擴充clog page,調用transactionidtopgindex計算xid和clog_xacts_per_page的餘數,如果不為0,則不需要擴充。
src/backend/access/transam/clog.c
zeroclogpage(pageno, true);,調用simplelruzeropage,擴充并初始化clog page,寫xlog日志。
simplelruzeropage(clogctl, pageno);,調用slruselectlrupage(ctl, pageno);,從clog shared buffer中選擇slot。
src/backend/access/transam/slru.c
slruselectlrupage(slructl ctl, int pageno),從clog buffer選擇一個空的slot,如果沒有空的slot,則需要調用slruinternalwritepage(ctl, bestvalidslot, null);,寫shared buffer page。
slruinternalwritepage(slructl ctl, int slotno, slruflush fdata),調用slruphysicalwritepage,執行write。
slru page狀态
slruphysicalwritepage(ctl, pageno, slotno, fdata);,這裡涉及pg_clog相關的slructldata結構,do_fsync=true。
ctl->do_fsync && pg_fsync(fd)涉及的代碼:
src/include/access/slru.h
以下是clog初始化lru的調用,可以看到它沒有修改do_fsync,是以是ture。
以下是subtrans初始化lru的調用,看到它修改了do_fsync=false。是以subtrans擴充page時不需要調用pg_fsync。
src/backend/access/transam/subtrans.c
multixact.c也沒有修改do_fsync,是以也是需要fsync的。
multixactshmeminit(void)@src/backend/access/transam/multixact.c
pg_fsync代碼:
src/backend/storage/file/fd.c
從上面的代碼分析,擴充clog page時,如果在clog buffer中沒有empty slot,則需要backend process主動刷clog page,是以會有調用pg_fsync的動作。
clog page和資料庫blocksz (database block size)一樣大,預設是8k(如果編譯資料庫軟體時沒有修改的話,預設是8kb),最大可以設定為32kb。
每個事務在pg_clog中需要2個比特位來存儲事務資訊(xmin commit/abort,xmax commit/abort)。
是以8k的clog page可以存儲32k個事務資訊,換句話說,每32k個事務,需要擴充一次clog page。
下面的代碼是clog的一些常用宏。
檢視資料庫的block size:
我們可以使用stap來跟蹤是否調用pg_fsync,如果你要觀察backend process主動刷clog 髒頁,可以把checkpoint間隔開大,同時把clog shared buffer pages。你就會觀察到backend process主動刷clog 髒頁。
跟蹤
opentransientfile
pg_fsync(fd)
[root@digoal ~]# cat trc.stp
global f_start[999999]
probe process("/opt/pgsql/bin/postgres").function("slruphysicalwritepage@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/slru.c").call {
f_start[execname(), pid(), tid(), cpu()] = gettimeofday_ms()
printf("%s <- time:%d, pp:%s, par:%sn", thread_indent(-1), gettimeofday_ms(), pp(), $$parms$$)
# printf("%s -> time:%d, pp:%sn", thread_indent(1), f_start[execname(), pid(), tid(), cpu()], pp() )
}
probe process("/opt/pgsql/bin/postgres").function("slruphysicalwritepage@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/slru.c").return {
t=gettimeofday_ms()
a=execname()
b=cpu()
c=pid()
d=pp()
e=tid()
if (f_start[a,c,e,b]) {
printf("%s <- time:%d, pp:%s, par:%sn", thread_indent(-1), t - f_start[a,c,e,b], d, $return$$)
# printf("%s <- time:%d, pp:%sn", thread_indent(-1), t - f_start[a,c,e,b], d)
probe process("/opt/pgsql/bin/postgres").function("opentransientfile@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c").call {
probe process("/opt/pgsql/bin/postgres").function("opentransientfile@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c").return {
probe process("/opt/pgsql/bin/postgres").function("pg_fsync@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c").call {
probe process("/opt/pgsql/bin/postgres").function("pg_fsync@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c").return {
postgres@digoal-> cat 7.sql
select txid_current();
postgres@digoal-> pgbench -m prepared -n -r -p 1 -f ./7.sql -c 1 -j 1 -t 100000
progress: 240.0 s, 31164.4 tps, lat 0.031 ms stddev 0.183
progress: 241.0 s, 33243.3 tps, lat 0.029 ms stddev 0.127
progress: 242.0 s, 32567.3 tps, lat 0.030 ms stddev 0.179
progress: 243.0 s, 33656.6 tps, lat 0.029 ms stddev 0.038
progress: 244.0 s, 33948.1 tps, lat 0.029 ms stddev 0.021
progress: 245.0 s, 32996.8 tps, lat 0.030 ms stddev 0.046
progress: 246.0 s, 34156.7 tps, lat 0.029 ms stddev 0.015
progress: 247.0 s, 33259.5 tps, lat 0.029 ms stddev 0.074
progress: 248.0 s, 32979.6 tps, lat 0.030 ms stddev 0.043
progress: 249.0 s, 32892.6 tps, lat 0.030 ms stddev 0.039
progress: 250.0 s, 33090.7 tps, lat 0.029 ms stddev 0.020
progress: 251.0 s, 33238.3 tps, lat 0.029 ms stddev 0.017
progress: 252.0 s, 32341.3 tps, lat 0.030 ms stddev 0.045
progress: 253.0 s, 31999.0 tps, lat 0.030 ms stddev 0.167
progress: 254.0 s, 33332.6 tps, lat 0.029 ms stddev 0.056
progress: 255.0 s, 30394.6 tps, lat 0.032 ms stddev 0.027
progress: 256.0 s, 31862.7 tps, lat 0.031 ms stddev 0.023
progress: 257.0 s, 31574.0 tps, lat 0.031 ms stddev 0.112
postgres@digoal-> ps -ewf|grep postgres
postgres 2921 1883 29 09:37 pts/1 00:00:05 pgbench -m prepared -n -r -p 1 -f ./7.sql -c 1 -j 1 -t 100000
postgres 2924 1841 66 09:37 ? 00:00:13 postgres: postgres postgres [local] select
[root@digoal ~]# stap -vp 5 -dmaxskipped=9999999 -dstp_no_overload -dmaxtrylock=100 ./trc.stp -x 2924 >./stap.log 2>&1
102 postgres(2924): -> time:1441503927731, pp:process("/opt/pgsql9.4.4/bin/postgres").function("pg_fsync@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c:315").call, par:fd=14
1096 postgres(2924): <- time:1, pp:process("/opt/pgsql9.4.4/bin/postgres").function("pg_fsync@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c:315").return, par:0
1113 postgres(2924): <- time:1, pp:process("/opt/pgsql9.4.4/bin/postgres").function("slruphysicalwritepage@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/slru.c:699").return, par:'001'
1105302 postgres(2924): -> time:1441503928836, pp:process("/opt/pgsql9.4.4/bin/postgres").function("slruphysicalwritepage@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/slru.c:699").call, par:ctl={.shared=0x7f74a9fe39c0, .do_fsync='001', .pageprecedes=0x4b1960, .dir="pg_clog"} pageno=12351 slotno=11 fdata=error
1105329 postgres(2924): -> time:1441503928836, pp:process("/opt/pgsql9.4.4/bin/postgres").function("opentransientfile@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c:1710").call, par:filename="pg_clog/0181" fileflags=66 filemode=384
1105348 postgres(2924): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("opentransientfile@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c:1710").return, par:14
1105405 postgres(2924): -> time:1441503928836, pp:process("/opt/pgsql9.4.4/bin/postgres").function("pg_fsync@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c:315").call, par:fd=14
1106440 postgres(2924): <- time:1, pp:process("/opt/pgsql9.4.4/bin/postgres").function("pg_fsync@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c:315").return, par:0
1106452 postgres(2924): <- time:1, pp:process("/opt/pgsql9.4.4/bin/postgres").function("slruphysicalwritepage@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/slru.c:699").return, par:'001'
2087891 postgres(2924): -> time:1441503929819, pp:process("/opt/pgsql9.4.4/bin/postgres").function("slruphysicalwritepage@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/slru.c:699").call, par:ctl={.shared=0x7f74a9fe39c0, .do_fsync='001', .pageprecedes=0x4b1960, .dir="pg_clog"} pageno=12352 slotno=12 fdata=error
2087917 postgres(2924): -> time:1441503929819, pp:process("/opt/pgsql9.4.4/bin/postgres").function("opentransientfile@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c:1710").call, par:filename="pg_clog/0182" fileflags=66 filemode=384
2087958 postgres(2924): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("opentransientfile@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c:1710").return, par:14
2088013 postgres(2924): -> time:1441503929819, pp:process("/opt/pgsql9.4.4/bin/postgres").function("pg_fsync@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c:315").call, par:fd=14
2089250 postgres(2924): <- time:1, pp:process("/opt/pgsql9.4.4/bin/postgres").function("pg_fsync@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c:315").return, par:0
2089265 postgres(2924): <- time:1, pp:process("/opt/pgsql9.4.4/bin/postgres").function("slruphysicalwritepage@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/slru.c:699").return, par:'001'
postgres=# select 1441503928836-1441503927731;
(1 row)
postgres=# select 1441503929819-1441503928836;
size
clogshmembuffers(void)
{