天天看點

PostgreSQL pg_clog fsync 頻率分析

本文主要分析一下pg_clog是在什麼時候需要調用fsync的?

引用wiki裡的一段pg_clog的介紹。

https://wiki.postgresql.org/wiki/hint_bits

下面從代碼中分析一下pg_clog是如何調用pg_fsync刷髒頁的。

每次申請新的事務id時,都需要調用extendclog,如果通過事務id計算得到的clog page頁不存在,則需要擴充,但是并不是每次擴充都需要調用pg_fsync,因為checkpoint會将clog buffer刷到磁盤,除非在申請新的clog page時所有的clog buffer都沒有刷出髒頁,才需要主動選擇一個page并調用pg_fsync刷出對應的pg_clog/file。

src/backend/access/transam/varsup.c

extendclog(xid);擴充clog page,調用transactionidtopgindex計算xid和clog_xacts_per_page的餘數,如果不為0,則不需要擴充。

src/backend/access/transam/clog.c

zeroclogpage(pageno, true);,調用simplelruzeropage,擴充并初始化clog page,寫xlog日志。

simplelruzeropage(clogctl, pageno);,調用slruselectlrupage(ctl, pageno);,從clog shared buffer中選擇slot。

src/backend/access/transam/slru.c

slruselectlrupage(slructl ctl, int pageno),從clog buffer選擇一個空的slot,如果沒有空的slot,則需要調用slruinternalwritepage(ctl, bestvalidslot, null);,寫shared buffer page。

slruinternalwritepage(slructl ctl, int slotno, slruflush fdata),調用slruphysicalwritepage,執行write。

slru page狀态

slruphysicalwritepage(ctl, pageno, slotno, fdata);,這裡涉及pg_clog相關的slructldata結構,do_fsync=true。

ctl->do_fsync && pg_fsync(fd)涉及的代碼:

src/include/access/slru.h

以下是clog初始化lru的調用,可以看到它沒有修改do_fsync,是以是ture。

以下是subtrans初始化lru的調用,看到它修改了do_fsync=false。是以subtrans擴充page時不需要調用pg_fsync。

src/backend/access/transam/subtrans.c

multixact.c也沒有修改do_fsync,是以也是需要fsync的。

multixactshmeminit(void)@src/backend/access/transam/multixact.c

pg_fsync代碼:

src/backend/storage/file/fd.c

從上面的代碼分析,擴充clog page時,如果在clog buffer中沒有empty slot,則需要backend process主動刷clog page,是以會有調用pg_fsync的動作。

clog page和資料庫blocksz (database block size)一樣大,預設是8k(如果編譯資料庫軟體時沒有修改的話,預設是8kb),最大可以設定為32kb。

每個事務在pg_clog中需要2個比特位來存儲事務資訊(xmin commit/abort,xmax commit/abort)。

是以8k的clog page可以存儲32k個事務資訊,換句話說,每32k個事務,需要擴充一次clog page。

下面的代碼是clog的一些常用宏。

檢視資料庫的block size:

我們可以使用stap來跟蹤是否調用pg_fsync,如果你要觀察backend process主動刷clog 髒頁,可以把checkpoint間隔開大,同時把clog shared buffer pages。你就會觀察到backend process主動刷clog 髒頁。

跟蹤

opentransientfile

pg_fsync(fd)

[root@digoal ~]# cat trc.stp

global f_start[999999]

probe process("/opt/pgsql/bin/postgres").function("slruphysicalwritepage@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/slru.c").call {

f_start[execname(), pid(), tid(), cpu()] = gettimeofday_ms()

printf("%s <- time:%d, pp:%s, par:%sn", thread_indent(-1), gettimeofday_ms(), pp(), $$parms$$)

# printf("%s -> time:%d, pp:%sn", thread_indent(1), f_start[execname(), pid(), tid(), cpu()], pp() )

}

probe process("/opt/pgsql/bin/postgres").function("slruphysicalwritepage@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/slru.c").return {

t=gettimeofday_ms()

a=execname()

b=cpu()

c=pid()

d=pp()

e=tid()

if (f_start[a,c,e,b]) {

printf("%s <- time:%d, pp:%s, par:%sn", thread_indent(-1), t - f_start[a,c,e,b], d, $return$$)

# printf("%s <- time:%d, pp:%sn", thread_indent(-1), t - f_start[a,c,e,b], d)

probe process("/opt/pgsql/bin/postgres").function("opentransientfile@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c").call {

probe process("/opt/pgsql/bin/postgres").function("opentransientfile@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c").return {

probe process("/opt/pgsql/bin/postgres").function("pg_fsync@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c").call {

probe process("/opt/pgsql/bin/postgres").function("pg_fsync@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c").return {

postgres@digoal-> cat 7.sql

select txid_current();

postgres@digoal-> pgbench -m prepared -n -r -p 1 -f ./7.sql -c 1 -j 1 -t 100000

progress: 240.0 s, 31164.4 tps, lat 0.031 ms stddev 0.183

progress: 241.0 s, 33243.3 tps, lat 0.029 ms stddev 0.127

progress: 242.0 s, 32567.3 tps, lat 0.030 ms stddev 0.179

progress: 243.0 s, 33656.6 tps, lat 0.029 ms stddev 0.038

progress: 244.0 s, 33948.1 tps, lat 0.029 ms stddev 0.021

progress: 245.0 s, 32996.8 tps, lat 0.030 ms stddev 0.046

progress: 246.0 s, 34156.7 tps, lat 0.029 ms stddev 0.015

progress: 247.0 s, 33259.5 tps, lat 0.029 ms stddev 0.074

progress: 248.0 s, 32979.6 tps, lat 0.030 ms stddev 0.043

progress: 249.0 s, 32892.6 tps, lat 0.030 ms stddev 0.039

progress: 250.0 s, 33090.7 tps, lat 0.029 ms stddev 0.020

progress: 251.0 s, 33238.3 tps, lat 0.029 ms stddev 0.017

progress: 252.0 s, 32341.3 tps, lat 0.030 ms stddev 0.045

progress: 253.0 s, 31999.0 tps, lat 0.030 ms stddev 0.167

progress: 254.0 s, 33332.6 tps, lat 0.029 ms stddev 0.056

progress: 255.0 s, 30394.6 tps, lat 0.032 ms stddev 0.027

progress: 256.0 s, 31862.7 tps, lat 0.031 ms stddev 0.023

progress: 257.0 s, 31574.0 tps, lat 0.031 ms stddev 0.112

postgres@digoal-> ps -ewf|grep postgres

postgres 2921 1883 29 09:37 pts/1 00:00:05 pgbench -m prepared -n -r -p 1 -f ./7.sql -c 1 -j 1 -t 100000

postgres 2924 1841 66 09:37 ? 00:00:13 postgres: postgres postgres [local] select

[root@digoal ~]# stap -vp 5 -dmaxskipped=9999999 -dstp_no_overload -dmaxtrylock=100 ./trc.stp -x 2924 >./stap.log 2>&1

102 postgres(2924): -> time:1441503927731, pp:process("/opt/pgsql9.4.4/bin/postgres").function("pg_fsync@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c:315").call, par:fd=14

1096 postgres(2924): <- time:1, pp:process("/opt/pgsql9.4.4/bin/postgres").function("pg_fsync@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c:315").return, par:0

1113 postgres(2924): <- time:1, pp:process("/opt/pgsql9.4.4/bin/postgres").function("slruphysicalwritepage@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/slru.c:699").return, par:'001'

1105302 postgres(2924): -> time:1441503928836, pp:process("/opt/pgsql9.4.4/bin/postgres").function("slruphysicalwritepage@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/slru.c:699").call, par:ctl={.shared=0x7f74a9fe39c0, .do_fsync='001', .pageprecedes=0x4b1960, .dir="pg_clog"} pageno=12351 slotno=11 fdata=error

1105329 postgres(2924): -> time:1441503928836, pp:process("/opt/pgsql9.4.4/bin/postgres").function("opentransientfile@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c:1710").call, par:filename="pg_clog/0181" fileflags=66 filemode=384

1105348 postgres(2924): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("opentransientfile@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c:1710").return, par:14

1105405 postgres(2924): -> time:1441503928836, pp:process("/opt/pgsql9.4.4/bin/postgres").function("pg_fsync@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c:315").call, par:fd=14

1106440 postgres(2924): <- time:1, pp:process("/opt/pgsql9.4.4/bin/postgres").function("pg_fsync@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c:315").return, par:0

1106452 postgres(2924): <- time:1, pp:process("/opt/pgsql9.4.4/bin/postgres").function("slruphysicalwritepage@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/slru.c:699").return, par:'001'

2087891 postgres(2924): -> time:1441503929819, pp:process("/opt/pgsql9.4.4/bin/postgres").function("slruphysicalwritepage@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/slru.c:699").call, par:ctl={.shared=0x7f74a9fe39c0, .do_fsync='001', .pageprecedes=0x4b1960, .dir="pg_clog"} pageno=12352 slotno=12 fdata=error

2087917 postgres(2924): -> time:1441503929819, pp:process("/opt/pgsql9.4.4/bin/postgres").function("opentransientfile@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c:1710").call, par:filename="pg_clog/0182" fileflags=66 filemode=384

2087958 postgres(2924): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("opentransientfile@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c:1710").return, par:14

2088013 postgres(2924): -> time:1441503929819, pp:process("/opt/pgsql9.4.4/bin/postgres").function("pg_fsync@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c:315").call, par:fd=14

2089250 postgres(2924): <- time:1, pp:process("/opt/pgsql9.4.4/bin/postgres").function("pg_fsync@/opt/soft_bak/postgresql-9.4.4/src/backend/storage/file/fd.c:315").return, par:0

2089265 postgres(2924): <- time:1, pp:process("/opt/pgsql9.4.4/bin/postgres").function("slruphysicalwritepage@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/slru.c:699").return, par:'001'

postgres=# select 1441503928836-1441503927731;

(1 row)

postgres=# select 1441503929819-1441503928836;

size

clogshmembuffers(void)

{

繼續閱讀