背景
夜談PostgreSQL 垃圾回收參數優化之 - maintenance_work_mem , autovacuum_work_mem。
http://www.postgres.cn/v2/news/viewone/1/398 https://rhaas.blogspot.com/2019/01/how-much-maintenanceworkmem-do-i-need.html9.4以前的版本,垃圾回收相關的記憶體參數maintenance_work_mem,9.4以及以後的版本為autovacuum_work_mem,如果沒有設定autovacuum_work_mem,則使用maintenance_work_mem的設定。
這個參數設定的是記憶體大小有什麼用呢?
這部分記憶體被用于記錄垃圾tupleid,vacuum程序在進行表掃描時,當掃描到的垃圾記錄ID占滿了整個記憶體(autovacuum_work_mem或maintenance_work_mem),那麼會停止掃描表,開始INDEX的掃描。
掃描INDEX時,清理索引中的哪些tuple,實際上是從剛才記憶體中記錄的這些tupleid來進行比對。
當所有索引都掃描并清理了一遍後,繼續從剛才的位點開始掃描表。
過程如下:
1、palloc autovacuum_work_mem memory
2、scan table,
3、dead tuple's tupleid write to autovacuum_work_mem
4、when autovacuum_work_mem full (with dead tuples can vacuum)
5、record table scan offset.
6、scan indexs
7、vacuum index's dead tuple (these: index item's ctid in autovacuum_work_mem)
8、scan indexs end
9、continue scan table with prev's offset
...
顯然,如果垃圾回收時autovacuum_work_mem太小,INDEX會被多次掃描,浪費資源,時間。
palloc autovacuum_work_mem memory 這部分記憶體是使用時配置設定,并不是直接全部使用掉maintenance_work_mem或autovacuum_work_mem設定的記憶體,PG代碼中做了優化限制:
對于小表,可能申請少量記憶體,算法請參考如下代碼(對于小表,申請的記憶體數會是保障可記錄下整表的tupleid的記憶體數(當maintenance_work_mem或autovacuum_work_mem設定的記憶體大于這個值時))。
我已經在如下代碼中進行了标注:
/*
* MaxHeapTuplesPerPage is an upper bound on the number of tuples that can
* fit on one heap page. (Note that indexes could have more, because they
* use a smaller tuple header.) We arrive at the divisor because each tuple
* must be maxaligned, and it must have an associated item pointer.
*
* Note: with HOT, there could theoretically be more line pointers (not actual
* tuples) than this on a heap page. However we constrain the number of line
* pointers to this anyway, to avoid excessive line-pointer bloat and not
* require increases in the size of work arrays.
*/
#define MaxHeapTuplesPerPage \
((int) ((BLCKSZ - SizeOfPageHeaderData) / \
(MAXALIGN(SizeofHeapTupleHeader) + sizeof(ItemIdData))))
/*
* Guesstimation of number of dead tuples per page. This is used to
* provide an upper limit to memory allocated when vacuuming small
* tables.
*/
#define LAZY_ALLOC_TUPLES MaxHeapTuplesPerPage
/*
* lazy_space_alloc - space allocation decisions for lazy vacuum
*
* See the comments at the head of this file for rationale.
*/
static void
lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks)
{
long maxtuples;
int vac_work_mem = IsAutoVacuumWorkerProcess() &&
autovacuum_work_mem != -1 ?
autovacuum_work_mem : maintenance_work_mem;
if (vacrelstats->hasindex)
{
maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
maxtuples = Min(maxtuples, INT_MAX);
maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
/* curious coding here to ensure the multiplication can't overflow */
這裡保證了maintenance_work_mem或autovacuum_work_mem不會直接被使用光,
如果是小表,會palloc少量memory。
if ((BlockNumber) (maxtuples / LAZY_ALLOC_TUPLES) > relblocks)
maxtuples = relblocks * LAZY_ALLOC_TUPLES;
/* stay sane if small maintenance_work_mem */
maxtuples = Max(maxtuples, MaxHeapTuplesPerPage);
}
else
{
maxtuples = MaxHeapTuplesPerPage;
}
vacrelstats->num_dead_tuples = 0;
vacrelstats->max_dead_tuples = (int) maxtuples;
vacrelstats->dead_tuples = (ItemPointer)
palloc(maxtuples * sizeof(ItemPointerData));
}
maintenance_work_mem這個記憶體還有一個用途,建立索引時,maintenance_work_mem控制系統在建構索引時将使用的最大記憶體量。為了建構一個B樹索引,必須對輸入的資料進行排序,如果要排序的資料在maintenance_work_mem設定的記憶體中放置不下,它将會溢出到磁盤中。
例子
如何計算适合的記憶體大小
postgres=# show autovacuum_work_mem ;
autovacuum_work_mem
---------------------
1GB
(1 row)
postgres=# show maintenance_work_mem ;
maintenance_work_mem
----------------------
1GB
(1 row)
也就是說,最多有1GB的記憶體,用于記錄一次vacuum時,一次性可存儲的垃圾tuple的tupleid。
tupleid為6位元組長度。
/*
* ItemPointer:
*
* This is a pointer to an item within a disk page of a known file
* (for example, a cross-link from an index to its parent table).
* blkid tells us which block, posid tells us which entry in the linp
* (ItemIdData) array we want.
*
* Note: because there is an item pointer in each tuple header and index
* tuple header on disk, it's very important not to waste space with
* structure padding bytes. The struct is designed to be six bytes long
* (it contains three int16 fields) but a few compilers will pad it to
* eight bytes unless coerced. We apply appropriate persuasion where
* possible. If your compiler can't be made to play along, you'll waste
* lots of space.
*/
typedef struct ItemPointerData
{
BlockIdData ip_blkid;
OffsetNumber ip_posid;
}
1G可存儲1.7億條dead tuple的tupleid。
postgres=# select 1024*1024*1024/6;
?column?
-----------
178956970
(1 row)
而自動垃圾回收是在什麼條件下觸發的呢?
src/backend/postmaster/autovacuum.c
* A table needs to be vacuumed if the number of dead tuples exceeds a
* threshold. This threshold is calculated as
*
* threshold = vac_base_thresh + vac_scale_factor * reltuples
vac_base_thresh: autovacuum_vacuum_threshold
vac_scale_factor: autovacuum_vacuum_scale_factor
postgres=# show autovacuum_vacuum_threshold ;
autovacuum_vacuum_threshold
-----------------------------
50
(1 row)
postgres=# show autovacuum_vacuum_scale_factor ;
autovacuum_vacuum_scale_factor
--------------------------------
0.2
(1 row)
以上設定,表示當垃圾記錄數達到50+表大小乘以0.2時,會觸發垃圾回收。
可以看成,垃圾記錄約等于表大小的20%,觸發垃圾回收。
那麼1G能存下多大表的垃圾呢?約8.9億條記錄的表。
postgres=# select 1024*1024*1024/6/0.2;
?column?
--------------------
894784850
(1 row)
壓力測試例子
postgres=# show log_autovacuum_min_duration ;
log_autovacuum_min_duration
-----------------------------
0
(1 row)
create table test(id int primary key, c1 int, c2 int, c3 int);
create index idx_test_1 on test (c1);
create index idx_test_2 on test (c2);
create index idx_test_3 on test (c3);
vi test.sql
\set id random(1,10000000)
insert into test values (:id,random()*100, random()*100,random()*100) on conflict (id) do update set c1=excluded.c1, c2=excluded.c2,c3=excluded.c3;
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 32 -j 32 -T 1200
垃圾回收記錄
2019-02-26 22:51:50.323 CST,,,35632,,5c755284.8b30,1,,2019-02-26 22:51:48 CST,36/22,0,LOG,00000,"automatic vacuum of table ""postgres.public.test"": index scans: 1
pages: 0 removed, 6312 remain, 2 skipped due to pins, 0 skipped frozen
tuples: 4631 removed, 1158251 remain, 1523 are dead but not yet removable, oldest xmin: 1262982800
buffer usage: 39523 hits, 1 misses, 1 dirtied
avg read rate: 0.004 MB/s, avg write rate: 0.004 MB/s
system usage: CPU: user: 1.66 s, system: 0.10 s, elapsed: 1.86 s",,,,,,,,"lazy_vacuum_rel, vacuumlazy.c:407",""
2019-02-26 22:51:50.566 CST,,,35632,,5c755284.8b30,2,,2019-02-26 22:51:48 CST,36/23,1263417553,LOG,00000,"automatic analyze of table ""postgres.public.test"" system usage: CPU: user: 0.16 s, system: 0.04 s, elapsed: 0.24 s",,,,,,,,"do_analyze_rel, analyze.c:722",""
index scans:1 表示垃圾回收的表有索引,并且索引隻掃描了一次。
說明autovacuum_work_mem足夠大,沒有出現vacuum時裝不下垃圾dead tuple tupleid的情況。
小結
建議:
1、log_autovacuum_min_duration=0,表示記錄所有autovacuum的統計資訊。
2、autovacuum_vacuum_scale_factor=0.01,表示1%的垃圾時,觸發自動垃圾回收。
3、autovacuum_work_mem,視情況定,確定不出現垃圾回收時多次INDEX SCAN.
4、如果發現垃圾回收統計資訊中出現了index scans: 超過1的情況,說明:
4.1、需要增加autovacuum_work_mem,增加多少呢?增加到目前autovacuum_work_mem乘以index scans即可。
4.2、或者調低autovacuum_vacuum_scale_factor到目前值除以index scans即可,讓autovacuum盡可能早的進行垃圾回收。
參考
《PostgreSQL 11 參數模闆 - 珍藏級》PostgreSQL 許願連結
您的願望将傳達給PG kernel hacker、資料庫廠商等, 幫助提高資料庫産品品質和功能, 說不定下一個PG版本就有您提出的功能點. 針對非常好的提議,獎勵限量版PG文化衫、紀念品、貼紙、PG熱門書籍等,獎品豐富,快來許願。
開不開森.