天天看點

PostgreSQL 垃圾回收參數優化之 - maintenance_work_mem , autovacuum_work_mem

背景

夜談PostgreSQL 垃圾回收參數優化之 - maintenance_work_mem , autovacuum_work_mem。

http://www.postgres.cn/v2/news/viewone/1/398 https://rhaas.blogspot.com/2019/01/how-much-maintenanceworkmem-do-i-need.html

9.4以前的版本,垃圾回收相關的記憶體參數maintenance_work_mem,9.4以及以後的版本為autovacuum_work_mem,如果沒有設定autovacuum_work_mem,則使用maintenance_work_mem的設定。

這個參數設定的是記憶體大小有什麼用呢?

這部分記憶體被用于記錄垃圾tupleid,vacuum程序在進行表掃描時,當掃描到的垃圾記錄ID占滿了整個記憶體(autovacuum_work_mem或maintenance_work_mem),那麼會停止掃描表,開始INDEX的掃描。

掃描INDEX時,清理索引中的哪些tuple,實際上是從剛才記憶體中記錄的這些tupleid來進行比對。

當所有索引都掃描并清理了一遍後,繼續從剛才的位點開始掃描表。

過程如下:

1、palloc autovacuum_work_mem memory  
  
2、scan table,   
  
3、dead tuple's tupleid write to autovacuum_work_mem  
  
4、when autovacuum_work_mem full (with dead tuples can vacuum)  
  
5、record table scan offset.  
  
6、scan indexs  
  
7、vacuum index's dead tuple (these: index item's ctid in autovacuum_work_mem)  
  
8、scan indexs end  
  
9、continue scan table with prev's offset  
  
...             

顯然,如果垃圾回收時autovacuum_work_mem太小,INDEX會被多次掃描,浪費資源,時間。

palloc autovacuum_work_mem memory 這部分記憶體是使用時配置設定,并不是直接全部使用掉maintenance_work_mem或autovacuum_work_mem設定的記憶體,PG代碼中做了優化限制:

對于小表,可能申請少量記憶體,算法請參考如下代碼(對于小表,申請的記憶體數會是保障可記錄下整表的tupleid的記憶體數(當maintenance_work_mem或autovacuum_work_mem設定的記憶體大于這個值時))。

我已經在如下代碼中進行了标注:

/*  
 * MaxHeapTuplesPerPage is an upper bound on the number of tuples that can  
 * fit on one heap page.  (Note that indexes could have more, because they  
 * use a smaller tuple header.)  We arrive at the divisor because each tuple  
 * must be maxaligned, and it must have an associated item pointer.  
 *  
 * Note: with HOT, there could theoretically be more line pointers (not actual  
 * tuples) than this on a heap page.  However we constrain the number of line  
 * pointers to this anyway, to avoid excessive line-pointer bloat and not  
 * require increases in the size of work arrays.  
 */  
#define MaxHeapTuplesPerPage    \  
        ((int) ((BLCKSZ - SizeOfPageHeaderData) / \  
                        (MAXALIGN(SizeofHeapTupleHeader) + sizeof(ItemIdData))))  
  
  
/*  
 * Guesstimation of number of dead tuples per page.  This is used to  
 * provide an upper limit to memory allocated when vacuuming small  
 * tables.  
 */  
#define LAZY_ALLOC_TUPLES               MaxHeapTuplesPerPage  
  
  
/*  
 * lazy_space_alloc - space allocation decisions for lazy vacuum  
 *  
 * See the comments at the head of this file for rationale.  
 */  
static void  
lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks)  
{  
        long            maxtuples;  
        int                     vac_work_mem = IsAutoVacuumWorkerProcess() &&  
        autovacuum_work_mem != -1 ?  
        autovacuum_work_mem : maintenance_work_mem;  
  
        if (vacrelstats->hasindex)  
        {  
                maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);  
                maxtuples = Min(maxtuples, INT_MAX);  
                maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));  
  
                /* curious coding here to ensure the multiplication can't overflow */  
  
                這裡保證了maintenance_work_mem或autovacuum_work_mem不會直接被使用光,  
                如果是小表,會palloc少量memory。  
  
                if ((BlockNumber) (maxtuples / LAZY_ALLOC_TUPLES) > relblocks)  
                        maxtuples = relblocks * LAZY_ALLOC_TUPLES;  
  
                /* stay sane if small maintenance_work_mem */  
                maxtuples = Max(maxtuples, MaxHeapTuplesPerPage);  
        }  
        else  
        {  
                maxtuples = MaxHeapTuplesPerPage;  
        }  
  
        vacrelstats->num_dead_tuples = 0;  
        vacrelstats->max_dead_tuples = (int) maxtuples;  
        vacrelstats->dead_tuples = (ItemPointer)  
                palloc(maxtuples * sizeof(ItemPointerData));  
}             

maintenance_work_mem這個記憶體還有一個用途,建立索引時,maintenance_work_mem控制系統在建構索引時将使用的最大記憶體量。為了建構一個B樹索引,必須對輸入的資料進行排序,如果要排序的資料在maintenance_work_mem設定的記憶體中放置不下,它将會溢出到磁盤中。

例子

如何計算适合的記憶體大小

postgres=# show autovacuum_work_mem ;  
 autovacuum_work_mem   
---------------------  
 1GB  
(1 row)  
  
postgres=# show maintenance_work_mem ;  
 maintenance_work_mem   
----------------------  
 1GB  
(1 row)             

也就是說,最多有1GB的記憶體,用于記錄一次vacuum時,一次性可存儲的垃圾tuple的tupleid。

tupleid為6位元組長度。

/*  
 * ItemPointer:  
 *  
 * This is a pointer to an item within a disk page of a known file  
 * (for example, a cross-link from an index to its parent table).  
 * blkid tells us which block, posid tells us which entry in the linp  
 * (ItemIdData) array we want.  
 *  
 * Note: because there is an item pointer in each tuple header and index  
 * tuple header on disk, it's very important not to waste space with  
 * structure padding bytes.  The struct is designed to be six bytes long  
 * (it contains three int16 fields) but a few compilers will pad it to  
 * eight bytes unless coerced.  We apply appropriate persuasion where  
 * possible.  If your compiler can't be made to play along, you'll waste  
 * lots of space.  
 */  
typedef struct ItemPointerData  
{  
        BlockIdData ip_blkid;  
        OffsetNumber ip_posid;  
}             

1G可存儲1.7億條dead tuple的tupleid。

postgres=# select 1024*1024*1024/6;  
 ?column?    
-----------  
 178956970  
(1 row)             

而自動垃圾回收是在什麼條件下觸發的呢?

src/backend/postmaster/autovacuum.c

* A table needs to be vacuumed if the number of dead tuples exceeds a  
 * threshold.  This threshold is calculated as  
 *  
 * threshold = vac_base_thresh + vac_scale_factor * reltuples  
  
vac_base_thresh:   autovacuum_vacuum_threshold    
  
vac_scale_factor:  autovacuum_vacuum_scale_factor    
  
  
postgres=# show autovacuum_vacuum_threshold ;  
 autovacuum_vacuum_threshold   
-----------------------------  
 50  
(1 row)  
  
postgres=# show autovacuum_vacuum_scale_factor ;  
 autovacuum_vacuum_scale_factor   
--------------------------------  
 0.2  
(1 row)             

以上設定,表示當垃圾記錄數達到50+表大小乘以0.2時,會觸發垃圾回收。

可以看成,垃圾記錄約等于表大小的20%,觸發垃圾回收。

那麼1G能存下多大表的垃圾呢?約8.9億條記錄的表。

postgres=# select 1024*1024*1024/6/0.2;  
      ?column?        
--------------------  
 894784850  
(1 row)             

壓力測試例子

postgres=# show log_autovacuum_min_duration ;  
 log_autovacuum_min_duration   
-----------------------------  
 0  
(1 row)             
create table test(id int primary key, c1 int, c2 int, c3 int);  
create index idx_test_1 on test (c1);  
create index idx_test_2 on test (c2);  
create index idx_test_3 on test (c3);             
vi test.sql  
\set id random(1,10000000)  
insert into test values (:id,random()*100, random()*100,random()*100) on conflict (id) do update set c1=excluded.c1, c2=excluded.c2,c3=excluded.c3;             
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 32 -j 32 -T 1200             

垃圾回收記錄

2019-02-26 22:51:50.323 CST,,,35632,,5c755284.8b30,1,,2019-02-26 22:51:48 CST,36/22,0,LOG,00000,"automatic vacuum of table ""postgres.public.test"": index scans: 1  
pages: 0 removed, 6312 remain, 2 skipped due to pins, 0 skipped frozen  
tuples: 4631 removed, 1158251 remain, 1523 are dead but not yet removable, oldest xmin: 1262982800  
buffer usage: 39523 hits, 1 misses, 1 dirtied  
avg read rate: 0.004 MB/s, avg write rate: 0.004 MB/s  
system usage: CPU: user: 1.66 s, system: 0.10 s, elapsed: 1.86 s",,,,,,,,"lazy_vacuum_rel, vacuumlazy.c:407",""  
2019-02-26 22:51:50.566 CST,,,35632,,5c755284.8b30,2,,2019-02-26 22:51:48 CST,36/23,1263417553,LOG,00000,"automatic analyze of table ""postgres.public.test"" system usage: CPU: user: 0.16 s, system: 0.04 s, elapsed: 0.24 s",,,,,,,,"do_analyze_rel, analyze.c:722",""             

index scans:1 表示垃圾回收的表有索引,并且索引隻掃描了一次。

說明autovacuum_work_mem足夠大,沒有出現vacuum時裝不下垃圾dead tuple tupleid的情況。

小結

建議:

1、log_autovacuum_min_duration=0,表示記錄所有autovacuum的統計資訊。

2、autovacuum_vacuum_scale_factor=0.01,表示1%的垃圾時,觸發自動垃圾回收。

3、autovacuum_work_mem,視情況定,確定不出現垃圾回收時多次INDEX SCAN.

4、如果發現垃圾回收統計資訊中出現了index scans: 超過1的情況,說明:

4.1、需要增加autovacuum_work_mem,增加多少呢?增加到目前autovacuum_work_mem乘以index scans即可。

4.2、或者調低autovacuum_vacuum_scale_factor到目前值除以index scans即可,讓autovacuum盡可能早的進行垃圾回收。

參考

《PostgreSQL 11 參數模闆 - 珍藏級》

PostgreSQL 許願連結

您的願望将傳達給PG kernel hacker、資料庫廠商等, 幫助提高資料庫産品品質和功能, 說不定下一個PG版本就有您提出的功能點. 針對非常好的提議,獎勵限量版PG文化衫、紀念品、貼紙、PG熱門書籍等,獎品豐富,快來許願。

開不開森

.

9.9元購買3個月阿裡雲RDS PostgreSQL執行個體

PostgreSQL 解決方案集合