PostgreSQL 垃圾回收參數優化之 - maintenance_work_mem , autovacuum_work_mem

背景

夜談PostgreSQL 垃圾回收參數優化之 - maintenance_work_mem , autovacuum_work_mem。

http://www.postgres.cn/v2/news/viewone/1/398 https://rhaas.blogspot.com/2019/01/how-much-maintenanceworkmem-do-i-need.html

9.4以前的版本，垃圾回收相關的記憶體參數maintenance_work_mem，9.4以及以後的版本為autovacuum_work_mem，如果沒有設定autovacuum_work_mem，則使用maintenance_work_mem的設定。

這個參數設定的是記憶體大小有什麼用呢？

這部分記憶體被用于記錄垃圾tupleid，vacuum程序在進行表掃描時，當掃描到的垃圾記錄ID占滿了整個記憶體（autovacuum_work_mem或maintenance_work_mem），那麼會停止掃描表，開始INDEX的掃描。

掃描INDEX時，清理索引中的哪些tuple，實際上是從剛才記憶體中記錄的這些tupleid來進行比對。

當所有索引都掃描并清理了一遍後，繼續從剛才的位點開始掃描表。

過程如下：

1、palloc autovacuum_work_mem memory  
  
2、scan table,   
  
3、dead tuple's tupleid write to autovacuum_work_mem  
  
4、when autovacuum_work_mem full (with dead tuples can vacuum)  
  
5、record table scan offset.  
  
6、scan indexs  
  
7、vacuum index's dead tuple (these: index item's ctid in autovacuum_work_mem)  
  
8、scan indexs end  
  
9、continue scan table with prev's offset  
  
...

顯然，如果垃圾回收時autovacuum_work_mem太小，INDEX會被多次掃描，浪費資源，時間。

palloc autovacuum_work_mem memory 這部分記憶體是使用時配置設定，并不是直接全部使用掉maintenance_work_mem或autovacuum_work_mem設定的記憶體，PG代碼中做了優化限制:

對于小表，可能申請少量記憶體，算法請參考如下代碼（對于小表，申請的記憶體數會是保障可記錄下整表的tupleid的記憶體數（當maintenance_work_mem或autovacuum_work_mem設定的記憶體大于這個值時））。

我已經在如下代碼中進行了标注：

/*  
 * MaxHeapTuplesPerPage is an upper bound on the number of tuples that can  
 * fit on one heap page.  (Note that indexes could have more, because they  
 * use a smaller tuple header.)  We arrive at the divisor because each tuple  
 * must be maxaligned, and it must have an associated item pointer.  
 *  
 * Note: with HOT, there could theoretically be more line pointers (not actual  
 * tuples) than this on a heap page.  However we constrain the number of line  
 * pointers to this anyway, to avoid excessive line-pointer bloat and not  
 * require increases in the size of work arrays.  
 */  
#define MaxHeapTuplesPerPage    \  
        ((int) ((BLCKSZ - SizeOfPageHeaderData) / \  
                        (MAXALIGN(SizeofHeapTupleHeader) + sizeof(ItemIdData))))  
  
  
/*  
 * Guesstimation of number of dead tuples per page.  This is used to  
 * provide an upper limit to memory allocated when vacuuming small  
 * tables.  
 */  
#define LAZY_ALLOC_TUPLES               MaxHeapTuplesPerPage  
  
  
/*  
 * lazy_space_alloc - space allocation decisions for lazy vacuum  
 *  
 * See the comments at the head of this file for rationale.  
 */  
static void  
lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks)  
{  
        long            maxtuples;  
        int                     vac_work_mem = IsAutoVacuumWorkerProcess() &&  
        autovacuum_work_mem != -1 ?  
        autovacuum_work_mem : maintenance_work_mem;  
  
        if (vacrelstats->hasindex)  
        {  
                maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);  
                maxtuples = Min(maxtuples, INT_MAX);  
                maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));  
  
                /* curious coding here to ensure the multiplication can't overflow */  
  
                這裡保證了maintenance_work_mem或autovacuum_work_mem不會直接被使用光，  
                如果是小表，會palloc少量memory。  
  
                if ((BlockNumber) (maxtuples / LAZY_ALLOC_TUPLES) > relblocks)  
                        maxtuples = relblocks * LAZY_ALLOC_TUPLES;  
  
                /* stay sane if small maintenance_work_mem */  
                maxtuples = Max(maxtuples, MaxHeapTuplesPerPage);  
        }  
        else  
        {  
                maxtuples = MaxHeapTuplesPerPage;  
        }  
  
        vacrelstats->num_dead_tuples = 0;  
        vacrelstats->max_dead_tuples = (int) maxtuples;  
        vacrelstats->dead_tuples = (ItemPointer)  
                palloc(maxtuples * sizeof(ItemPointerData));  
}

maintenance_work_mem這個記憶體還有一個用途，建立索引時，maintenance_work_mem控制系統在建構索引時将使用的最大記憶體量。為了建構一個B樹索引，必須對輸入的資料進行排序，如果要排序的資料在maintenance_work_mem設定的記憶體中放置不下，它将會溢出到磁盤中。

例子

如何計算适合的記憶體大小

postgres=# show autovacuum_work_mem ;  
 autovacuum_work_mem   
---------------------  
 1GB  
(1 row)  
  
postgres=# show maintenance_work_mem ;  
 maintenance_work_mem   
----------------------  
 1GB  
(1 row)

也就是說，最多有1GB的記憶體，用于記錄一次vacuum時，一次性可存儲的垃圾tuple的tupleid。

tupleid為6位元組長度。

/*  
 * ItemPointer:  
 *  
 * This is a pointer to an item within a disk page of a known file  
 * (for example, a cross-link from an index to its parent table).  
 * blkid tells us which block, posid tells us which entry in the linp  
 * (ItemIdData) array we want.  
 *  
 * Note: because there is an item pointer in each tuple header and index  
 * tuple header on disk, it's very important not to waste space with  
 * structure padding bytes.  The struct is designed to be six bytes long  
 * (it contains three int16 fields) but a few compilers will pad it to  
 * eight bytes unless coerced.  We apply appropriate persuasion where  
 * possible.  If your compiler can't be made to play along, you'll waste  
 * lots of space.  
 */  
typedef struct ItemPointerData  
{  
        BlockIdData ip_blkid;  
        OffsetNumber ip_posid;  
}

1G可存儲1.7億條dead tuple的tupleid。

postgres=# select 1024*1024*1024/6;  
 ?column?    
-----------  
 178956970  
(1 row)

而自動垃圾回收是在什麼條件下觸發的呢？

src/backend/postmaster/autovacuum.c

* A table needs to be vacuumed if the number of dead tuples exceeds a  
 * threshold.  This threshold is calculated as  
 *  
 * threshold = vac_base_thresh + vac_scale_factor * reltuples  
  
vac_base_thresh:   autovacuum_vacuum_threshold    
  
vac_scale_factor:  autovacuum_vacuum_scale_factor    
  
  
postgres=# show autovacuum_vacuum_threshold ;  
 autovacuum_vacuum_threshold   
-----------------------------  
 50  
(1 row)  
  
postgres=# show autovacuum_vacuum_scale_factor ;  
 autovacuum_vacuum_scale_factor   
--------------------------------  
 0.2  
(1 row)

以上設定，表示當垃圾記錄數達到50+表大小乘以0.2時，會觸發垃圾回收。

可以看成，垃圾記錄約等于表大小的20%，觸發垃圾回收。

那麼1G能存下多大表的垃圾呢？約8.9億條記錄的表。

postgres=# select 1024*1024*1024/6/0.2;  
      ?column?        
--------------------  
 894784850  
(1 row)

壓力測試例子

postgres=# show log_autovacuum_min_duration ;  
 log_autovacuum_min_duration   
-----------------------------  
 0  
(1 row)

create table test(id int primary key, c1 int, c2 int, c3 int);  
create index idx_test_1 on test (c1);  
create index idx_test_2 on test (c2);  
create index idx_test_3 on test (c3);

vi test.sql  
\set id random(1,10000000)  
insert into test values (:id,random()*100, random()*100,random()*100) on conflict (id) do update set c1=excluded.c1, c2=excluded.c2,c3=excluded.c3;

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 32 -j 32 -T 1200

垃圾回收記錄

2019-02-26 22:51:50.323 CST,,,35632,,5c755284.8b30,1,,2019-02-26 22:51:48 CST,36/22,0,LOG,00000,"automatic vacuum of table ""postgres.public.test"": index scans: 1  
pages: 0 removed, 6312 remain, 2 skipped due to pins, 0 skipped frozen  
tuples: 4631 removed, 1158251 remain, 1523 are dead but not yet removable, oldest xmin: 1262982800  
buffer usage: 39523 hits, 1 misses, 1 dirtied  
avg read rate: 0.004 MB/s, avg write rate: 0.004 MB/s  
system usage: CPU: user: 1.66 s, system: 0.10 s, elapsed: 1.86 s",,,,,,,,"lazy_vacuum_rel, vacuumlazy.c:407",""  
2019-02-26 22:51:50.566 CST,,,35632,,5c755284.8b30,2,,2019-02-26 22:51:48 CST,36/23,1263417553,LOG,00000,"automatic analyze of table ""postgres.public.test"" system usage: CPU: user: 0.16 s, system: 0.04 s, elapsed: 0.24 s",,,,,,,,"do_analyze_rel, analyze.c:722",""

index scans:1 表示垃圾回收的表有索引，并且索引隻掃描了一次。

說明autovacuum_work_mem足夠大，沒有出現vacuum時裝不下垃圾dead tuple tupleid的情況。

小結

建議：

1、log_autovacuum_min_duration=0，表示記錄所有autovacuum的統計資訊。

2、autovacuum_vacuum_scale_factor=0.01，表示1%的垃圾時，觸發自動垃圾回收。

3、autovacuum_work_mem，視情況定，確定不出現垃圾回收時多次INDEX SCAN.

4、如果發現垃圾回收統計資訊中出現了index scans: 超過1的情況，說明：

4.1、需要增加autovacuum_work_mem，增加多少呢？增加到目前autovacuum_work_mem乘以index scans即可。

4.2、或者調低autovacuum_vacuum_scale_factor到目前值除以index scans即可，讓autovacuum盡可能早的進行垃圾回收。

參考

《PostgreSQL 11 參數模闆 - 珍藏級》

PostgreSQL 垃圾回收參數優化之 - maintenance_work_mem , autovacuum_work_mem

背景

例子

如何計算适合的記憶體大小

小結

參考

PostgreSQL 許願連結

9.9元購買3個月阿裡雲RDS PostgreSQL執行個體

PostgreSQL 解決方案集合

繼續閱讀

neo4j之cypher使用文檔

Dijkstra--簡易版（最短路徑）

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

NOSQL安全攻擊

mybatis_入門程式Mybatis入門

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

登入plsql 報錯 the account is locked --使用者被鎖

Effective Java 8:通用程式設計

SequoiaDB巨杉資料庫C++驅動概述

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method

hdu7108哈希