标簽
PostgreSQL , brin索引 , gin索引 , 合并延遲 , gin_pending_list_limit , 查詢性能下降
https://github.com/digoal/blog/blob/master/201809/20180919_02.md#%E8%83%8C%E6%99%AF 背景
GIN索引為PostgreSQL資料庫多值類型的反向索引,一條記錄可能涉及到多個GIN索引中的KEY,是以如果寫入時實時合并索引,會導緻IO急劇增加,寫入RT必然增加。為了提高寫入吞吐,PG允許使用者開啟GIN索引的延遲合并技術,開啟後,資料會先寫入pending list,并不是直接寫入索引頁,當pending list達到一定大小,或者autovacuum 對應表時,會觸發pending list合并到索引的動作。
查詢時,如果有未合并到索引中的PENDING LIST,那麼會查詢pending list,同時查詢索引也的資訊。
如果寫入量很多,pending list非常巨大,合并(autovacuum worker做的)速度跟不上時,會導緻通過GIN索引查詢時查詢性能下降。
知道問題的根源,就知道如何解決,以及如何排查。
https://github.com/digoal/blog/blob/master/201809/20180919_02.md#%E8%83%8C%E6%99%AF%E5%8E%9F%E7%90%86 背景原理
https://www.postgresql.org/docs/11/static/sql-createindex.htmlhttps://github.com/digoal/blog/blob/master/201809/20180919_02.md#gin-indexes-accept-different-parameters GIN indexes accept different parameters:
1、fastupdate
This setting controls usage of the fast update technique described in Section 66.4.1. It is a Boolean parameter: ON enables fast update, OFF disables it. (Alternative spellings of ON and OFF are allowed as described in Section 19.1.) The default is ON.
Note
Turning fastupdate off via ALTER INDEX prevents future insertions from going into the list of pending index entries, but does not in itself flush previous entries. You might want to VACUUM the table or call gin_clean_pending_list function afterward to ensure the pending list is emptied.
2、gin_pending_list_limit
Custom gin_pending_list_limit parameter. This value is specified in kilobytes.
目前設定
postgres=# show gin_pending_list_limit ;
gin_pending_list_limit
------------------------
4MB
(1 row)
https://github.com/digoal/blog/blob/master/201809/20180919_02.md#brin-indexes-accept-different-parameters BRIN indexes accept different parameters:
1、pages_per_range
Defines the number of table blocks that make up one block range for each entry of a BRIN index (see Section 67.1 for more details). The default is 128.
2、autosummarize
Defines whether a summarization run is invoked for the previous page range whenever an insertion is detected on the next one.
https://github.com/digoal/blog/blob/master/201809/20180919_02.md#%E9%80%9A%E8%BF%87pageinspect%E5%8F%AF%E8%A7%82%E5%AF%9F%E7%B4%A2%E5%BC%95%E7%9A%84pending-list%E7%AD%89%E5%86%85%E5%AE%B9 通過pageinspect可觀察索引的pending list等内容。
https://www.postgresql.org/docs/11/static/pageinspect.htmlpostgres=# create extension pageinspect ;
CREATE EXTENSION
https://github.com/digoal/blog/blob/master/201809/20180919_02.md#%E4%BE%8B%E5%AD%90 例子
1、建表
postgres=# create table t(id int, arr int[]);
CREATE TABLE
2、建立反向索引
postgres=# create index idx_t_1 on t using gin (arr);
CREATE INDEX
3、建立生成随機數組的函數
postgres=# create or replace function gen_rand_arr() returns int[] as $$
select array(select (100*random())::int from generate_series(1,64));
$$ language sql strict;
CREATE FUNCTION
4、寫入測試資料
postgres=# insert into t select generate_series(1,100000), gen_rand_arr();
INSERT 0 100000
postgres=# insert into t select generate_series(1,1000000), gen_rand_arr();
INSERT 0 1000000
5、通過pageinspect插件,觀察目前GIN索引的pendinglist大小,可以看到pending page有356個,涉及2484條記錄。
如果很多條記錄在pending list中,查詢性能會下降明顯。
postgres=# SELECT * FROM gin_metapage_info(get_raw_page('idx_t_1', 0));
pending_head | pending_tail | tail_free_size | n_pending_pages | n_pending_tuples | n_total_pages | n_entry_pages | n_data_pages | n_entries | version
--------------+--------------+----------------+-----------------+------------------+---------------+---------------+--------------+-----------+---------
2 | 369 | 3640 | 356 | 2848 | 2 | 1 | 0 | 0 | 2
(1 row)
6、查詢測試1,(pending list大于0)
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where arr @> array[1,2,3];
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.t (cost=82.38..262.28 rows=11373 width=284) (actual time=82.444..141.559 rows=114906 loops=1)
Output: id, arr
Recheck Cond: (t.arr @> '{1,2,3}'::integer[])
Heap Blocks: exact=41304
Buffers: shared hit=42043
-> Bitmap Index Scan on idx_t_1 (cost=0.00..79.92 rows=11373 width=0) (actual time=75.902..75.902 rows=114906 loops=1)
Index Cond: (t.arr @> '{1,2,3}'::integer[])
Buffers: shared hit=739
Planning Time: 0.092 ms
Execution Time: 152.260 ms
(10 rows)
7、vacuum table,強制合并pending list
set vacuum_cost_delay=0;
postgres=# vacuum t;
VACUUM
8、觀察pendign list合并後,n_pending_tuples等于0.
postgres=# SELECT * FROM gin_metapage_info(get_raw_page('idx_t_1', 0));
pending_head | pending_tail | tail_free_size | n_pending_pages | n_pending_tuples | n_total_pages | n_entry_pages | n_data_pages | n_entries | version
--------------+--------------+----------------+-----------------+------------------+---------------+---------------+--------------+-----------+---------
4294967295 | 4294967295 | 0 | 0 | 0 | 9978 | 41 | 9421 | 101 | 2
(1 row)
9、查詢測試2,(pending list = 0)
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where arr @> array[1,2,3];
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.t (cost=792.36..1699.10 rows=117244 width=284) (actual time=79.861..139.603 rows=114906 loops=1)
Output: id, arr
Recheck Cond: (t.arr @> '{1,2,3}'::integer[])
Heap Blocks: exact=41304
Buffers: shared hit=41687
-> Bitmap Index Scan on idx_t_1 (cost=0.00..766.95 rows=117244 width=0) (actual time=73.360..73.360 rows=114906 loops=1)
Index Cond: (t.arr @> '{1,2,3}'::integer[])
Buffers: shared hit=383 -- 大幅減少
Planning Time: 0.135 ms
Execution Time: 150.656 ms
(10 rows)
https://github.com/digoal/blog/blob/master/201809/20180919_02.md#%E4%B8%8E%E6%AD%A4%E7%B1%BB%E4%BC%BCbrin%E4%B9%9F%E6%9C%89%E7%B1%BB%E4%BC%BC%E7%9A%84%E6%83%85%E5%86%B5 與此類似,brin也有類似的情況。
處理方法類似。
https://github.com/digoal/blog/blob/master/201809/20180919_02.md#%E5%B0%8F%E7%BB%93 小結
資料庫為了降低索引引入的寫RT升高,采用了延遲合并的方法。如果資料庫長期寫壓力巨大,可能導緻未合并的LIST很大,導緻查詢性能受到影響。
使用pageinspect插件可以觀察未合并的pending list有多大。
使用vacuum可以強制合并pending list,提高查詢性能。