天天看點

【重新發現PostgreSQL之美】- 22 黃帝内經

背景

場景:

電商、社交、SAAS軟體行業.

使用者表、租戶表、使用者行為表.

挑戰:

按企業、appid等hash分區可能出現傾斜, 例如某些APPID的資料量可能非常大, 熱資料可能在同一個分區, saas的某些大企業可能被分到同一分區, 使得資料傾斜.

PG 解決方案:

非對稱分區方法.

例子

1、建立1個分區表, 按appid hash分區

appid int,      
c1 int,      
c2 int,      
info text,      
ts timestamp      
) partition by hash (appid);      
create unlogged table t0 partition of t for values with (modulus 4, REMAINDER 0);      
create unlogged table t1 partition of t for values with (modulus 4, REMAINDER 1);      
create unlogged table t2 partition of t for values with (modulus 4, REMAINDER 2);      
create unlogged table t3 partition of t for values with (modulus 4, REMAINDER 3);      

2、寫入2000萬記錄, 其中1000萬appid=1, 制造傾斜

insert into t select 1, random()*100, random()*1000, md5(random()::text), clock_timestamp() from generate_series(1,10000000);      
List of relations      
Schema |   Name   |       Type        |  Owner   | Persistence | Access method |  Size   | Description      
--------+----------+-------------------+----------+-------------+---------------+---------+-------------      
public | t        | partitioned table | postgres | unlogged    |               | 0 bytes |      
public | t0       | table             | postgres | unlogged    | heap          | 1014 MB |      
public | t1       | table             | postgres | unlogged    | heap          | 189 MB  |      
public | t2       | table             | postgres | unlogged    | heap          | 222 MB  |      
public | t3       | table             | postgres | unlogged    | heap          | 186 MB  |      

3、解決辦法, 對于appid=1的分區, 再次按c1 hash分區, 其他分區保持不變.

create unlogged table tt0(like t) PARTITION BY hash (c1);      
create unlogged table tt0_0 partition of tt0 for values with (modulus 4, REMAINDER 0);      
create unlogged table tt0_1 partition of tt0 for values with (modulus 4, REMAINDER 1);      
create unlogged table tt0_2 partition of tt0 for values with (modulus 4, REMAINDER 2);      
create unlogged table tt0_3 partition of tt0 for values with (modulus 4, REMAINDER 3);      
alter table t DETACH PARTITION t0;      
insert into tt0 select * from t0;      
alter table t attach partition tt0 FOR VALUES WITH ( MODULUS 4, REMAINDER 0 );      
end;      
Unlogged partitioned table "public.t"      
Column |            Type             | Collation | Nullable | Default | Storage  | Compression | Stats target | Description      
--------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------      
appid  | integer                     |           |          |         | plain    |             |              |      
c1     | integer                     |           |          |         | plain    |             |              |      
c2     | integer                     |           |          |         | plain    |             |              |      
info   | text                        |           |          |         | extended |             |              |      
ts     | timestamp without time zone |           |          |         | plain    |             |              |      
Partition key: HASH (appid)      
Partitions: t1 FOR VALUES WITH (modulus 4, remainder 1),      
t2 FOR VALUES WITH (modulus 4, remainder 2),      
t3 FOR VALUES WITH (modulus 4, remainder 3),      
tt0 FOR VALUES WITH (modulus 4, remainder 0), PARTITIONED      
postgres=# \d+ tt0      
Unlogged partitioned table "public.tt0"      
Column |            Type             | Collation | Nullable | Default | Storage  | Compression | Stats target | Description      
--------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------      
appid  | integer                     |           |          |         | plain    |             |              |      
c1     | integer                     |           |          |         | plain    |             |              |      
c2     | integer                     |           |          |         | plain    |             |              |      
info   | text                        |           |          |         | extended |             |              |      
ts     | timestamp without time zone |           |          |         | plain    |             |              |      
Partition of: t FOR VALUES WITH (modulus 4, remainder 0)      
Partition constraint: satisfies_hash_partition('16552'::oid, 4, 0, appid)      
Partition key: HASH (c1)      
Partitions: tt0_0 FOR VALUES WITH (modulus 4, remainder 0),      
tt0_1 FOR VALUES WITH (modulus 4, remainder 1),      
tt0_2 FOR VALUES WITH (modulus 4, remainder 2),      
tt0_3 FOR VALUES WITH (modulus 4, remainder 3)      

可以看到appid=1的資料已經打散

List of relations      
Schema |   Name   |       Type        |  Owner   | Persistence | Access method |  Size   | Description      
--------+----------+-------------------+----------+-------------+---------------+---------+-------------      
public | t        | partitioned table | postgres | unlogged    |               | 0 bytes |      
public | t1       | table             | postgres | unlogged    | heap          | 189 MB  |      
public | t2       | table             | postgres | unlogged    | heap          | 222 MB  |      
public | t3       | table             | postgres | unlogged    | heap          | 186 MB  |      
public | tt0      | partitioned table | postgres | unlogged    |               | 0 bytes |      
public | tt0_0    | table             | postgres | unlogged    | heap          | 279 MB  |      
public | tt0_1    | table             | postgres | unlogged    | heap          | 309 MB  |      
public | tt0_2    | table             | postgres | unlogged    | heap          | 254 MB  |      
public | tt0_3    | table             | postgres | unlogged    | heap          | 172 MB  |      

4、檢查執行計劃, 分區被正常路由

QUERY PLAN      
------------------------------------------------------------      
Seq Scan on t2 t  (cost=0.00..62947.58 rows=9965 width=53)      
Filter: (appid = 2)      
(2 rows)      
postgres=# explain select * from t where appid=1;      
QUERY PLAN      
--------------------------------------------------------------------------      
Append  (cost=0.00..337327.48 rows=10031536 width=53)      
->  Seq Scan on tt0_0 t_1  (cost=0.00..78943.27 rows=2771208 width=53)      
Filter: (appid = 1)      
->  Seq Scan on tt0_1 t_2  (cost=0.00..87582.51 rows=3057186 width=53)      
Filter: (appid = 1)      
->  Seq Scan on tt0_2 t_3  (cost=0.00..71825.61 rows=2499200 width=53)      
Filter: (appid = 1)      
->  Seq Scan on tt0_3 t_4  (cost=0.00..48818.40 rows=1703942 width=53)      
Filter: (appid = 1)      
(9 rows)      
postgres=# explain select * from t where appid=1 and c1=1;      
QUERY PLAN      
----------------------------------------------------------------      
Seq Scan on tt0_0 t  (cost=0.00..87595.73 rows=99763 width=53)      
Filter: ((appid = 1) AND (c1 = 1))      
(2 rows)      

層次不限、分區方法不限

1、如果t1也有問題, 我們可以繼續對t1分區進行二次分區, 選擇不同的分區個數

create unlogged table tt1(like t) PARTITION BY hash (c1);      
create unlogged table tt1_0 partition of tt1 for values with (modulus 5, REMAINDER 0);      
create unlogged table tt1_1 partition of tt1 for values with (modulus 5, REMAINDER 1);      
create unlogged table tt1_2 partition of tt1 for values with (modulus 5, REMAINDER 2);      
create unlogged table tt1_3 partition of tt1 for values with (modulus 5, REMAINDER 3);      
create unlogged table tt1_4 partition of tt1 for values with (modulus 5, REMAINDER 4);      
alter table t DETACH PARTITION t1;      
insert into tt1 select * from t1;      
alter table t attach partition tt1 FOR VALUES WITH ( MODULUS 4, REMAINDER 1 );      
end;      
Unlogged partitioned table "public.t"      
Column |            Type             | Collation | Nullable | Default | Storage  | Compression | Stats target | Description      
--------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------      
appid  | integer                     |           |          |         | plain    |             |              |      
c1     | integer                     |           |          |         | plain    |             |              |      
c2     | integer                     |           |          |         | plain    |             |              |      
info   | text                        |           |          |         | extended |             |              |      
ts     | timestamp without time zone |           |          |         | plain    |             |              |      
Partition key: HASH (appid)      
Partitions: t2 FOR VALUES WITH (modulus 4, remainder 2),      
t3 FOR VALUES WITH (modulus 4, remainder 3),      
tt0 FOR VALUES WITH (modulus 4, remainder 0), PARTITIONED,      
tt1 FOR VALUES WITH (modulus 4, remainder 1), PARTITIONED      
postgres=# \d+ tt1      
Unlogged partitioned table "public.tt1"      
Column |            Type             | Collation | Nullable | Default | Storage  | Compression | Stats target | Description      
--------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------      
appid  | integer                     |           |          |         | plain    |             |              |      
c1     | integer                     |           |          |         | plain    |             |              |      
c2     | integer                     |           |          |         | plain    |             |              |      
info   | text                        |           |          |         | extended |             |              |      
ts     | timestamp without time zone |           |          |         | plain    |             |              |      
Partition of: t FOR VALUES WITH (modulus 4, remainder 1)      
Partition constraint: satisfies_hash_partition('16552'::oid, 4, 1, appid)      
Partition key: HASH (c1)      
Partitions: tt1_0 FOR VALUES WITH (modulus 5, remainder 0),      
tt1_1 FOR VALUES WITH (modulus 5, remainder 1),      
tt1_2 FOR VALUES WITH (modulus 5, remainder 2),      
tt1_3 FOR VALUES WITH (modulus 5, remainder 3),      
tt1_4 FOR VALUES WITH (modulus 5, remainder 4)      

路由正常

QUERY PLAN      
-----------------------------------------------------------------------      
Append  (cost=0.00..53418.63 rows=9951 width=53)      
->  Seq Scan on tt1_0 t_1  (cost=0.00..12278.30 rows=2289 width=53)      
Filter: (appid = 3)      
->  Seq Scan on tt1_1 t_2  (cost=0.00..8540.11 rows=1593 width=53)      
Filter: (appid = 3)      
->  Seq Scan on tt1_2 t_3  (cost=0.00..10130.17 rows=1886 width=53)      
Filter: (appid = 3)      
->  Seq Scan on tt1_3 t_4  (cost=0.00..9849.98 rows=1837 width=53)      
Filter: (appid = 3)      
->  Seq Scan on tt1_4 t_5  (cost=0.00..12570.31 rows=2346 width=53)      
Filter: (appid = 3)      
(11 rows)      
postgres=# explain select * from t where appid=3 and c1=2;      
QUERY PLAN      
-------------------------------------------------------------      
Seq Scan on tt1_1 t  (cost=0.00..9476.14 rows=103 width=53)      
Filter: ((appid = 3) AND (c1 = 2))      
(2 rows)      

2、我們也可以對t2使用range或list分區

create unlogged table tt2(like t) PARTITION BY range (ts);      
create unlogged table tt2_2021 partition of tt2 for values from ('2021-01-01') to ('2022-01-01');      
create unlogged table tt2_2022 partition of tt2 for values from ('2022-01-01') to ('2023-01-01');      
alter table t DETACH PARTITION t2;      
insert into tt2 select * from t2;      
alter table t attach partition tt2 FOR VALUES WITH ( MODULUS 4, REMAINDER 2 );      
end;      
Unlogged partitioned table "public.t"      
Column |            Type             | Collation | Nullable | Default | Storage  | Compression | Stats target | Description      
--------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------      
appid  | integer                     |           |          |         | plain    |             |              |      
c1     | integer                     |           |          |         | plain    |             |              |      
c2     | integer                     |           |          |         | plain    |             |              |      
info   | text                        |           |          |         | extended |             |              |      
ts     | timestamp without time zone |           |          |         | plain    |             |              |      
Partition key: HASH (appid)      
Partitions: t3 FOR VALUES WITH (modulus 4, remainder 3),      
tt0 FOR VALUES WITH (modulus 4, remainder 0), PARTITIONED,      
tt1 FOR VALUES WITH (modulus 4, remainder 1), PARTITIONED,      
tt2 FOR VALUES WITH (modulus 4, remainder 2), PARTITIONED      
postgres=# \d+ tt2      
Unlogged partitioned table "public.tt2"      
Column |            Type             | Collation | Nullable | Default | Storage  | Compression | Stats target | Description      
--------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------      
appid  | integer                     |           |          |         | plain    |             |              |      
c1     | integer                     |           |          |         | plain    |             |              |      
c2     | integer                     |           |          |         | plain    |             |              |      
info   | text                        |           |          |         | extended |             |              |      
ts     | timestamp without time zone |           |          |         | plain    |             |              |      
Partition of: t FOR VALUES WITH (modulus 4, remainder 2)      
Partition constraint: satisfies_hash_partition('16552'::oid, 4, 2, appid)      
Partition key: RANGE (ts)      
Partitions: tt2_2021 FOR VALUES FROM ('2021-01-01 00:00:00') TO ('2022-01-01 00:00:00'),      
tt2_2022 FOR VALUES FROM ('2022-01-01 00:00:00') TO ('2023-01-01 00:00:00')      

使用PG, 支援非對稱分區(深度不對稱、分區方法不對稱、分區數不對稱), 徹底解決資料傾斜問題.

除了分, 還能合, 對于某些比較小的分區可以合起來減少分區數. 和黃帝内經的人與自然,五行平衡學說類似, 實則瀉之,虛則補之.

參考

《如何修改PostgreSQL分區表分區範圍- detach attach - 拆分、合并、非平衡分區表、深度不一緻分區表》 《PostgreSQL hash分區表擴容、縮容(增加分區、減少分區、分區重分布、拆分區、合并分區), hash算法hash_any, 混合hash MODULUS 分區- attach , detach》 https://www.postgresql.org/docs/14/sql-createtable.html

繼續閱讀