标簽
PostgreSQL , cpu 并行 , smp 并行 , 并行計算 , gpu 并行 , 并行過程支援
https://github.com/digoal/blog/blob/master/201903/20190317_13.md#%E8%83%8C%E6%99%AF 背景
PostgreSQL 11 優化器已經支援了非常多場合的并行。簡單估計,已支援27餘種場景的并行計算。
parallel seq scan
parallel index scan
parallel index only scan
parallel bitmap scan
parallel filter
parallel hash agg
parallel group agg
parallel cte
parallel subquery
parallel create table
parallel create index
parallel select into
parallel CREATE MATERIALIZED VIEW
parallel 排序 : gather merge
parallel nestloop join
parallel hash join
parallel merge join
parallel 自定義并行聚合
parallel 自定義并行UDF
parallel append
parallel union
parallel fdw table scan
parallel partition join
parallel partition agg
parallel gather
parallel gather merge
parallel rc 并行
parallel rr 并行
parallel GPU 并行
parallel unlogged table
lead parallel
接下來進行一一介紹。
關鍵知識請先自行了解:
1、優化器自動并行度算法 CBO
《PostgreSQL 9.6 并行計算 優化器算法淺析》 《PostgreSQL 11 并行計算算法,參數,強制并行度設定》https://github.com/digoal/blog/blob/master/201903/20190317_13.md#parallel-partition-table-wise-agg parallel partition table wise agg
并行分區表智能聚合(類似MPP)
當分區表聚合操作的分組字段為分區字段時,PostgreSQL優化器會選擇并行分區智能聚合。
《PostgreSQL 11 preview - 分區表智能并行聚合、分組計算(已類似MPP架構,性能暴增)》 《PostgreSQL 11 preview - 分區表智能并行JOIN (已類似MPP架構,性能暴增)》資料量:10億
場景 | 資料量 | 關閉并行 | 開啟并行 | 并行度 | 開啟并行性能提升倍數 |
---|---|---|---|---|---|
10億 | 191 秒 | 8 秒 | 24 | 23.9 倍 |
例子,24個分區的HASH分區表。
CREATE unlogged TABLE ccc (
order_id bigint not null,
cust_id bigint not null,
status text
) PARTITION BY HASH (cust_id);
do language plpgsql $$
declare
begin
for i in 0..23 loop
execute format('CREATE unlogged TABLE %s%s PARTITION OF %s FOR VALUES WITH (MODULUS %s, REMAINDER %s)', 'ccc', i, 'ccc', 24, i);
execute format('alter table %s%s set(parallel_workers =64)', 'ccc',i);
end loop;
end;
$$;
postgres=# \d ccc
Unlogged table "public.ccc"
Column | Type | Collation | Nullable | Default
----------+--------+-----------+----------+---------
order_id | bigint | | not null |
cust_id | bigint | | not null |
status | text | | |
Partition key: HASH (cust_id)
Number of partitions: 24 (Use \d+ to list them.)
寫入10億資料
insert into ccc select i, random()*960 from generate_series(1,1000000000) t(i);
vacuum (analyze,verbose) ccc;
postgres=# show max_worker_processes ;
max_worker_processes
----------------------
128
(1 row)
postgres=# set min_parallel_table_scan_size =0;
postgres=# set min_parallel_index_scan_size =0;
postgres=# set parallel_tuple_cost =0;
postgres=# set parallel_setup_cost =0;
postgres=# set max_parallel_workers=128;
postgres=# set max_parallel_workers_per_gather =24;
postgres=# set enable_parallel_hash =on;
postgres=# set enable_parallel_append =off;
postgres=# set enable_partitionwise_aggregate =on;
postgres=# set work_mem ='128MB';
https://github.com/digoal/blog/blob/master/201903/20190317_13.md#1%E5%85%B3%E9%97%AD%E5%B9%B6%E8%A1%8C%E8%80%97%E6%97%B6-191-%E7%A7%92 1、關閉并行,耗時: 191 秒。
postgres=# set max_parallel_workers_per_gather =0;
postgres=# set enable_parallel_append =off;
postgres=# set enable_partitionwise_aggregate =off;
postgres=# explain select cust_id,count(*) from ccc group by cust_id;
QUERY PLAN
-----------------------------------------------------------------------------
HashAggregate (cost=25405409.32..25405418.93 rows=961 width=16)
Group Key: ccc0.cust_id
-> Append (cost=0.00..20405411.99 rows=999999466 width=8)
-> Seq Scan on ccc0 (cost=0.00..641839.96 rows=41663296 width=8)
-> Seq Scan on ccc1 (cost=0.00..625842.88 rows=40624888 width=8)
-> Seq Scan on ccc2 (cost=0.00..722092.24 rows=46872124 width=8)
-> Seq Scan on ccc3 (cost=0.00..545581.20 rows=35414920 width=8)
-> Seq Scan on ccc4 (cost=0.00..657705.92 rows=42693192 width=8)
-> Seq Scan on ccc5 (cost=0.00..609843.00 rows=39586300 width=8)
-> Seq Scan on ccc6 (cost=0.00..625934.32 rows=40630732 width=8)
-> Seq Scan on ccc7 (cost=0.00..673876.80 rows=43742880 width=8)
-> Seq Scan on ccc8 (cost=0.00..601729.04 rows=39059604 width=8)
-> Seq Scan on ccc9 (cost=0.00..609919.96 rows=39591296 width=8)
-> Seq Scan on ccc10 (cost=0.00..674124.76 rows=43758976 width=8)
-> Seq Scan on ccc11 (cost=0.00..529544.24 rows=34373924 width=8)
-> Seq Scan on ccc12 (cost=0.00..818443.04 rows=53127004 width=8)
-> Seq Scan on ccc13 (cost=0.00..674104.80 rows=43757680 width=8)
-> Seq Scan on ccc14 (cost=0.00..786195.28 rows=51033728 width=8)
-> Seq Scan on ccc15 (cost=0.00..609709.04 rows=39577604 width=8)
-> Seq Scan on ccc16 (cost=0.00..633745.96 rows=41137896 width=8)
-> Seq Scan on ccc17 (cost=0.00..673962.32 rows=43748432 width=8)
-> Seq Scan on ccc18 (cost=0.00..802380.08 rows=52083808 width=8)
-> Seq Scan on ccc19 (cost=0.00..529621.20 rows=34378920 width=8)
-> Seq Scan on ccc20 (cost=0.00..642042.32 rows=41676432 width=8)
-> Seq Scan on ccc21 (cost=0.00..401251.50 rows=26046150 width=8)
-> Seq Scan on ccc22 (cost=0.00..673891.04 rows=43743804 width=8)
-> Seq Scan on ccc23 (cost=0.00..642033.76 rows=41675876 width=8)
(27 rows)
postgres=# select cust_id,count(*) from ccc group by cust_id;
cust_id | count
---------+---------
652 | 1041702
273 | 1041730
51 | 1043333
951 | 1041738
70 | 1042693
839 | 1042059
350 | 1040677
539 | 1042402
758 | 1042199
874 | 1043176
278 | 1041664
946 | 1041442
176 | 1041523
576 | 1041131
292 | 1041236
929 | 1041554
663 | 1041569
770 | 1041953
22 | 1041484
..........................
790 | 1041575
828 | 1042024
(961 rows)
Time: 191475.978 ms (03:11.476)
https://github.com/digoal/blog/blob/master/201903/20190317_13.md#2%E5%BC%80%E5%90%AF%E5%B9%B6%E8%A1%8C%E8%80%97%E6%97%B6-8-%E7%A7%92 2、開啟并行,耗時: 8 秒。
postgres=# set max_parallel_workers_per_gather =24;
postgres=# set enable_parallel_append =off;
postgres=# set enable_partitionwise_aggregate =on;
postgres=# explain select cust_id,count(*) from ccc group by cust_id;
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Append (cost=251248.61..6031194.59 rows=961 width=16)
-> Finalize GroupAggregate (cost=251248.61..251278.60 rows=40 width=16)
Group Key: ccc0.cust_id
-> Gather Merge (cost=251248.61..251273.40 rows=960 width=16)
Workers Planned: 24
-> Sort (cost=251248.03..251248.13 rows=40 width=16)
Sort Key: ccc0.cust_id
-> Partial HashAggregate (cost=251246.56..251246.96 rows=40 width=16)
Group Key: ccc0.cust_id
-> Parallel Seq Scan on ccc0 (cost=0.00..242566.71 rows=1735971 width=8)
-> Finalize GroupAggregate (cost=244986.56..245015.80 rows=39 width=16)
Group Key: ccc1.cust_id
-> Gather Merge (cost=244986.56..245010.73 rows=936 width=16)
Workers Planned: 24
-> Sort (cost=244985.98..244986.07 rows=39 width=16)
Sort Key: ccc1.cust_id
-> Partial HashAggregate (cost=244984.56..244984.95 rows=39 width=16)
Group Key: ccc1.cust_id
-> Parallel Seq Scan on ccc1 (cost=0.00..236521.04 rows=1692704 width=8)
-> Finalize GroupAggregate (cost=282669.29..282703.03 rows=45 width=16)
Group Key: ccc2.cust_id
-> Gather Merge (cost=282669.29..282697.18 rows=1080 width=16)
Workers Planned: 24
-> Sort (cost=282668.71..282668.82 rows=45 width=16)
Sort Key: ccc2.cust_id
-> Partial HashAggregate (cost=282667.02..282667.47 rows=45 width=16)
Group Key: ccc2.cust_id
-> Parallel Seq Scan on ccc2 (cost=0.00..272901.68 rows=1953068 width=8)
-> Finalize GroupAggregate (cost=213567.74..213593.23 rows=34 width=16)
Group Key: ccc3.cust_id
-> Gather Merge (cost=213567.74..213588.81 rows=816 width=16)
Workers Planned: 24
-> Sort (cost=213567.16..213567.25 rows=34 width=16)
Sort Key: ccc3.cust_id
-> Partial HashAggregate (cost=213565.96..213566.30 rows=34 width=16)
Group Key: ccc3.cust_id
-> Parallel Seq Scan on ccc3 (cost=0.00..206187.97 rows=1475597 width=8)
-> Finalize GroupAggregate (cost=257459.33..257490.07 rows=41 width=16)
Group Key: ccc4.cust_id
-> Gather Merge (cost=257459.33..257484.74 rows=984 width=16)
Workers Planned: 24
-> Sort (cost=257458.75..257458.86 rows=41 width=16)
Sort Key: ccc4.cust_id
-> Partial HashAggregate (cost=257457.24..257457.65 rows=41 width=16)
Group Key: ccc4.cust_id
-> Parallel Seq Scan on ccc4 (cost=0.00..248562.83 rows=1778883 width=8)
-> Finalize GroupAggregate (cost=238722.97..238751.46 rows=38 width=16)
Group Key: ccc5.cust_id
-> Gather Merge (cost=238722.97..238746.52 rows=912 width=16)
Workers Planned: 24
-> Sort (cost=238722.39..238722.48 rows=38 width=16)
Sort Key: ccc5.cust_id
-> Partial HashAggregate (cost=238721.01..238721.39 rows=38 width=16)
Group Key: ccc5.cust_id
-> Parallel Seq Scan on ccc5 (cost=0.00..230474.01 rows=1649401 width=8)
-> Finalize GroupAggregate (cost=245023.21..245052.45 rows=39 width=16)
Group Key: ccc6.cust_id
-> Gather Merge (cost=245023.21..245047.38 rows=936 width=16)
Workers Planned: 24
-> Sort (cost=245022.63..245022.72 rows=39 width=16)
Sort Key: ccc6.cust_id
-> Partial HashAggregate (cost=245021.21..245021.60 rows=39 width=16)
Group Key: ccc6.cust_id
-> Parallel Seq Scan on ccc6 (cost=0.00..236556.47 rows=1692947 width=8)
-> Finalize GroupAggregate (cost=263789.43..263820.92 rows=42 width=16)
Group Key: ccc7.cust_id
-> Gather Merge (cost=263789.43..263815.46 rows=1008 width=16)
Workers Planned: 24
-> Sort (cost=263788.85..263788.96 rows=42 width=16)
Sort Key: ccc7.cust_id
-> Partial HashAggregate (cost=263787.30..263787.72 rows=42 width=16)
Group Key: ccc7.cust_id
-> Parallel Seq Scan on ccc7 (cost=0.00..254674.20 rows=1822620 width=8)
-> Finalize GroupAggregate (cost=235547.21..235575.70 rows=38 width=16)
Group Key: ccc8.cust_id
-> Gather Merge (cost=235547.21..235570.76 rows=912 width=16)
Workers Planned: 24
-> Sort (cost=235546.63..235546.73 rows=38 width=16)
Sort Key: ccc8.cust_id
-> Partial HashAggregate (cost=235545.25..235545.63 rows=38 width=16)
Group Key: ccc8.cust_id
-> Parallel Seq Scan on ccc8 (cost=0.00..227407.83 rows=1627484 width=8)
-> Finalize GroupAggregate (cost=238753.52..238782.01 rows=38 width=16)
Group Key: ccc9.cust_id
-> Gather Merge (cost=238753.52..238777.07 rows=912 width=16)
Workers Planned: 24
-> Sort (cost=238752.94..238753.03 rows=38 width=16)
Sort Key: ccc9.cust_id
-> Partial HashAggregate (cost=238751.56..238751.94 rows=38 width=16)
Group Key: ccc9.cust_id
-> Parallel Seq Scan on ccc9 (cost=0.00..230503.37 rows=1649637 width=8)
-> Finalize GroupAggregate (cost=263886.49..263917.98 rows=42 width=16)
Group Key: ccc10.cust_id
-> Gather Merge (cost=263886.49..263912.52 rows=1008 width=16)
Workers Planned: 24
-> Sort (cost=263885.91..263886.02 rows=42 width=16)
Sort Key: ccc10.cust_id
-> Partial HashAggregate (cost=263884.36..263884.78 rows=42 width=16)
Group Key: ccc10.cust_id
-> Parallel Seq Scan on ccc10 (cost=0.00..254767.91 rows=1823291 width=8)
-> Finalize GroupAggregate (cost=207290.45..207315.19 rows=33 width=16)
Group Key: ccc11.cust_id
-> Gather Merge (cost=207290.45..207310.90 rows=792 width=16)
Workers Planned: 24
-> Sort (cost=207289.87..207289.95 rows=33 width=16)
Sort Key: ccc11.cust_id
-> Partial HashAggregate (cost=207288.70..207289.03 rows=33 width=16)
Group Key: ccc11.cust_id
-> Parallel Seq Scan on ccc11 (cost=0.00..200127.47 rows=1432247 width=8)
-> Finalize GroupAggregate (cost=320379.91..320418.15 rows=51 width=16)
Group Key: ccc12.cust_id
-> Gather Merge (cost=320379.91..320411.52 rows=1224 width=16)
Workers Planned: 24
-> Sort (cost=320379.33..320379.46 rows=51 width=16)
Sort Key: ccc12.cust_id
-> Partial HashAggregate (cost=320377.38..320377.89 rows=51 width=16)
Group Key: ccc12.cust_id
-> Parallel Seq Scan on ccc12 (cost=0.00..309309.25 rows=2213625 width=8)
-> Finalize GroupAggregate (cost=263878.68..263910.17 rows=42 width=16)
Group Key: ccc13.cust_id
-> Gather Merge (cost=263878.68..263904.71 rows=1008 width=16)
Workers Planned: 24
-> Sort (cost=263878.10..263878.21 rows=42 width=16)
Sort Key: ccc13.cust_id
-> Partial HashAggregate (cost=263876.55..263876.97 rows=42 width=16)
Group Key: ccc13.cust_id
-> Parallel Seq Scan on ccc13 (cost=0.00..254760.37 rows=1823237 width=8)
-> Finalize GroupAggregate (cost=307756.52..307793.26 rows=49 width=16)
Group Key: ccc14.cust_id
-> Gather Merge (cost=307756.52..307786.89 rows=1176 width=16)
Workers Planned: 24
-> Sort (cost=307755.94..307756.07 rows=49 width=16)
Sort Key: ccc14.cust_id
-> Partial HashAggregate (cost=307754.08..307754.57 rows=49 width=16)
Group Key: ccc14.cust_id
-> Parallel Seq Scan on ccc14 (cost=0.00..297122.05 rows=2126405 width=8)
-> Finalize GroupAggregate (cost=238670.96..238699.45 rows=38 width=16)
Group Key: ccc15.cust_id
-> Gather Merge (cost=238670.96..238694.51 rows=912 width=16)
Workers Planned: 24
-> Sort (cost=238670.38..238670.48 rows=38 width=16)
Sort Key: ccc15.cust_id
-> Partial HashAggregate (cost=238669.00..238669.38 rows=38 width=16)
Group Key: ccc15.cust_id
-> Parallel Seq Scan on ccc15 (cost=0.00..230423.67 rows=1649067 width=8)
-> Finalize GroupAggregate (cost=248080.23..248110.22 rows=40 width=16)
Group Key: ccc16.cust_id
-> Gather Merge (cost=248080.23..248105.02 rows=960 width=16)
Workers Planned: 24
-> Sort (cost=248079.65..248079.75 rows=40 width=16)
Sort Key: ccc16.cust_id
-> Partial HashAggregate (cost=248078.19..248078.59 rows=40 width=16)
Group Key: ccc16.cust_id
-> Parallel Seq Scan on ccc16 (cost=0.00..239507.79 rows=1714079 width=8)
-> Finalize GroupAggregate (cost=263822.24..263853.73 rows=42 width=16)
Group Key: ccc17.cust_id
-> Gather Merge (cost=263822.24..263848.27 rows=1008 width=16)
Workers Planned: 24
-> Sort (cost=263821.66..263821.77 rows=42 width=16)
Sort Key: ccc17.cust_id
-> Partial HashAggregate (cost=263820.11..263820.53 rows=42 width=16)
Group Key: ccc17.cust_id
-> Parallel Seq Scan on ccc17 (cost=0.00..254706.07 rows=1822807 width=8)
-> Finalize GroupAggregate (cost=314097.79..314135.28 rows=50 width=16)
Group Key: ccc18.cust_id
-> Gather Merge (cost=314097.79..314128.78 rows=1200 width=16)
Workers Planned: 24
-> Sort (cost=314097.21..314097.33 rows=50 width=16)
Sort Key: ccc18.cust_id
-> Partial HashAggregate (cost=314095.30..314095.80 rows=50 width=16)
Group Key: ccc18.cust_id
-> Parallel Seq Scan on ccc18 (cost=0.00..303244.20 rows=2170220 width=8)
-> Finalize GroupAggregate (cost=207320.57..207345.31 rows=33 width=16)
Group Key: ccc19.cust_id
-> Gather Merge (cost=207320.57..207341.02 rows=792 width=16)
Workers Planned: 24
-> Sort (cost=207319.99..207320.07 rows=33 width=16)
Sort Key: ccc19.cust_id
-> Partial HashAggregate (cost=207318.83..207319.15 rows=33 width=16)
Group Key: ccc19.cust_id
-> Parallel Seq Scan on ccc19 (cost=0.00..200156.55 rows=1432455 width=8)
-> Finalize GroupAggregate (cost=251327.81..251357.81 rows=40 width=16)
Group Key: ccc20.cust_id
-> Gather Merge (cost=251327.81..251352.61 rows=960 width=16)
Workers Planned: 24
-> Sort (cost=251327.23..251327.33 rows=40 width=16)
Sort Key: ccc20.cust_id
-> Partial HashAggregate (cost=251325.77..251326.17 rows=40 width=16)
Group Key: ccc20.cust_id
-> Parallel Seq Scan on ccc20 (cost=0.00..242643.18 rows=1736518 width=8)
-> Finalize GroupAggregate (cost=157070.25..157089.00 rows=25 width=16)
Group Key: ccc21.cust_id
-> Gather Merge (cost=157070.25..157085.75 rows=600 width=16)
Workers Planned: 24
-> Sort (cost=157069.67..157069.74 rows=25 width=16)
Sort Key: ccc21.cust_id
-> Partial HashAggregate (cost=157068.84..157069.09 rows=25 width=16)
Group Key: ccc21.cust_id
-> Parallel Seq Scan on ccc21 (cost=0.00..151642.56 rows=1085256 width=8)
-> Finalize GroupAggregate (cost=263795.01..263826.50 rows=42 width=16)
Group Key: ccc22.cust_id
-> Gather Merge (cost=263795.01..263821.04 rows=1008 width=16)
Workers Planned: 24
-> Sort (cost=263794.43..263794.53 rows=42 width=16)
Sort Key: ccc22.cust_id
-> Partial HashAggregate (cost=263792.88..263793.29 rows=42 width=16)
Group Key: ccc22.cust_id
-> Parallel Seq Scan on ccc22 (cost=0.00..254679.58 rows=1822658 width=8)
-> Finalize GroupAggregate (cost=251324.47..251354.46 rows=40 width=16)
Group Key: ccc23.cust_id
-> Gather Merge (cost=251324.47..251349.26 rows=960 width=16)
Workers Planned: 24
-> Sort (cost=251323.89..251323.99 rows=40 width=16)
Sort Key: ccc23.cust_id
-> Partial HashAggregate (cost=251322.42..251322.82 rows=40 width=16)
Group Key: ccc23.cust_id
-> Parallel Seq Scan on ccc23 (cost=0.00..242639.95 rows=1736495 width=8)
(217 rows)
postgres=# select cust_id,count(*) from ccc group by cust_id;
cust_id | count
---------+---------
16 | 1041453
34 | 1042275
54 | 1039714
60 | 1040715
81 | 1041507
108 | 1042387
119 | 1041137
151 | 1041746
153 | 1042255
186 | 1041705
..................................
588 | 1040386
629 | 1043353
686 | 1041974
690 | 1040047
693 | 1042870
725 | 1042487
765 | 1040199
770 | 1041953
795 | 1042219
801 | 1042153
807 | 1041840
821 | 1041773
870 | 1041345
874 | 1043176
895 | 1042988
915 | 1040430
934 | 1043436
951 | 1041738
(961 rows)
Time: 8142.401 ms (00:08.142)
https://github.com/digoal/blog/blob/master/201903/20190317_13.md#%E5%85%B6%E4%BB%96%E7%9F%A5%E8%AF%86 其他知識
2、function, op 識别是否支援parallel
postgres=# select proparallel,proname from pg_proc;
proparallel | proname
-------------+----------------------------------------------
s | boolin
s | boolout
s | byteain
s | byteaout
3、subquery mapreduce unlogged table
對于一些情況,如果期望簡化優化器對非常非常複雜的SQL并行優化的負擔,可以自己将SQL拆成幾段,中間結果使用unlogged table儲存,類似mapreduce的思想。unlogged table同樣支援parallel 計算。
4、vacuum,垃圾回收并行。
5、dblink 異步調用并行
《PostgreSQL VOPS 向量計算 + DBLINK異步并行 - 單執行個體 10億 聚合計算跑進2秒》 《PostgreSQL 相似搜尋分布式架構設計與實踐 - dblink異步調用與多機并行(遠端 遊标+記錄 UDF執行個體)》 《PostgreSQL dblink異步調用實作 并行hash分片JOIN - 含資料交、并、差 提速案例 - 含dblink VS pg 11 parallel hash join VS pg 11 智能分區JOIN》暫時不允許并行的場景(将來PG會繼續擴大支援範圍):
1、修改行,鎖行,除了create table as , select into, create mview這幾個可以使用并行。
2、query 會被中斷時,例如cursor , loop in PL/SQL ,因為涉及到中間處理,是以不建議開啟并行。
3、paralle unsafe udf ,這種UDF不會并行
4、嵌套并行(udf (内部query并行)),外部調用這個UDF的SQL不會并行。(主要是防止large parallel workers )
5、SSI 隔離級别
https://github.com/digoal/blog/blob/master/201903/20190317_13.md#%E5%8F%82%E8%80%83 參考
https://www.postgresql.org/docs/11/parallel-plans.html 《PostgreSQL 11 preview - 并行計算 增強 彙總》 《PostgreSQL 10 自定義并行計算聚合函數的原理與實踐 - (含array_agg合并多個數組為單個一進制數組的例子)》https://github.com/digoal/blog/blob/master/201903/20190317_13.md#%E5%85%8D%E8%B4%B9%E9%A2%86%E5%8F%96%E9%98%BF%E9%87%8C%E4%BA%91rds-postgresql%E5%AE%9E%E4%BE%8Becs%E8%99%9A%E6%8B%9F%E6%9C%BA 免費領取阿裡雲RDS PostgreSQL執行個體、ECS虛拟機
