天天看點

PostgreSQL 9.x, 10, 11 hash分區表 用法舉例

标簽

PostgreSQL , 分區表 , 優化器 , 分區過濾 , hash 分區

https://github.com/digoal/blog/blob/master/201805/20180524_05.md#%E8%83%8C%E6%99%AF 背景

PostgreSQL 10開始内置分區表文法,當時隻支援了range,list兩種分區,實際上可以通過LIST實作HASH分區。

https://github.com/digoal/blog/blob/master/201805/20180524_05.md#postgresql-10-hash-%E5%88%86%E5%8C%BA%E8%A1%A8 PostgreSQL 10 hash 分區表

使用list支援hash分區

postgres=# create table p (id int , info text, crt_time timestamp) partition by list (abs(mod(id,4)));  
CREATE TABLE  
  
postgres=# create table p0 partition of p for values in (0);  
CREATE TABLE  
postgres=# create table p1 partition of p for values in (1);  
CREATE TABLE  
postgres=# create table p2 partition of p for values in (2);  
CREATE TABLE  
postgres=# create table p3 partition of p for values in (3);  
CREATE TABLE  
           

分區表如下

postgres=# \d+ p  
                                                Table "public.p"  
  Column  |            Type             | Collation | Nullable | Default | Storage  | Stats target | Description   
----------+-----------------------------+-----------+----------+---------+----------+--------------+-------------  
 id       | integer                     |           |          |         | plain    |              |   
 info     | text                        |           |          |         | extended |              |   
 crt_time | timestamp without time zone |           |          |         | plain    |              |   
Partition key: LIST (abs(mod(id, 4)))  
Partitions: p0 FOR VALUES IN (0),  
            p1 FOR VALUES IN (1),  
            p2 FOR VALUES IN (2),  
            p3 FOR VALUES IN (3)  
           

寫入資料

postgres=# insert into p select generate_series(1,1000),md5(random()::text),now();  
INSERT 0 1000  
postgres=# select tableoid::regclass,id from p limit 10;  
 tableoid | id   
----------+----  
 p0       |  4  
 p0       |  8  
 p0       | 12  
 p0       | 16  
 p0       | 20  
 p0       | 24  
 p0       | 28  
 p0       | 32  
 p0       | 36  
 p0       | 40  
(10 rows)  
           

普通的查詢,無法做到分區的過濾

postgres=# explain select * from p where id=1 ;  
                        QUERY PLAN                          
----------------------------------------------------------  
 Append  (cost=0.00..96.50 rows=24 width=44)  
   ->  Seq Scan on p0  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 1)  
   ->  Seq Scan on p1  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 1)  
   ->  Seq Scan on p2  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 1)  
   ->  Seq Scan on p3  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 1)  
(9 rows)  
           

一定要帶上分區條件,才可以做到分區過濾

postgres=# explain select * from p where id=1 and abs(mod(id, 4))=abs(mod(1, 4));  
                        QUERY PLAN                          
----------------------------------------------------------  
 Append  (cost=0.00..32.60 rows=1 width=44)  
   ->  Seq Scan on p1  (cost=0.00..32.60 rows=1 width=44)  
         Filter: ((id = 1) AND (abs(mod(id, 4)) = 1))  
(3 rows)  
           

https://github.com/digoal/blog/blob/master/201805/20180524_05.md#postgresql-11-hash-%E5%88%86%E5%8C%BA%E8%A1%A8 PostgreSQL 11 hash 分區表

PostgreSQL 11同樣可以使用與10一樣的方法,LIST來實作HASH分區,但是有一個更加優雅的方法,直接使用HASH分區。

postgres=# create table p (id int , info text, crt_time timestamp) partition by hash (id);  
CREATE TABLE  
  
postgres=# create table p0 partition of p  for values WITH (MODULUS 4, REMAINDER 0);  
CREATE TABLE  
postgres=# create table p1 partition of p  for values WITH (MODULUS 4, REMAINDER 1);  
CREATE TABLE  
postgres=# create table p2 partition of p  for values WITH (MODULUS 4, REMAINDER 2);  
CREATE TABLE  
postgres=# create table p3 partition of p  for values WITH (MODULUS 4, REMAINDER 3);  
CREATE TABLE  
           

表結構如下

postgres=# \d+ p  
                                                Table "public.p"  
  Column  |            Type             | Collation | Nullable | Default | Storage  | Stats target | Description   
----------+-----------------------------+-----------+----------+---------+----------+--------------+-------------  
 id       | integer                     |           |          |         | plain    |              |   
 info     | text                        |           |          |         | extended |              |   
 crt_time | timestamp without time zone |           |          |         | plain    |              |   
Partition key: HASH (id)  
Partitions: p0 FOR VALUES WITH (modulus 4, remainder 0),  
            p1 FOR VALUES WITH (modulus 4, remainder 1),  
            p2 FOR VALUES WITH (modulus 4, remainder 2),  
            p3 FOR VALUES WITH (modulus 4, remainder 3)  
           

表分區定義,内置的限制是一個HASH函數的傳回值

postgres=# \d+ p0  
                                                Table "public.p0"  
  Column  |            Type             | Collation | Nullable | Default | Storage  | Stats target | Description   
----------+-----------------------------+-----------+----------+---------+----------+--------------+-------------  
 id       | integer                     |           |          |         | plain    |              |   
 info     | text                        |           |          |         | extended |              |   
 crt_time | timestamp without time zone |           |          |         | plain    |              |   
Partition of: p FOR VALUES WITH (modulus 4, remainder 0)  
Partition constraint: satisfies_hash_partition('180289'::oid, 4, 0, id)  
  
  
postgres=# \d+ p1  
                                                Table "public.p1"  
  Column  |            Type             | Collation | Nullable | Default | Storage  | Stats target | Description   
----------+-----------------------------+-----------+----------+---------+----------+--------------+-------------  
 id       | integer                     |           |          |         | plain    |              |   
 info     | text                        |           |          |         | extended |              |   
 crt_time | timestamp without time zone |           |          |         | plain    |              |   
Partition of: p FOR VALUES WITH (modulus 4, remainder 1)  
Partition constraint: satisfies_hash_partition('180289'::oid, 4, 1, id)  
           

這個hash函數的定義如下,他一定是一個immutable 函數,是以可以用于分區過濾

postgres=# \x  
Expanded display is on.  
postgres=# \df+ satisfies_hash_partition  
List of functions  
-[ RECORD 1 ]-------+--------------------------------------  
Schema              | pg_catalog  
Name                | satisfies_hash_partition  
Result data type    | boolean  
Argument data types | oid, integer, integer, VARIADIC "any"  
Type                | func  
Volatility          | immutable  
Parallel            | safe  
Owner               | postgres  
Security            | invoker  
Access privileges   |   
Language            | internal  
Source code         | satisfies_hash_partition  
Description         | hash partition CHECK constraint  
           

PostgreSQL 11終于可以隻輸入分區字段值就可以做到分區過濾了

postgres=# explain select * from p where id=1;  
                        QUERY PLAN                          
----------------------------------------------------------  
 Append  (cost=0.00..24.16 rows=6 width=44)  
   ->  Seq Scan on p0  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 1)  
(3 rows)  
  
postgres=# explain select * from p where id=0;  
                        QUERY PLAN                          
----------------------------------------------------------  
 Append  (cost=0.00..24.16 rows=6 width=44)  
   ->  Seq Scan on p0  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 0)  
(3 rows)  
  
postgres=# explain select * from p where id=2;  
                        QUERY PLAN                          
----------------------------------------------------------  
 Append  (cost=0.00..24.16 rows=6 width=44)  
   ->  Seq Scan on p2  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 2)  
(3 rows)  
  
postgres=# explain select * from p where id=3;  
                        QUERY PLAN                          
----------------------------------------------------------  
 Append  (cost=0.00..24.16 rows=6 width=44)  
   ->  Seq Scan on p1  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 3)  
(3 rows)  
           

它受控于一個開關,當關閉後,就無法隻通過分區值來過濾分區。

postgres=# set enable_partition_pruning =off;  
SET  
postgres=# explain select * from p where id=0;  
                        QUERY PLAN                          
----------------------------------------------------------  
 Append  (cost=0.00..96.62 rows=24 width=44)  
   ->  Seq Scan on p0  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 0)  
   ->  Seq Scan on p1  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 0)  
   ->  Seq Scan on p2  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 0)  
   ->  Seq Scan on p3  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 0)  
(9 rows)  
           

https://github.com/digoal/blog/blob/master/201805/20180524_05.md#postgresql-%E7%BB%A7%E6%89%BF%E8%A1%A8-hash-%E5%88%86%E5%8C%BA%E8%A1%A8%E5%AE%9E%E7%8E%B0 PostgreSQL 繼承表 hash 分區表實作

《PostgreSQL 傳統 hash 分區方法和性能》
postgres=# explain select * from tbl where abs(mod(id,4)) = abs(mod(1,4)) and id=1;    
                                QUERY PLAN                                    
--------------------------------------------------------------------------    
 Append  (cost=0.00..979127.84 rows=3 width=45)    
   ->  Seq Scan on tbl  (cost=0.00..840377.67 rows=2 width=45)    
         Filter: ((id = 1) AND (abs(mod(id, 4)) = 1))    
   ->  Seq Scan on tbl1  (cost=0.00..138750.17 rows=1 width=45)    
         Filter: ((id = 1) AND (abs(mod(id, 4)) = 1))    
(5 rows)    
           

https://github.com/digoal/blog/blob/master/201805/20180524_05.md#pg_pathman%E5%88%86%E5%8C%BA%E6%96%B9%E6%B3%95 pg_pathman分區方法

支援9.5以上的版本

《PostgreSQL 9.5+ 高效分區表實作 - pg_pathman》

https://github.com/digoal/blog/blob/master/201805/20180524_05.md#%E5%B0%8F%E7%BB%93 小結

1、PostgreSQL 10内置分區表,為了HASH分區,可以使用LIST分區的方法,但是為了讓資料庫可以自動過濾分區,一定要帶上HASH分區條件表達式到SQL中。

2、PostgreSQL 11内置分區表,内置了HASH分區,并且支援隻按照HASH分區條件,自動過濾分區。

3、繼承表的方法,同樣可以實作HASH分區,需要建立觸發器,同時主表在查詢時依舊會被查詢到。

以上三種方法,必須保證

constraint_exclusion

參數設定為partition或者on, 否則無法做到分區自動過濾。

對于PostgreSQL 11,為了實作隻輸入分區字段的值就能夠滿足分區自動過濾,還需要設定

enable_partition_pruning

為on.

索性這些參數預設都是OK的。

4、pg_pathman是通過custom scan接口實作的分區,是目前為止,性能最好的,鎖粒度最低的方法。

https://github.com/digoal/blog/blob/master/201805/20180524_05.md#%E5%8F%82%E8%80%83 參考

《PostgreSQL 11 preview - 分區表 增強 彙總》 《PostgreSQL 自動建立分區實踐 - 寫入觸發器》 《PostgreSQL 11 preview - 分區過濾控制參數 - enable_partition_pruning》 《Greenplum 計算能力估算 - 暨多大表需要分區,單個分區多大适宜》 《PostgreSQL 11 preview - 分區表智能并行聚合、分組計算(已類似MPP架構,性能暴增)》 《PostgreSQL 并行vacuum patch - 暨為什麼需要并行vacuum或分區表》 《分區表鎖粒度差異 - pg_pathman VS native partition table》 《PostgreSQL 11 preview - 分區表用法及增強 - 增加HASH分區支援 (hash, range, list)》 《PostgreSQL 11 preview - Parallel Append(包括 union all\分區查詢) (多表并行計算) sharding架構并行計算核心功能之一》 《PostgreSQL 11 preview - 分區表智能并行JOIN (已類似MPP架構,性能暴增)》 《PostgreSQL 查詢涉及分區表過多導緻的性能問題 - 性能診斷與優化(大量BIND, spin lock, SLEEP程序)》 《PostgreSQL 商用版本EPAS(阿裡雲ppas(Oracle 相容版)) - 分區表性能優化 (堪比pg_pathman)》 《HTAP資料庫 PostgreSQL 場景與性能測試之 45 - (OLTP) 資料量與性能的線性關系(10億+無衰減), 暨單表多大需要分區》 《PostgreSQL 10 内置分區 vs pg_pathman perf profiling》 《PostgreSQL 10.0 preview 功能增強 - 内置分區表》

繼續閱讀