天天看點

PostgreSQL 大寬表,全列索引,高并發合并寫入(insert into on conflict, upsert, merge insert) - 實時adhoc query

标簽

PostgreSQL , 全列索引 , 大寬表 , 寫測試 , insert on conflict , upsert , merge insert , adhoc query

https://github.com/digoal/blog/blob/master/201901/20190112_01.md#%E8%83%8C%E6%99%AF 背景

OLAP系統中,adhoc query非常場景(任意次元查詢分析)。

adhoc query,通常來說,可以加GIN倒排,或者每一列都加一個索引來實作。

《PostgreSQL 設計優化case - 大寬表任意字段組合查詢索引如何選擇(btree, gin, rum) - (含單個索引列數超過32列的方法)》

加索引和不加索引,實時的upsert(有則更新,無則寫入),性能差别如何呢?每列都有索引時是否可以支撐實時的合并寫入呢?

https://github.com/digoal/blog/blob/master/201901/20190112_01.md#%E6%B5%8B%E8%AF%95 測試

測試資料upsert合并寫入

https://github.com/digoal/blog/blob/master/201901/20190112_01.md#100%E4%B8%AA%E5%AD%97%E6%AE%B5%E4%B8%8D%E5%90%AB%E7%B4%A2%E5%BC%95 100個字段不含索引

do language plpgsql $$     
declare    
  sql text := 'create table tbl_test1( id int primary key,';    
begin    
  for i in 1..100 loop    
    sql := sql||' c'||i||' int default random()*10000000,';    
  end loop;    
  sql := rtrim(sql,',');    
  sql := sql || ')';    
  execute sql;    
end;    
$$;    
           
vi test1.sql    
\set id random(1,100000000)    
\set c1 random(1,20000)    
insert into tbl_test1 (id, c1) values (:id,:c1) on conflict(id) do update set c1=excluded.c1;    
           
pgbench -M prepared -n -r -P 1 -f ./test1.sql -c 32 -j 32 -T 120    
    
transaction type: ./test1.sql    
scaling factor: 1    
query mode: prepared    
number of clients: 32    
number of threads: 32    
duration: 120 s    
number of transactions actually processed: 14433112    
latency average = 0.266 ms    
latency stddev = 3.919 ms    
tps = 120275.287837 (including connections establishing)    
tps = 120303.256409 (excluding connections establishing)    
statement latencies in milliseconds:    
         0.001  \set id random(1,100000000)    
         0.000  \set c1 random(1,20000)    
         0.265  insert into tbl_test1 (id, c1) values (:id,:c1) on conflict(id) do update set c1=excluded.c1;    
           

https://github.com/digoal/blog/blob/master/201901/20190112_01.md#100%E4%B8%AA%E5%AD%97%E6%AE%B5%E5%90%AB%E7%B4%A2%E5%BC%95 100個字段含索引

do language plpgsql $$     
declare    
  sql text := 'create table tbl_test2( id int primary key,';    
begin    
  for i in 1..100 loop    
    sql := sql||' c'||i||' int default random()*10000000,';    
  end loop;    
  sql := rtrim(sql,',');    
  sql := sql || ')';    
  execute sql;    
  for i in 1..100 loop    
    execute 'create index idx_tbl_test2_'||i||' on tbl_test2 (c'||i||')';    
  end loop;    
end;    
$$;    
           
vi test2.sql    
\set id random(1,100000000)    
\set c1 random(1,20000)    
insert into tbl_test2 (id, c1) values (:id,:c1) on conflict(id) do update set c1=excluded.c1;    
           
pgbench -M prepared -n -r -P 1 -f ./test2.sql -c 32 -j 32 -T 120    
    
    
           

https://github.com/digoal/blog/blob/master/201901/20190112_01.md#%E5%88%86%E5%8C%BA%E8%A1%A8%E5%A4%A7%E5%AE%BD%E8%A1%A8%E5%85%A8%E7%B4%A2%E5%BC%95%E6%B5%8B%E8%AF%95 分區表大寬表,全索引測試

do language plpgsql $$     
declare    
  sql text := 'create table tbl_test3(sid int, id int,';    
begin    
  for i in 1..100 loop    
    sql := sql||' c'||i||' int default random()*10000000,';    
  end loop;    
  sql := rtrim(sql,',');    
  sql := sql || ') partition by list (sid)';    
  execute sql;    
end;    
$$;    

建立索引和唯一限制

alter table tbl_test3 add constraint uk_tbl_test3 unique (sid,id);

do language plpgsql $$
declare
begin
  for i in 1..100 loop
    execute 'create index idx_tbl_test3_'||i||' on tbl_test3 (c'||i||')';
  end loop;
end;
$$;

建立分區

do language plpgsql $$
declare
begin
  for sid in 1..60 loop
    execute format('create table tbl_test3_%s partition of tbl_test3 for values in (%s)', sid, sid);  
  end loop;
end;
$$;

合并寫的文法

insert into tbl_test3 values (1,1) 
on conflict (sid,id) 
do update 
set c1=excluded.c1;

create or replace function upsert_tbl_test3 (v_sid int, v_id int) returns void as $$
declare
  
begin
  execute format('execute p_%s (%s, %s)', v_sid, v_sid, v_id);
  exception when others then 
    execute format('prepare p_%s (int,int) as insert into tbl_test3_%s(sid,id) values ($1,$2) on conflict (sid,id) do update set c1=excluded.c1', v_sid, v_sid);
    execute format('execute p_%s (%s, %s)', v_sid, v_sid, v_id);
end;
$$ language plpgsql strict;


vi test.sql
\set id random(1,2000000000)
\set sid random(1,60)
select upsert_tbl_test3(:sid,:id);

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 28 -j 28 -T 120
           

優化

《PostgreSQL 分區表如何支援多列唯一 - 枚舉、hash哈希 分區, 多列唯一, insert into on conflict, update, upsert, merge insert》

https://github.com/digoal/blog/blob/master/201901/20190112_01.md#%E6%80%A7%E8%83%BD%E5%AF%B9%E6%AF%94 性能對比

場景 合并寫入QPS
1個索引 120303
101個索引 19485

小結:

1、瓶頸分析,多個索引時,注意瓶頸在索引的RT上面,為提升性能,可以使用分區表,将RT打平,提升整體寫入吞吐。

2、頻繁更新可能導緻索引膨脹,如果發現索引膨脹嚴重,在空閑期間需要對索引進行rebuild操作(是一樣concurrently并發建立,不影響DML)。

《PostgreSQL CREATE INDEX CONCURRENTLY 的原理以及哪些操作可能堵塞索引的建立》 《PostgreSQL 垃圾回收原理以及如何預防膨脹 - How to prevent object bloat in PostgreSQL》

https://github.com/digoal/blog/blob/master/201901/20190112_01.md#%E5%8F%82%E8%80%83 參考

《PostgreSQL 快速給指定表每個字段建立索引 - 2 (近乎完美)》 《PostgreSQL 快速給指定表每個字段建立索引 - 1 (不夠完美)》

https://github.com/digoal/blog/blob/master/201901/20190112_01.md#%E5%85%8D%E8%B4%B9%E9%A2%86%E5%8F%96%E9%98%BF%E9%87%8C%E4%BA%91rds-postgresql%E5%AE%9E%E4%BE%8Becs%E8%99%9A%E6%8B%9F%E6%9C%BA 免費領取阿裡雲RDS PostgreSQL執行個體、ECS虛拟機

PostgreSQL 大寬表,全列索引,高并發合并寫入(insert into on conflict, upsert, merge insert) - 實時adhoc query