PostgreSQL 批量SQL before/after for each row trigger的觸發時機、性能差異分析、建議 - 含9.4 , 10版本

标簽

PostgreSQL , trigger , row , statement , before , after , s_lock

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#%E8%83%8C%E6%99%AF 背景

資料庫觸發器的觸發時機，性能，高并發批量導入時，觸發器的性能如何？

批量導入時，before, after觸發器在for each row模式下，觸發機制如何，什麼時候開始條到觸發器指定的function中進行運算？

1、before for each row，在資料落目标表前，被觸發，同時傳回的内容（TUPLE）被REPLACE到對應的資料檔案存儲。觸發器必須明确傳回

NEW

。

以insert為例    
    
insert request to HEAP table -> 每一row立即generate NEW -> before trigger(s) -> return NEW -> write tuple to HEAP table

2、after for each row，在資料落到目标表之後，再被觸發(如果是批量寫入，那麼會等批量寫入結束後，才開始觸發after trigger procedure)。after tirgger procedure傳回什麼值都無所謂，因為用不上。after for each row建議觸發器傳回null。

以insert為例    
    
insert request to HEAP table -> write tuple to HEAP table -> 所有row一次性generate NEW -> after trigger(s) -> return NULL

到底哪個性能好？

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#%E6%B5%8B%E8%AF%95 測試

測試場景參考

《PostgreSQL 流式處理應用實踐 - 二手商品實時歸類(異步消息notify/listen、閱後即焚)》《PostgreSQL 批量SQL before/after for each row trigger的觸發時機、性能差異》

1、建表

create table a (          
  id int8 primary key,   -- 商品ID          
  att jsonb   -- 商品屬性          
);

2、建結果表

create table t_result(id serial8 primary key, class text, content text);

3、建merge json函數

create or replace function merge_json(jsonb, jsonb) returns jsonb as $$        
  -- select json_object_agg(key,value)::jsonb from (     -- 9.4   
  select jsonb_object_agg(key,value) from (        
  select         
    coalesce(a.key, b.key) as key,         
    case         
    when         
    coalesce(jsonb_array_element(a.value,1)::text::timestamp, '1970-01-01'::timestamp)         
    >         
    coalesce(jsonb_array_element(b.value,1)::text::timestamp, '1970-01-01'::timestamp)         
    then a.value        
    else b.value        
    end        
  from jsonb_each($1) a full outer join jsonb_each($2) b using (key)        
  ) t;          
$$ language sql strict ;

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#%E6%89%B9%E9%87%8F%E5%B9%B6%E5%8F%91%E6%95%B0%E6%8D%AE%E5%86%99%E5%85%A5%E6%80%A7%E8%83%BD%E5%AF%B9%E6%AF%94before-after-no-trigger 批量，并發資料寫入性能對比(before, after, no trigger)

1、建立dblink插件

create extension dblink;

2、建立斷開連接配接的函數，目的是不抛異常。

create or replace function dis_conn(name) returns void as $$    
declare    
begin    
  perform dblink_disconnect($1);    
  return;    
exception when others then    
  return;    
end;    
$$ language plpgsql strict;

3、建立連接配接函數接口

CREATE OR REPLACE FUNCTION public.conn(name, text)    
 RETURNS void    
 LANGUAGE plpgsql    
 STRICT    
AS $function$                
declare                
begin                
  perform dis_conn($1);      
  perform dblink_connect($1, $2);               
  return;                
exception when others then                
  return;                
end;                
$function$;

4、建立并行，批量加載函數。 56個并行，每一批寫入200萬條資料。總共寫入1.12億行。

CREATE OR REPLACE FUNCTION public.get_res()    
 RETURNS SETOF record    
 LANGUAGE plpgsql    
 STRICT    
AS $function$          
declare          
  start_time timestamptz := clock_timestamp();    
  loops int := 55;    
  batchs int := 2000000;    
  -- 總資料量1.12億    
begin          
  for i in 0..loops loop                 
    perform conn('link'||i,  'hostaddr=127.0.0.1 port='||current_setting('port')||' user=postgres dbname=postgres application_name=digoal_loader');               
    perform '1' from dblink_get_result('link'||i) as t(res text);                
    perform dblink_send_query('link'||i, format($_$    
    insert into a select         
    id, '{"price":[10000, "2018-01-01 10:10:11"]}'    
    from generate_series(%s,%s) t(id)    
    on conflict (id)        -- 9.4 注釋掉 這行  
    do update set           -- 9.4 注釋掉 這行  
    att = merge_json(a.att, excluded.att)      -- 9.4 注釋掉 這行  
    $_$, i*batchs, (i+1)*batchs-1));                
  end loop;             
  for i in 0..loops loop          
    return query select extract(epoch from clock_timestamp()-start_time)::text from dblink_get_result('link'||i) as t(res text);    
  end loop;          
end;          
$function$;

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#after-trigger-for-each-row after trigger for each row

當一條SQL寫入a完成後，觸發after觸發器，開始處理每行。

1、建觸發器函數，用于處理每一行原始資料，包括50個處理邏輯.

CREATE OR REPLACE FUNCTION notify1() returns trigger          
AS $function$          
declare          
begin          
  if jsonb_array_element(NEW.att->'price', 0)::text::float8 > 100 then   -- 規則1， 價格大于100，寫入結果表          
     insert into t_result(class,content) values (        
       'a',    -- 歸類        
       format('CLASS:high price, ID:%s, ATT:%s', NEW.id, NEW.att)   -- 消息内容          
     );          
  end if;         
  -- 模拟多輪判斷    
  for i in 1..49 loop    
    if jsonb_array_element(NEW.att->'price', 0)::text::float8 > 100 then   -- 規則xx        
      null;         
    end if;        
  end loop;    
  return null;    -- aster 觸發器    
  -- return NEW;  -- BEFORE 觸發器    
end;          
$function$ language plpgsql strict;

2、建立after insert or update觸發器

create trigger tg1 after insert or update on a for each row execute procedure notify1();

3、寫入單條，測試

insert into a values           
  (1, '{"price":[10000, "2018-01-01 10:10:11"]}')           
  on conflict (id)           
  do update set           
  att = merge_json(a.att, excluded.att)     -- 合并新屬性，保留老屬性，需要使用一個UDF來合并          
;

4、調用并行接口，批量并發寫入

select * from get_res() as t(id text);

5、你會發現，資料是在寫入完成後，才開始逐行處理觸發器内部邏輯。

目标表在寫入，但是trigger并沒有處理，是以結果表還沒有看到任何記錄

以insert為例    
    
insert request to HEAP table -> write tuple to HEAP table -> 所有row一次性generate NEW -> after trigger(s) -> return NULL

postgres=# \dt+ a|t_result    
                    List of relations    
 Schema | Name | Type  |  Owner   |  Size   | Description     
--------+------+-------+----------+---------+-------------    
 public | a    | table | postgres | 3560 MB |     
 public | t_result | table | postgres | 8192 bytes |     
    
postgres=# \dt+ a    
                    List of relations    
 Schema | Name | Type  |  Owner   |  Size   | Description     
--------+------+-------+----------+---------+-------------    
 public | a    | table | postgres | 3603 MB |     
 public | t_result | table | postgres | 8192 bytes |

6、資料量：1.12億條

總耗時：

(主要慢在trigger内部的邏輯處理)

1367 秒。

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#before-trigger-for-each-row before trigger for each row

before觸發器，在資料落盤前，觸發before trigger function

CREATE OR REPLACE FUNCTION notify1() returns trigger          
AS $function$          
declare          
begin          
  if jsonb_array_element(NEW.att->'price', 0)::text::float8 > 100 then   -- 規則1， 價格大于100，寫入結果表          
     insert into t_result(class,content) values (        
       'a',    -- 歸類        
       format('CLASS:high price, ID:%s, ATT:%s', NEW.id, NEW.att)   -- 消息内容          
     );          
  end if;         
  -- 模拟多輪判斷    
  for i in 1..49 loop    
    if jsonb_array_element(NEW.att->'price', 0)::text::float8 > 100 then   -- 規則xx        
      null;         
    end if;        
  end loop;    
  -- return null;    -- aster 觸發器    
  return NEW;  -- BEFORE 觸發器    
end;          
$function$ language plpgsql strict;

2、建立before insert or update觸發器

drop trigger tg1 on a;    
    
create trigger tg1 before insert or update on a for each row execute procedure notify1();

3、調用并行接口，批量并發寫入

truncate a;  
truncate t_result;  
select * from get_res() as t(id text);

4、寫入過程中檢視

你會發現，目标表和結果表同時在增長，因為

以insert為例    
    
insert request to HEAP table -> 每一row立即generate NEW -> before trigger(s) -> return NEW -> write tuple to HEAP table

postgres=# \dt+ a|t_res*    
                      List of relations    
 Schema |   Name   | Type  |  Owner   |  Size  | Description     
--------+----------+-------+----------+--------+-------------    
 public | a        | table | postgres | 335 MB |     
 public | t_result | table | postgres | 387 MB |     
(2 rows)

1207 秒。

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#%E6%97%A0trigger%E5%AF%BC%E5%85%A5%E9%80%9F%E5%BA%A6 無trigger導入速度：

1、删除觸發器

postgres=# drop trigger tg1 on a;    
DROP TRIGGER

2、調用并行接口，批量并發寫入

truncate a;  
truncate t_result;  
select * from get_res() as t(id text);

3、資料量：1.12億條

706 秒。

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#%E6%80%A7%E8%83%BD%E5%AF%B9%E6%AF%94 性能對比

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#postgresql-10-on-centos-7x PostgreSQL 10 on CentOS 7.x

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#postgresql-10-logged-table-%E6%B5%8B%E8%AF%95%E7%BB%93%E6%9E%9C PostgreSQL 10 logged table 測試結果

case	并發數	寫入量	耗時
無觸發器	56	1.12億	103 秒
before for each row觸發器			1165 秒
after for each row觸發器			1247 秒

性能瓶頸，在寫wal日志上面，如果使用unlogged table，就可以發揮出CPU所有能力了。

postgres=# select wait_event_type,wait_event,count(*) from pg_stat_activity group by 1,2 order by count(*) desc;  
 wait_event_type |     wait_event      | count   
-----------------+---------------------+-------  
 LWLock          | wal_insert          |    40  
                 |                     |    19  
 Activity        | BgWriterMain        |     1  
 Activity        | AutoVacuumMain      |     1  
 IO              | DataFileWrite       |     1  
 Activity        | LogicalApplyMain    |     1  
 Activity        | LogicalLauncherMain |     1  
(7 rows)

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#postgresql-10-unlogged-table-%E6%B5%8B%E8%AF%95%E7%BB%93%E6%9E%9C PostgreSQL 10 unlogged table 測試結果

truncate a;  
truncate t_result;  
alter table a set unlogged;  
alter table t_result set unlogged;


			61 秒
			1113 秒
			1158 秒

現在“無觸發器”模式的瓶頸變成了EXTEND BLOCK，也就是擴充資料檔案。觸發器的情況下，CPU計算為瓶頸，沒有其他瓶頸，是以unlogged與logged table性能差異不大）。

postgres=# select wait_event_type,wait_event,count(*) from pg_stat_activity group by 1,2 order by count(*) desc;  
 wait_event_type |     wait_event      | count   
-----------------+---------------------+-------  
 Lock            | extend              |    41

《HTAP資料庫 PostgreSQL 場景與性能測試之 43 - (OLTP+OLAP) unlogged table 含索引多表批量寫入》《HTAP資料庫 PostgreSQL 場景與性能測試之 42 - (OLTP+OLAP) unlogged table 不含索引多表批量寫入》《HTAP資料庫 PostgreSQL 場景與性能測試之 41 - (OLTP+OLAP) 含索引多表批量寫入》《HTAP資料庫 PostgreSQL 場景與性能測試之 40 - (OLTP+OLAP) 不含索引多表批量寫入》《HTAP資料庫 PostgreSQL 場景與性能測試之 39 - (OLTP+OLAP) 含索引多表單點寫入》《HTAP資料庫 PostgreSQL 場景與性能測試之 38 - (OLTP+OLAP) 不含索引多表單點寫入》《HTAP資料庫 PostgreSQL 場景與性能測試之 37 - (OLTP+OLAP) 含索引單表批量寫入》《HTAP資料庫 PostgreSQL 場景與性能測試之 36 - (OLTP+OLAP) 不含索引單表批量寫入》《HTAP資料庫 PostgreSQL 場景與性能測試之 35 - (OLTP+OLAP) 含索引單表單點寫入》《HTAP資料庫 PostgreSQL 場景與性能測試之 34 - (OLTP+OLAP) 不含索引單表單點寫入》

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#postgresql-94-on-centos-7x PostgreSQL 9.4 on CentOS 7.x

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#postgresql-94-logged-table-%E6%B5%8B%E8%AF%95%E7%BB%93%E6%9E%9C PostgreSQL 9.4 logged table 測試結果

PostgreSQL 9.4，當批量導入的TABLE加了trigger，并且trigger function裡面有query處理時，很卡很卡，資料庫幾乎不可用。

卡在哪裡？

Samples: 655K of event 'cpu-clock', Event count (approx.): 143038981880  
Overhead  Shared Object          Symbol                                  
  76.93%  postgres               [.] s_lock                              
   3.60%  postgres               [.] LWLockAcquire                       
   3.34%  postgres               [.] LWLockRelease                       
   1.55%  [kernel]               [k] run_timer_softirq                   
   0.84%  postgres               [.] GetSnapshotData                     
   0.73%  postgres               [.] AllocSetAlloc                       
   0.64%  postgres               [.] PushActiveSnapshot                  
   0.59%  [kernel]               [k] __do_softirq                        
   0.54%  [kernel]               [k] _raw_spin_unlock_irqrestore         
   0.40%  [kernel]               [k] finish_task_switch                  
   0.35%  libc-2.17.so           [.] __GI_____strtod_l_internal          
   0.32%  [kernel]               [k] rcu_process_callbacks               
   0.26%  postgres               [.] ExecMakeFunctionResultNoSets        
   0.25%  libc-2.17.so           [.] __memcpy_ssse3_back                 
   0.24%  postgres               [.] palloc                              
   0.21%  plpgsql.so             [.] exec_eval_expr                      
   0.21%  [kernel]               [k] tick_nohz_idle_exit

lwlockacquire到release的過程可能過長。

PostgreSQL 10在各方面都有優化，比如典型的GIN索引場景，9.4在高并發更新下面也是存在性能問題。

《PostgreSQL 10 GIN索引鎖優化》

建議本文提到的場景，不要使用9.4的版本。(并發控制到8以下，s_lock問題才不是那麼明顯)，以下是并發8的測試結果

下面測試隻寫入1400萬，耗時乘以7，可以對比postgresql 10


8	1400萬	21 秒 , 147 秒
		210 秒 , 1470 秒
		206 秒 , 1442 秒

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#%E5%85%B6%E4%BB%96 其他

可以随時殺掉導入程序

select pg_terminate_backend(pid) from pg_stat_activity where application_name='digoal_loader';

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#%E5%8F%82%E8%80%83 參考

《PostgreSQL 批量、單步寫入 - row, statement 觸發器(中間表)、CTE 幾種用法性能對比》《PostgreSQL 自動建立分區實踐 - 寫入觸發器》《PostgreSQL Oracle 相容性之 - ALTER TRIGGER ENABLE|DISABLE》《PostgreSQL rotate table 自動清理排程 - 限制，觸發器》《PostgreSQL 事件觸發器應用 - DDL審計記錄 + 異步通知(notify)》《資料入庫實時轉換 - trigger , rule》《(流式、lambda、觸發器)實時處理大比拼 - 物聯網(IoT)\金融,時序處理最佳實踐》《快速入門PostgreSQL應用開發與管理 - 7 函數、存儲過程和觸發器》《PostgreSQL 10.0 preview 功能增強 - 觸發器函數内置中間表》《PostgreSQL 安全陷阱 - 利用觸發器或規則，結合security invoker函數制造反噬陷阱》《use PostgreSQL trigger manage stock & offer infomation》《PostgreSQL trigger/rule based replication configure, DISABLE/ENABLE [ REPLICA | ALWAYS ] TRIGGER | RULE》《PostgreSQL Oracle 相容性之 - 事件觸發器實作類似Oracle的資源回收筒功能》《PostgreSQL 觸發器應用 - use trigger audit record which column modified, insert, delete.》《use event trigger function record user who alter table's SQL》《PostgreSQL 事件觸發器 - DDL審計 , DDL邏輯複制 , 打造DDL統一管理入》《PostgreSQL 觸發器應用 - (觸發器WHEN)前置條件過濾跟蹤目标記錄》《PostgreSQL 閃回 - flash back query emulate by trigger》《PostgreSQL 事件觸發器 - PostgreSQL 9.3 Event Trigger》《表級複制(base on trigger) -- PostgreSQL general sync and async multi-master replication trigger function》《PostgreSQL 觸發器用法詳解 2》《PostgreSQL 觸發器用法詳解 1》《遞歸優化CASE - performance tuning case :use cursor\trigger\recursive replace (group by and order by) REDUCE needed blockes scan》《PostgreSQL general public partition table trigger》《表級複制(base on trigger) -- multi master replication & performance tuning》《表級複制(base on trigger) -- one(rw) to many(ro|rw)》《PostgreSQL 跟蹤DDL時間 - cann't use pg_class's trigger trace user_table's create,modify,delete Time》

PostgreSQL 批量SQL before/after for each row trigger的觸發時機、性能差異分析、建議 - 含9.4 , 10版本

标簽

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#%E8%83%8C%E6%99%AF 背景

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#%E6%B5%8B%E8%AF%95 測試

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#%E6%89%B9%E9%87%8F%E5%B9%B6%E5%8F%91%E6%95%B0%E6%8D%AE%E5%86%99%E5%85%A5%E6%80%A7%E8%83%BD%E5%AF%B9%E6%AF%94before-after-no-trigger 批量，并發資料寫入性能對比(before, after, no trigger)

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#after-trigger-for-each-row after trigger for each row

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#before-trigger-for-each-row before trigger for each row

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#%E6%97%A0trigger%E5%AF%BC%E5%85%A5%E9%80%9F%E5%BA%A6 無trigger導入速度：

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#%E6%80%A7%E8%83%BD%E5%AF%B9%E6%AF%94 性能對比

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#postgresql-10-on-centos-7x PostgreSQL 10 on CentOS 7.x

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#postgresql-10-logged-table-%E6%B5%8B%E8%AF%95%E7%BB%93%E6%9E%9C PostgreSQL 10 logged table 測試結果

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#postgresql-10-unlogged-table-%E6%B5%8B%E8%AF%95%E7%BB%93%E6%9E%9C PostgreSQL 10 unlogged table 測試結果

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#postgresql-94-on-centos-7x PostgreSQL 9.4 on CentOS 7.x

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#postgresql-94-logged-table-%E6%B5%8B%E8%AF%95%E7%BB%93%E6%9E%9C PostgreSQL 9.4 logged table 測試結果

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#%E5%85%B6%E4%BB%96 其他

https://github.com/digoal/blog/blob/master/201807/20180721_01.md#%E5%8F%82%E8%80%83 參考

繼續閱讀

set define off關閉替代變量功能

報錯：'mysql' 不是内部或外部指令，也不是可運作的程式或批處理檔案。

Linxu常用指令技巧彙總

ERROR 1 (HY000): Can't create/write to file '/tmp/#sql_4188_1.MYI' (Errcode: 28)

艱難安裝LDAP,SSL認證

《Linux指令行與Shell腳本程式設計大全第2版.布盧姆》pdf

MySQL的4種隔離級别？出現問題

XX系統實施過程問題總結

無元件上傳圖檔到資料庫中，最完整解決方案

【MySQL資料庫】資料庫索引事務1.索引2.事務

neo4j之cypher使用文檔

NOSQL安全攻擊

mybatis_入門程式Mybatis入門

登入plsql 報錯 the account is locked --使用者被鎖

sqlServer根據經緯查距離

SequoiaDB巨杉資料庫C++驅動概述