标簽
PostgreSQL , 鎖等待 , ddl , 大鎖 , 雪崩 , lock_timeout , deadlock_timeout , AB表切換 , 分區表
https://github.com/digoal/blog/blob/master/201806/20180622_02.md#%E8%83%8C%E6%99%AF 背景
當SQL請求鎖等待超過deadlock_timeout指定的時間時,報類似如下日志:
LOG: process xxx1 acquired RowExclusiveLock on relation xxx2 of database xxx3 after xxx4 ms at xxx
STATEMENT: INSERT ...........
解釋:
xxx1程序請求位于資料庫xxx3中的xxx2對象的RowExclusiveLock鎖,已等待xxx4秒。
同學們可能會納悶,怎麼insert也會等待?
其實不管什麼操作,都有可能出現等待,隻要請求的鎖與已有或已經在隊列中的LOCK級别有沖突就會出現等待。

src/include/storage/lockdefs.h
/* NoLock is not a lock mode, but a flag value meaning "don't get a lock" */
#define NoLock 0
#define AccessShareLock 1 /* SELECT */
#define RowShareLock 2 /* SELECT FOR UPDATE/FOR SHARE */
#define RowExclusiveLock 3 /* INSERT, UPDATE, DELETE */
#define ShareUpdateExclusiveLock 4 /* VACUUM (non-FULL),ANALYZE, CREATE INDEX
* CONCURRENTLY */
#define ShareLock 5 /* CREATE INDEX (WITHOUT CONCURRENTLY) */
#define ShareRowExclusiveLock 6 /* like EXCLUSIVE MODE, but allows ROW
* SHARE */
#define ExclusiveLock 7 /* blocks ROW SHARE/SELECT...FOR UPDATE */
#define AccessExclusiveLock 8 /* ALTER TABLE, DROP TABLE, VACUUM FULL,
* and unqualified LOCK TABLE */
https://github.com/digoal/blog/blob/master/201806/20180622_02.md#%E5%A6%82%E4%BD%95%E5%88%86%E6%9E%90---%E5%AE%9E%E4%BE%8B%E8%AE%B2%E8%A7%A3 如何分析? - 執行個體講解
請參考萬能文章:
《PostgreSQL 鎖等待監控 珍藏級SQL - 誰堵塞了誰》1、開啟審計日志
log_destination = 'csvlog'
logging_collector = on
log_truncate_on_rotation = on
log_statement = 'all'
2、psql 挂一個列印鎖等待的視窗
psql
with
t_wait as
(
select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,a.granted,
a.objid,a.objsubid,a.pid,a.virtualtransaction,a.virtualxid,a.transactionid,a.fastpath,
b.state,b.query,b.xact_start,b.query_start,b.usename,b.datname,b.client_addr,b.client_port,b.application_name
from pg_locks a,pg_stat_activity b where a.pid=b.pid and not a.granted
),
t_run as
(
select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,a.granted,
a.objid,a.objsubid,a.pid,a.virtualtransaction,a.virtualxid,a.transactionid,a.fastpath,
b.state,b.query,b.xact_start,b.query_start,b.usename,b.datname,b.client_addr,b.client_port,b.application_name
from pg_locks a,pg_stat_activity b where a.pid=b.pid and a.granted
),
t_overlap as
(
select r.* from t_wait w join t_run r on
(
r.locktype is not distinct from w.locktype and
r.database is not distinct from w.database and
r.relation is not distinct from w.relation and
r.page is not distinct from w.page and
r.tuple is not distinct from w.tuple and
r.virtualxid is not distinct from w.virtualxid and
r.transactionid is not distinct from w.transactionid and
r.classid is not distinct from w.classid and
r.objid is not distinct from w.objid and
r.objsubid is not distinct from w.objsubid and
r.pid <> w.pid
)
),
t_unionall as
(
select r.* from t_overlap r
union all
select w.* from t_wait w
)
select locktype,datname,relation::regclass,page,tuple,virtualxid,transactionid::text,classid::regclass,objid,objsubid,
string_agg(
'Pid: '||case when pid is null then 'NULL' else pid::text end||chr(10)||
'Lock_Granted: '||case when granted is null then 'NULL' else granted::text end||' , Mode: '||case when mode is null then 'NULL' else mode::text end||' , FastPath: '||case when fastpath is null then 'NULL' else fastpath::text end||' , VirtualTransaction: '||case when virtualtransaction is null then 'NULL' else virtualtransaction::text end||' , Session_State: '||case when state is null then 'NULL' else state::text end||chr(10)||
'Username: '||case when usename is null then 'NULL' else usename::text end||' , Database: '||case when datname is null then 'NULL' else datname::text end||' , Client_Addr: '||case when client_addr is null then 'NULL' else client_addr::text end||' , Client_Port: '||case when client_port is null then 'NULL' else client_port::text end||' , Application_Name: '||case when application_name is null then 'NULL' else application_name::text end||chr(10)||
'Xact_Start: '||case when xact_start is null then 'NULL' else xact_start::text end||' , Query_Start: '||case when query_start is null then 'NULL' else query_start::text end||' , Xact_Elapse: '||case when (now()-xact_start) is null then 'NULL' else (now()-xact_start)::text end||' , Query_Elapse: '||case when (now()-query_start) is null then 'NULL' else (now()-query_start)::text end||chr(10)||
'SQL (Current SQL in Transaction): '||chr(10)||
case when query is null then 'NULL' else query::text end,
chr(10)||'--------'||chr(10)
order by
( case mode
when 'INVALID' then 0
when 'AccessShareLock' then 1
when 'RowShareLock' then 2
when 'RowExclusiveLock' then 3
when 'ShareUpdateExclusiveLock' then 4
when 'ShareLock' then 5
when 'ShareRowExclusiveLock' then 6
when 'ExclusiveLock' then 7
when 'AccessExclusiveLock' then 8
else 0
end ) desc,
(case when granted then 0 else 1 end)
) as lock_conflict
from t_unionall
group by
locktype,datname,relation,page,tuple,virtualxid,transactionid::text,classid,objid,objsubid ;
\watch 0.2
3、tail 挂一個日志觀測視窗
for ((i=1;i>0;i=1)); do grep RowExclusiveLock *.csv ; sleep 0.2; done
或
for ((i=1;i>0;i=1)); do grep acquired *.csv ; sleep 0.2; done
4、發現問題
在業務運作過程中,如果問題複現,一定能觀測到日志。
通過鎖等待的視窗,觀測到其中一個會話對表xxx持有了
accessExclusiveLock
,也就是排他鎖。通過前面的鎖沖突表,你會發現這個鎖和所有鎖都沖突(實際上
DDL,VACUUM FULL
等操作都會持有排它鎖,或者人為的發出
lock table xxx in access exclusive mode;
)
是以,當然會堵塞其他對該表的INSERT操作了。
4.1、根據鎖等待的視窗拿到持有鎖的PID,到審計日志裡面檢視這個PID在目前事務中,前面都發起了什麼SQL。
4.2、找到問題根源,原來這個事務發起了
ALTER TABLE XXX RENAME TO XXXXX;
的動作。
這個事件也引起了INSERT變慢(實際上是鎖等待,實踐都花在了等待上面)的問題。
很顯然,業務上應該經常會有一些觸發改表名的動作,比如為了防止一個表太大,經常做AB表切換的動作。
雖然切換表名隻是修改中繼資料,但是這個瞬間的鎖,在高并發的業務場景中,也會帶來堵塞危害。
建議使用者采用分區表,而不要在高并發業務中頻繁使用DDL這樣的大鎖高危操作。
https://github.com/digoal/blog/blob/master/201806/20180622_02.md#%E5%B0%8F%E7%BB%93 小結
本文講解了鎖等待問題的排查方法。原因實際上是使用者在業務中使用了AB表切換,雖然切換表名隻是修改中繼資料,但是這個瞬間的鎖,在高并發的業務場景中,也會帶來堵塞危害。
《PostgreSQL 9.x, 10, 11 hash分區表 用法舉例》 《PostgreSQL 11 preview - 分區表 增強 彙總》 《PostgreSQL 查詢涉及分區表過多導緻的性能問題 - 性能診斷與優化(大量BIND, spin lock, SLEEP程序)》 《PostgreSQL 商用版本EPAS(阿裡雲ppas(Oracle 相容版)) - 分區表性能優化 (堪比pg_pathman)》 《PostgreSQL 傳統 hash 分區方法和性能》 《PostgreSQL 10 内置分區 vs pg_pathman perf profiling》 《PostgreSQL 10.0 preview 功能增強 - 内置分區表》 《PostgreSQL 9.5+ 高效分區表實作 - pg_pathman》另外。大鎖操作,一定要注意防止雪崩。
最後,遇到問題要冷靜思考,不要輕易認為資料庫有問題。