天天看點

【重新發現PostgreSQL之美】- 10 内卷 & 大禹治水

背景

場景:

内卷現象, 供不應求(高峰期打車、電商秒殺), 熱點資料更新

社會現象: 資源有限而需求無限的情況(春運時期的火車票、學生報補習班、企業裡面的資源地盤争奪等)

挑戰:

當系統中出現熱點row時, 意味着大量的并發請求更新同一行資料, 因為資料庫最小粒度的鎖為行鎖,是以這些并發請求隻能串行執行,

一個會話在更新的時候其他所有會話都處于等待狀态, 可能導緻連接配接打爆, 其他會話連不進來引起雪崩.

如果被秒殺的商品庫存隻有10個, 那麼實際上隻有10個請求能達成交易, 其他等待中的會話都屬于無用功.浪費大量的連接配接和等待時間.

PG 解決方案:

大禹治水(疏導、消滅無用等待):

  • SKIP LOCKED,
  • advisory lock

例子

測試表, 1條熱點記錄, 庫存1000萬.

id int primary key ,  -- 商品ID      
cnt int  ,  -- 庫存      
ts timestamp  -- 修改時間      
);      
insert into a values (1, 10000000, now());      

扣減庫存并傳回

id |   cnt   |             ts      
----+---------+----------------------------      
1 | 9999993 | 2021-06-01 14:41:14.775177      
(1 row)      
UPDATE 1      
postgres=# update a set cnt=cnt-1, ts=clock_timestamp() where id=1 returning *;      
id |   cnt   |             ts      
----+---------+----------------------------      
1 | 9999992 | 2021-06-01 14:41:17.747961      
(1 row)      
UPDATE 1      

并發能力測試

1、傳統方法

update a set cnt=cnt-1, ts=clock_timestamp() where id=1 returning *;      
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 12 -j 12 -T 120      
pgbench (PostgreSQL) 14.0      
transaction type: ./test.sql      
scaling factor: 1      
query mode: prepared      
number of clients: 12      
number of threads: 12      
duration: 120 s      
number of transactions actually processed: 2301279      
latency average = 0.625 ms      
latency stddev = 0.562 ms      
initial connection time = 8.466 ms      
tps = 19177.578464 (without initial connection time)      
statement latencies in milliseconds:      
0.625  update a set cnt=cnt-1, ts=clock_timestamp() where id=1 returning *;      

2、skip locked 跳過被鎖的行

ctid =      
(select ctid from a where id=1 and cnt>=1 for update skip locked)      
returning *;      
QUERY PLAN      
-----------------------------------------------------------------------------------      
Update on a  (cost=2.36..3.48 rows=1 width=18)      
InitPlan 1 (returns $1)      
->  LockRows  (cost=0.12..2.36 rows=1 width=12)      
->  Index Scan using a_pkey on a a_1  (cost=0.12..2.35 rows=1 width=12)      
Index Cond: (id = 1)      
Filter: (cnt >= 1)      
->  Tid Scan on a  (cost=0.00..1.12 rows=1 width=18)      
TID Cond: (ctid = $1)      
(8 rows)      
pgbench (PostgreSQL) 14.0      
transaction type: ./test.sql      
scaling factor: 1      
query mode: prepared      
number of clients: 12      
number of threads: 12      
duration: 120 s      
number of transactions actually processed: 7165617      
latency average = 0.201 ms      
latency stddev = 0.150 ms      
initial connection time = 11.126 ms      
tps = 59717.700525 (without initial connection time)      
statement latencies in milliseconds:      
0.202  update a set cnt=cnt-1 , ts=clock_timestamp() where      

3、advisory lock, 徹底消除行鎖

QUERY PLAN      
-----------------------------------------------------------------------      
Update on a  (cost=0.12..2.36 rows=1 width=18)      
->  Index Scan using a_pkey on a  (cost=0.12..2.36 rows=1 width=18)      
Index Cond: (id = 1)      
Filter: pg_try_advisory_xact_lock((id)::bigint)      
(4 rows)      
postgres=# begin;      
BEGIN      
postgres=*# update a set cnt=cnt-1, ts=clock_timestamp() where id=1 and pg_try_advisory_xact_lock(id) returning *;      
id |   cnt   |             ts      
----+---------+----------------------------      
1 | 6839129 | 2021-06-01 14:47:54.232782      
(1 row)      
UPDATE 1      
其他會話, 探測同一個商品ID的advisory鎖, 未擷取則不會進行更新      
postgres=# update a set cnt=cnt-1, ts=clock_timestamp() where id=1 and pg_try_advisory_xact_lock(id) returning *;      
id | cnt | ts      
----+-----+----      
(0 rows)      
UPDATE 0      
transaction type: ./test.sql      
scaling factor: 1      
query mode: prepared      
number of clients: 12      
number of threads: 12      
duration: 120 s      
number of transactions actually processed: 10701637      
latency average = 0.134 ms      
latency stddev = 0.705 ms      
initial connection time = 10.577 ms      
tps = 89184.703653 (without initial connection time)      
statement latencies in milliseconds:      
0.136  update a set cnt=cnt-1, ts=clock_timestamp() where id=1 and pg_try_advisory_xact_lock(id) returning *;      

tps 性能提升

12個并發:

19177(傳統方法) -> 59717(skip locked) -> 89184(advisory lock)

800個并發:

374(傳統方法) -> 34495(skip locked) -> 70444(advisory lock)

知識點

1、skip locked

https://www.postgresql.org/docs/14/sql-select.html

2、advisory lock (database->session|xact level)

https://www.postgresql.org/docs/14/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS https://www.postgresql.org/docs/14/explicit-locking.html#ADVISORY-LOCKS

3、tid scan

https://www.postgresql.org/docs/14/runtime-config-query.html#RUNTIME-CONFIG-QUERY-ENABLE

4、ctid

https://www.postgresql.org/docs/14/ddl-system-columns.html

5、update delete returning

https://www.postgresql.org/docs/14/dml-returning.html
201801/20180105_03.md  《PostgreSQL秒殺4種方法- 增加批量流式加減庫存方法》
201711/20171107_31.md  《HTAP資料庫PostgreSQL 場景與性能測試之30 - (OLTP) 秒殺- 高并發單點更新》
201611/20161117_01.md  《聊一聊雙十一背後的技術- 不一樣的秒殺技術, 裸秒》
201509/20150914_01.md  《PostgreSQL秒殺場景優化》