天天看点

【重新发现PostgreSQL之美】- 10 内卷 & 大禹治水

背景

场景:

内卷现象, 供不应求(高峰期打车、电商秒杀), 热点数据更新

社会现象: 资源有限而需求无限的情况(春运时期的火车票、学生报补习班、企业里面的资源地盘争夺等)

挑战:

当系统中出现热点row时, 意味着大量的并发请求更新同一行数据, 因为数据库最小粒度的锁为行锁,所以这些并发请求只能串行执行,

一个会话在更新的时候其他所有会话都处于等待状态, 可能导致连接打爆, 其他会话连不进来引起雪崩.

如果被秒杀的商品库存只有10个, 那么实际上只有10个请求能达成交易, 其他等待中的会话都属于无用功.浪费大量的连接和等待时间.

PG 解决方案:

大禹治水(疏导、消灭无用等待):

  • SKIP LOCKED,
  • advisory lock

例子

测试表, 1条热点记录, 库存1000万.

id int primary key ,  -- 商品ID      
cnt int  ,  -- 库存      
ts timestamp  -- 修改时间      
);      
insert into a values (1, 10000000, now());      

扣减库存并返回

id |   cnt   |             ts      
----+---------+----------------------------      
1 | 9999993 | 2021-06-01 14:41:14.775177      
(1 row)      
UPDATE 1      
postgres=# update a set cnt=cnt-1, ts=clock_timestamp() where id=1 returning *;      
id |   cnt   |             ts      
----+---------+----------------------------      
1 | 9999992 | 2021-06-01 14:41:17.747961      
(1 row)      
UPDATE 1      

并发能力测试

1、传统方法

update a set cnt=cnt-1, ts=clock_timestamp() where id=1 returning *;      
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 12 -j 12 -T 120      
pgbench (PostgreSQL) 14.0      
transaction type: ./test.sql      
scaling factor: 1      
query mode: prepared      
number of clients: 12      
number of threads: 12      
duration: 120 s      
number of transactions actually processed: 2301279      
latency average = 0.625 ms      
latency stddev = 0.562 ms      
initial connection time = 8.466 ms      
tps = 19177.578464 (without initial connection time)      
statement latencies in milliseconds:      
0.625  update a set cnt=cnt-1, ts=clock_timestamp() where id=1 returning *;      

2、skip locked 跳过被锁的行

ctid =      
(select ctid from a where id=1 and cnt>=1 for update skip locked)      
returning *;      
QUERY PLAN      
-----------------------------------------------------------------------------------      
Update on a  (cost=2.36..3.48 rows=1 width=18)      
InitPlan 1 (returns $1)      
->  LockRows  (cost=0.12..2.36 rows=1 width=12)      
->  Index Scan using a_pkey on a a_1  (cost=0.12..2.35 rows=1 width=12)      
Index Cond: (id = 1)      
Filter: (cnt >= 1)      
->  Tid Scan on a  (cost=0.00..1.12 rows=1 width=18)      
TID Cond: (ctid = $1)      
(8 rows)      
pgbench (PostgreSQL) 14.0      
transaction type: ./test.sql      
scaling factor: 1      
query mode: prepared      
number of clients: 12      
number of threads: 12      
duration: 120 s      
number of transactions actually processed: 7165617      
latency average = 0.201 ms      
latency stddev = 0.150 ms      
initial connection time = 11.126 ms      
tps = 59717.700525 (without initial connection time)      
statement latencies in milliseconds:      
0.202  update a set cnt=cnt-1 , ts=clock_timestamp() where      

3、advisory lock, 彻底消除行锁

QUERY PLAN      
-----------------------------------------------------------------------      
Update on a  (cost=0.12..2.36 rows=1 width=18)      
->  Index Scan using a_pkey on a  (cost=0.12..2.36 rows=1 width=18)      
Index Cond: (id = 1)      
Filter: pg_try_advisory_xact_lock((id)::bigint)      
(4 rows)      
postgres=# begin;      
BEGIN      
postgres=*# update a set cnt=cnt-1, ts=clock_timestamp() where id=1 and pg_try_advisory_xact_lock(id) returning *;      
id |   cnt   |             ts      
----+---------+----------------------------      
1 | 6839129 | 2021-06-01 14:47:54.232782      
(1 row)      
UPDATE 1      
其他会话, 探测同一个商品ID的advisory锁, 未获取则不会进行更新      
postgres=# update a set cnt=cnt-1, ts=clock_timestamp() where id=1 and pg_try_advisory_xact_lock(id) returning *;      
id | cnt | ts      
----+-----+----      
(0 rows)      
UPDATE 0      
transaction type: ./test.sql      
scaling factor: 1      
query mode: prepared      
number of clients: 12      
number of threads: 12      
duration: 120 s      
number of transactions actually processed: 10701637      
latency average = 0.134 ms      
latency stddev = 0.705 ms      
initial connection time = 10.577 ms      
tps = 89184.703653 (without initial connection time)      
statement latencies in milliseconds:      
0.136  update a set cnt=cnt-1, ts=clock_timestamp() where id=1 and pg_try_advisory_xact_lock(id) returning *;      

tps 性能提升

12个并发:

19177(传统方法) -> 59717(skip locked) -> 89184(advisory lock)

800个并发:

374(传统方法) -> 34495(skip locked) -> 70444(advisory lock)

知识点

1、skip locked

https://www.postgresql.org/docs/14/sql-select.html

2、advisory lock (database->session|xact level)

https://www.postgresql.org/docs/14/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS https://www.postgresql.org/docs/14/explicit-locking.html#ADVISORY-LOCKS

3、tid scan

https://www.postgresql.org/docs/14/runtime-config-query.html#RUNTIME-CONFIG-QUERY-ENABLE

4、ctid

https://www.postgresql.org/docs/14/ddl-system-columns.html

5、update delete returning

https://www.postgresql.org/docs/14/dml-returning.html
201801/20180105_03.md  《PostgreSQL秒杀4种方法- 增加批量流式加减库存方法》
201711/20171107_31.md  《HTAP数据库PostgreSQL 场景与性能测试之30 - (OLTP) 秒杀- 高并发单点更新》
201611/20161117_01.md  《聊一聊双十一背后的技术- 不一样的秒杀技术, 裸秒》
201509/20150914_01.md  《PostgreSQL秒杀场景优化》