有些業務中需要求解最短路徑,PostgreSQL中有個pgrouting插件内置了和計算最短路徑相關的算法。
下面看下示例
表定義
postgres=# \d testpath
Table "public.testpath"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | |
source | integer | | |
target | integer | | |
cost | integer | | |
這是張業務表,每一行代表一條邊及其代價,總共1000多條記錄(實際對應的是按業務條件篩選後的結果集大小)。其餘業務相關的屬性全部隐去。
求解2點間的最短路徑
postgres=# SELECT * FROM pgr_dijkstra(
'SELECT id,source,target,cost FROM testpath',
10524, 10379, directed:=true);
seq | path_seq | node | edge | cost | agg_cost
-----+----------+-------+---------+------+----------
1 | 1 | 10524 | 1971852 | 1 | 0
2 | 2 | 7952 | 32256 | 1 | 1
3 | 3 | 7622 | 76615 | 2 | 2
4 | 4 | 44964 | 76616 | 1 | 4
5 | 5 | 7861 | 19582 | 1 | 5
6 | 6 | 7629 | 14948 | 2 | 6
7 | 7 | 17135 | 14949 | 1 | 8
8 | 8 | 10379 | -1 | 0 | 9
(8 rows)
Time: 22.979 ms
求解2點間最短的N條路徑
postgres=# SELECT * FROM pgr_ksp(
'SELECT id,source,target,cost FROM testpath',
10524, 10379, 1000,directed:=true);
seq | path_id | path_seq | node | edge | cost | agg_cost
-----+---------+----------+-------+---------+------+----------
1 | 1 | 1 | 10524 | 1971852 | 1 | 0
2 | 1 | 2 | 7952 | 32256 | 1 | 1
3 | 1 | 3 | 7622 | 54740 | 2 | 2
4 | 1 | 4 | 35389 | 54741 | 1 | 4
5 | 1 | 5 | 7861 | 19582 | 1 | 5
6 | 1 | 6 | 7629 | 14948 | 2 | 6
7 | 1 | 7 | 17135 | 14949 | 1 | 8
8 | 1 | 8 | 10379 | -1 | 0 | 9
...(略)
100 | 12 | 4 | 53179 | 95137 | 1 | 4
101 | 12 | 5 | 7625 | 90682 | 2 | 5
102 | 12 | 6 | 51211 | 90683 | 1 | 7
103 | 12 | 7 | 7861 | 19582 | 1 | 8
104 | 12 | 8 | 7629 | 1173911 | 2 | 9
105 | 12 | 9 | 59579 | 1173917 | 1 | 11
106 | 12 | 10 | 10379 | -1 | 0 | 12
(106 rows)
Time: 201.223 ms
純SQL求解最短路徑
前面的最短路徑是通過pgrouting插件計算的,能不能單純利用PG自身的SQL完成最短路徑的計算呢?
真實的業務場景下是可以限制路徑的長度的,比如,如果我們舍棄所有邊數大于7的路徑。
那麼完全可以用簡單的深度周遊計算最短路徑。計算速度還提高了5倍。
postgres=# WITH RECURSIVE line AS(
SELECT source,target,cost from testpath
),
path(fullpath,pathseq,node,total_cost) AS (
select ARRAY[10524],1,10524,0
UNION ALL
select array_append(fullpath,target),pathseq+1,target,total_cost+cost from path join line on(source=node) where node!=10379 and pathseq<=8
)
SELECT * FROM path where fullpath @> ARRAY[10379] order by total_cost limit 1;
fullpath | pathseq | node | total_cost
-----------------------------------------------+---------+-------+------------
{10524,7952,7622,80465,7861,7629,17135,10379} | 8 | 10379 | 9
(1 row)
Time: 4.334 ms
如果每條邊的cost相同,可以去掉上面的
order by total_cost
,在大資料集上性能會有很大的提升。
純SQL求解最短的N條路徑
沿用前面的SQL,隻是修改了一下
limit
值,相比pgrouting的
pgr_ksp
函數性能提升的更多。性能提升了50倍。
postgres=# WITH RECURSIVE line AS(
SELECT source,target,cost from testpath
),
path(fullpath,pathseq,node,total_cost) AS (
select ARRAY[10524],1,10524,0
UNION ALL
select array_append(fullpath,target),pathseq+1,target,total_cost+cost from path join line on(source=node) where node!=10379 and pathseq<=8
)
SELECT * FROM path where fullpath @> ARRAY[10379] order by total_cost limit 1000;
fullpath | pathseq | node | total_cost
----------------------------------------------------+---------+-------+------------
{10524,7952,7622,80465,7861,7629,17135,10379} | 8 | 10379 | 9
{10524,7952,7622,35389,7861,7629,17135,10379} | 8 | 10379 | 9
{10524,7952,7622,44964,7861,7629,17135,10379} | 8 | 10379 | 9
{10524,7952,7622,80465,7861,7629,59579,10379} | 8 | 10379 | 9
{10524,7952,7622,35389,7861,7629,59579,10379} | 8 | 10379 | 9
{10524,7952,7622,44964,7861,7629,59579,10379} | 8 | 10379 | 9
{10524,7952,7622,53179,7625,7861,7629,17135,10379} | 9 | 10379 | 10
{10524,7952,7622,53179,7625,7861,7629,59579,10379} | 9 | 10379 | 10
(8 rows)
Time: 4.425 ms
下面看下執行計劃
postgres=# explain analyze
WITH RECURSIVE line AS(
SELECT source,target,cost from testpath
),
path(fullpath,pathseq,node,total_cost) AS (
select ARRAY[10524],1,10524,0
UNION ALL
select array_append(fullpath,target),pathseq+1,target,total_cost+cost from path join line on(source=node) where node!=10379 and pathseq<=8
)
SELECT * FROM path where fullpath @> ARRAY[10379] order by total_cost limit 1000;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=277.18..277.18 rows=1 width=44) (actual time=9.992..10.001 rows=8 loops=1)
CTE line
-> Seq Scan on testpath (cost=0.00..16.45 rows=1045 width=12) (actual time=0.017..0.624 rows=1045 loops=1)
CTE path
-> Recursive Union (cost=0.00..257.09 rows=161 width=44) (actual time=0.003..9.889 rows=42 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=44) (actual time=0.001..0.002 rows=1 loops=1)
-> Hash Join (cost=0.29..25.39 rows=16 width=44) (actual time=0.451..1.090 rows=5 loops=9)
Hash Cond: (line.source = path_1.node)
-> CTE Scan on line (cost=0.00..20.90 rows=1045 width=12) (actual time=0.003..0.678 rows=1045 loops=8)
-> Hash (cost=0.25..0.25 rows=3 width=44) (actual time=0.007..0.007 rows=3 loops=9)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> WorkTable Scan on path path_1 (cost=0.00..0.25 rows=3 width=44) (actual time=0.001..0.004 rows=3 loops=9)
Filter: ((node <> 10379) AND (pathseq <= 8))
Rows Removed by Filter: 1
-> Sort (cost=3.63..3.64 rows=1 width=44) (actual time=9.991..9.994 rows=8 loops=1)
Sort Key: path.total_cost
Sort Method: quicksort Memory: 26kB
-> CTE Scan on path (cost=0.00..3.62 rows=1 width=44) (actual time=7.851..9.979 rows=8 loops=1)
Filter: (fullpath @> '{10379}'::integer[])
Rows Removed by Filter: 34
Planning time: 0.234 ms
Execution time: 10.111 ms
(22 rows)
Time: 10.973 ms
大資料集的對比
以上測試的資料集比較小,隻有1000多個邊,如果在100w的資料集下,結果如何呢?
計算最短路徑 | 時間(秒) |
---|---|
pgr_dijkstra() | 52秒 |
遞歸CTE(最大深度2邊) | 2秒 |
遞歸CTE(最大深度3邊) | 5秒 |
遞歸CTE(最大深度4邊) | 105秒 |
遞歸CTE(最大深度5邊) | 算不出來,放棄 |
遞歸CTE(最大深度7邊,假設每個邊cost相等,不排序,結果最短路徑為3個邊) | 1.6秒 |
小結
簡單的深度周遊求解可以适用于小資料集或深度比較小的場景。
在滿足這些條件的場景下,效果還是不錯的。