PostgreSQL求解最短路徑

有些業務中需要求解最短路徑，PostgreSQL中有個pgrouting插件内置了和計算最短路徑相關的算法。

下面看下示例

表定義

postgres=# \d testpath
              Table "public.testpath"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 id     | integer |           |          | 
 source | integer |           |          | 
 target | integer |           |          | 
 cost   | integer |           |          |

這是張業務表，每一行代表一條邊及其代價，總共1000多條記錄（實際對應的是按業務條件篩選後的結果集大小）。其餘業務相關的屬性全部隐去。

求解2點間的最短路徑

postgres=# SELECT * FROM pgr_dijkstra(
  'SELECT id,source,target,cost FROM testpath',
  10524, 10379, directed:=true);
 seq | path_seq | node  |  edge   | cost | agg_cost 
-----+----------+-------+---------+------+----------
   1 |        1 | 10524 | 1971852 |    1 |        0
   2 |        2 |  7952 |   32256 |    1 |        1
   3 |        3 |  7622 |   76615 |    2 |        2
   4 |        4 | 44964 |   76616 |    1 |        4
   5 |        5 |  7861 |   19582 |    1 |        5
   6 |        6 |  7629 |   14948 |    2 |        6
   7 |        7 | 17135 |   14949 |    1 |        8
   8 |        8 | 10379 |      -1 |    0 |        9
(8 rows)

Time: 22.979 ms

求解2點間最短的N條路徑

postgres=# SELECT * FROM pgr_ksp(
  'SELECT id,source,target,cost FROM testpath',
  10524, 10379, 1000,directed:=true);
 seq | path_id | path_seq | node  |  edge   | cost | agg_cost 
-----+---------+----------+-------+---------+------+----------
   1 |       1 |        1 | 10524 | 1971852 |    1 |        0
   2 |       1 |        2 |  7952 |   32256 |    1 |        1
   3 |       1 |        3 |  7622 |   54740 |    2 |        2
   4 |       1 |        4 | 35389 |   54741 |    1 |        4
   5 |       1 |        5 |  7861 |   19582 |    1 |        5
   6 |       1 |        6 |  7629 |   14948 |    2 |        6
   7 |       1 |        7 | 17135 |   14949 |    1 |        8
   8 |       1 |        8 | 10379 |      -1 |    0 |        9
...(略)
 100 |      12 |        4 | 53179 |   95137 |    1 |        4
 101 |      12 |        5 |  7625 |   90682 |    2 |        5
 102 |      12 |        6 | 51211 |   90683 |    1 |        7
 103 |      12 |        7 |  7861 |   19582 |    1 |        8
 104 |      12 |        8 |  7629 | 1173911 |    2 |        9
 105 |      12 |        9 | 59579 | 1173917 |    1 |       11
 106 |      12 |       10 | 10379 |      -1 |    0 |       12
(106 rows)

Time: 201.223 ms

純SQL求解最短路徑

前面的最短路徑是通過pgrouting插件計算的，能不能單純利用PG自身的SQL完成最短路徑的計算呢？

真實的業務場景下是可以限制路徑的長度的，比如，如果我們舍棄所有邊數大于7的路徑。

那麼完全可以用簡單的深度周遊計算最短路徑。計算速度還提高了5倍。

postgres=# WITH RECURSIVE line AS(
SELECT source,target,cost from testpath
),
path(fullpath,pathseq,node,total_cost) AS (
    select ARRAY[10524],1,10524,0
  UNION ALL
    select array_append(fullpath,target),pathseq+1,target,total_cost+cost from path join line on(source=node) where node!=10379 and pathseq<=8
)
SELECT * FROM path where fullpath @> ARRAY[10379] order by total_cost limit 1;
                   fullpath                    | pathseq | node  | total_cost 
-----------------------------------------------+---------+-------+------------
 {10524,7952,7622,80465,7861,7629,17135,10379} |       8 | 10379 |          9
(1 row)

Time: 4.334 ms

如果每條邊的cost相同，可以去掉上面的

order by total_cost

，在大資料集上性能會有很大的提升。

純SQL求解最短的N條路徑

沿用前面的SQL，隻是修改了一下

limit

值，相比pgrouting的

pgr_ksp

函數性能提升的更多。性能提升了50倍。

postgres=# WITH RECURSIVE line AS(
SELECT source,target,cost from testpath
),
path(fullpath,pathseq,node,total_cost) AS (
    select ARRAY[10524],1,10524,0
  UNION ALL
    select array_append(fullpath,target),pathseq+1,target,total_cost+cost from path join line on(source=node) where node!=10379 and pathseq<=8
)
SELECT * FROM path where fullpath @> ARRAY[10379] order by total_cost limit 1000;
                      fullpath                      | pathseq | node  | total_cost 
----------------------------------------------------+---------+-------+------------
 {10524,7952,7622,80465,7861,7629,17135,10379}      |       8 | 10379 |          9
 {10524,7952,7622,35389,7861,7629,17135,10379}      |       8 | 10379 |          9
 {10524,7952,7622,44964,7861,7629,17135,10379}      |       8 | 10379 |          9
 {10524,7952,7622,80465,7861,7629,59579,10379}      |       8 | 10379 |          9
 {10524,7952,7622,35389,7861,7629,59579,10379}      |       8 | 10379 |          9
 {10524,7952,7622,44964,7861,7629,59579,10379}      |       8 | 10379 |          9
 {10524,7952,7622,53179,7625,7861,7629,17135,10379} |       9 | 10379 |         10
 {10524,7952,7622,53179,7625,7861,7629,59579,10379} |       9 | 10379 |         10
(8 rows)

Time: 4.425 ms

下面看下執行計劃

postgres=# explain analyze
WITH RECURSIVE line AS(
SELECT source,target,cost from testpath
),
path(fullpath,pathseq,node,total_cost) AS (
    select ARRAY[10524],1,10524,0
  UNION ALL
    select array_append(fullpath,target),pathseq+1,target,total_cost+cost from path join line on(source=node) where node!=10379 and pathseq<=8
)
SELECT * FROM path where fullpath @> ARRAY[10379] order by total_cost limit 1000;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=277.18..277.18 rows=1 width=44) (actual time=9.992..10.001 rows=8 loops=1)
   CTE line
     ->  Seq Scan on testpath  (cost=0.00..16.45 rows=1045 width=12) (actual time=0.017..0.624 rows=1045 loops=1)
   CTE path
     ->  Recursive Union  (cost=0.00..257.09 rows=161 width=44) (actual time=0.003..9.889 rows=42 loops=1)
           ->  Result  (cost=0.00..0.01 rows=1 width=44) (actual time=0.001..0.002 rows=1 loops=1)
           ->  Hash Join  (cost=0.29..25.39 rows=16 width=44) (actual time=0.451..1.090 rows=5 loops=9)
                 Hash Cond: (line.source = path_1.node)
                 ->  CTE Scan on line  (cost=0.00..20.90 rows=1045 width=12) (actual time=0.003..0.678 rows=1045 loops=8)
                 ->  Hash  (cost=0.25..0.25 rows=3 width=44) (actual time=0.007..0.007 rows=3 loops=9)
                       Buckets: 1024  Batches: 1  Memory Usage: 8kB
                       ->  WorkTable Scan on path path_1  (cost=0.00..0.25 rows=3 width=44) (actual time=0.001..0.004 rows=3 loops=9)
                             Filter: ((node <> 10379) AND (pathseq <= 8))
                             Rows Removed by Filter: 1
   ->  Sort  (cost=3.63..3.64 rows=1 width=44) (actual time=9.991..9.994 rows=8 loops=1)
         Sort Key: path.total_cost
         Sort Method: quicksort  Memory: 26kB
         ->  CTE Scan on path  (cost=0.00..3.62 rows=1 width=44) (actual time=7.851..9.979 rows=8 loops=1)
               Filter: (fullpath @> '{10379}'::integer[])
               Rows Removed by Filter: 34
 Planning time: 0.234 ms
 Execution time: 10.111 ms
(22 rows)

Time: 10.973 ms

大資料集的對比

以上測試的資料集比較小，隻有1000多個邊，如果在100w的資料集下，結果如何呢？

計算最短路徑	時間(秒)
pgr_dijkstra()	52秒
遞歸CTE(最大深度2邊)	2秒
遞歸CTE(最大深度3邊)	5秒
遞歸CTE(最大深度4邊)	105秒
遞歸CTE(最大深度5邊)	算不出來，放棄
遞歸CTE(最大深度7邊，假設每個邊cost相等，不排序，結果最短路徑為3個邊)	1.6秒

小結

簡單的深度周遊求解可以适用于小資料集或深度比較小的場景。

在滿足這些條件的場景下，效果還是不錯的。

PostgreSQL求解最短路徑

表定義

求解2點間的最短路徑

求解2點間最短的N條路徑

純SQL求解最短路徑

純SQL求解最短的N條路徑

大資料集的對比

小結

繼續閱讀

CSMA/CD1． CSMA/CD的概述2． CSMA 的工作原理3． CSMA/CD控制規程及特點4． CSMA/CD協定5． CSMA/CD的優點6．結束語

極大似然法(ML)與最大期望法(EM)

MapReduce的幾個企業級經典面試案例MapReduce的幾個企業級經典面試案例

9.spark Core 進階2--Cashe

C++ 第十五周報告1--《冒泡法排序》

淺談企業活動中進行資料分析的重要性

筆試面試題目：滑動視窗(二)

資料結構與算法（27）——排序（二）

neo4j之cypher使用文檔

Dijkstra--簡易版（最短路徑）

Ambari介紹和架構原理

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

NOSQL安全攻擊

sqlServer根據經緯查距離

win10本地scala和spark安裝安裝scala安裝spark

hdu7108哈希