标簽
PostgreSQL , 空間類型 , 統計資訊 , 包含查詢 , 相交查詢 , BOUND , index , x , y
https://github.com/digoal/blog/blob/master/201807/20180711_02.md#%E8%83%8C%E6%99%AF 背景
PostgreSQL 通過幾何類型以及PostGIS插件,支援豐富的空間類型,空間資料類型包括KNN檢索,空間包含,空間相交,方位搜尋,空間計算等操作。
為了得到正确的執行計劃,在資料庫優化器中,評估滿足某個操作符條件的行數,是一個重要的CBO依據。
而評估選擇性又依賴統計資訊,同時依賴每種操作符的選擇性算法。
一些優化器相關文檔:
《PostgreSQL 自定義函數表達式選擇性評估算法 - Statistics, Cardinality, Selectivity, Estimate》 《PostgreSQL 多值列的選擇性 - Statistics, Cardinality, Selectivity, Estimate》 《PostgreSQL 11 preview - 表達式索引柱狀圖buckets\STATISTICS\default_statistics_target可設定》 《PostgreSQL 11 preview - 優化器 增強 彙總》 《PostgreSQL 11 preview - 強制auto prepared statment開關(自動化plan cache)(類似Oracle cursor_sharing force)》 《[未完待續] PostgreSQL PRO 特性 - AQO(機器學習執行計劃優化器)》 《PostgreSQL pg_stat_reset清除track_counts的隐患》 《PostgreSQL 統計資訊pg_statistic格式及導入導出dump_stat - 相容Oracle》 《優化器裡的機率學 - 性能抖動原理分析》 《懶人推動社會進步 - 多列聚合, gin與資料分布(選擇性)》 《資料庫優化器原理(含動态規劃、機器學習模組化優化器aqo) - 如何治療選擇綜合症》 《PostgreSQL 10.0 preview 功能增強 - SQL執行剩餘時間 - 垃圾回收過程可視pg_stat_progress_vacuum》 《PostgreSQL 10.0 preview 性能增強 - hash,nestloop join優化(聰明的優化器是這樣的)》 《PostgreSQL 10.0 preview 功能增強 - 動态視圖pg_stat_activity新增資料庫管理程序資訊》 《PostgreSQL 10.0 preview 功能增強 - 流複制統計視圖pg_stat_replication增強, 新增時間次元延遲評估》 《PostgreSQL 10.0 preview 優化器改進 - 不完整索引支援複合排序》 《聊一下PostgreSQL優化器 - in裡面有重複值時PostgreSQL如何處理?》 《官人要杯咖啡嗎? - PostgreSQL實時監測PLAN tree的執行進度 - pg_query_state》 《PostgreSQL pg_stat_ pg_statio_ 統計資訊(scan,read,fetch,hit)源碼解讀》 《PostgreSQL 9.6 并行計算 優化器算法淺析 - 以及如何強制并行度》 《PostgreSQL 9.6 并行計算 優化器算法淺析》 《關鍵時刻HINT出彩 - PG優化器的參數優化、執行計劃固化CASE》 《Greenplum ORCA 優化器的編譯安裝與使用》 《PostgreSQL prepared statement和simple query的profile及性能差異》 《PostgreSQL 優化器邏輯推理能力 源碼解析》 《PostgreSQL 操作符與優化器詳解 - 包含(選擇性、JOIN方法、等效)等内容》 《PostgreSQL pg_stat_replication sent_location, write_location, flush_location, replay_location的差别》 《為什麼PostgreSQL啟動後有個UDP監聽localhost - pgstat統計資訊程序間通信》 《pg_stat_statements fwrite not save》 《PostgreSQL 9.5 new feature - use FSM fast evaluate bloat and other table level statics(pgstattuple_approx)》 《PostgreSQL aggregate function 2 : Aggregate Functions for Statistics》 《優化器成本因子校對(disk,ssd,memory IO開銷精算) - PostgreSQL real seq_page_cost & random_page_cost in disks,ssd,memory》 《優化器成本因子校對 - PostgreSQL explain cost constants alignment to timestamp》 《PostgreSQL pg_stats used to estimate top N freps values and explain rows》 《執行計劃選擇算法 與 綁定變量 - PostgreSQL prepared statement: SPI_prepare, prepare|execute COMMAND, PL/pgsql STYLE: custom & generic plan cache》 《執行計劃次元統計\判斷執行計劃翻轉\統計每種執行計劃的統計資訊 use pg_stat_plans in PostgreSQL 9.0, 9.1 and 9.2》 《PostgreSQL 9.2 improve prepared statements plan's selectivity》 《PostgreSQL 9.2 add array elements statistics》 《WHY prepared Statement running slower in some situation CASE》 《PostgreSQL 優化器行評估算法》那麼,空間類型有合理的統計資訊嗎?
“目前空間資料類型,還沒有柱狀圖,高頻詞等内容”,是以給操作符提供的選擇性依據非常有限。
是以目前存在一個這樣的問題,選擇性函數内使用百分比寫死,在代碼中寫死了選擇性比例,那麼空間包含查詢(不管範圍多大,評估出來的記錄數是一樣的),是以基本上都會選擇空間索引掃描( 然而實際上當真實空間包含查詢包含了大量記錄時,全表掃描性能更好 )。
https://github.com/digoal/blog/blob/master/201807/20180711_02.md#%E4%BE%8B%E5%AD%90 例子
1、建表(pos存儲point資料),寫入1000萬記錄資料,建立空間索引
create extension postgis;
create table test (id int , info text, pos geometry);
insert into test select generate_series(1,10000000), 'test', st_setsrid(st_makepoint(110+random()*20-10, 70+random()*10-5), 4326);
create index idx_test_1 on test using gist(pos);
2、收集統計資訊
vacuum analyze test;
你會發現空間類型的統計資訊非常有限(沒有高頻詞,柱狀圖,相關性等内容):
postgres=# select * from pg_stats where tablename='test' and attname='pos';
schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_freqs | elem_count_histogram
------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+------------------+-------------+-------------------+------------------------+----------------------
public | test | pos | f | 0 | 32 | -1 | | | | | | |
(1 row)
3、檢視執行計劃,檢視評估出來的記錄數。
空間類型選擇性使用如下寫死0.001。
src/backend/utils/adt/geo_selfuncs.c
/*
* contsel -- How likely is a box to contain (be contained by) a given box?
*
* This is a tighter constraint than "overlap", so produce a smaller
* estimate than areasel does.
*/
Datum
contsel(PG_FUNCTION_ARGS)
{
PG_RETURN_FLOAT8(0.001);
}
不管輸入什麼空間條件,評估得到的記錄數都是千分之一:10000條
條件1:
postgres=# explain select * from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) , pos);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=149.95..12834.93 rows=3333 width=41)
Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)
Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos)
-> Bitmap Index Scan on idx_test_1 (cost=0.00..149.12 rows=10000 width=0)
Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)
(5 rows)
條件2:
postgres=# explain select * from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=149.95..12834.93 rows=3333 width=41)
Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)
Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos)
-> Bitmap Index Scan on idx_test_1 (cost=0.00..149.12 rows=10000 width=0)
Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)
(5 rows)
4、那麼實際上的記錄數是多少呢?
條件1:0條。
postgres=# explain analyze select count(*) from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) , pos);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=12843.26..12843.27 rows=1 width=8) (actual time=0.015..0.015 rows=1 loops=1)
-> Bitmap Heap Scan on test (cost=149.95..12834.93 rows=3333 width=0) (actual time=0.012..0.012 rows=0 loops=1)
Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)
Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos)
-> Bitmap Index Scan on idx_test_1 (cost=0.00..149.12 rows=10000 width=0) (actual time=0.010..0.010 rows=0 loops=1)
Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~
pos)
Planning time: 0.170 ms
Execution time: 0.061 ms
(8 rows)
條件2:5001107條。
postgres=# explain analyze select count(*) from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=12843.26..12843.27 rows=1 width=8) (actual time=5057.822..5057.822 rows=1 loops=1)
-> Bitmap Heap Scan on test (cost=149.95..12834.93 rows=3333 width=0) (actual time=507.744..4537.564 rows=5001107 loops=1)
Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)
Rows Removed by Index Recheck: 1981986
Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry,
pos)
Heap Blocks: exact=50167 lossy=33167
-> Bitmap Index Scan on idx_test_1 (cost=0.00..149.12 rows=10000 width=0) (actual time=499.627..499.627 rows=5001107 loops=1)
Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~
pos)
Planning time: 0.179 ms
Execution time: 5057.882 ms
(10 rows)
5、空間類型的選擇性不準确,導緻的問題,不管什麼情況都走空間索引。
5.1 大範圍搜尋,使用索引反而更慢,如下。
postgres=# explain analyze select count(*) from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=10539.13..10539.14 rows=1 width=8) (actual time=1079.698..1079.699 rows=1 loops=1)
-> Gather (cost=10539.10..10539.11 rows=8 width=8) (actual time=1079.655..1079.693 rows=9 loops=1)
Workers Planned: 8
Workers Launched: 8
-> Partial Aggregate (cost=10539.10..10539.11 rows=1 width=8) (actual time=1049.984..1049.984 rows=1 loops=9)
-> Parallel Bitmap Heap Scan on test (cost=149.95..10538.06 rows=417 width=0) (actual time=483.733..991.774 rows=555679 loops=9)
Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geo metry ~ pos)
Rows Removed by Index Recheck: 220221
Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos)
Heap Blocks: exact=5908 lossy=3936
-> Bitmap Index Scan on idx_test_1 (cost=0.00..149.12 rows=10000 width=0) (actual time=502.790..502.790 rows=5001107 loops=1)
Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)
Planning time: 0.171 ms
Execution time: 1087.890 ms
(14 rows)
postgres=# explain analyze select count(*) from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=411456.73..411456.74 rows=1 width=8) (actual time=674.853..674.853 rows=1 loops=1)
-> Gather (cost=411456.70..411456.71 rows=8 width=8) (actual time=674.793..674.848 rows=9 loops=1)
Workers Planned: 8
Workers Launched: 8
-> Partial Aggregate (cost=411456.70..411456.71 rows=1 width=8) (actual time=644.627..644.627 rows=1 loops=9)
-> Parallel Seq Scan on test (cost=0.00..411455.65 rows=417 width=0) (actual time=0.045..586.545 rows=555679 loops=9)
Filter: (('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos) AND _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos) )
Rows Removed by Filter: 555433
Planning time: 0.130 ms
Execution time: 683.011 ms
(10 rows)
https://github.com/digoal/blog/blob/master/201807/20180711_02.md#%E4%BC%98%E5%8C%96%E6%96%B9%E6%B3%95 優化方法
首先我們知道空間包含,實際上是通過類似r-tree來從大到小進行收斂的,索引結構如下:
《PostgreSQL 黑科技 - 空間聚集存儲, 内窺GIN, GiST, SP-GiST索引》 《通過空間思想了解GiST索引的構造》是以,對于“平面或3D空間點”類型,我們可以利用x,y,z來收集BOUND BOX的邊界統計資訊,使用表達式索引即可得到。
下面的方法适合POINT geometry類型。
1、對point類型,建立x,y表達式索引
create index idx_test_2 on test (st_x(pos));
create index idx_test_3 on test (st_y(pos));
vacuum analyze test;
3、現在,表達式索引的統計資訊有了(邊界,柱狀圖,高頻詞等).
postgres=# select * from pg_stats where tablename='idx_test_2' ;
schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs |
histogram_bounds
| correlation | most_common_elems | most_common_elem_freqs | e
lem_count_histogram
------------+------------+---------+-----------+-----------+-----------+------------+--------------------+-------------------+---------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+-------------------+------------------------+--
--------------------
public | idx_test_2 | st_x | f | 0 | 8 | -0.978292 | {100.210125753656} | {6.66667e-05} | {100.001683048904,100.198292508721,100.394482128322,100.614011939615,100.833150474355,101.039264379069,101.233
114106581,101.457025809214,101.663421457633,101.865276163444,102.060301089659,102.248072810471,102.453578868881,102.656150059775,102.860928429291,103.057968365029,103.260449869558,103.457426037639,103.632223745808,103.85178135708,104.049
411769956,104.26340050064,104.474113518372,104.658453594893,104.85337683931,105.054327072576,105.277718435973,105.484521845356,105.703673372045,105.907790735364,106.115896645933,106.314396839589,106.519666947424,106.724081514403,106.9380
68913296,107.144253859296,107.326071513817,107.518919138238,107.721077715978,107.916374253109,108.123016441241,108.360300129279,108.551258808002,108.714984534308,108.903268156573,109.091228945181,109.277053307742,109.48098176159,109.6832
09387586,109.853478074074,110.044501451775,110.241627292708,110.45505293645,110.652585113421,110.851763477549,111.036027139053,111.231312695891,111.423934968188,111.620921371505,111.847442938015,112.031054906547,112.234003236517,112.4332
3944509,112.63793184422,112.834744984284,113.041293732822,113.23037719354,113.432592023164,113.619993180037,113.809020379558,114.014157289639,114.208164261654,114.409972019494,114.612494371831,114.827286116779,115.010247323662,115.208249
371499,115.397510435432,115.607831152156,115.814645215869,116.007154844701,116.198594048619,116.39090036042,116.600214680657,116.819634130225,117.008039858192,117.199446037412,117.389995325357,117.60263632983,117.81475706026,118.02150100
4696,118.219726895913,118.425922412425,118.623819025233,118.801201758906,118.997462140396,119.18357164599,119.394188923761,119.607428628951,119.789487998933,119.999485854059} | 0.0060246 | | |
(1 row)
postgres=# select * from pg_stats where tablename='idx_test_3' ;
schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs |
histogram_bounds
| correlation | most_common_elems | most_common_elem_freqs | elem_cou
nt_histogram
------------+------------+---------+-----------+-----------+-----------+------------+------------------+-------------------+-----------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+-------------------+------------------------+---------
-------------
public | idx_test_3 | st_y | f | 0 | 8 | -1 | | | {65.0000545242801,65.0990079157054,65.2072919439524,65.301208673045,65.3929488640279,65.4877291806042,65.5846159
020439,65.6880339607596,65.7856895681471,65.8818093780428,65.9888388821855,66.0896951518953,66.1984446132556,66.304198782891,66.3969474285841,66.496685273014,66.6056828852743,66.7012783605605,66.7941406555474,66.8967378046364,66.98895718
90235,67.0855156239122,67.191197341308,67.2994636883959,67.3969272850081,67.4947946099564,67.5993846775964,67.7099830284715,67.8120812214911,67.9088703868911,67.9981751134619,68.099527056329,68.1973828841001,68.2960116351023,68.398535256
274,68.4888928988948,68.5882083000615,68.6907122470438,68.800939405337,68.9010692993179,69.0081828692928,69.11032628268,69.19522870332,69.2957087606192,69.3950105085969,69.5086302934214,69.6040614042431,69.7045981185511,69.8040427314118,
69.906362099573,70.0155263161287,70.1146272942424,70.2022811910138,70.2994426619262,70.4040612280369,70.5112449126318,70.6194959674031,70.722281485796,70.822925157845,70.918908463791,71.0138623649254,71.1134587181732,71.2168908445165,71.
3034919975325,71.4052765490487,71.5016098646447,71.6014923620969,71.6971560986713,71.8042277451605,71.8970008520409,71.9877171749249,72.0928315026686,72.1963384188712,72.2967635607347,72.3947894759476,72.4977402808145,72.5967703945935,72
.6985206454992,72.7883456554264,72.885032473132,72.9875696543604,73.0857636081055,73.1845043040812,73.280454329215,73.3783990284428,73.4819683339447,73.5874625109136,73.6941803013906,73.7924668611959,73.8872256595641,73.9888873603195,74.
0806347271428,74.1819418873638,74.2877806117758,74.3848745618016,74.4918680656701,74.5988712925464,74.7014277381822,74.8038458405063,74.9038722459227,74.9999207118526} | -0.00307583 | | |
(1 row)
https://github.com/digoal/blog/blob/master/201807/20180711_02.md#%E7%8E%B0%E5%9C%A8%E6%88%91%E4%BB%AC%E6%94%B9%E4%B8%80%E4%B8%8Bsql%E6%8A%8Aboundbox%E6%94%BE%E8%BF%9Bsql 現在我們改一下SQL,把boundbox放進SQL
select count(*) from test
where
st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) , pos);
改成:
select count(*) from test
where
st_x(pos)
between
st_xmin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))
and
st_xmax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))
and
st_y(pos)
between
st_ymin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))
and
st_ymax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))
and
st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);
執行計劃如下:
在大範圍輸入時,通過表達式索引條件評估得到的行數比空間編碼大于空間評估(寫死 千分之一)得到的記錄數,是以依舊使用了空間索引。
postgres=# explain select count(*) from test
where
st_x(pos)
between
st_xmin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))
and
st_xmax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))
and
st_y(pos)
between
st_ymin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))
and
st_ymax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))
and
st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=12380.60..12380.61 rows=1 width=8)
-> Index Scan using idx_test_1 on test (cost=0.42..12376.42 rows=1674 width=0)
Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)
Filter: ((st_x(pos) >= '110'::double precision) AND (st_x(pos) <= '120'::double precision) AND (st_y(pos) >= '1'::double precision) AND (st_y(pos) <= '100'::double precision) AND _st_contains('0103000020E610000001000000050000000 000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos))
(4 rows)
或
在小範圍輸入時,評估得到的行數比空間編碼更少(千分之一),是以使用了非空間索引。
explain select count(*) from test
where
st_x(pos)
between
st_xmin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326)))
and
st_xmax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326)))
and
st_y(pos)
between
st_ymin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326)))
and
st_ymax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326)))
and
st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) , pos);
QUERY PLAN
----------------------------------------------------------------------------------------------
Aggregate (cost=2.92..2.93 rows=1 width=8)
-> Index Scan using idx_test_3 on test (cost=0.43..2.92 rows=1 width=0)
Index Cond: ((st_y(pos) >= '1'::double precision) AND (st_y(pos) <= '2'::double precision))
Filter: (('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos) AND
(st_x(pos) >= '110'::double precision) AND (st_x(pos) <= '112'::double precision) AND _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000
000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos))
(4 rows)
https://github.com/digoal/blog/blob/master/201807/20180711_02.md#%E4%B8%80%E4%BA%9B%E5%87%BD%E6%95%B0 一些函數
檢視geometry的bound box
postgres=# select st_astext(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) ));
st_astext
------------------------------------------
POLYGON((110 1,110 2,112 2,112 1,110 1))
(1 row)
postgres=# select st_astext(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));
st_astext
--------------------------------------------------------------------------------------------------------------------------------------------------------
POLYGON((220186.995121892 150406,220186.995121892 150506.126829327,220288.248780547 150506.126829327,220288.248780547 150406,220186.995121892 150406))
(1 row)
2、檢視BOUND BOX的邊界點x,y.
postgres=# select st_xmin(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));
st_xmin
------------------
220186.995121892
(1 row)
postgres=# select st_xmax(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));
st_xmax
------------------
220288.248780547
(1 row)
postgres=# select st_ymin(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));
st_ymin
---------
150406
(1 row)
postgres=# select st_ymax(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));
st_ymax
------------------
150506.126829327
(1 row)
https://github.com/digoal/blog/blob/master/201807/20180711_02.md#%E5%B0%8F%E7%BB%93 小結
目前空間類型的bound box沒有統計資訊,是以空間查詢時,選擇性使用了寫死寫死,固定了選擇性百分比,乘以pg_class.reltuples即可得到評估記錄數,但是不準确。
使得PostgreSQL的空間搜尋(包含搜尋),不管範圍多大,都使用索引,而實際上大範圍可能使用全表掃描性能更好。
本文增加boundbox的條件,使用xyz邊界條件,使得行數評估更加的準确,在适當的時候選擇适合的索引。
當然,最好的方法,還是PG核心層面優化空間類型的統計資訊,以及選擇性代碼的支援。
https://github.com/digoal/blog/blob/master/201807/20180711_02.md#%E5%8F%82%E8%80%83 參考
建議,PostgreSQL核心層面改進空間類型的統計資訊,點類型,分别統計X,Y軸的統計資訊。