标簽
PostgreSQL , 資料離散性 , 掃描性能 , 重複掃 , bitmap index scan , 排序掃描
https://github.com/digoal/blog/blob/master/201804/20180402_01.md#%E8%83%8C%E6%99%AF 背景
PostgreSQL中資料的掃描方法很多,常見的有:
1、全表順序掃描(seqscan)
2、索引+回表掃描(index scan)
3、索引掃描(index only scan)
4、bitmap掃描(bitmap index + block sorted heap scan)
那麼對于同一張表,傳回同樣的記錄數,不同的索引,效率有什麼差别呢?
回答是和資料的存儲線性相關性有關。(PostgreSQL的bitmap scan就是用來解這個問題的)
https://github.com/digoal/blog/blob/master/201804/20180402_01.md#%E4%BE%8B%E5%AD%90 例子
1、構造一份資料,1000萬記錄,其中一個字段存儲線性相關(時序),另一個字段亂序。
postgres=# create table corr_test(c1 int, c2 int); CREATE TABLE postgres=# insert into corr_test select generate_series(1,10000000) , random()*10000000; INSERT 0 10000000
2、建立索引
postgres=# create index idx_corr_test_1 on corr_test (c1); CREATE INDEX postgres=# create index idx_corr_test_2 on corr_test (c2); CREATE INDEX
3、記錄如下
postgres=# select count (distinct c1), count(distinct c2) from corr_test ; -[ RECORD 1 ]--- count | 10000000 count | 6321273
4、檢視兩個字段的存儲線性相關性(correlation)
《PostgreSQL 計算 任意類型 字段之間的線性相關性》 《PostgreSQL 統計資訊之 - 邏輯與實體存儲的線性相關性》 《索引順序掃描引發的堆掃描IO放大背後的統計學原理與解決辦法 - PostgreSQL index scan enlarge heap page scans when index and column correlation small.》postgres=# analyze corr_test ; ANALYZE postgres=# select * from pg_stats where tablename='corr_test'; -[ RECORD 1 ]----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- schemaname | public tablename | corr_test attname | c1 inherited | f null_frac | 0 avg_width | 4 n_distinct | -1 most_common_vals | most_common_freqs | histogram_bounds | {264,92701,193390,294012,388058,485346,593684,685077,789274,894145,997995,1089782,1200355,1283219,1378625,1483703,1579768,1686405,1784210,1881611,1986283,2081148,2174498,2283021,2374024,2488115,2595505,2692236,2786642,2885331,2986908,3084688,3189282,3286800,3391763,3498828,3590785,3680583,3783899,3899023,3986142,4086260,4182787,4279303,4376024,4474895,4557993,4658238,4769987,4860761,4952944,5060715,5167068,5268284,5368289,5478782,5576872,5682785,5777336,5882893,5979896,6092909,6192493,6283722,6386922,6481272,6590049,6684505,6784638,6877767,6977433,7073017,7169473,7267987,7374703,7467565,7572386,7674582,7770014,7866422,7977399,8086125,8193108,8289676,8391245,8493604,8600614,8697894,8800583,8899375,9003507,9112105,9205179,9296198,9397346,9498380,9594476,9690568,9788951,9895128,9999848} correlation | 1 # 線性相關 most_common_elems | most_common_elem_freqs | elem_count_histogram | -[ RECORD 2 ]----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- schemaname | public tablename | corr_test attname | c2 inherited | f null_frac | 0 avg_width | 4 n_distinct | -0.51138 most_common_vals | {426318,766194,855444,1149657,1200975,1220163,1365550,2088174,2140281,2763150,3056426,3112825,3121883,3155120,3215926,3301333,3560657,4426775,4594877,4777035,5252068,5677480,5854408,6189177,6218589,6244509,6302874,6739846,6791877,7114618,7339405,7868148,8246024,8401824,8581341,8804281,9126954,9441793,9478704,9596435,9695680,9878570,9971294} most_common_freqs | {6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05,6.66667e-05} histogram_bounds | {271,106225,201567,294497,392824,494367,588776,678500,770927,868980,969487,1070023,1163902,1263888,1370143,1469104,1560941,1663822,1774042,1878441,1968698,2062914,2165452,2269587,2379207,2488012,2588110,2687086,2794150,2901621,3008186,3110180,3209342,3310782,3405087,3496885,3595592,3703924,3809056,3912808,4005481,4106138,4198143,4299460,4389301,4488847,4586914,4678114,4783071,4874577,4967846,5068838,5157910,5258764,5360550,5461419,5549937,5652323,5748370,5850884,5956837,6062955,6164560,6260734,6359909,6464077,6563959,6656376,6757534,6846335,6939362,7033698,7137829,7252307,7356135,7453579,7553301,7656348,7766475,7860057,7960997,8067532,8176088,8278098,8378888,8473638,8589748,8697667,8797619,8894596,8999418,9101559,9200533,9305406,9394744,9497708,9598103,9699825,9803509,9901242,9999667} correlation | 0.00410469 # 線性不相關 most_common_elems | most_common_elem_freqs | elem_count_histogram |
這兩個字段的相關性反差非常明顯,C1字段的存儲完全線性相關,C2字段完全不相幹。
如果按索引順序掃描這張表,使用C1索引,由于存儲線性相關,是以在HEAP上的BLOCK掃描也幾乎是線性的。
然而C2字段由于存儲完全不相關,如果使用C2字段的索引掃描,會導緻IO重複掃描,導緻放大。
5、強制模拟索引掃描:
postgres=# set enable_seqscan=off; SET postgres=# set enable_bitmapscan =off; SET 線性掃描,掃描71573個資料塊 postgres=# explain (analyze,verbose,timing,costs,buffers) select * from corr_test where c1 between 1 and 10000000; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_corr_test_1 on public.corr_test (cost=0.43..274413.40 rows=10000033 width=8) (actual time=0.017..1919.720 rows=10000000 loops=1) Output: c1, c2 Index Cond: ((corr_test.c1 >= 1) AND (corr_test.c1 <= 10000000)) Buffers: shared hit=71573 Planning time: 0.142 ms Execution time: 2771.570 ms (6 rows) 離散掃描,每個BLOCK幾乎都被重複掃描了140次,一個BLOCK剛好存儲140條記錄,說明這140條記錄在順序上完全離散。 postgres=# explain (analyze,verbose,timing,costs,buffers) select * from corr_test where c2 between 1 and 10000000; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_corr_test_2 on public.corr_test (cost=0.43..36296.14 rows=50000 width=8) (actual time=0.029..6563.525 rows=9999999 loops=1) Output: c1, c2 Index Cond: ((corr_test.c2 >= 1) AND (corr_test.c2 <= 10000000)) Buffers: shared hit=10027095 Planning time: 0.089 ms Execution time: 7421.801 ms (6 rows)
6、PostgreSQL對于這種資料,會使用位圖掃描,位圖掃描的原理是從索引中得到HEAP BLOCK ID,然後按HEAP BLOCK ID 排序後順序掃描。
排序掃描後,按兩個字段的索引掃描,掃描的BLOCK就一樣多了。不會有重複掃描的現象。
postgres=# set enable_bitmapscan =on; SET postgres=# explain (analyze,verbose,timing,costs,buffers) select * from corr_test where c1 between 1 and 10000000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on public.corr_test (cost=2844700.77..3038949.27 rows=10000033 width=8) (actual time=440.230..1691.523 rows=10000000 loops=1) Output: c1, c2 Recheck Cond: ((corr_test.c1 >= 1) AND (corr_test.c1 <= 10000000)) Heap Blocks: exact=44248 Buffers: shared hit=71573 -> Bitmap Index Scan on idx_corr_test_1 (cost=0.00..2842200.77 rows=10000033 width=0) (actual time=433.548..433.548 rows=10000000 loops=1) Index Cond: ((corr_test.c1 >= 1) AND (corr_test.c1 <= 10000000)) Buffers: shared hit=27325 Planning time: 0.144 ms Execution time: 2510.658 ms (10 rows) postgres=# explain (analyze,verbose,timing,costs,buffers) select * from corr_test where c2 between 1 and 10000000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on public.corr_test (cost=2844700.76..3038949.24 rows=10000032 width=8) (actual time=688.150..1939.259 rows=9999998 loops=1) Output: c1, c2 Recheck Cond: ((corr_test.c2 >= 1) AND (corr_test.c2 <= 10000000)) Heap Blocks: exact=44248 Buffers: shared hit=71573 -> Bitmap Index Scan on idx_corr_test_2 (cost=0.00..2842200.75 rows=10000032 width=0) (actual time=681.488..681.488 rows=9999998 loops=1) Index Cond: ((corr_test.c2 >= 1) AND (corr_test.c2 <= 10000000)) Buffers: shared hit=27325 Planning time: 0.147 ms Execution time: 2758.621 ms (10 rows)
但是大家請注意BITMAP SCAN會引入一個recheck的過程,因為按BLOCK順序掃描時,隻有BLOCK ID,并不知道這個BLOCK裡面哪條記錄是比對的。是以必須要recheck。
是以BITMAP SCAN降低了IO放大,但是引入了recheck。
在成本評估時,起作用的兩個成本因子:
1、random_page_cost,離散掃描成本,乘以要掃描的塊數。
2、cpu_operator_cost,函數或操作符的基本成本,評估的記錄數乘以這個值再乘以函數或操作符的基本成本系數(
pg_proc.procost
)。
https://github.com/digoal/blog/blob/master/201804/20180402_01.md#%E5%B0%8F%E7%BB%93 小結
PostgreSQL使用索引掃描時,如果索引順序與資料存儲順序的相關性很差,會導緻HEAP BLOCK掃描的放大(由于亂序導緻一個BLOCK被多次讀取)。
使用bitmap scan,可以消除HEAP BLOCK掃描的放大問題(按BLOCK ID排序後掃描一遍),但是會引入一個問題,需要rechek。
是以僅僅當評估滿足條件的記錄數與BLOCK内實際含的記錄數相比,比例較大時,使用bitmap scan帶來的效果非常好,如果比例較小,那麼就當operator帶來的消耗比掃描IO帶來的消耗小時更劃算。
(例如評估出來滿足條件的有1000條,掃描100個BLOCK每個BLOCK有50條記錄,那麼實際上比例就是0.2=1000/(100*50))
除了考慮資料存儲的離散性,索引頁本身的組織也是離散的,詳見:
《深入淺出PostgreSQL B-Tree索引結構》 《PostgreSQL 黑科技 - 空間聚集存儲, 内窺GIN, GiST, SP-GiST索引》 https://www.postgresql.org/docs/10/static/pageinspect.html