天天看點

PostgreSQL 優化器案例之 - order by limit 索引選擇問題

标簽

PostgreSQL , limit , order by , 優化器 , 選擇性 , 相關性 , 資料存儲順序 , 目标資料存儲順序

https://github.com/digoal/blog/blob/master/201807/20180712_01.md#%E8%83%8C%E6%99%AF 背景

當我們在執行一個這樣的SQL時,假如有這樣幾個索引(c1,c2) (id),資料庫到底該用哪個索引呢?

explain select * from tbl where c1=200 and c2=200 order by id limit 10;  
或  
explain select * from tbl where c1=200 and c2 between 100 and 300 order by id limit 10;  
           

實際上PG會通過計算成本得到應該使用哪個索引。

但是實際上計算公式本身的通用性,使得當C1,C2資料分布傾斜時,可能導緻某些C1,C2輸入值的執行計劃不準确。

走c1,c2索引,當滿足c1,c2條件的記錄很少時。如果資料比較多,則SORT耗費的成本就較大,導緻整個耗時變大。

走id索引,當滿足c1,c2條件的記錄很多時,(并且c1,c2滿足條件的資料分布在id偏小的端),如果分布在大端,則需要掃描更多的記錄才能找到滿足條件的記錄。

如果資料庫統計資訊中可以評估出來滿足條件的c1,c2的記錄與ID順序分布的關系,也可以作為優化器計算成本的一種參考輸入(當然要做到這個可能比較複雜,同時也會增加優化器計算成本的開銷)。

下面通過執行個體來說明。

https://github.com/digoal/blog/blob/master/201807/20180712_01.md#%E4%BE%8B%E5%AD%90 例子

1、建測試表

postgres=# create table tbl (id int, c1 int, c2 int, c3 int, c4 int);  
CREATE TABLE  
           

2、寫入一批随機資料,ID從1到1000萬。

postgres=# insert into tbl select generate_series(1,10000000), random()*100, random()*100, random()*100, random()*100;  
INSERT 0 10000000  
           

3、寫入另一批100萬條資料,c1,c2 與前面1000萬的值不一樣。

postgres=# insert into tbl select generate_series(10000001,11000000), 200,200,200,200;  
INSERT 0 1000000  
           

4、建立兩個索引,也就是本文需要重點關注的,到底走哪個索引更劃算

postgres=# create index idx_tbl_1 on tbl(id);  
CREATE INDEX  
postgres=# create index idx_tbl_2 on tbl(c1,c2,c3,c4);  
CREATE INDEX  
           

5、收集統計資訊

postgres=# vacuum analyze tbl;  
VACUUM  
           

6、檢視下面SQL的執行計劃,走了id索引

postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 10;  
                                      QUERY PLAN                                        
--------------------------------------------------------------------------------------  
 Limit  (cost=0.43..32.59 rows=10 width=20)  
   ->  Index Scan using idx_tbl_1 on tbl  (cost=0.43..323244.26 rows=100533 width=20)  
         Filter: ((c1 = 200) AND (c2 = 200))  
(3 rows)  
           

還是走了id索引

postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 1000;  
                                      QUERY PLAN                                        
--------------------------------------------------------------------------------------  
 Limit  (cost=0.43..3215.74 rows=1000 width=20)  
   ->  Index Scan using idx_tbl_1 on tbl  (cost=0.43..323244.26 rows=100533 width=20)  
         Filter: ((c1 = 200) AND (c2 = 200))  
(3 rows)  
           

當LIMIT達到50000時,走了c1,c2的索引。為什麼呢?

postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 50000;  
                                         QUERY PLAN                                           
--------------------------------------------------------------------------------------------  
 Limit  (cost=70355.06..70480.06 rows=50000 width=20)  
   ->  Sort  (cost=70355.06..70606.39 rows=100533 width=20)  
         Sort Key: id  
         ->  Bitmap Heap Scan on tbl  (cost=1457.82..62005.97 rows=100533 width=20)  
               Recheck Cond: ((c1 = 200) AND (c2 = 200))  
               ->  Bitmap Index Scan on idx_tbl_2  (cost=0.00..1432.69 rows=100533 width=0)  
                     Index Cond: ((c1 = 200) AND (c2 = 200))  
(7 rows)  
           

7、分析以上執行計劃的含義

首先,表的記錄數(1100萬)除以"滿足c1=200 and c2=200 條件的記錄數"(100533),得到平均需要掃描多少條記錄,可以得到一條滿足c1=200 and c2=200條件的記錄.

postgres=# select 11000000/100533.0;  
       ?column?         
----------------------  
 109.4168084111684720  
(1 row)  
           

也就是說每掃描109.4條記錄,可以得到一條滿足條件的記錄。(優化器這麼算,是認為資料分布是均勻的。)

但是,實際上,資料分布是不均勻的,c1=200 and c2=200的記錄在表的末端(1000萬條記錄後面),也就是說需要掃描1000萬條記錄後,才能得到1條滿足c1=200 and c2=200的記錄。

并不是估算的每掃描109.4條記錄,可以得到一條滿足條件的記錄。

問題就出在這裡。

8、我們再來分析一下為什麼limit 50000時,選擇了c1,c2的索引。而不是id的索引

使用ID索引時,需要掃描100533條記錄,同時需要排序,直到排序完成,總成約70606.39。然後就是GET HEAP TUPLE的成本。

當使用id的索引掃描時,傳回多少條記錄能達到70606.39的成本呢?

以limit 1000的3215.74成本為例  
  
postgres=# select 70606.39/3215.74;  
      ?column?         
---------------------  
 21.9564983487471002  
(1 row)  
  
postgres=# select 21.956*1000;  
 ?column?    
-----------  
 21956.000  
(1 row)  
           

https://github.com/digoal/blog/blob/master/201807/20180712_01.md#%E5%88%86%E6%B0%B4%E5%B2%AD 分水嶺

經過以上分析,也就是說,LIMIT 21956時,走ID索引掃描的執行計劃,成本可達到70606.39。

是以limit 21956是一個分水嶺,大于這個值時,可能使用c1,c2的索引掃描,而小于它,則會使用ID索引掃描.

如下

postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 22000;  
                                         QUERY PLAN                                           
--------------------------------------------------------------------------------------------  
 Limit  (cost=69759.69..69814.69 rows=22000 width=20)  
   ->  Sort  (cost=69759.69..70011.02 rows=100533 width=20)  
         Sort Key: id  
         ->  Bitmap Heap Scan on tbl  (cost=1457.82..62005.97 rows=100533 width=20)  
               Recheck Cond: ((c1 = 200) AND (c2 = 200))  
               ->  Bitmap Index Scan on idx_tbl_2  (cost=0.00..1432.69 rows=100533 width=0)  
                     Index Cond: ((c1 = 200) AND (c2 = 200))  
(7 rows)  
           
postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 21000;  
                                      QUERY PLAN                                        
--------------------------------------------------------------------------------------  
 Limit  (cost=0.43..67521.75 rows=21000 width=20)  
   ->  Index Scan using idx_tbl_1 on tbl  (cost=0.43..323244.26 rows=100533 width=20)  
         Filter: ((c1 = 200) AND (c2 = 200))  
(3 rows)  
           

https://github.com/digoal/blog/blob/master/201807/20180712_01.md#%E7%9C%9F%E5%AE%9E%E7%9A%84%E6%89%A7%E8%A1%8C%E8%80%97%E6%97%B6 真實的執行耗時

很顯然,使用id掃描,一定會慢,因為滿足條件的資料都分布在1000萬行後面。

1、c1,c2索引掃描,直接命中資料,加排序(100萬條),略快。

postgres=# explain  analyze select * from tbl where c1=200 and c2=200 order by id limit 22000;  
                                                                  QUERY PLAN                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=69759.69..69814.69 rows=22000 width=20) (actual time=293.961..299.054 rows=22000 loops=1)  
   ->  Sort  (cost=69759.69..70011.02 rows=100533 width=20) (actual time=293.960..296.006 rows=22000 loops=1)  
         Sort Key: id  
         Sort Method: top-N heapsort  Memory: 3255kB  
         ->  Bitmap Heap Scan on tbl  (cost=1457.82..62005.97 rows=100533 width=20) (actual time=47.919..175.698 rows=1000000 loops=1)  
               Recheck Cond: ((c1 = 200) AND (c2 = 200))  
               Heap Blocks: exact=6370  
               ->  Bitmap Index Scan on idx_tbl_2  (cost=0.00..1432.69 rows=100533 width=0) (actual time=47.160..47.160 rows=1000000 loops=1)  
                     Index Cond: ((c1 = 200) AND (c2 = 200))  
 Planning time: 0.152 ms  
 Execution time: 300.664 ms  
(11 rows)  
           

2、id 索引掃描,慢。

postgres=# explain  analyze select * from tbl where c1=200 and c2=200 order by id limit 21000;  
                                                                QUERY PLAN                                                                  
------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.43..67521.75 rows=21000 width=20) (actual time=1404.932..1412.594 rows=21000 loops=1)  
   ->  Index Scan using idx_tbl_1 on tbl  (cost=0.43..323244.26 rows=100533 width=20) (actual time=1404.930..1409.639 rows=21000 loops=1)  
         Filter: ((c1 = 200) AND (c2 = 200))  
         Rows Removed by Filter: 10000000  
 Planning time: 0.139 ms  
 Execution time: 1414.142 ms  
(6 rows)  
           

3、limit 10同樣,id 索引掃描,慢。

postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2=200 order by id limit 10;  
                                                                  QUERY PLAN                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.43..32.59 rows=10 width=20) (actual time=1403.861..1403.865 rows=10 loops=1)  
   Output: id, c1, c2, c3, c4  
   Buffers: shared hit=91020  
   ->  Index Scan using idx_tbl_1 on public.tbl  (cost=0.43..323244.26 rows=100533 width=20) (actual time=1403.859..1403.861 rows=10 loops=1)  
         Output: id, c1, c2, c3, c4  
         Filter: ((tbl.c1 = 200) AND (tbl.c2 = 200))  
         Rows Removed by Filter: 10000000  
         Buffers: shared hit=91020  
 Planning time: 0.127 ms  
 Execution time: 1403.893 ms  
(10 rows)  
           

https://github.com/digoal/blog/blob/master/201807/20180712_01.md#%E4%BC%98%E5%8C%96%E6%96%B9%E6%B3%95 優化方法

1、使用HINT或者改SQL,強制不走ID掃描。

并不适用于所有場景,比如資料分布均勻時,那麼PG的這周成本計算方法就對口,那麼什麼時候使用ID,什麼時候使用C1,C2索引掃描就合乎常理了。

postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2=200 order by id+0 limit 10;  
                                                                  QUERY PLAN                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=64429.79..64429.81 rows=10 width=24) (actual time=409.622..409.626 rows=10 loops=1)  
   Output: id, c1, c2, c3, c4, ((id + 0))  
   Buffers: shared hit=10205  
   ->  Sort  (cost=64429.79..64681.12 rows=100533 width=24) (actual time=409.620..409.621 rows=10 loops=1)  
         Output: id, c1, c2, c3, c4, ((id + 0))  
         Sort Key: ((tbl.id + 0))  
         Sort Method: top-N heapsort  Memory: 25kB  
         Buffers: shared hit=10205  
         ->  Bitmap Heap Scan on public.tbl  (cost=1457.82..62257.30 rows=100533 width=24) (actual time=47.347..237.455 rows=1000000 loops=1)  
               Output: id, c1, c2, c3, c4, (id + 0)  
               Recheck Cond: ((tbl.c1 = 200) AND (tbl.c2 = 200))  
               Heap Blocks: exact=6370  
               Buffers: shared hit=10205  
               ->  Bitmap Index Scan on idx_tbl_2  (cost=0.00..1432.69 rows=100533 width=0) (actual time=46.577..46.577 rows=1000000 loops=1)  
                     Index Cond: ((tbl.c1 = 200) AND (tbl.c2 = 200))  
                     Buffers: shared hit=3835  
 Planning time: 0.133 ms  
 Execution time: 409.670 ms  
(18 rows)  
           

2、如果c1,c2是等值查詢的話,可以用以下索引,那麼效率是最高的。

postgres=# create index idx_tbl_3 on tbl(c1,c2,id);  
CREATE INDEX  
  
  
postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2 =200 order by id limit 10;  
                                                              QUERY PLAN                                                                 
---------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.56..6.93 rows=10 width=20) (actual time=0.102..0.106 rows=10 loops=1)  
   Output: id, c1, c2, c3, c4  
   Buffers: shared hit=1 read=4  
   I/O Timings: read=0.047  
   ->  Index Scan using idx_tbl_3 on public.tbl  (cost=0.56..64086.79 rows=100533 width=20) (actual time=0.101..0.103 rows=10 loops=1)  
         Output: id, c1, c2, c3, c4  
         Index Cond: ((tbl.c1 = 200) AND (tbl.c2 = 200))  
         Buffers: shared hit=1 read=4  
         I/O Timings: read=0.047  
 Planning time: 0.142 ms  
 Execution time: 0.131 ms  
(11 rows)  
           

3、建議方法:

注意方法2 不适合非等值查詢,

postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2 between 100 and 300 order by id limit 10;  
                                                                  QUERY PLAN                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.43..35.32 rows=10 width=20) (actual time=1371.094..1371.099 rows=10 loops=1)  
   Output: id, c1, c2, c3, c4  
   Buffers: shared hit=91020  
   ->  Index Scan using idx_tbl_1 on public.tbl  (cost=0.43..350743.84 rows=100533 width=20) (actual time=1371.092..1371.095 rows=10 loops=1)  
         Output: id, c1, c2, c3, c4  
         Filter: ((tbl.c2 >= 100) AND (tbl.c2 <= 300) AND (tbl.c1 = 200))  
         Rows Removed by Filter: 10000000  
         Buffers: shared hit=91020  
 Planning time: 0.278 ms  
 Execution time: 1371.128 ms  
(10 rows)  
           

但是不用擔心,我們依舊可以使用其他等值查詢列,加上排序列組成複合索引,在INDEX SCAN中使用FILTER來加速。

新增如下索引,加速如下

postgres=# create index idx_tbl_4 on tbl(c1,id);  
CREATE INDEX  
  
postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2 between 100 and 300 order by id limit 10;  
                                                               QUERY PLAN                                                                 
----------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.43..10.47 rows=10 width=20) (actual time=0.105..0.110 rows=10 loops=1)  
   Output: id, c1, c2, c3, c4  
   Buffers: shared hit=1 read=3  
   I/O Timings: read=0.051  
   ->  Index Scan using idx_tbl_4 on public.tbl  (cost=0.43..100877.50 rows=100533 width=20) (actual time=0.104..0.107 rows=10 loops=1)  
         Output: id, c1, c2, c3, c4  
         Index Cond: (tbl.c1 = 200)  
         Filter: ((tbl.c2 >= 100) AND (tbl.c2 <= 300))  
         Buffers: shared hit=1 read=3  
         I/O Timings: read=0.051  
 Planning time: 0.172 ms  
 Execution time: 0.134 ms  
(12 rows)  
           

https://github.com/digoal/blog/blob/master/201807/20180712_01.md#%E9%99%84oracle 附Oracle

《Oracle migration to Greenplum - (含 Ora2pg)》 http://www.dba-oracle.com/t_OracleAutotrace.htm
SQL> create table tbl(id int, c1 int, c2 int, c3 int, c4 int);

Table created.

SQL> insert into tbl select rownum,trunc(dbms_random.value(0, 100)),trunc(dbms_random.value(0, 100)),trunc(dbms_random.value(0, 100)),trunc(dbms_random.value(0, 100)) from dual connect by level <=10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> insert into tbl select rownum+10000000, 200,200,200,200 from dual connect by level <=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index idx_tbl_1 on tbl(id);

Index created.

SQL> create index idx_tbl_2 on tbl(c1,c2,c3,c4);

Index created.

SQL> set linesize 512
SQL> set pagesize 50000

SQL> set autotrace on;

SQL> exec DBMS_STATS.GATHER_TABLE_STATS('JIUDU','TBL'); 

PL/SQL procedure successfully completed.

SQL> select * from (select * from tbl where c1=200 and c2=200 order by id) t where rownum<10;

        ID         C1         C2         C3         C4
---------- ---------- ---------- ---------- ----------
  10000001        200        200        200        200
  10000002        200        200        200        200
  10000003        200        200        200        200
  10000004        200        200        200        200
  10000005        200        200        200        200
  10000006        200        200        200        200
  10000007        200        200        200        200
  10000008        200        200        200        200
  10000009        200        200        200        200

9 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 745043579

----------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |     9 |   585 |       | 10253   (2)| 00:02:04 |
|*  1 |  COUNT STOPKEY          |      |       |       |       |            |          |
|   2 |   VIEW                  |      | 84875 |  5387K|       | 10253   (2)| 00:02:04 |
|*  3 |    SORT ORDER BY STOPKEY|      | 84875 |  1491K|  2672K| 10253   (2)| 00:02:04 |
|*  4 |     TABLE ACCESS FULL   | TBL  | 84875 |  1491K|       |  9767   (2)| 00:01:58 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<10)
   3 - filter(ROWNUM<10)
   4 - filter("C1"=200 AND "C2"=200)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      34868  consistent gets
          0  physical reads
          0  redo size
        937  bytes sent via SQL*Net to client
        500  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          9  rows processed

SQL> select * from (select * from tbl where c1=1 and c2=1 order by id) t where rownum<10;

        ID         C1         C2         C3         C4
---------- ---------- ---------- ---------- ----------
      9697          1          1         78         39
     20586          1          1         81         71
     27820          1          1         33         64
     44324          1          1         26         27
     47079          1          1          3          5
     64669          1          1         13         49
     73715          1          1         20         74
     80903          1          1         96         25
     98368          1          1         59          9

9 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 447312937

--------------------------------------------------------------------------------------------
| Id  | Operation                      | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |           |     9 |   585 |   641   (1)| 00:00:08 |
|*  1 |  COUNT STOPKEY                 |           |       |       |            |          |
|   2 |   VIEW                         |           |   704 | 45760 |   641   (1)| 00:00:08 |
|*  3 |    SORT ORDER BY STOPKEY       |           |   704 | 12672 |   641   (1)| 00:00:08 |
|   4 |     TABLE ACCESS BY INDEX ROWID| TBL       |   704 | 12672 |   640   (0)| 00:00:08 |
|*  5 |      INDEX RANGE SCAN          | IDX_TBL_2 |   704 |       |     5   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<10)
   3 - filter(ROWNUM<10)
   5 - access("C1"=1 AND "C2"=1)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1072  consistent gets
         11  physical reads
          0  redo size
        969  bytes sent via SQL*Net to client
        500  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          9  rows processed


SQL> create index idx_tbl_3 on tbl(c1,c2,id);
Index created.

SQL> select * from (select * from tbl where c1=200 and c2 between 100 and 300 order by id) t where rownum < 10;

        ID         C1         C2         C3         C4
---------- ---------- ---------- ---------- ----------
  10000001        200        200        200        200
  10000002        200        200        200        200
  10000003        200        200        200        200
  10000004        200        200        200        200
  10000005        200        200        200        200
  10000006        200        200        200        200
  10000007        200        200        200        200
  10000008        200        200        200        200
  10000009        200        200        200        200

9 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 745043579

----------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |     9 |   585 |       | 10253   (2)| 00:02:04 |
|*  1 |  COUNT STOPKEY          |      |       |       |       |            |          |
|   2 |   VIEW                  |      | 84875 |  5387K|       | 10253   (2)| 00:02:04 |
|*  3 |    SORT ORDER BY STOPKEY|      | 84875 |  1491K|  2672K| 10253   (2)| 00:02:04 |
|*  4 |     TABLE ACCESS FULL   | TBL  | 84875 |  1491K|       |  9767   (2)| 00:01:58 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<10)
   3 - filter(ROWNUM<10)
   4 - filter("C1"=200 AND "C2">=100 AND "C2"<=300)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      34868  consistent gets
          0  physical reads
          0  redo size
        937  bytes sent via SQL*Net to client
        500  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          9  rows processed

SQL> select * from (select * from tbl where c1=200 and c2 =200 order by id) t where rownum < 10;

        ID         C1         C2         C3         C4
---------- ---------- ---------- ---------- ----------
  10000001        200        200        200        200
  10000002        200        200        200        200
  10000003        200        200        200        200
  10000004        200        200        200        200
  10000005        200        200        200        200
  10000006        200        200        200        200
  10000007        200        200        200        200
  10000008        200        200        200        200
  10000009        200        200        200        200

9 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1825274432

-------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |     9 |   585 |    12   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |           |       |       |            |          |
|   2 |   VIEW                        |           |    10 |   650 |    12   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| TBL       |    10 |   180 |    12   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | IDX_TBL_3 |       |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<10)
   4 - access("C1"=200 AND "C2"=200)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          6  consistent gets
          2  physical reads
          0  redo size
        937  bytes sent via SQL*Net to client
        500  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          9  rows processed

           

同時也發現一個問題,Oracle可能無法使用index filter來優化,例如将index2,index3删除後,留下ID索引,Oracle無法走索引,而PG可以。

SQL> select * from (select * from tbl where c1=1 and c2 =1 order by id) t where rownum < 10;

        ID         C1         C2         C3         C4
---------- ---------- ---------- ---------- ----------
      9697          1          1         78         39
     20586          1          1         81         71
     27820          1          1         33         64
     44324          1          1         26         27
     47079          1          1          3          5
     64669          1          1         13         49
     73715          1          1         20         74
     80903          1          1         96         25
     98368          1          1         59          9

9 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 745043579

--------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |     9 |   585 |  9766   (2)| 00:01:58 |
|*  1 |  COUNT STOPKEY          |      |       |       |            |          |
|   2 |   VIEW                  |      |   704 | 45760 |  9766   (2)| 00:01:58 |
|*  3 |    SORT ORDER BY STOPKEY|      |   704 | 12672 |  9766   (2)| 00:01:58 |
|*  4 |     TABLE ACCESS FULL   | TBL  |   704 | 12672 |  9765   (2)| 00:01:58 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<10)
   3 - filter(ROWNUM<10)
   4 - filter("C1"=1 AND "C2"=1)


Statistics
----------------------------------------------------------
        186  recursive calls
          0  db block gets
      34893  consistent gets
          0  physical reads
          0  redo size
        969  bytes sent via SQL*Net to client
        500  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          7  sorts (memory)
          0  sorts (disk)
          9  rows processed
  

PG


postgres=# explain analyze select * from tbl where c1=1 and c2 =1 order by id limit 10;
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..3703.11 rows=10 width=20) (actual time=7.199..23.926 rows=10 loops=1)
   ->  Index Scan using idx_tbl_1 on tbl  (cost=0.43..323243.84 rows=873 width=20) (actual time=7.198..23.921 rows=10 loops=1)
         Filter: ((c1 = 1) AND (c2 = 1))
         Rows Removed by Filter: 142814
 Planning time: 0.119 ms
 Execution time: 23.950 ms
(6 rows)
           

https://github.com/digoal/blog/blob/master/201807/20180712_01.md#%E5%B0%8F%E7%BB%93 小結

當SQL查詢中包括排序,以及其他字段的過濾條件,并使用LIMIT快速傳回少量資料時,如果滿足條件的資料分布在排序鍵的末端,那麼優化器給出的執行計劃可能是不好的,導緻通過排序索引掃描更多的資料後才能命中需要的記錄。

然而,資料庫目前使用的評估走排序鍵時,LIMIT需要掃描多少條記錄,使用了資料均勻分布的假設,是以在資料(滿足條件的資料與排序鍵本身的相關性不均勻)分布不均勻時,導緻成本估算不準(oracle幹脆走全表掃描)。

建議優化方法:

增加索引,建立等值查詢條件列(s)加排序列(s)組成的複合索引,降低掃描量。

select * from tbl where c1=200 and c2 between 100 and 300 order by id limit 10;  
  
增加索引  
  
(c1,id)  -- 索引掃描, filter c2  
  
已有  
(c1,c2)  -- 索引掃描, sort id  
(id)     -- 索引掃描, filter c1,c2  
           
select * from tbl where c1=200 and c2 =200 order by id limit 10;  
  
增加索引  
  
(c1,c2,id)  -- 索引掃描  
  
已有  
(c1,c2)  -- 索引掃描, sort id  
(id)     -- 索引掃描, filter c1,c2  
           

https://github.com/digoal/blog/blob/master/201807/20180712_01.md#%E5%8F%82%E8%80%83 參考

《PostgreSQL 10 黑科技 - 自定義統計資訊》

繼續閱讀