天天看點

PostgreSQL Oracle 相容性之 - INDEX SKIP SCAN (遞歸查詢變态優化) 非驅動列索引掃描優化

标簽

PostgreSQL , Oracle , index skip scan , 非驅動列條件 , 遞歸查詢 , 子樹

https://github.com/digoal/blog/blob/master/201803/20180323_03.md#%E8%83%8C%E6%99%AF 背景

對于輸入條件在複合索引中為非驅動列的,如何高效的利用索引掃描?

在Oracle中可以使用index skip scan來實作這類CASE的高效掃描:

INDEX跳躍掃描一般用在WHERE條件裡面沒有使用到引導列,但是用到了引導列以外的其他列,并且引導列的DISTINCT值較少的情況。

在這種情況下,資料庫把這個複合索引邏輯上拆散為多個子索引,依次搜尋子索引中非引導列的WHERE條件裡面的值。

使用方法如下:

/*+ INDEX_SS ( [ @ qb_name ] tablespec [ indexspec [ indexspec ]... ] ) */           

The INDEX_SS hint instructs the optimizer to perform an index skip scan for the specified table. If the statement uses an index range scan, then Oracle scans the index entries in ascending order of their indexed values. In a partitioned index, the results are in ascending order within each partition.Each parameter serves the same purpose as in "INDEX Hint". For example:

SELECT /*+ INDEX_SS(e emp_name_ix) */ last_name FROM employees e WHERE first_name = 'Steven';           

下面是來自ORACLE PERFORMANCE TUNING裡的原文:

Index skip scans improve index scans by nonprefix columns. Often, scanning index blocks is faster than scanning table data blocks.

Skip scanning lets a composite index be split logically into smaller subindexes. In skip scanning, the initial column of the composite index is not specified in the query. In other words, it is skipped.

The number of logical subindexes is determined by the number of distinct values in the initial column. Skip scanning is advantageous if there are few distinct values in the leading column of the composite index and many distinct values in the nonleading key of the index.

Example 13-5 Index Skip Scan

Consider, for example, a table

employees(       sex,        employee_id,       address       )           

with a composite index on

(sex, employee_id).           

Splitting this composite index would result in two logical subindexes, one for M and one for F.

For this example, suppose you have the following index data:

('F',98)('F',100)('F',102)('F',104)('M',101)('M',103)('M',105)           

The index is split logically into the following two subindexes:

The first subindex has the keys with the value F.

The second subindex has the keys with the value M

PostgreSQL Oracle 相容性之 - INDEX SKIP SCAN (遞歸查詢變态優化) 非驅動列索引掃描優化

The column sex is skipped in the following query:

SELECT * FROM employeesWHERE employee_id = 101;           

A complete scan of the index is not performed, but the subindex with the value F is searched first, followed by a search of the subindex with the value M.

https://github.com/digoal/blog/blob/master/201803/20180323_03.md#postgresql-%E9%9D%9Eskip-scan PostgreSQL 非skip scan

PostgreSQL支援非驅動列的索引掃描,但是需要掃描整個索引。

例子

1、建立測試表

postgres=# create table t(id int, c1 int);       CREATE TABLE           

2、寫入1000萬測試資料

postgres=# insert into t select random()*1 , id from generate_series(1,10000000) id;       INSERT 0 10000000           

3、建立多列索引

postgres=# create index idx_t on t(id,c1);       CREATE INDEX           

4、非驅動列查詢測試如下

https://github.com/digoal/blog/blob/master/201803/20180323_03.md#index-only-scan index only scan

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where c1=1;                                                                       QUERY PLAN                                                                        -------------------------------------------------------------------------------------------------------------------------------------------        Index Only Scan using idx_t on public.t  (cost=10000000000.43..10000105164.89 rows=1 width=8) (actual time=0.043..152.288 rows=1 loops=1)          Output: id, c1          Index Cond: (t.c1 = 1)          Heap Fetches: 0          Buffers: shared hit=27326        Execution time: 152.328 ms       (6 rows)           

https://github.com/digoal/blog/blob/master/201803/20180323_03.md#index-scan index scan

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where c1=1;                                                             QUERY PLAN                                                              -----------------------------------------------------------------------------------------------------------------------        Index Scan using idx_t on public.t  (cost=0.43..105165.99 rows=1 width=8) (actual time=0.022..151.845 rows=1 loops=1)          Output: id, c1          Index Cond: (t.c1 = 1)          Buffers: shared hit=27326        Execution time: 151.881 ms       (5 rows)           

https://github.com/digoal/blog/blob/master/201803/20180323_03.md#bitmap-scan bitmap scan

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where c1=1;                                                              QUERY PLAN                                                              ------------------------------------------------------------------------------------------------------------------------        Bitmap Heap Scan on public.t  (cost=105164.88..105166.00 rows=1 width=8) (actual time=151.731..151.732 rows=1 loops=1)          Output: id, c1          Recheck Cond: (t.c1 = 1)          Heap Blocks: exact=1          Buffers: shared hit=27326          ->  Bitmap Index Scan on idx_t  (cost=0.00..105164.88 rows=1 width=0) (actual time=151.721..151.721 rows=1 loops=1)                Index Cond: (t.c1 = 1)                Buffers: shared hit=27325        Execution time: 151.777 ms       (9 rows)           

https://github.com/digoal/blog/blob/master/201803/20180323_03.md#seq-scan%E5%85%A8%E8%A1%A8%E6%89%AB%E6%8F%8F seq scan(全表掃描)

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where c1=1;                                                      QUERY PLAN                                                       ---------------------------------------------------------------------------------------------------------        Seq Scan on public.t  (cost=0.00..169248.41 rows=1 width=8) (actual time=0.014..594.535 rows=1 loops=1)          Output: id, c1          Filter: (t.c1 = 1)          Rows Removed by Filter: 9999999          Buffers: shared hit=44248        Execution time: 594.568 ms       (6 rows)           

使用索引掃,因為不需要FILTER,同時掃描的BLOCK更少,是以性能比全表掃略好。但是還是掃了整個索引的PAGE,是以并不能算skip scan。

那麼如何讓PostgreSQL支援index skip scan呢?

https://github.com/digoal/blog/blob/master/201803/20180323_03.md#postgresql-skip-scan PostgreSQL skip scan

實際上原理和Oracle類似,可以輸入驅動列條件,然後按多個條件掃描,這樣就能達到SKIP SCAN的效果。(即多顆子樹掃描)。

同樣也更加适合于驅動列DISTINCT值較少的情況。

用PostgreSQL的遞歸查詢文法可以實作這樣的加速效果。這種方法也被用于擷取count(distinct), distinct值等。

《distinct xx和count(distinct xx)的變态遞歸優化方法 - 索引收斂(skip scan)掃描》

例如,我們通過這個方法,可以快速的得到驅動列的唯一值

with recursive skip as (           (             select min(t.id) as id from t where t.id is not null           )           union all           (             select (select min(t.id) as id from t where t.id > s.id and t.id is not null)                from skip s where s.id is not null           )  -- 這裡的where s.id is not null 一定要加,否則就死循環了.         )          select id from skip ;           

然後封裝到如下SQL,實作skip scan的效果

explain (analyze,verbose,timing,costs,buffers) select * from t where id in       (       with recursive skip as (           (             select min(t.id) as id from t where t.id is not null           )           union all           (             select (select min(t.id) as id from t where t.id > s.id and t.id is not null)                from skip s where s.id is not null           )  -- 這裡的where s.id is not null 一定要加,否則就死循環了.         )          select id from skip        ) and c1=1       union all        select * from t where id is null and c1=1;           

或者

explain (analyze,verbose,timing,costs,buffers) select * from t where id = any(array       (       with recursive skip as (           (             select min(t.id) as id from t where t.id is not null           )           union all           (             select (select min(t.id) as id from t where t.id > s.id and t.id is not null)                from skip s where s.id is not null           )  -- 這裡的where s.id is not null 一定要加,否則就死循環了.         )          select id from skip        )) and c1=1       union all        select * from t where id is null and c1=1;           

看執行計劃:

效果好多了

QUERY PLAN                                                                                               -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------        Append  (cost=55.00..215.22 rows=2 width=8) (actual time=0.127..0.138 rows=1 loops=1)          Buffers: shared hit=21          ->  Nested Loop  (cost=55.00..213.64 rows=1 width=8) (actual time=0.126..0.127 rows=1 loops=1)                Output: t.id, t.c1                Buffers: shared hit=18                ->  HashAggregate  (cost=54.57..55.58 rows=101 width=4) (actual time=0.108..0.109 rows=3 loops=1)                      Output: skip.id                      Group Key: skip.id                      Buffers: shared hit=11                      ->  CTE Scan on skip  (cost=51.29..53.31 rows=101 width=4) (actual time=0.052..0.102 rows=3 loops=1)                            Output: skip.id                            Buffers: shared hit=11                            CTE skip                              ->  Recursive Union  (cost=0.46..51.29 rows=101 width=4) (actual time=0.050..0.099 rows=3 loops=1)                                    Buffers: shared hit=11                                    ->  Result  (cost=0.46..0.47 rows=1 width=4) (actual time=0.049..0.049 rows=1 loops=1)                                          Output: $1                                          Buffers: shared hit=4                                          InitPlan 3 (returns $1)                                            ->  Limit  (cost=0.43..0.46 rows=1 width=4) (actual time=0.045..0.046 rows=1 loops=1)                                                  Output: t_3.id                                                  Buffers: shared hit=4                                                  ->  Index Only Scan using idx_t on public.t t_3  (cost=0.43..205165.21 rows=10000033 width=4) (actual time=0.045..0.045 rows=1 loops=1)                                                        Output: t_3.id                                                        Index Cond: (t_3.id IS NOT NULL)                                                        Heap Fetches: 0                                                        Buffers: shared hit=4                                    ->  WorkTable Scan on skip s  (cost=0.00..4.88 rows=10 width=4) (actual time=0.015..0.015 rows=1 loops=3)                                          Output: (SubPlan 2)                                          Filter: (s.id IS NOT NULL)                                          Rows Removed by Filter: 0                                          Buffers: shared hit=7                                          SubPlan 2                                            ->  Result  (cost=0.46..0.47 rows=1 width=4) (actual time=0.018..0.019 rows=1 loops=2)                                                  Output: $3                                                  Buffers: shared hit=7                                                  InitPlan 1 (returns $3)                                                    ->  Limit  (cost=0.43..0.46 rows=1 width=4) (actual time=0.018..0.018 rows=0 loops=2)                                                          Output: t_2.id                                                          Buffers: shared hit=7                                                          ->  Index Only Scan using idx_t on public.t t_2  (cost=0.43..76722.42 rows=3333344 width=4) (actual time=0.017..0.017 rows=0 loops=2)                                                                Output: t_2.id                                                                Index Cond: ((t_2.id > s.id) AND (t_2.id IS NOT NULL))                                                                Heap Fetches: 0                                                                Buffers: shared hit=7                ->  Index Only Scan using idx_t on public.t  (cost=0.43..1.56 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=3)                      Output: t.id, t.c1                      Index Cond: ((t.id = skip.id) AND (t.c1 = 1))                      Heap Fetches: 0                      Buffers: shared hit=7          ->  Index Only Scan using idx_t on public.t t_1  (cost=0.43..1.56 rows=1 width=8) (actual time=0.010..0.010 rows=0 loops=1)                Output: t_1.id, t_1.c1                Index Cond: ((t_1.id IS NULL) AND (t_1.c1 = 1))                Heap Fetches: 0                Buffers: shared hit=3        Execution time: 0.256 ms       (56 rows)           

從150多毫秒,降低到了0.256毫秒

https://github.com/digoal/blog/blob/master/201803/20180323_03.md#%E5%86%85%E6%A0%B8%E5%B1%82%E9%9D%A2%E4%BC%98%E5%8C%96 核心層面優化

與Oracle做法類似,或者說與遞歸的做法類似。

使用這種方法來改進優化器,可以達到index skip scan的效果,而且不用改寫SQL。

https://github.com/digoal/blog/blob/master/201803/20180323_03.md#%E5%8F%82%E8%80%83 參考

繼續閱讀