關于PostgreSQL中的組合索引之一(b-tree篇)

有時候查詢中會帶有多個字段的查詢條件，但是其中任何單個字段的選擇率都不高，但是多個字段組合起來卻有比較好的選擇率。這種場景是bitmap索引大顯身手的地方，但是bitmap索引對更新性能的影響相當大，不适合OLTP場景。PG不支援bitmap索引，但是有一個臨時的記憶體中的類似bitmap索引的東西，叫“Bitmap Index Scan”。

除了Bitmap Index Scan，多字段索引也是一種選擇，但是兩種方法的性能有沒有差異呢？下面作個測試看看。

測試環境在一個PC的虛拟機上

主控端

CPU:AMD Athlon II X4 640 3.0GHz

MEM:6G

OS:Win7 64bit

虛拟機所在存儲:Apacer A S510S 128GB

虛拟機

CPU：4 core

MEM: 2G

OS:CentOS release 6.5 (Final)

PostgreSQL:9.3.4(shared_buffers = 128MB,其它都是預設值)

postgres=# create table tb1(c1 int,c2 int);

CREATE TABLE

postgres=# insert into tb1 select round(random()*100),round(random()*1000) from generate_series(1,10000000);

INSERT 0 10000000

postgres=# select pg_size_pretty(pg_table_size('tb1'));

pg_size_pretty

----------------

346 MB

(1 row)

postgres=# \timing

Timing is on.

postgres=# select count(*) from tb1 where c1=99 and c2=999;

count

-------

Time: 1376.393 ms

postgres=# create index tb1_idx1 on tb1(c1);

CREATE INDEX

Time: 65308.131 ms

postgres=# select pg_size_pretty(pg_relation_size('tb1_idx1'));

214 MB

Time: 2.659 ms

postgres=# explain (analyze,buffers) select count(*) from tb1 where c1=99 and c2=999;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=47675.20..47675.21 rows=1 width=0) (actual time=1267.921..1267.921 rows=1 loops=1)

Buffers: shared read=39978

-> Bitmap Heap Scan on tb1 (cost=1891.96..47674.95 rows=99 width=0) (actual time=40.726..1267.631 rows=92 loops=1)

Recheck Cond: (c1 = 99)

Rows Removed by Index Recheck: 7363157

Filter: (c2 = 999)

Rows Removed by Filter: 99794

Buffers: shared read=39978

-> Bitmap Index Scan on tb1_idx1 (cost=0.00..1891.93 rows=102333 width=0) (actual time=30.829..30.829 rows=99886 loops=1)

Index Cond: (c1 = 99)

Buffers: shared read=276

Total runtime: 1267.961 ms

(12 rows)

Time: 1268.916 ms

c1單索引的選擇率隻有1/100，對查詢性能提升毫無作用。

postgres=# drop index tb1_idx1 ;

DROP INDEX

Time: 41.062 ms

postgres=# create index tb1_idx2 on tb1(c2);

Time: 50313.915 ms

postgres=# select pg_size_pretty(pg_relation_size('tb1_idx2'));

Time: 0.915 ms

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=23558.69..23558.70 rows=1 width=0) (actual time=74.215..74.215 rows=1 loops=1)

Buffers: shared hit=1948 read=6954

-> Bitmap Heap Scan on tb1 (cost=180.85..23558.45 rows=99 width=0) (actual time=5.462..73.883 rows=92 loops=1)

Recheck Cond: (c2 = 999)

Filter: (c1 = 99)

Rows Removed by Filter: 9800

Buffers: shared hit=1948 read=6954

-> Bitmap Index Scan on tb1_idx2 (cost=0.00..180.82 rows=9652 width=0) (actual time=2.414..2.414 rows=9892 loops=1)

Index Cond: (c2 = 999)

Buffers: shared read=30

Total runtime: 74.298 ms

(11 rows)

Time: 76.732 ms

c1單索引的選擇率有1/1000，性能提升很明顯。

Time: 56792.281 ms

QUERY PLAN

------

Aggregate (cost=2456.74..2456.75 rows=1 width=0) (actual time=50.347..50.348 rows=1 loops=1)

Buffers: shared hit=6587 read=343

-> Bitmap Heap Scan on tb1 (cost=2073.06..2456.49 rows=99 width=0) (actual time=39.969..50.312 rows=92 loops=1)

Recheck Cond: ((c2 = 999) AND (c1 = 99))

Rows Removed by Index Recheck: 7291

Buffers: shared hit=6587 read=343

-> BitmapAnd (cost=2073.06..2073.06 rows=99 width=0) (actual time=37.864..37.864 rows=0 loops=1)

Buffers: shared hit=30 read=276

-> Bitmap Index Scan on tb1_idx2 (cost=0.00..180.82 rows=9652 width=0) (actual time=3.091..3.091 rows=9892 loops=1)

Index Cond: (c2 = 999)

Buffers: shared hit=30

-> Bitmap Index Scan on tb1_idx1 (cost=0.00..1891.93 rows=102333 width=0) (actual time=33.030..33.030 rows=99886 loo

ps=1)

Index Cond: (c1 = 99)

Buffers: shared read=276

Total runtime: 50.484 ms

(15 rows)

Time: 52.287 ms

c1+c2索引bitmap掃描性能進一步提升。（這個性能提升也可能僅僅是因為buffers的hit命中率提升）

postgres=# create index tb1_idx3 on tb1(c1,c2);

Time: 67824.333 ms

postgres=# select pg_size_pretty(pg_relation_size('tb1_idx3'));

Time: 0.835 ms

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=389.13..389.14 rows=1 width=0) (actual time=0.580..0.580 rows=1 loops=1)

Buffers: shared hit=92 read=3

-> Bitmap Heap Scan on tb1 (cost=5.45..388.89 rows=99 width=0) (actual time=0.298..0.566 rows=92 loops=1)

Recheck Cond: ((c1 = 99) AND (c2 = 999))

Buffers: shared hit=92 read=3

-> Bitmap Index Scan on tb1_idx3 (cost=0.00..5.42 rows=99 width=0) (actual time=0.279..0.279 rows=92 loops=1)

Index Cond: ((c1 = 99) AND (c2 = 999))

Buffers: shared read=3

Total runtime: 0.651 ms

(9 rows)

Time: 3.912 ms

由此可見，本例中，多字段索引的效率相當高。

http://postgres.cn/docs/9.3/indexes-multicolumn.html

------------------------------------------------------------------

一個多字段的 B-tree 索引可以用在包含索引字段子集的查詢條件裡，不過，如果在前導字段(最左邊)上有限制條件，那麼效率最高。準确的規則是前導字段上的等于限制，加上第一個沒有等于限制的非等于限制字段，将用于限制所掃描的索引範圍。将檢查這兩個字段右邊字段上的索引以減少對表的通路，但是并不減少需要掃描的索引。比如，假如我們有一個在(a, b, c)上的索引，查詢條件是WHERE a = 5 AND b >= 42 AND c = 77的索引條目将被忽略，但是他們仍然會被掃描。這個索引原則上仍然會被用于那些在b和/或c上有限制，但是在a上沒有限制的查詢，但是就必須掃描整個索引了。是以，在大多數這種情況下，優化器會選擇順序掃描表，而不使用索引。

b-tree多字段索引的原理很簡單，就是把多個字段，按定義索引時的先後順序排序，是以越靠前的字段越重要。

下面是b-tree多字段索引條目内容的一個例子。

postgres=# \d tb1

Table "public.tb1"

Column | Type | Modifiers

--------+---------+-----------

id | integer |

name | text |

Indexes:

"tb1_idx2" btree (id, name)

postgres=# insert into tb1 values(1,'aaaa');

INSERT 0 1

postgres=# create extension pageinspect;

CREATE EXTENSION

postgres=# SELECT * FROM bt_page_items('tb1_idx2', 1);

------------+-------+---------+-------+------+-------------------------------------------------

1 | (0,5) | 24 | f | t | 01 00 00 00 0b 61 61 61 61 00 00 00 00 00 00 00

postgres=# select 'a'::bytea;

bytea

\x61

關于PostgreSQL中的組合索引之一(b-tree篇)

繼續閱讀

Apache (You don't have permission to access / on this server.）

debian9更新4.9.0核心到4.19.2核心過程

centOS7 配置 vsftpd 虛拟使用者及權限Vsftpd配置虛拟使用者及權限

linux-svn解除安裝與安裝

vsftp虛拟多使用者多權限一鍵部署腳本

Ubuntu14.04 LTS下安裝mongodb

httpd服務的部署、啟動、配置和簡單優化一、部署二、啟動三、配置檔案

配置網頁内容通路

手動安裝Intel network I217-LM網卡的Linux驅動

禁止ubuntu系統彈出報錯界面

Ubuntu Linux下Apache的配置檔案

samba伺服器的功能

【Linux】UDP廣播封包接收速率問題

Linux裝置模型（中）之上層容器

PowerPC平台 Linux移植三