Postgres支援B-tree, hash, GiST, and GIN，也支援使用者通過Gist自定義索引方法，比如時空資料庫的R-Tree索引。

為了支援索引架構，在建立索引時會查找和操作一系列Catalog中繼資料，另外為了加速B-Tree索引的建構，會先對待建立索引的資料進行排序，然後再按照B-Tree的頁面格式直接寫B-Tree的page，避免page的split。

例子

create table t001(id int, i int, j int, k int, msg text);
create table t002(id int, i int, j int, k int, msg text);
insert into t001 select i, i+1, i+2, i+3, md5(random()::text) from generate_series(1,1000) as i;
insert into t002 select i, i+1, i+2, i+3, md5(random()::text) from generate_series(1,1000) as i;
create index t001_i_j on t001(i, j);

建立B-Tree索引分成2個部分:

catalog系統中生成新索引的相關中繼資料（索引檔案也是一個表，是以相關的中繼資料需要建立和關聯起來）；
對索引列進行排序并生成BTree的page；

校驗新索引的Catalog中繼資料

文法解析

create index t001_i_j on t001 using btree (i, j);

通過parse.y将建立索引的sql解析成IndexStmt結構，其中：

IndexStmt
{
    type = T_IndexStmt
    idxname = "t001_i_j"
    accessMethod = "btree"
}

校驗B-Tree的handler

using btree執行了索引的類型是btree，是以需要校驗核心是否支援該類型的索引。

pg_am

tuple = SearchSysCache1(AMNAME, PointerGetDatum("btree"));

在pg_am中查找"btree"對應的handler

postgres=# select oid,* from pg_am;
 oid  | amname |  amhandler  | amtype
------+--------+-------------+--------
  403 | btree  | bthandler   | i
  405 | hash   | hashhandler | i
  783 | gist   | gisthandler | i
 2742 | gin    | ginhandler  | i
 4000 | spgist | spghandler  | i
 3580 | brin   | brinhandler | i
(6 行記錄)

BTree的handler是bthandler，是一個regproc類型，對應pg_proc一個函數。對于内置的BTree索引，相關的函數是在bootstrap過程中插入(pg_proc.dat檔案）。

pg_proc

pg_proc.dat檔案（下面代碼會被重新格式化成postgres.bki檔案，并且在inintdb時被bootstrap.y解析插入）：

# Index access method handlers
{ oid => '330', descr => 'btree index access method handler',
  proname => 'bthandler', provolatile => 'v', prorettype => 'index_am_handler',
  proargtypes => 'internal', prosrc => 'bthandler' },

postgres=# select oid,* from pg_proc where proname='bthandler';
-[ RECORD 1 ]---+----------
oid             | 330
proname         | bthandler
pronamespace    | 11
proowner        | 10
prolang         | 12
procost         | 1
prorows         | 0
provariadic     | 0
protransform    | -
prokind         | f
prosecdef       | f
proleakproof    | f
proisstrict     | t
proretset       | f
provolatile     | v
proparallel     | s
pronargs        | 1
pronargdefaults | 0
prorettype      | 325
proargtypes     | 2281
proallargtypes  |
proargmodes     |
proargnames     |
proargdefaults  |
protrftypes     |
prosrc          | bthandler
probin          |
proconfig       |
proacl          |

校驗索引列及比較函數

查找pg_attribute，校驗create index中指定的索引列是否存在，如果存在記錄attno，并且根據atttypid查找對應的比較函數；

校驗pg_attribute

postgres=# select a.*  from pg_class c, pg_attribute a  where c.relname='t001' and c.oid = a.attrelid;
-[ RECORD 8 ]-+---------
attrelid      | 16384
attname       | i
atttypid      | 23
attstattarget | -1
attlen        | 4
attnum        | 2
attndims      | 0
attcacheoff   | -1
atttypmod     | -1
attbyval      | t
attstorage    | p
attalign      | i
attnotnull    | f
atthasdef     | f
atthasmissing | f
attidentity   |
attisdropped  | f
attislocal    | t
attinhcount   | 0
attcollation  | 0
attacl        |
attoptions    |
attfdwoptions |
attmissingval |
-[ RECORD 9 ]-+---------
attrelid      | 16384
attname       | j
atttypid      | 23
attstattarget | -1
attlen        | 4
attnum        | 3
attndims      | 0
attcacheoff   | -1
atttypmod     | -1
attbyval      | t
attstorage    | p
attalign      | i
attnotnull    | f
atthasdef     | f
atthasmissing | f
attidentity   |
attisdropped  | f
attislocal    | t
attinhcount   | 0
attcollation  | 0
attacl        |
attoptions    |
attfdwoptions |
attmissingval |

查找比較函數

其中403是pg_am中btree的handler的oid

postgres=# select * from pg_opclass where  opcmethod = 403 and opcintype  = 23;
-[ RECORD 1 ]+---------
opcmethod    | 403
opcname      | int4_ops
opcnamespace | 11
opcowner     | 10
opcfamily    | 1976
opcintype    | 23
opcdefault   | t
opckeytype   | 0

在經過上述中繼資料校驗之後，可以開始為新的索引檔案生成相關中繼資料了。

建立新索引的中繼資料

在檔案系統中建立索引檔案

生成Oid

DefineIndex -> index_create -> GetNewRelFileNode

為新的索引檔案生成唯一oid，過程是：生成一個新的oid，然後查找pg_class的索引，如果不存在就傳回這個oid。

跟新本地的relcache

把正在建立的relation添加到relcache系統中

DefineIndex -> index_create -> heap_create -> RelationBuildLocalRelation

建立檔案

DefineIndex -> index_create -> heap_create -> RelationCreateStorage

檔案系統中生成新的檔案；

路徑：
 "file-dio:///home/postgres/tmp_datadir_polardb_pg_1100_bld/base/13881/16391"

生成wal日志，類型是

XLogInsert(RM_SMGR_ID, XLOG_SMGR_CREATE | XLR_SPECIAL_REL_UPDATE)

建立新索引的Catalog中繼資料

pg_class

新的索引檔案也是一個表檔案，是以需要插入到pg_class中：

DefineIndex -> index_create -> InsertPgClassTuple

postgres=# select * from pg_class where relname = 't001_i_j';
-[ RECORD 1 ]-------+---------
relname             | t001_i_j
relnamespace        | 2200
reltype             | 0
reloftype           | 0
relowner            | 10
relam               | 403
relfilenode         | 16396
reltablespace       | 0
relpages            | 6
reltuples           | 1100
relallvisible       | 0
reltoastrelid       | 0
relhasindex         | f
relisshared         | f
relpersistence      | p
relkind             | i
relnatts            | 2
relchecks           | 0
relhasoids          | f
relhasrules         | f
relhastriggers      | f
relhassubclass      | f
relrowsecurity      | f
relforcerowsecurity | f
relispopulated      | t
relreplident        | n
relispartition      | f
relrewrite          | 0
relfrozenxid        | 0
relminmxid          | 0
relacl              |
reloptions          |

pg_attribute

把索引檔案引用的列插入pg_attr中：

DefineIndex -> index_create -> AppendAttributeTuples

postgres=# select a.*  from pg_class c, pg_attribute a  where c.relname='t001_i_j' and c.oid = a.attrelid;
-[ RECORD 1 ]-+------
attrelid      | 16396
attname       | i
atttypid      | 23
attstattarget | -1
attlen        | 4
attnum        | 1
attndims      | 0
attcacheoff   | -1
atttypmod     | -1
attbyval      | t
attstorage    | p
attalign      | i
attnotnull    | f
atthasdef     | f
atthasmissing | f
attidentity   |
attisdropped  | f
attislocal    | t
attinhcount   | 0
attcollation  | 0
attacl        |
attoptions    |
attfdwoptions |
attmissingval |
-[ RECORD 2 ]-+------
attrelid      | 16396
attname       | j
atttypid      | 23
attstattarget | -1
attlen        | 4
attnum        | 2
attndims      | 0
attcacheoff   | -1
atttypmod     | -1
attbyval      | t
attstorage    | p
attalign      | i
attnotnull    | f
atthasdef     | f
atthasmissing | f
attidentity   |
attisdropped  | f
attislocal    | t
attinhcount   | 0
attcollation  | 0
attacl        |
attoptions    |
attfdwoptions |
attmissingval |
postgres=#

pg_index

把索引本身相關資訊插入pg_index中，如果索引包含了表達式，則把表達式通過nodeToString序列化成字元串：

DefineIndex -> index_create -> UpdateIndexRelation

postgres=# select * from pg_indexes where indexname = 't001_i_j';
-[ RECORD 1 ]-------------------------------------------------------
schemaname | public
tablename  | t001
indexname  | t001_i_j
tablespace |
indexdef   | CREATE INDEX t001_i_j ON public.t001 USING btree (i, j)
postgres=#

relcache生效

為了使catalog中繼資料的變更對所有程序生效，把該heap相關的中繼資料invalid掉，此處僅僅是注冊失效函數，在CommandCounterIncrement，開啟執行下一個command時才真正執行invalid。

DefineIndex -> index_create -> CacheInvalidateRelcache

pg_depend

記錄該索引對heap的依賴，對collations的依賴，對opclass的依賴，

比如：
t001_i_j依賴表 t001的i；
t001_i_j依賴表 t001的j；

postgres=# select * from pg_depend where objid = 16396;
-[ RECORD 1 ]------
classid     | 1259
objid       | 16396
objsubid    | 0
refclassid  | 1259
refobjid    | 16384
refobjsubid | 2
deptype     | a
-[ RECORD 2 ]------
classid     | 1259
objid       | 16396
objsubid    | 0
refclassid  | 1259
refobjid    | 16384
refobjsubid | 3
deptype     | a

CommandCounterIncrement

使得新索引檔案相關的relcache生效

建構B-Tree索引

create index時通過"btree" ，找到bthandler函數，找到btbuild

DefineIndex -> index_create -> index_build -> btbuild

排序

建構sortkey

通過index的索引列建構排序時需要用到的sortkey：

btbuild -> _bt_spools_heapscan -> tuplesort_begin_index_btree

掃描heap表

在非concurrent模式下create index時，為了掃描所有的tpule，snapshot使用SnapshotAny，

btbuild -> _bt_spools_heapscan -> IndexBuildHeapScan -> IndexBuildHeapRangeScan -> heap_getnext

自下向上建構B-Tree索引page

逐個讀取排好序的tuple，填充到B-Tree的葉子節點上，自下向上插入B-Tree，插入葉子節點可能會遞歸的觸發起父節點也插入。

page layout

索引頁面的記憶體布局整體上是遵循堆表的布局：pageheader，行指針數組構成。

不同點是：

1. 頁面的尾巴上多了BTree自定義的BTPageOpaqueData結構，和左右鄰居頁面組織成連結清單；

2. 每個行指針指向的内容由IndexTupleData和key組成；

PostgreSQL 建立B-Tree索引的過程例子校驗新索引的Catalog中繼資料建立新索引的中繼資料建構B-Tree索引

填充率

向剛剛建立出來的索引插入新的資料時，為了避免split，在第一次建構索引時，每個頁面上都預留了一些空間：

1. 葉子節點90%；

2. 中間節點70%；

自下向上建構

填充葉子page：leaf-0;
當leaf-0滿了，leaf-0落盤前需要把它的鄰居和父節點相關指針更新；
配置設定一個paret0，leaf-0插入parent0中；
配置設定一個新的頁面leaf-1，leaf-0的右指針指向leaf-1；
leaft-0可以落盤；
leaft-1也滿了後，過程和2到6一樣；
當leaf-2滿了後，要把leaf-2插入父節點，此時父節點也滿了，是以要遞歸的把父節點落盤，過程和2到6一樣；

leaf-0

leaf-0滿了

leaf-1滿了

leaf-2滿了，先遞歸處理parent0滿的情況

leaf-2滿了，再處理leaf-2自身滿的情況

indextuple和scankey的語義

對于葉子節點，indextuple指向heap表的行指針；
對于中間節點，indextuple指向大于scankey的頁面，是一個左閉右開區間[scankey, )；

索引頁面中真正存放的是indextuple和scankey的組合，在page這個層面中把這兩個當做一個item，長度是兩者之和；

在實際查找時，同時linp定位到indextuple，由于indextuple是定長的，隻需要再往前移動sizeof(indextuple)就能找到scankey的datum和null區域；

根據index的attribute desc來解析datum和null；

key space

對于-Tree中間節點的任意一層，它的所有的indextuple和scankey并集在一起是[負無窮，正無窮]。

由于一對indextuple和scankey組合表達的是左閉右開區間[scankey, )：

1. 對于最右側的scankey (k4)，它的語義本身就是[scankey，正無窮)；

2. 對于最左側的scankey (k1)，這個key本身代表的是負無窮的語義：

使用該層出現的第一個key内容(記錄在pagesate->minkey)；

indextuple中隻記錄blkno，使用offset來記錄attr為0；

不存儲scankey的datum和null；

if (last_off == P_HIKEY)
    {
        Assert(state->btps_minkey == NULL);
        state->btps_minkey = CopyIndexTuple(itup);
        BTreeTupleSetNAtts(state->btps_minkey, 0);
    }
    
    ...
    
    if (!P_ISLEAF(opaque) && itup_off == P_FIRSTKEY)
    {
        trunctuple = *itup;
        trunctuple.t_info = sizeof(IndexTupleData);
        BTreeTupleSetNAtts(&trunctuple, 0);
        itup = &trunctuple;
        itemsize = sizeof(IndexTupleData); //大小僅僅是indextuple，去除了datum和null
    }

high key和first key

每個非最右側頁面都有highkey，highkey的語義是改頁面所有key的上界（不包含），是以最右側頁面沒有highkey，它的上界是正無窮。

頁面1的highkey是在頁面1滿了之後，生成頁面2時确定下來的：

1. 把頁面1的highkey内容拷貝到頁面2，做為頁面2的FIRSTKEY；

2. 設定頁面1的P_HIHEY指向最後一個行指針last_off，并且标記last_off為無效指針；

是以，頁面1的highkey是頁面2的FIRSTKEY。

未滿頁面的落盤

當所有的heap表都被插入了之後，還沒有達到滿的頁面需要落盤，在自下向上的過程中，插入父節點時是一個遞歸的過程，為了維護每次遞歸的狀态資訊，每層的頁面用一個BTPageState結構來描述：

記錄level；
記錄目前層的最右側的page；
記錄minkey；

BTPageState結構是一個單向連結清單，是以在插入了所有heap表之後，隻需要周遊這個連結清單，從葉子節點開始落盤。

注意：過程中會往往父節點插入(indextuple,scankey)，可能會觸發父節點滿了而落盤，因而産生一個新的父節點，所屬層的BTPageState會被更新，沿着BTPageState連結清單會最終把這個新産生的頁面也落盤了。

metapage

metapage記錄B-Tree：level，root，fastroot等資訊。

metapge始終在索引檔案的第0個page，root的位置是不固定的。

PostgreSQL 建立B-Tree索引的過程例子校驗新索引的Catalog中繼資料建立新索引的中繼資料建構B-Tree索引

例子