天天看點

PostgreSQL 操作符與優化器詳解

postgresql 支援自定義操作符,本質上是調用函數來實作的。

文法如下:

例如建立一個求兩個值的平均值的操作符:

首選要建立函數

postgres=# create function f_avg(numeric,numeric) returns numeric as $$

postgres$#   select ($1+$2)/2;

postgres$# $$ language sql strict;

create function

驗證函數

postgres=# select f_avg(1,null);

 f_avg 

-------

(1 row)

postgres=# select f_avg(1,2);

       f_avg        

--------------------

 1.5000000000000000

建立操作符,指定左右參數類型,調用的函數名,commutator是一個和優化器相關的選項,我後面會重點介紹:

postgres=# create operator ## (procedure=f_avg, leftarg=numeric, rightarg=numeric, commutator='##');

create operator

postgres=# select 1 ## 2;

      ?column?      

注意到在建立操作符的文法中有6個和優化器有關的關鍵字:

    [, commutator = com_op ] [, negator = neg_op ]

    [, restrict = res_proc ] [, join = join_proc ]

    [, hashes ] [, merges ]

介紹如下:

假設x表示操作符左側的參數,y表示操作符右側的參數

1. commutator,指明x op1 y等效于y op2 x,即操作數調換,傳回的值一樣。例如2>1 和1<2結果是一緻的。那麼>就是<的commutator或者反之。又例如1+2和2+1是等價的,那麼+就是+的commutator。commutator隻需要在建立其中一個操作符時指定,建立另一個對應的操作符時可以不需要指定,postgresql會自動建立這個關系。例如建立>操作符時指定了它的commutator是<,那麼在建立<操作符時可以不需要指定>是它的commutator。

另外需要注意,有commutator操作符的操作符的左右兩側的參數類型必須一緻,這樣才能滿足x op1 y等價于y op2 x。

優化器如何利用commutator呢?例如索引掃描,必須列在操作符的左側才能使用索引。1 > tbl.c這個條件,如果>沒有commutator的話,是不能使用索引的。

例子,以int4的>和<操作符為例,實驗一下:

>和<在postgresql中是一對commutator

postgres=# select oprcom::regoper from pg_operator where oprname='>' and oprcode='int4gt'::regproc;

    oprcom    

--------------

 pg_catalog.<

postgres=# select oprcom::regoper from pg_operator where oprname='<' and oprcode='int4lt'::regproc;

 pg_catalog.>

記錄他們的oprcom對應的oid

postgres=# select * from pg_operator where oprname='>' and oprcode='int4gt'::regproc;

 oprname | oprnamespace | oprowner | oprkind | oprcanmerge | oprcanhash | oprleft | oprright | oprresult | oprcom | oprnegate | oprc

ode |   oprrest   |     oprjoin     

---------+--------------+----------+---------+-------------+------------+---------+----------+-----------+--------+-----------+-----

----+-------------+-----------------

 >       |           11 |       10 | b       | f           | f          |      23 |       23 |        16 |     97 |       523 | int4

gt  | scalargtsel | scalargtjoinsel

postgres=# select * from pg_operator where oprname='<' and oprcode='int4lt'::regproc;

 <       |           11 |       10 | b       | f           | f          |      23 |       23 |        16 |    521 |       525 | int4

lt  | scalarltsel | scalarltjoinsel

接下來我要通過更新pg_operator解除他們的commutator關系,設定為0即可。

postgres=# update pg_operator set oprcom=0 where oprname='>' and oprcode='int4gt'::regproc;

update 1

postgres=# update pg_operator set oprcom=0 where oprname='<' and oprcode='int4lt'::regproc;

建立測試表,插入測試資料,建立索引:

postgres=# create table tbl(id int);

create table

postgres=# insert into tbl select generate_series(1,100000);

insert 0 100000

postgres=# create index idx_tbl_id on tbl(id);

create index

将列放在條件的左邊可以走索引,但是放在右邊不走索引。因為優化器不能決定>,<是否為commutator

postgres=# explain select * from tbl where id<10;

                                query plan                                 

---------------------------------------------------------------------------

 index only scan using idx_tbl_id on tbl  (cost=0.29..8.45 rows=9 width=4)

   index cond: (id < 10)

(2 rows)

postgres=# explain select * from tbl where 10>id;

                        query plan                        

----------------------------------------------------------

 seq scan on tbl  (cost=0.00..1361.00 rows=33333 width=4)

   filter: (10 > id)

重建立立這兩個 operator的commutator關系後,優化器會自動将10>id轉換為id<10,并且走索引了:

postgres=# update pg_operator set oprcom=521 where oprname='<' and oprcode='int4lt'::regproc;

postgres=# update pg_operator set oprcom=97 where oprname='>' and oprcode='int4gt'::regproc;

2. negator,指x op1 y 等價于 not(y op2 x),或者x op1等價于not( y op2),或者op1 x 等價于not(op2 y),是以negator支援一進制和二進制操作符。

例子:

如果=和<>是一對negator操作符,not (x = y) 可以簡化為 x <> y。

postgres=# explain select * from tbl where 10=id;

 index only scan using idx_tbl_id on tbl  (cost=0.29..8.31 rows=1 width=4)

   index cond: (id = 10)

postgres=# explain select * from tbl where not(10<>id);

同樣,操作符兩側參數x,y的類型必須一緻。并且僅适用于傳回布爾邏輯類型的操作符。

3. restrict,是用于評估選擇性的函數,僅适用于二進制操作符,例如where col>100,這個查詢條件,如何評估選擇性呢?是通過操作符的restrict來指定的,選擇性乘以pg_class.reltuples就可以評估得到這個查詢條件的行數。

選擇性函數的代碼在 src/backend/utils/adt/ 

包括

-rw-r--r--. 1 1107 1107  33191 jun 10 03:29 array_selfuncs.c

-rw-r--r--. 1 1107 1107   2316 jun 10 03:29 geo_selfuncs.c

-rw-r--r--. 1 1107 1107    720 jun 10 03:29 network_selfuncs.c

-rw-r--r--. 1 1107 1107  33895 jun 10 03:29 rangetypes_selfuncs.c

-rw-r--r--. 1 1107 1107 218809 jun 10 03:29 selfuncs.c

選擇性函數,還需要依賴資料庫的統計資訊,進而計算選擇性,常見的選擇性計算函數有:

postgres=# select distinct oprrest from pg_operator order by 1;

   oprrest    

 -

 eqsel  相等

 neqsel  不相等

 scalarltsel  小于等于

 scalargtsel  大于等于

 areasel

 positionsel

 contsel

 iclikesel

 icnlikesel

 regexeqsel

 likesel

 icregexeqsel

 regexnesel

 nlikesel

 icregexnesel

 rangesel

 networksel

 tsmatchsel

 arraycontsel

(20 rows)

當然,使用者如果自定義資料類型的話,也可以自定義選擇性函數,或者使用以上标準的選擇性函數,隻是可能需要實作一下類型轉換。

源碼中的介紹:

src/backend/utils/adt/selfuncs.c

/*----------

 * operator selectivity estimation functions are called to estimate the

 * selectivity of where clauses whose top-level operator is their operator.

 * we divide the problem into two cases:

 *              restriction clause estimation: the clause involves vars of just

 *                      one relation.  一種是符合where條件的選擇性(百分比)。

 *              join clause estimation: the clause involves vars of multiple rels.

 * join selectivity estimation is far more difficult and usually less accurate

 * than restriction estimation.  -- join的選擇性評估通常沒有where條件的選擇性準确。

 *

 * when dealing with the inner scan of a nestloop join, we consider the

 * join's joinclauses as restriction clauses for the inner relation, and

 * treat vars of the outer relation as parameters (a/k/a constants of unknown

 * values).  so, restriction estimators need to be able to accept an argument

 * telling which relation is to be treated as the variable.

在使用nestloop join時,一個表的字段将作為變量,另一個表的字段(及其統計資訊)與操作符作為join評估子句。

 * the call convention for a restriction estimator (oprrest function) is

 *              selectivity oprrest (plannerinfo *root,

 *                                                       oid operator,

 *                                                       list *args,

 *                                                       int varrelid);

 * 評估選擇性需要4個參數:

 * root: general information about the query (rtable and reloptinfo lists

 * are particularly important for the estimator).   plannerinfo資訊。

 * operator: oid of the specific operator in question. 操作符的oid

 * args: argument list from the operator clause.  操作符子句中的參數清單

 * varrelid: if not zero, the relid (rtable index) of the relation to

 * be treated as the variable relation.  may be zero if the args list

 * is known to contain vars of only one relation.   表示where條件所包含的參數來自哪些relation。

 * this is represented at the sql level (in pg_proc) as

 *              float8 oprrest (internal, oid, internal, int4);   在pg_proc資料字典中表示為oprrest指定的函數。

 * the result is a selectivity, that is, a fraction (0 to 1) of the rows

 * of the relation that are expected to produce a true result for the

 * given operator.  選擇性函數的評估結果就是一個百分比。乘以pg_class.reltuples就可以得到記錄數。

 * the call convention for a join estimator (oprjoin function) is similar

 * except that varrelid is not needed, and instead join information is

 * supplied:

 * join選擇性的計算函數與where選擇性的計算函數參數有輕微差别,麼有varrelid, 增加了join資訊的參數。

 *              selectivity oprjoin (plannerinfo *root,

 *                                                       jointype jointype,

 *                                                       specialjoininfo *sjinfo);

 *              float8 oprjoin (internal, oid, internal, int2, internal);

 * (before postgres 8.4, join estimators had only the first four of these

 * parameters.  that signature is still allowed, but deprecated.)  the

 * relationship between jointype and sjinfo is explained in the comments for

 * clause_selectivity() --- the short version is that jointype is usually

 * best ignored in favor of examining sjinfo.

 * join selectivity for regular inner and outer joins is defined as the

 * fraction (0 to 1) of the cross product of the relations that is expected

 * to produce a true result for the given operator.  for both semi and anti  (半連接配接與預連接配接)

 * joins, however, the selectivity is defined as the fraction of the left-hand

 * side relation's rows that are expected to have a match (ie, at least one

 * row with a true result) in the right-hand side.

 * for both oprrest and oprjoin functions, the operator's input collation oid

 * (if any) is passed using the standard fmgr mechanism, so that the estimator

 * function can fetch it with pg_get_collation().  note, however, that all

 * statistics in pg_statistic are currently built using the database's default

 * collation.  thus, in most cases where we are looking at statistics, we

 * should ignore the actual operator collation and use default_collation_oid.

 * we expect that the error induced by doing this is usually not large enough

 * to justify complicating matters.

 *----------

4. join,是joinsel即join的選擇性計算函數。

對應pg_operator.oprjoin

postgres=# select distinct oprjoin from pg_operator order by 1;

     oprjoin      

------------------

 eqjoinsel

 neqjoinsel

 scalarltjoinsel

 scalargtjoinsel

 areajoinsel

 positionjoinsel

 contjoinsel

 iclikejoinsel

 icnlikejoinsel

 regexeqjoinsel

 likejoinsel

 icregexeqjoinsel

 regexnejoinsel

 nlikejoinsel

 icregexnejoinsel

 networkjoinsel

 tsmatchjoinsel

 arraycontjoinsel

(19 rows)

5. hashes

6. merges

hashes和merges表示該操作符是否允許hash join和merge join, 隻有傳回布爾邏輯值的二進制操作符滿足這個要求。

我們在pg_operator這個catalog中也可以檢視到對應的介紹:

name

type

references

description

oid

row identifier (hidden attribute; must be explicitly selected)

oprname

name of the operator

oprnamespace

pg_namespace.oid

the oid of the namespace that contains this operator

oprowner

pg_authid.oid

owner of the operator

oprkind

char

b = infix ("between"), l = prefix ("left"), r = postfix ("right")

指定操作符在什麼位置,例如中間,左側,右側

oprcanmerge

bool

this operator supports merge joins

此操作符是否支援merge join

oprcanhash

this operator supports hash joins

此操作符是否支援hash join

oprleft

pg_type.oid

type of the left operand

操作符左側的資料類型

oprright

type of the right operand

操作符右側的資料類型

oprresult

type of the result

傳回結果的資料類型

oprcom

pg_operator.oid

commutator of this operator, if any

oprnegate

negator of this operator, if any

oprcode

regproc

pg_proc.oid

function that implements this operator

oprrest

restriction selectivity estimation function for this operator

oprjoin

join selectivity estimation function for this operator

[參考]1. http://www.postgresql.org/docs/9.4/static/sql-createoperator.html

2. http://www.postgresql.org/docs/9.4/static/xoper-optimization.html

繼續閱讀