postgresql 支援自定義操作符,本質上是調用函數來實作的。
文法如下:
例如建立一個求兩個值的平均值的操作符:
首選要建立函數
postgres=# create function f_avg(numeric,numeric) returns numeric as $$
postgres$# select ($1+$2)/2;
postgres$# $$ language sql strict;
create function
驗證函數
postgres=# select f_avg(1,null);
f_avg
-------
(1 row)
postgres=# select f_avg(1,2);
f_avg
--------------------
1.5000000000000000
建立操作符,指定左右參數類型,調用的函數名,commutator是一個和優化器相關的選項,我後面會重點介紹:
postgres=# create operator ## (procedure=f_avg, leftarg=numeric, rightarg=numeric, commutator='##');
create operator
postgres=# select 1 ## 2;
?column?
注意到在建立操作符的文法中有6個和優化器有關的關鍵字:
[, commutator = com_op ] [, negator = neg_op ]
[, restrict = res_proc ] [, join = join_proc ]
[, hashes ] [, merges ]
介紹如下:
假設x表示操作符左側的參數,y表示操作符右側的參數
1. commutator,指明x op1 y等效于y op2 x,即操作數調換,傳回的值一樣。例如2>1 和1<2結果是一緻的。那麼>就是<的commutator或者反之。又例如1+2和2+1是等價的,那麼+就是+的commutator。commutator隻需要在建立其中一個操作符時指定,建立另一個對應的操作符時可以不需要指定,postgresql會自動建立這個關系。例如建立>操作符時指定了它的commutator是<,那麼在建立<操作符時可以不需要指定>是它的commutator。
另外需要注意,有commutator操作符的操作符的左右兩側的參數類型必須一緻,這樣才能滿足x op1 y等價于y op2 x。
優化器如何利用commutator呢?例如索引掃描,必須列在操作符的左側才能使用索引。1 > tbl.c這個條件,如果>沒有commutator的話,是不能使用索引的。
例子,以int4的>和<操作符為例,實驗一下:
>和<在postgresql中是一對commutator
postgres=# select oprcom::regoper from pg_operator where oprname='>' and oprcode='int4gt'::regproc;
oprcom
--------------
pg_catalog.<
postgres=# select oprcom::regoper from pg_operator where oprname='<' and oprcode='int4lt'::regproc;
pg_catalog.>
記錄他們的oprcom對應的oid
postgres=# select * from pg_operator where oprname='>' and oprcode='int4gt'::regproc;
oprname | oprnamespace | oprowner | oprkind | oprcanmerge | oprcanhash | oprleft | oprright | oprresult | oprcom | oprnegate | oprc
ode | oprrest | oprjoin
---------+--------------+----------+---------+-------------+------------+---------+----------+-----------+--------+-----------+-----
----+-------------+-----------------
> | 11 | 10 | b | f | f | 23 | 23 | 16 | 97 | 523 | int4
gt | scalargtsel | scalargtjoinsel
postgres=# select * from pg_operator where oprname='<' and oprcode='int4lt'::regproc;
< | 11 | 10 | b | f | f | 23 | 23 | 16 | 521 | 525 | int4
lt | scalarltsel | scalarltjoinsel
接下來我要通過更新pg_operator解除他們的commutator關系,設定為0即可。
postgres=# update pg_operator set oprcom=0 where oprname='>' and oprcode='int4gt'::regproc;
update 1
postgres=# update pg_operator set oprcom=0 where oprname='<' and oprcode='int4lt'::regproc;
建立測試表,插入測試資料,建立索引:
postgres=# create table tbl(id int);
create table
postgres=# insert into tbl select generate_series(1,100000);
insert 0 100000
postgres=# create index idx_tbl_id on tbl(id);
create index
将列放在條件的左邊可以走索引,但是放在右邊不走索引。因為優化器不能決定>,<是否為commutator
postgres=# explain select * from tbl where id<10;
query plan
---------------------------------------------------------------------------
index only scan using idx_tbl_id on tbl (cost=0.29..8.45 rows=9 width=4)
index cond: (id < 10)
(2 rows)
postgres=# explain select * from tbl where 10>id;
query plan
----------------------------------------------------------
seq scan on tbl (cost=0.00..1361.00 rows=33333 width=4)
filter: (10 > id)
重建立立這兩個 operator的commutator關系後,優化器會自動将10>id轉換為id<10,并且走索引了:
postgres=# update pg_operator set oprcom=521 where oprname='<' and oprcode='int4lt'::regproc;
postgres=# update pg_operator set oprcom=97 where oprname='>' and oprcode='int4gt'::regproc;
2. negator,指x op1 y 等價于 not(y op2 x),或者x op1等價于not( y op2),或者op1 x 等價于not(op2 y),是以negator支援一進制和二進制操作符。
例子:
如果=和<>是一對negator操作符,not (x = y) 可以簡化為 x <> y。
postgres=# explain select * from tbl where 10=id;
index only scan using idx_tbl_id on tbl (cost=0.29..8.31 rows=1 width=4)
index cond: (id = 10)
postgres=# explain select * from tbl where not(10<>id);
同樣,操作符兩側參數x,y的類型必須一緻。并且僅适用于傳回布爾邏輯類型的操作符。
3. restrict,是用于評估選擇性的函數,僅适用于二進制操作符,例如where col>100,這個查詢條件,如何評估選擇性呢?是通過操作符的restrict來指定的,選擇性乘以pg_class.reltuples就可以評估得到這個查詢條件的行數。
選擇性函數的代碼在 src/backend/utils/adt/
包括
-rw-r--r--. 1 1107 1107 33191 jun 10 03:29 array_selfuncs.c
-rw-r--r--. 1 1107 1107 2316 jun 10 03:29 geo_selfuncs.c
-rw-r--r--. 1 1107 1107 720 jun 10 03:29 network_selfuncs.c
-rw-r--r--. 1 1107 1107 33895 jun 10 03:29 rangetypes_selfuncs.c
-rw-r--r--. 1 1107 1107 218809 jun 10 03:29 selfuncs.c
選擇性函數,還需要依賴資料庫的統計資訊,進而計算選擇性,常見的選擇性計算函數有:
postgres=# select distinct oprrest from pg_operator order by 1;
oprrest
-
eqsel 相等
neqsel 不相等
scalarltsel 小于等于
scalargtsel 大于等于
areasel
positionsel
contsel
iclikesel
icnlikesel
regexeqsel
likesel
icregexeqsel
regexnesel
nlikesel
icregexnesel
rangesel
networksel
tsmatchsel
arraycontsel
(20 rows)
當然,使用者如果自定義資料類型的話,也可以自定義選擇性函數,或者使用以上标準的選擇性函數,隻是可能需要實作一下類型轉換。
源碼中的介紹:
src/backend/utils/adt/selfuncs.c
/*----------
* operator selectivity estimation functions are called to estimate the
* selectivity of where clauses whose top-level operator is their operator.
* we divide the problem into two cases:
* restriction clause estimation: the clause involves vars of just
* one relation. 一種是符合where條件的選擇性(百分比)。
* join clause estimation: the clause involves vars of multiple rels.
* join selectivity estimation is far more difficult and usually less accurate
* than restriction estimation. -- join的選擇性評估通常沒有where條件的選擇性準确。
*
* when dealing with the inner scan of a nestloop join, we consider the
* join's joinclauses as restriction clauses for the inner relation, and
* treat vars of the outer relation as parameters (a/k/a constants of unknown
* values). so, restriction estimators need to be able to accept an argument
* telling which relation is to be treated as the variable.
在使用nestloop join時,一個表的字段将作為變量,另一個表的字段(及其統計資訊)與操作符作為join評估子句。
* the call convention for a restriction estimator (oprrest function) is
* selectivity oprrest (plannerinfo *root,
* oid operator,
* list *args,
* int varrelid);
* 評估選擇性需要4個參數:
* root: general information about the query (rtable and reloptinfo lists
* are particularly important for the estimator). plannerinfo資訊。
* operator: oid of the specific operator in question. 操作符的oid
* args: argument list from the operator clause. 操作符子句中的參數清單
* varrelid: if not zero, the relid (rtable index) of the relation to
* be treated as the variable relation. may be zero if the args list
* is known to contain vars of only one relation. 表示where條件所包含的參數來自哪些relation。
* this is represented at the sql level (in pg_proc) as
* float8 oprrest (internal, oid, internal, int4); 在pg_proc資料字典中表示為oprrest指定的函數。
* the result is a selectivity, that is, a fraction (0 to 1) of the rows
* of the relation that are expected to produce a true result for the
* given operator. 選擇性函數的評估結果就是一個百分比。乘以pg_class.reltuples就可以得到記錄數。
* the call convention for a join estimator (oprjoin function) is similar
* except that varrelid is not needed, and instead join information is
* supplied:
* join選擇性的計算函數與where選擇性的計算函數參數有輕微差别,麼有varrelid, 增加了join資訊的參數。
* selectivity oprjoin (plannerinfo *root,
* jointype jointype,
* specialjoininfo *sjinfo);
* float8 oprjoin (internal, oid, internal, int2, internal);
* (before postgres 8.4, join estimators had only the first four of these
* parameters. that signature is still allowed, but deprecated.) the
* relationship between jointype and sjinfo is explained in the comments for
* clause_selectivity() --- the short version is that jointype is usually
* best ignored in favor of examining sjinfo.
* join selectivity for regular inner and outer joins is defined as the
* fraction (0 to 1) of the cross product of the relations that is expected
* to produce a true result for the given operator. for both semi and anti (半連接配接與預連接配接)
* joins, however, the selectivity is defined as the fraction of the left-hand
* side relation's rows that are expected to have a match (ie, at least one
* row with a true result) in the right-hand side.
* for both oprrest and oprjoin functions, the operator's input collation oid
* (if any) is passed using the standard fmgr mechanism, so that the estimator
* function can fetch it with pg_get_collation(). note, however, that all
* statistics in pg_statistic are currently built using the database's default
* collation. thus, in most cases where we are looking at statistics, we
* should ignore the actual operator collation and use default_collation_oid.
* we expect that the error induced by doing this is usually not large enough
* to justify complicating matters.
*----------
4. join,是joinsel即join的選擇性計算函數。
對應pg_operator.oprjoin
postgres=# select distinct oprjoin from pg_operator order by 1;
oprjoin
------------------
eqjoinsel
neqjoinsel
scalarltjoinsel
scalargtjoinsel
areajoinsel
positionjoinsel
contjoinsel
iclikejoinsel
icnlikejoinsel
regexeqjoinsel
likejoinsel
icregexeqjoinsel
regexnejoinsel
nlikejoinsel
icregexnejoinsel
networkjoinsel
tsmatchjoinsel
arraycontjoinsel
(19 rows)
5. hashes
6. merges
hashes和merges表示該操作符是否允許hash join和merge join, 隻有傳回布爾邏輯值的二進制操作符滿足這個要求。
我們在pg_operator這個catalog中也可以檢視到對應的介紹:
name
type
references
description
oid
row identifier (hidden attribute; must be explicitly selected)
oprname
name of the operator
oprnamespace
pg_namespace.oid
the oid of the namespace that contains this operator
oprowner
pg_authid.oid
owner of the operator
oprkind
char
b = infix ("between"), l = prefix ("left"), r = postfix ("right")
指定操作符在什麼位置,例如中間,左側,右側
oprcanmerge
bool
this operator supports merge joins
此操作符是否支援merge join
oprcanhash
this operator supports hash joins
此操作符是否支援hash join
oprleft
pg_type.oid
type of the left operand
操作符左側的資料類型
oprright
type of the right operand
操作符右側的資料類型
oprresult
type of the result
傳回結果的資料類型
oprcom
pg_operator.oid
commutator of this operator, if any
oprnegate
negator of this operator, if any
oprcode
regproc
pg_proc.oid
function that implements this operator
oprrest
restriction selectivity estimation function for this operator
oprjoin
join selectivity estimation function for this operator
[參考]1. http://www.postgresql.org/docs/9.4/static/sql-createoperator.html
2. http://www.postgresql.org/docs/9.4/static/xoper-optimization.html