PostgreSQL 操作符與優化器詳解

postgresql 支援自定義操作符，本質上是調用函數來實作的。

文法如下：

例如建立一個求兩個值的平均值的操作符：

首選要建立函數

postgres=# create function f_avg(numeric,numeric) returns numeric as $$

postgres$# select ($1+$2)/2;

postgres$# $$ language sql strict;

create function

驗證函數

postgres=# select f_avg(1,null);

f_avg

-------

(1 row)

postgres=# select f_avg(1,2);

f_avg

--------------------

1.5000000000000000

建立操作符，指定左右參數類型，調用的函數名，commutator是一個和優化器相關的選項，我後面會重點介紹：

postgres=# create operator ## (procedure=f_avg, leftarg=numeric, rightarg=numeric, commutator='##');

create operator

postgres=# select 1 ## 2;

?column?

注意到在建立操作符的文法中有6個和優化器有關的關鍵字：

[, commutator = com_op ] [, negator = neg_op ]

[, restrict = res_proc ] [, join = join_proc ]

[, hashes ] [, merges ]

介紹如下：

假設x表示操作符左側的參數，y表示操作符右側的參數

1. commutator，指明x op1 y等效于y op2 x，即操作數調換，傳回的值一樣。例如2>1 和1<2結果是一緻的。那麼>就是<的commutator或者反之。又例如1+2和2+1是等價的，那麼+就是+的commutator。commutator隻需要在建立其中一個操作符時指定，建立另一個對應的操作符時可以不需要指定，postgresql會自動建立這個關系。例如建立>操作符時指定了它的commutator是<，那麼在建立<操作符時可以不需要指定>是它的commutator。

另外需要注意，有commutator操作符的操作符的左右兩側的參數類型必須一緻，這樣才能滿足x op1 y等價于y op2 x。

優化器如何利用commutator呢？例如索引掃描，必須列在操作符的左側才能使用索引。1 > tbl.c這個條件，如果>沒有commutator的話，是不能使用索引的。

例子，以int4的>和<操作符為例，實驗一下：

>和<在postgresql中是一對commutator

postgres=# select oprcom::regoper from pg_operator where oprname='>' and oprcode='int4gt'::regproc;

oprcom

--------------

pg_catalog.<

postgres=# select oprcom::regoper from pg_operator where oprname='<' and oprcode='int4lt'::regproc;

pg_catalog.>

記錄他們的oprcom對應的oid

postgres=# select * from pg_operator where oprname='>' and oprcode='int4gt'::regproc;

ode | oprrest | oprjoin

---------+--------------+----------+---------+-------------+------------+---------+----------+-----------+--------+-----------+-----

----+-------------+-----------------

> | 11 | 10 | b | f | f | 23 | 23 | 16 | 97 | 523 | int4

gt | scalargtsel | scalargtjoinsel

postgres=# select * from pg_operator where oprname='<' and oprcode='int4lt'::regproc;

< | 11 | 10 | b | f | f | 23 | 23 | 16 | 521 | 525 | int4

lt | scalarltsel | scalarltjoinsel

接下來我要通過更新pg_operator解除他們的commutator關系，設定為0即可。

postgres=# update pg_operator set oprcom=0 where oprname='>' and oprcode='int4gt'::regproc;

update 1

postgres=# update pg_operator set oprcom=0 where oprname='<' and oprcode='int4lt'::regproc;

建立測試表，插入測試資料，建立索引：

postgres=# create table tbl(id int);

create table

postgres=# insert into tbl select generate_series(1,100000);

insert 0 100000

postgres=# create index idx_tbl_id on tbl(id);

create index

将列放在條件的左邊可以走索引，但是放在右邊不走索引。因為優化器不能決定>,<是否為commutator

postgres=# explain select * from tbl where id<10;

query plan

---------------------------------------------------------------------------

index only scan using idx_tbl_id on tbl (cost=0.29..8.45 rows=9 width=4)

index cond: (id < 10)

(2 rows)

postgres=# explain select * from tbl where 10>id;

query plan

----------------------------------------------------------

seq scan on tbl (cost=0.00..1361.00 rows=33333 width=4)

filter: (10 > id)

重建立立這兩個 operator的commutator關系後，優化器會自動将10>id轉換為id<10，并且走索引了：

postgres=# update pg_operator set oprcom=521 where oprname='<' and oprcode='int4lt'::regproc;

postgres=# update pg_operator set oprcom=97 where oprname='>' and oprcode='int4gt'::regproc;

2. negator，指x op1 y 等價于 not(y op2 x)，或者x op1等價于not( y op2)，或者op1 x 等價于not(op2 y)，是以negator支援一進制和二進制操作符。

例子:

如果=和<>是一對negator操作符，not (x = y) 可以簡化為 x <> y。

postgres=# explain select * from tbl where 10=id;

index only scan using idx_tbl_id on tbl (cost=0.29..8.31 rows=1 width=4)

index cond: (id = 10)

postgres=# explain select * from tbl where not(10<>id);

同樣，操作符兩側參數x,y的類型必須一緻。并且僅适用于傳回布爾邏輯類型的操作符。

3. restrict，是用于評估選擇性的函數，僅适用于二進制操作符，例如where col>100，這個查詢條件，如何評估選擇性呢？是通過操作符的restrict來指定的，選擇性乘以pg_class.reltuples就可以評估得到這個查詢條件的行數。

選擇性函數的代碼在 src/backend/utils/adt/

包括

-rw-r--r--. 1 1107 1107 33191 jun 10 03:29 array_selfuncs.c

-rw-r--r--. 1 1107 1107 2316 jun 10 03:29 geo_selfuncs.c

-rw-r--r--. 1 1107 1107 720 jun 10 03:29 network_selfuncs.c

-rw-r--r--. 1 1107 1107 33895 jun 10 03:29 rangetypes_selfuncs.c

-rw-r--r--. 1 1107 1107 218809 jun 10 03:29 selfuncs.c

選擇性函數，還需要依賴資料庫的統計資訊，進而計算選擇性，常見的選擇性計算函數有：

postgres=# select distinct oprrest from pg_operator order by 1;

oprrest

eqsel 相等

neqsel 不相等

scalarltsel 小于等于

scalargtsel 大于等于

areasel

positionsel

contsel

iclikesel

icnlikesel

regexeqsel

likesel

icregexeqsel

regexnesel

nlikesel

icregexnesel

rangesel

networksel

tsmatchsel

arraycontsel

(20 rows)

當然，使用者如果自定義資料類型的話，也可以自定義選擇性函數，或者使用以上标準的選擇性函數，隻是可能需要實作一下類型轉換。

源碼中的介紹：

src/backend/utils/adt/selfuncs.c

/*----------

* operator selectivity estimation functions are called to estimate the

* selectivity of where clauses whose top-level operator is their operator.

* we divide the problem into two cases:

* restriction clause estimation: the clause involves vars of just

* one relation. 一種是符合where條件的選擇性（百分比）。

* join clause estimation: the clause involves vars of multiple rels.

* join selectivity estimation is far more difficult and usually less accurate

* than restriction estimation. -- join的選擇性評估通常沒有where條件的選擇性準确。

* when dealing with the inner scan of a nestloop join, we consider the

* join's joinclauses as restriction clauses for the inner relation, and

* treat vars of the outer relation as parameters (a/k/a constants of unknown

* values). so, restriction estimators need to be able to accept an argument

* telling which relation is to be treated as the variable.

在使用nestloop join時，一個表的字段将作為變量，另一個表的字段（及其統計資訊）與操作符作為join評估子句。

* the call convention for a restriction estimator (oprrest function) is

* selectivity oprrest (plannerinfo *root,

* oid operator,

* list *args,

* int varrelid);

* 評估選擇性需要4個參數:

* root: general information about the query (rtable and reloptinfo lists

* are particularly important for the estimator). plannerinfo資訊。

* operator: oid of the specific operator in question. 操作符的oid

* args: argument list from the operator clause. 操作符子句中的參數清單

* varrelid: if not zero, the relid (rtable index) of the relation to

* be treated as the variable relation. may be zero if the args list

* is known to contain vars of only one relation. 表示where條件所包含的參數來自哪些relation。

* this is represented at the sql level (in pg_proc) as

* float8 oprrest (internal, oid, internal, int4); 在pg_proc資料字典中表示為oprrest指定的函數。

* the result is a selectivity, that is, a fraction (0 to 1) of the rows

* of the relation that are expected to produce a true result for the

* given operator. 選擇性函數的評估結果就是一個百分比。乘以pg_class.reltuples就可以得到記錄數。

* the call convention for a join estimator (oprjoin function) is similar

* except that varrelid is not needed, and instead join information is

* supplied:

* join選擇性的計算函數與where選擇性的計算函數參數有輕微差别，麼有varrelid, 增加了join資訊的參數。

* selectivity oprjoin (plannerinfo *root,

* jointype jointype,

* specialjoininfo *sjinfo);

* float8 oprjoin (internal, oid, internal, int2, internal);

* (before postgres 8.4, join estimators had only the first four of these

* parameters. that signature is still allowed, but deprecated.) the

* relationship between jointype and sjinfo is explained in the comments for

* clause_selectivity() --- the short version is that jointype is usually

* best ignored in favor of examining sjinfo.

* join selectivity for regular inner and outer joins is defined as the

* fraction (0 to 1) of the cross product of the relations that is expected

* to produce a true result for the given operator. for both semi and anti (半連接配接與預連接配接)

* joins, however, the selectivity is defined as the fraction of the left-hand

* side relation's rows that are expected to have a match (ie, at least one

* row with a true result) in the right-hand side.

* for both oprrest and oprjoin functions, the operator's input collation oid

* (if any) is passed using the standard fmgr mechanism, so that the estimator

* function can fetch it with pg_get_collation(). note, however, that all

* statistics in pg_statistic are currently built using the database's default

* collation. thus, in most cases where we are looking at statistics, we

* should ignore the actual operator collation and use default_collation_oid.

* we expect that the error induced by doing this is usually not large enough

* to justify complicating matters.

*----------

4. join，是joinsel即join的選擇性計算函數。

對應pg_operator.oprjoin

postgres=# select distinct oprjoin from pg_operator order by 1;

oprjoin

------------------

eqjoinsel

neqjoinsel

scalarltjoinsel

scalargtjoinsel

areajoinsel

positionjoinsel

contjoinsel

iclikejoinsel

icnlikejoinsel

regexeqjoinsel

likejoinsel

icregexeqjoinsel

regexnejoinsel

nlikejoinsel

icregexnejoinsel

networkjoinsel

tsmatchjoinsel

arraycontjoinsel

(19 rows)

5. hashes

6. merges

hashes和merges表示該操作符是否允許hash join和merge join, 隻有傳回布爾邏輯值的二進制操作符滿足這個要求。

我們在pg_operator這個catalog中也可以檢視到對應的介紹：

name

type

references

description

oid

row identifier (hidden attribute; must be explicitly selected)

oprname

name of the operator

oprnamespace

pg_namespace.oid

the oid of the namespace that contains this operator

oprowner

pg_authid.oid

owner of the operator

oprkind

char

b = infix ("between"), l = prefix ("left"), r = postfix ("right")

指定操作符在什麼位置，例如中間，左側，右側

oprcanmerge

bool

this operator supports merge joins

此操作符是否支援merge join

oprcanhash

this operator supports hash joins

此操作符是否支援hash join

oprleft

pg_type.oid

type of the left operand

操作符左側的資料類型

oprright

type of the right operand

操作符右側的資料類型

oprresult

type of the result

傳回結果的資料類型

oprcom

pg_operator.oid

commutator of this operator, if any

oprnegate

negator of this operator, if any

oprcode

regproc

pg_proc.oid

function that implements this operator

oprrest

restriction selectivity estimation function for this operator

oprjoin

join selectivity estimation function for this operator

[參考]1. http://www.postgresql.org/docs/9.4/static/sql-createoperator.html

2. http://www.postgresql.org/docs/9.4/static/xoper-optimization.html

PostgreSQL 操作符與優化器詳解

繼續閱讀

MySQL - order by排序優化

C#連接配接postgresql資料庫

2021-08-08 mysql索引

關于 java 轉碼 GBK -> UTF-8

postgresql替換換行符

查詢資料庫中存在的回車換行符

Django入門：第一個Django應用

SQLServer 提升查詢速度

postgresql 關于GBK和utf-8字元集問題

詳解SQL中幾種常用的表連接配接方式

oracle 中不使用已有的索引解決辦法

十四、MySQL備份和恢複資料庫1、備份和恢複的方法2、使用mysqldump導出3、恢複轉儲檔案4、字元編碼問題5、鎖表系列結語

對first_name建立唯一索引uniq_idx_firstname問題描述Sql語句

記一次因MySQL編碼問題導緻的慢查詢排查

PostgreSQL t_bits計算方法

為什麼要選擇UniDAC