PostgreSQL 電商業務(任意次元商品圈選應用) - json包range數組的命中優化 - 展開+索引優化

标簽

PostgreSQL , range , jsonb , gist , btree_gist , 展開 , array

https://github.com/digoal/blog/blob/master/201807/20180703_02.md#%E8%83%8C%E6%99%AF 背景

電商，任意次元商品圈選應用，其中一個查詢請求是這樣的：

求 "某個國家、某個時間點、調價+折扣後的價格" 落在某個價格範圍的商品。

首先需要有的要素包括：

1、商品ID

2、不同國家的商品價格

3、商品原價

4、商品日常價

5、不同時間段的價格折扣

6、調價系數

https://github.com/digoal/blog/blob/master/201807/20180703_02.md#%E4%BE%8B%E5%AD%90 例子

1、表結構設計

create table t_item (  
  id int8 primary key,   -- 商品ID  
  country jsonb,         -- 每個國家的價格取值範圍  
  price jsonb,           -- 每個時間段的折扣，（時間可能重疊，根據優先級LIMIT 1個折扣）  
  ratio float4           -- 調價比例  
  -- 其他屬性scalar類型, 使用rum或gin索引，本文末尾有案例  
);

2、資料樣本

insert into t_item values (  
  1,  
  jsonb '{"global":{"min": 100, "max":200}, "china":{"min": 120, "max":260}, "us":{"min": 170, "max":300}}',  
  jsonb '{"100|[1514764800,1515542400)":0.4, "200|[1514764800,1515542400)":0.9, "0|[-62135596800,253402214400)":1}',  
  0.1  
);

其中時間區間可以使用epoch表示

postgres=# select extract(epoch from date '2018-01-01');  
 date_part    
------------  
 1514764800  
(1 row)  
  
postgres=# select extract(epoch from date '2018-01-10');  
 date_part    
------------  
 1515542400  
(1 row)  
  
postgres=# select extract(epoch from date '0001-01-01');  
  date_part     
--------------  
 -62135596800  
(1 row)  
  
postgres=# select extract(epoch from date '9999-12-31');  
  date_part     
--------------  
 253402214400  
(1 row)

3、由于不同時間段的折扣不一樣，并且優先級也不一樣，是以，使用一個函數來擷取某個時間點的這塊。

當輸入的時間點有多個時間區間包括它時，取優先級最高的那個折扣，并傳回，如果沒有任何比對的時間區間，則傳回1。

create or replace function get_discount(  
  jsonb,  -- 每個時間段的折扣字段  
  int8    -- epoch 時間值  
) returns float4 as $$  
declare  
  res float4;   
begin  
  -- select split_part(key,'|',1) as priority, split_part(key,'|',2) as ts, value from jsonb_each_text($1);  
  select value into res from jsonb_each_text($1) where split_part(key,'|',2)::int8range @> $2 order by split_part(key,'|',1)::numeric desc limit 1;  
  if found then  
    return res;  
  end if;  
  return 1;  
end;  
$$ language plpgsql strict parallel safe;

postgres=# select get_discount(jsonb '{"100|[1514764800,1515542400)":0.4, "200|[1514764800,1515542400)":0.9, "0|[-62135596800,253402214400)":1}', 100000);  
 get_discount   
--------------  
            1  
(1 row)  
  
postgres=# select get_discount(jsonb '{"100|[1514764800,1515542400)":0.4, "200|[1514764800,1515542400)":0.9, "0|[-62135596800,253402214400)":1}', 1515542200);  
 get_discount   
--------------  
          0.9  
(1 row)

4、不同的國家，價格不一樣，輸入國家編碼，傳回對應國家的價格，如果輸入的編碼在JSONB中沒有，則傳回global的價格。

create or replace function get_price(  
  jsonb,  -- 國家價格區間  
  text    -- 國家編碼  
) returns float8 as $$  
  select case when ($1->$2->>'max')::float8 is not null then ($1->$2->>'max')::float8 else ($1->'global'->>'max')::float8 end;  
$$ language sql strict parallel safe;

postgres=# select get_price(jsonb '{"global":{"min": 100, "max":200}, "china":{"min": 120, "max":260}, "us":{"min": 170, "max":300}}', 'hello');  
 get_price   
-----------  
       200  
(1 row)  
  
  
  
postgres=# select get_price(jsonb '{"global":{"min": 100, "max":200}, "china":{"min": 120, "max":260}, "us":{"min": 170, "max":300}}', 'china');  
 get_price   
-----------  
       260  
(1 row)

5、求 "某個國家、某個時間點、調價+折扣後的價格" 落在某個價格範圍的商品。

SQL

postgres=# select * from t_item where get_price(country, 'china') * get_discount(price, 1515542200) * (1+ratio) < 100;    
 id | country | price | ratio   
----+---------+-------+-------  
(0 rows)  
  
  
postgres=# select * from t_item where get_price(country, 'china') * get_discount(price, 1515542200) * (1+ratio) < 1000;    
 id |                                                 country                                                 |                                                    price                                                     | ratio   
----+---------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+-------  
  1 | {"us": {"max": 300, "min": 170}, "china": {"max": 260, "min": 120}, "global": {"max": 200, "min": 100}} | {"100|[1514764800,1515542400)": 0.4, "200|[1514764800,1515542400)": 0.9, "0|[-62135596800,253402214400)": 1} |   0.1  
(1 row)

6、壓測

寫入5.3億資料

insert into t_item select * from t_item ;  
.....  
insert into t_item select * from t_item ;

單表約 186 GB

postgres=# \dt+ t_item  
                     List of relations  
 Schema |  Name  | Type  |  Owner   |  Size  | Description   
--------+--------+-------+----------+--------+-------------  
 public | t_item | table | postgres | 186 GB |   
(1 row)

7、使用并行計算

postgres=# alter function get_price  ;  
ALTER FUNCTION  
postgres=# alter function get_discount  parallel safe;  
ALTER FUNCTION  
  
postgres=# set max_parallel_workers_per_gather =56;  
SET  
postgres=# alter table t_item set (parallel_workers =56);  
ALTER TABLE  
  
postgres=# set min_parallel_table_scan_size =0;  
SET  
postgres=# set min_parallel_index_scan_size =0;  
SET  
postgres=# set parallel_setup_cost =0;  
SET  
postgres=# set parallel_tuple_cost =0;  
SET

8、最差的情況，沒有一條命中的資料，耗時為處理完5.3億條記錄的耗時

postgres=# explain select * from t_item where get_price(country, 'china') * get_discount(price, 1515542200) * (1+ratio) < 100 ;  
                                                                           QUERY PLAN                                                                              
-----------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Gather  (cost=1000.00..51024073.42 rows=178956971 width=332)  
   Workers Planned: 32  
   ->  Parallel Seq Scan on t_item  (cost=0.00..33127376.32 rows=5592405 width=332)  
         Filter: (((get_price(country, 'china'::text) * get_discount(price, '1515542200'::bigint)) * ('1'::double precision + ratio)) < '100'::double precision)  
(4 rows)  
  
  
  
postgres=# explain analyze select * from t_item where get_price(country, 'china') * get_discount(price, 1515542200) * (1+ratio) < 100 ;  
                                                                           QUERY PLAN                                                                              
-----------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Gather  (cost=1000.00..47285151.00 rows=178956971 width=332) (actual time=444448.106..444448.106 rows=0 loops=1)  
   Workers Planned: 56  
   Workers Launched: 56  
   ->  Parallel Seq Scan on t_item  (cost=0.00..29388453.90 rows=3195660 width=332) (actual time=444292.055..444292.055 rows=0 loops=57)  
         Filter: (((get_price(country, 'china'::text) * get_discount(price, '1515542200'::bigint)) * ('1'::double precision + ratio)) < '100'::double precision)  
         Rows Removed by Filter: 9418788  
 Planning Time: 0.072 ms  
 Execution Time: 462253.627 ms  
(8 rows)

56 core 虛拟機，耗時462秒。

https://github.com/digoal/blog/blob/master/201807/20180703_02.md#%E7%B4%A2%E5%BC%95%E4%BC%98%E5%8C%96 索引優化

将資料展開為兩張表(其中一張可以使用原始表，不需要建立)

其中表1的資料，需要業務方維護，當原價、折扣、調價系數發生變化時，需要實時的更新這裡的記錄。

https://github.com/digoal/blog/blob/master/201807/20180703_02.md#%E8%A1%A81 表1

折扣區間展開表：

商品ID  
國家  
時間區間  
折後價

create table t_item1 (  
  id int8,  
  country text,  
  ts int8range,  
  price float8,  
  exclude using gist (id with =, country with =, ts with &&)  -- 排他限制，同一個商品ID同一個國家不允許有TS相交的折扣資料
);  
  
create extension IF NOT EXISTS btree_gist;  
  
create index idx_t_item1_1 on t_item1 using gist (country,price,ts);

https://github.com/digoal/blog/blob/master/201807/20180703_02.md#%E8%A1%A82 表2

正常價格查原始表：

商品ID  
國家  
日常價  
原價  
調價比例

create table t_item2 (  
  id int8,  
  country text,  
  price1 float8,  
  price2 float8,  
  ratio float4,  
  primary key (country, id)  
);  
  
create index idx_t_item2_1 on t_item2 (country, (least(price1*ratio,price2*ratio)));

https://github.com/digoal/blog/blob/master/201807/20180703_02.md#sql

select id from t_item1 where country ='china' and price < 50::float8 and ts @> 10000000::int8  
union  
select id from t_item2  where country='china' and (least(price1*ratio,price2*ratio)) < 50;  

  
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=5.91..5.94 rows=3 width=8)
   Group Key: t_item1.id
   ->  Append  (cost=0.14..5.90 rows=3 width=8)
         ->  Index Scan using idx_t_item1_1 on t_item1  (cost=0.14..2.37 rows=1 width=8)
               Index Cond: ((country = 'china'::text) AND (price < '50'::double precision) AND (ts @> '10000000'::bigint))
         ->  Index Scan using idx_t_item2_1 on t_item2  (cost=0.15..3.49 rows=2 width=8)
               Index Cond: ((country = 'china'::text) AND (LEAST((price1 * ratio), (price2 * ratio)) < '50'::double precision))
(7 rows)

https://github.com/digoal/blog/blob/master/201807/20180703_02.md#%E5%B0%8F%E7%BB%93 小結

第一種設計，簡化了程式開發，但是無法使用索引掃描，性能會比較差。

第二種設計，當調價比例、原價、折扣資料發生變化時，程式需要維護價格的變更到t_item1表，程式開發上會增加一定的負擔，（當然也可以使用資料庫觸發器來更新，程式偷一下懶，但是不推薦這麼做）。

https://github.com/digoal/blog/blob/master/201807/20180703_02.md#%E5%8F%82%E8%80%83 參考

《PostgreSQL 函數式索引使用注意 - 暨非immutable函數不适合索引的原因》《PostgreSQL ADHoc(任意字段組合)查詢(rums索引加速) - 非字典化，普通、數組等組合字段生成新數組》《PostgreSQL ADHoc(任意字段組合)查詢與字典化 (rum索引加速) - 實踐與方案1》《PostgreSQL 店鋪營運實踐 - JSON[]數組内部标簽資料等值、範圍檢索100倍+加速示例 (含，單值+多值列合成)》《PostgreSQL UDF實作tsvector(全文檢索), array(數組)多值字段與scalar(單值字段)類型的整合索引(類分區索引) - 單值與多值類型複合查詢性能提速100倍+ 案例 (含，單值+多值列合成)》《會議室預定系統實踐(解放開發) - PostgreSQL tsrange(時間範圍類型) + 排他限制》 https://www.postgresql.org/docs/10/static/datatype-json.html#JSON-INDEXING

PostgreSQL 電商業務(任意次元商品圈選應用) - json包range數組的命中優化 - 展開+索引優化

标簽

https://github.com/digoal/blog/blob/master/201807/20180703_02.md#%E8%83%8C%E6%99%AF 背景

https://github.com/digoal/blog/blob/master/201807/20180703_02.md#%E4%BE%8B%E5%AD%90 例子

https://github.com/digoal/blog/blob/master/201807/20180703_02.md#%E7%B4%A2%E5%BC%95%E4%BC%98%E5%8C%96 索引優化

https://github.com/digoal/blog/blob/master/201807/20180703_02.md#%E8%A1%A81 表1

https://github.com/digoal/blog/blob/master/201807/20180703_02.md#%E8%A1%A82 表2

https://github.com/digoal/blog/blob/master/201807/20180703_02.md#sql

https://github.com/digoal/blog/blob/master/201807/20180703_02.md#%E5%B0%8F%E7%BB%93 小結

https://github.com/digoal/blog/blob/master/201807/20180703_02.md#%E5%8F%82%E8%80%83 參考

繼續閱讀

MySQL - order by排序優化

C#連接配接postgresql資料庫

2021-08-08 mysql索引

關于 java 轉碼 GBK -> UTF-8

postgresql替換換行符

查詢資料庫中存在的回車換行符

Django入門：第一個Django應用

SQLServer 提升查詢速度

postgresql 關于GBK和utf-8字元集問題

詳解SQL中幾種常用的表連接配接方式

oracle 中不使用已有的索引解決辦法

十四、MySQL備份和恢複資料庫1、備份和恢複的方法2、使用mysqldump導出3、恢複轉儲檔案4、字元編碼問題5、鎖表系列結語

對first_name建立唯一索引uniq_idx_firstname問題描述Sql語句

記一次因MySQL編碼問題導緻的慢查詢排查

PostgreSQL t_bits計算方法

為什麼要選擇UniDAC