1. 概述
zedstore是開發中的一個PostgreSQL的行列混合存儲引擎,
其設計目标偏OLAP場景,但是又能支援所有OLTP的操作,包括MVCC,索引等。
在設計上當OLAP和OLTP的目标發生沖突時,會優先OLAP,是以OLTP的性能會差一點。
zedstore的設計目标參考下面的說明
https://www.postgresql.org/message-id/CALfoeiuF-m5jg51mJUPm5GN8u396o5sA2AF5N97vTRAEDYac7w%40mail.gmail.comMotivations / Objectives
* Performance improvement for queries selecting subset of columns
(reduced IO).
* Reduced on-disk footprint compared to heap table. Shorter tuple
headers and also leveraging compression of similar type data
* Be first-class citizen in the Postgres architecture (tables data can
just independently live in columnar storage)
* Fully MVCC compliant
* All Indexes supported
* Hybrid row-column store, where some columns are stored together, and
others separately. Provide flexibility of granularity on how to
divide the columns. Columns accessed together can be stored
together.
* Provide better control over bloat (similar to zheap)
* Eliminate need for separate toast tables
* Faster add / drop column or changing data type of column by avoiding
full rewrite of the table.
zedstore内部page中可以存儲未壓縮的單個tuple,也可以存儲壓縮過的多個tuple的集合。
每個tuple用TID辨別,TID是個邏輯辨別,不同于heap中TID代表實體位置。
在zedstore的整個資料檔案中,表被切分成很多列族(類似hbase,目前的開發版本固定每列都是一個列族),
每個列族都是一個btree,按TID的順序組織,整個zedstore資料檔案就是一個btree的森林(和gin類似)。
+-----------------------------
| Fixed-size page header:
|
| LSN
| TID low and hi key (for Lehman & Yao B-tree operations)
| left and right page pointers
|
| Items:
|
| TID | size | flags | uncompressed size | lastTID | payload (containeritem)
| TID | size | flags | uncompressed size | lastTID | payload (containeritem)
| TID | size | flags | undo pointer | payload (plain item)
| TID | size | flags | undo pointer | payload (plain item)
| ...
|
+----------------------------
zedstore雖然在OLTP場景下的性能不是最優,由于zedstore支援資料壓縮,将來可以用來存放OLTP庫的冷資料。
下面做個簡單的測試體驗一下。
2. 測試環境
- CentOS 7.3(16核128G SSD)
- zedstore
3. 編譯安裝
3.1 下載下傳zedstore源碼
3.2 安裝lz4
yum install lz4,lz4-devel
也可以下載下傳lz4源碼安裝,但源碼安裝後要執行一次ldconfig。否則編譯時configure可能出錯。
3.3 編譯
cd postgres-zedstore/
./configure --prefix=/usr/pgzedstore --with-lz4
make -j 16
make install
cd contrib/
make -j 16
make install
編譯debug版,可以在configure上添加
CFLAGS="-O0 -DOPTIMIZER_DEBUG -g3"
參數
3.4 初始化執行個體
su - postgres
/usr/pgzedstore/bin/initdb /pgsql/datazedstore -E UTF8 --no-locale
/usr/pgzedstore/bin/pg_ctl -D /pgsql/datazedstore -l logfile restart -o'-p 5444'
4 測試
4.1 初始化測試庫
/usr/pgzedstore/bin/pgbench -i -s 100 -p5444
4.2 heap表測試
[postgres@host10372181 ~]$/usr/pgzedstore/bin/pgbench -n -c 1 -j 1 -T 10 -p5444 -r -S
transaction type: <builtin: select only>
scaling factor: 100
query mode: simple
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 61833
latency average = 0.162 ms
tps = 6183.241453 (including connections establishing)
tps = 6184.485276 (excluding connections establishing)
statement latencies in milliseconds:
0.000 \set aid random(1, 100000 * :scale)
0.161 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
[postgres@host10372181 ~]$/usr/pgzedstore/bin/pgbench -n -c 1 -j 1 -T 10 -p5444 -r -S -M prepared
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 158597
latency average = 0.063 ms
tps = 15859.550657 (including connections establishing)
tps = 15862.665007 (excluding connections establishing)
statement latencies in milliseconds:
0.000 \set aid random(1, 100000 * :scale)
0.062 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
[postgres@host10372181 ~]$/usr/pgzedstore/bin/pgbench -n -c 1 -j 1 -T 10 -p5444 -r -M prepared
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 18910
latency average = 0.529 ms
tps = 1890.901809 (including connections establishing)
tps = 1891.305759 (excluding connections establishing)
statement latencies in milliseconds:
0.000 \set aid random(1, 100000 * :scale)
0.000 \set bid random(1, 1 * :scale)
0.000 \set tid random(1, 10 * :scale)
0.000 \set delta random(-5000, 5000)
0.021 BEGIN;
0.113 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.053 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
0.104 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
0.091 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.049 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
0.096 END;
4.2 建立zedstore表
create table pgbench_accounts2(like pgbench_accounts including all) using zedstore;
insert into pgbench_accounts2 select * from pgbench_accounts;
alter table pgbench_accounts rename to pgbench_accounts_old;
alter table pgbench_accounts2 rename to pgbench_accounts;
改造後,可以發現zedstore表的size小了很多。
postgres=# \d+
List of relations
Schema | Name | Type | Owner | Persistence | Size | Description
--------+----------------------+-------+----------+-------------+---------+-------------
public | pgbench_accounts | table | postgres | permanent | 61 MB |
public | pgbench_accounts_old | table | postgres | permanent | 1283 MB |
public | pgbench_branches | table | postgres | permanent | 40 kB |
public | pgbench_history | table | postgres | permanent | 992 kB |
public | pgbench_tellers | table | postgres | permanent | 104 kB |
(5 rows)
壓縮效果這麼好和
pgbench_accounts
表中重複值非常多有關。
我們再構造一些随機的資料對比zedstore的壓縮效果。
create table tb1(id int,c1 text);
insert into tb1 select id,md5(id::text) from generate_series(1,1000000)id;
create table tb2(id int,c1 text) using zedstore;
insert into tb2 select id,md5(id::text) from generate_series(1,1000000)id;
這個的壓縮效果就差了很多,lz4壓縮速度比較快,但其本身的壓縮率比較低。
postgres=# select * from tb2 limit 5;
id | c1
----+----------------------------------
1 | c4ca4238a0b923820dcc509a6f75849b
2 | c81e728d9d4c2f636f067f89cc14862c
3 | eccbc87e4b5ce2fe28308fd9f2a7baf3
4 | a87ff679a2f3e71d9181a67b7542122c
5 | e4da3b7fbbce2345d7772b0674a318d5
(5 rows)
postgres=# \d+
List of relations
Schema | Name | Type | Owner | Persistence | Size | Description
--------+----------------------+-------+----------+-------------+---------+-------------
...
public | tb1 | table | postgres | permanent | 65 MB |
public | tb2 | table | postgres | permanent | 38 MB |
(7 rows)
4.3 zedstore表測試
[postgres@host10372181 ~]$/usr/pgzedstore/bin/pgbench -n -c 1 -j 1 -T 10 -p5444 -r -S
transaction type: <builtin: select only>
scaling factor: 100
query mode: simple
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 3663
latency average = 2.730 ms
tps = 366.280735 (including connections establishing)
tps = 366.360837 (excluding connections establishing)
statement latencies in milliseconds:
0.000 \set aid random(1, 100000 * :scale)
2.729 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
[postgres@host10372181 ~]$/usr/pgzedstore/bin/pgbench -n -c 1 -j 1 -T 10 -p5444 -r -S -M prepared
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 3907
latency average = 2.560 ms
tps = 390.614280 (including connections establishing)
tps = 390.692250 (excluding connections establishing)
statement latencies in milliseconds:
0.000 \set aid random(1, 100000 * :scale)
2.559 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
[postgres@host10372181 ~]$/usr/pgzedstore/bin/pgbench -n -c 1 -j 1 -T 10 -p5444 -r -M prepared
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 811
latency average = 12.340 ms
tps = 81.036743 (including connections establishing)
tps = 81.053603 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.000 \set bid random(1, 1 * :scale)
0.000 \set tid random(1, 10 * :scale)
0.001 \set delta random(-5000, 5000)
0.023 BEGIN;
11.888 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.084 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
0.081 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
0.095 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.052 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
0.112 END;
zedstore表和heap表的測試結果彙總如下(機關tps)
測試模式 | 預編譯模式 | heap表 | zedstore表 |
---|---|---|---|
selectonly | simple | 6184 | 366 |
prepared | 15862 | 390 | |
normal | 1891 | 81 |
4.4 聚合查詢的性能對比
對前面建立的有100萬記錄的tb1和tb2表執行聚合查詢,比對執行時間,機關毫秒。
postgres=# \d+ tb1
Table "public.tb1"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
c1 | text | | | | extended | |
Access method: heap
postgres=# \d+ tb2
Table "public.tb2"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
c1 | text | | | | extended | |
Access method: zedstore
SQL | ||
---|---|---|
count(*) | 77 | 356 |
count(1) | 82 | |
count(id) | 98 | 294 |
max(id) | 95 | 285 |
avg(id) | 99 | 307 |
小結
從測試可以看出,目前的zedstore還沒發揮出其行列混合存儲應有的潛質,可能zedstore目前主要工作重心是確定邏輯正确性上,還沒開始做性能上的優化。