zedstore開發版初體驗

1. 概述

zedstore是開發中的一個PostgreSQL的行列混合存儲引擎，

其設計目标偏OLAP場景，但是又能支援所有OLTP的操作，包括MVCC，索引等。

在設計上當OLAP和OLTP的目标發生沖突時，會優先OLAP，是以OLTP的性能會差一點。

zedstore的設計目标參考下面的說明

https://www.postgresql.org/message-id/CALfoeiuF-m5jg51mJUPm5GN8u396o5sA2AF5N97vTRAEDYac7w%40mail.gmail.com

Motivations / Objectives

* Performance improvement for queries selecting subset of columns
  (reduced IO).
* Reduced on-disk footprint compared to heap table. Shorter tuple
  headers and also leveraging compression of similar type data
* Be first-class citizen in the Postgres architecture (tables data can
  just independently live in columnar storage)
* Fully MVCC compliant
* All Indexes supported
* Hybrid row-column store, where some columns are stored together, and
  others separately. Provide flexibility of granularity on how to
  divide the columns. Columns accessed together can be stored
  together.
* Provide better control over bloat (similar to zheap)
* Eliminate need for separate toast tables
* Faster add / drop column or changing data type of column by avoiding
  full rewrite of the table.

zedstore内部page中可以存儲未壓縮的單個tuple，也可以存儲壓縮過的多個tuple的集合。

每個tuple用TID辨別，TID是個邏輯辨別，不同于heap中TID代表實體位置。

在zedstore的整個資料檔案中，表被切分成很多列族（類似hbase，目前的開發版本固定每列都是一個列族），

每個列族都是一個btree，按TID的順序組織，整個zedstore資料檔案就是一個btree的森林（和gin類似）。

+-----------------------------
| Fixed-size page header:
|
|   LSN
|   TID low and hi key (for Lehman & Yao B-tree operations)
|   left and right page pointers
|
| Items:
|
|   TID | size | flags | uncompressed size | lastTID | payload (containeritem)
|   TID | size | flags | uncompressed size | lastTID | payload (containeritem)
|   TID | size | flags | undo pointer | payload (plain item)
|   TID | size | flags | undo pointer | payload (plain item)
|   ...
|
+----------------------------

zedstore雖然在OLTP場景下的性能不是最優，由于zedstore支援資料壓縮，将來可以用來存放OLTP庫的冷資料。

下面做個簡單的測試體驗一下。

2. 測試環境

CentOS 7.3(16核128G SSD)
zedstore

3. 編譯安裝

3.1 下載下傳zedstore源碼

https://github.com/greenplum-db/postgres/tree/zedstore

3.2 安裝lz4

yum install lz4,lz4-devel

也可以下載下傳lz4源碼安裝，但源碼安裝後要執行一次ldconfig。否則編譯時configure可能出錯。

3.3 編譯

cd postgres-zedstore/
./configure --prefix=/usr/pgzedstore --with-lz4
make -j 16
make install
cd contrib/
make -j 16
make install

編譯debug版，可以在configure上添加

CFLAGS="-O0 -DOPTIMIZER_DEBUG -g3"

參數

3.4 初始化執行個體

su - postgres
/usr/pgzedstore/bin/initdb /pgsql/datazedstore -E UTF8 --no-locale
/usr/pgzedstore/bin/pg_ctl -D /pgsql/datazedstore -l logfile restart -o'-p 5444'

4 測試

4.1 初始化測試庫

/usr/pgzedstore/bin/pgbench -i -s 100 -p5444

4.2 heap表測試

[postgres@host10372181 ~]$/usr/pgzedstore/bin/pgbench -n -c 1 -j 1 -T 10 -p5444 -r -S
transaction type: <builtin: select only>
scaling factor: 100
query mode: simple
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 61833
latency average = 0.162 ms
tps = 6183.241453 (including connections establishing)
tps = 6184.485276 (excluding connections establishing)
statement latencies in milliseconds:
         0.000  \set aid random(1, 100000 * :scale)
         0.161  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
[postgres@host10372181 ~]$/usr/pgzedstore/bin/pgbench -n -c 1 -j 1 -T 10 -p5444 -r -S -M prepared
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 158597
latency average = 0.063 ms
tps = 15859.550657 (including connections establishing)
tps = 15862.665007 (excluding connections establishing)
statement latencies in milliseconds:
         0.000  \set aid random(1, 100000 * :scale)
         0.062  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
[postgres@host10372181 ~]$/usr/pgzedstore/bin/pgbench -n -c 1 -j 1 -T 10 -p5444 -r  -M prepared
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 18910
latency average = 0.529 ms
tps = 1890.901809 (including connections establishing)
tps = 1891.305759 (excluding connections establishing)
statement latencies in milliseconds:
         0.000  \set aid random(1, 100000 * :scale)
         0.000  \set bid random(1, 1 * :scale)
         0.000  \set tid random(1, 10 * :scale)
         0.000  \set delta random(-5000, 5000)
         0.021  BEGIN;
         0.113  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
         0.053  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
         0.104  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
         0.091  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
         0.049  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
         0.096  END;

4.2 建立zedstore表

create table pgbench_accounts2(like pgbench_accounts including all) using zedstore;
insert into pgbench_accounts2 select * from pgbench_accounts;
alter table pgbench_accounts rename to pgbench_accounts_old;
alter table pgbench_accounts2 rename to pgbench_accounts;

改造後，可以發現zedstore表的size小了很多。

postgres=# \d+
                                   List of relations
 Schema |         Name         | Type  |  Owner   | Persistence |  Size   | Description 
--------+----------------------+-------+----------+-------------+---------+-------------
 public | pgbench_accounts     | table | postgres | permanent   | 61 MB   | 
 public | pgbench_accounts_old | table | postgres | permanent   | 1283 MB | 
 public | pgbench_branches     | table | postgres | permanent   | 40 kB   | 
 public | pgbench_history      | table | postgres | permanent   | 992 kB  | 
 public | pgbench_tellers      | table | postgres | permanent   | 104 kB  | 
(5 rows)

壓縮效果這麼好和

pgbench_accounts

表中重複值非常多有關。

我們再構造一些随機的資料對比zedstore的壓縮效果。

create table tb1(id int,c1 text);
insert into tb1 select id,md5(id::text) from generate_series(1,1000000)id;
create table tb2(id int,c1 text) using zedstore;
insert into tb2 select id,md5(id::text) from generate_series(1,1000000)id;

這個的壓縮效果就差了很多，lz4壓縮速度比較快，但其本身的壓縮率比較低。

postgres=# select * from tb2 limit 5;
 id |                c1                
----+----------------------------------
  1 | c4ca4238a0b923820dcc509a6f75849b
  2 | c81e728d9d4c2f636f067f89cc14862c
  3 | eccbc87e4b5ce2fe28308fd9f2a7baf3
  4 | a87ff679a2f3e71d9181a67b7542122c
  5 | e4da3b7fbbce2345d7772b0674a318d5
(5 rows)
postgres=# \d+
                                   List of relations
 Schema |         Name         | Type  |  Owner   | Persistence |  Size   | Description 
--------+----------------------+-------+----------+-------------+---------+-------------
...
 public | tb1                  | table | postgres | permanent   | 65 MB   | 
 public | tb2                  | table | postgres | permanent   | 38 MB   | 
(7 rows)

4.3 zedstore表測試

[postgres@host10372181 ~]$/usr/pgzedstore/bin/pgbench -n -c 1 -j 1 -T 10 -p5444 -r -S
transaction type: <builtin: select only>
scaling factor: 100
query mode: simple
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 3663
latency average = 2.730 ms
tps = 366.280735 (including connections establishing)
tps = 366.360837 (excluding connections establishing)
statement latencies in milliseconds:
         0.000  \set aid random(1, 100000 * :scale)
         2.729  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
[postgres@host10372181 ~]$/usr/pgzedstore/bin/pgbench -n -c 1 -j 1 -T 10 -p5444 -r -S -M prepared
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 3907
latency average = 2.560 ms
tps = 390.614280 (including connections establishing)
tps = 390.692250 (excluding connections establishing)
statement latencies in milliseconds:
         0.000  \set aid random(1, 100000 * :scale)
         2.559  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
[postgres@host10372181 ~]$/usr/pgzedstore/bin/pgbench -n -c 1 -j 1 -T 10 -p5444 -r  -M prepared
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 811
latency average = 12.340 ms
tps = 81.036743 (including connections establishing)
tps = 81.053603 (excluding connections establishing)
statement latencies in milliseconds:
         0.001  \set aid random(1, 100000 * :scale)
         0.000  \set bid random(1, 1 * :scale)
         0.000  \set tid random(1, 10 * :scale)
         0.001  \set delta random(-5000, 5000)
         0.023  BEGIN;
        11.888  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
         0.084  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
         0.081  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
         0.095  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
         0.052  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
         0.112  END;

zedstore表和heap表的測試結果彙總如下（機關tps）

測試模式	預編譯模式	heap表	zedstore表
selectonly	simple	6184	366
	prepared	15862	390
normal		1891	81

4.4 聚合查詢的性能對比

對前面建立的有100萬記錄的tb1和tb2表執行聚合查詢，比對執行時間,機關毫秒。

postgres=# \d+ tb1
                                    Table "public.tb1"
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+---------+-----------+----------+---------+----------+--------------+-------------
 id     | integer |           |          |         | plain    |              | 
 c1     | text    |           |          |         | extended |              | 
Access method: heap

postgres=# \d+ tb2
                                    Table "public.tb2"
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+---------+-----------+----------+---------+----------+--------------+-------------
 id     | integer |           |          |         | plain    |              | 
 c1     | text    |           |          |         | extended |              | 
Access method: zedstore

SQL
count(*)	77	356
count(1)	82
count(id)	98	294
max(id)	95	285
avg(id)	99	307

小結

從測試可以看出，目前的zedstore還沒發揮出其行列混合存儲應有的潛質，可能zedstore目前主要工作重心是確定邏輯正确性上，還沒開始做性能上的優化。

zedstore開發版初體驗

1. 概述

2. 測試環境

3. 編譯安裝

3.1 下載下傳zedstore源碼

3.2 安裝lz4

3.3 編譯

3.4 初始化執行個體

4 測試

4.1 初始化測試庫

4.2 heap表測試

4.2 建立zedstore表

4.3 zedstore表測試

4.4 聚合查詢的性能對比

小結

繼續閱讀

NYOJ 269--VF

二叉樹三種周遊(先序，中序，後序)----超詳細引入先序周遊後序周遊中序周遊總結

轉詳解C#資料庫存取圖檔三大方式

UVA 110 Meta-Loopless Sorts

為什麼要選擇UniDAC

BMP檔案結構及圖像每行位元組計算方法

磁盤結構及在Linux中的命名

CRC32和CRC8校驗代碼，C語言版

241 Different Ways to Add Parentheses（C代碼版）

C語言：初學者必定看懂的注釋！！！猴子吃桃問題。猴子第一天摘下若幹個桃子，每天都吃了前一天剩下的一半零一個，到第10天早上想再吃的時候，就剩下一個桃子. 求第一天共摘多少個桃子。

ubuntu14.04下安裝hbse1.0.1.1

[轉]九大排序算法——C語言實作及詳解

while 循環、do- while 循環和 for 循環之間的那點事C語言自學之三種循環比較

結構體：typedef與struct的差別

hdu7108哈希