AntDB讀寫分離探索
本次測試過程将AntDB叢集的datanode slave備機節點作為可讀節點來實作。這樣做的主要目的是合理利用資源,提高系統吞吐量。
适用場景: 針對同一時間有大量的讀操作和較少的寫操作,并且讀操作對資料的實時性要求并沒有那麼高。在此前提下,可以這麼設計解決方案。
一、環境準備
本次測試過程隻考慮一主一從的場景,datanode 一主兩從的場景原理類似,感興趣的朋友可以動手試試。
搭建一個2C2D(datanode 一主一從同步)的antdb叢集:
postgres=# monitor all;
nodename | nodetype | status | description | host | port
----------+-----------------+--------+-------------+--------------+-------
coord1 | coordinator | t | running | 10.21.20.175 | 6603
coord2 | coordinator | t | running | 10.21.20.176 | 6603
db1 | datanode master | t | running | 10.21.20.175 | 14000
db1 | datanode slave | t | running | 10.21.20.176 | 14001
db2 | datanode master | t | running | 10.21.20.176 | 15000
db2 | datanode slave | t | running | 10.21.20.175 | 15001
gtm | gtm master | t | running | 10.21.20.175 | 7663
(7 rows)
postgres=# list node;
name | host | type | mastername | port | sync_state | path | initialized | incluster
--------+------------+-----------------+------------+-------+------------+----------------------------+-------------+-----------
coord1 | localhost1 | coordinator | | 6603 | | /data/zgy/pgxc_data/coord1 | t | t
coord2 | localhost2 | coordinator | | 6603 | | /data/zgy/pgxc_data/coord2 | t | t
db1 | localhost1 | datanode master | | 14000 | | /data/zgy/pgxc_data/db1 | t | t
db1 | localhost2 | datanode slave | db1 | 14001 | sync | /data/zgy/pgxc_data/db1s | t | t
db2 | localhost2 | datanode master | | 15000 | | /data/zgy/pgxc_data/db2 | t | t
db2 | localhost1 | datanode slave | db2 | 15001 | sync | /data/zgy/pgxc_data/db2s | t | t
gtm | localhost1 | gtm master | | 7663 | | /data/zgy/pgxc_data/gtm | t | t
(7 rows)
二、資料準備
連接配接coord1,建立table并寫入資料:
postgres=# create table a (id int ,name varchar(),score int) distribute by hash(id);
CREATE TABLE
postgres=# insert into a select generate_series(,),'jay'||generate_series(,),ceil(random()*);
INSERT
postgres=# select count(*) from a;
count
-------
100
(1 row)
三、修改coord2節點的pgxc_node關于datanode節點資訊
這裡也可以通過adbmgr,添加一個coord3,部署在datanode slave節點所在的伺服器。本次實驗由于伺服器隻有2台,就隻基于原叢集進行改造。
psql (PGXC 1.2devel, based on PG 9.3.13 ADB 2.2devel 69c858e353)
Type "help" for help.
postgres=# select * from pgxc_node;
node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-----------+-----------+-----------+--------------+----------------+------------------+-------------
coord1 | C | 6603 | 10.21.20.175 | f | f | 1885696643
coord2 | C | 6603 | 10.21.20.176 | f | f | -1197102633
db1 | D | 14000 | 10.21.20.175 | t | f | -2885965
db2 | D | 15000 | 10.21.20.176 | f | t | -79866771
(4 rows)
postgres=# update pgxc_node set node_port=,node_host='10.21.20.176' where node_name='db1';
UPDATE
postgres=# update pgxc_node set node_port=,node_host='10.21.20.175' where node_name='db2';
UPDATE
postgres=# select pgxc_pool_reload();
pgxc_pool_reload
------------------
t
(1 row)
到此,讀寫分離配置完成。此刻,coord1提供讀寫datanode master節點,coord2提供讀datanode slave節點。
我們來看下效果:
連接配接coord2,執行簡單查詢:
psql (PGXC 1.2devel, based on PG 9.3.13 ADB 2.2devel 69c858e353)
Type "help" for help.
postgres=# select count(*) from a;
count
-------
100
(1 row)
postgres=# explain verbose select count(*) from a;
QUERY PLAN
---------------------------------------------------------------------------------------
Aggregate (cost=2.50..2.51 rows=1 width=0)
Output: pg_catalog.count(*)
-> Data Node Scan on "__REMOTE_GROUP_QUERY__" (cost=0.00..0.00 rows=1000 width=0)
Output: (count(*))
Node/s: db1, db2
Remote query: SELECT count(*) FROM ONLY public.a a WHERE true
( rows)
此時檢視程序,發現dbslave節點産生backend程序:
424707 ? S 0:00 /data/zgy/app/bin/postgres --datanode -D /data/zgy/pgxc_data/db1s -i
424709 ? Ss 0:00 \_ postgres: logger process
424710 ? Ss 0:00 \_ postgres: startup process recovering 000000010000000000000003
424711 ? Ss 0:00 \_ postgres: checkpointer process
424712 ? Ss 0:00 \_ postgres: writer process
424713 ? Ss 0:00 \_ postgres: stats collector process
424715 ? Ss 0:00 \_ postgres: wal receiver process streaming 0/3045C90
425072 ? Ss 0:00 \_ postgres: zgy postgres 10.21.20.176(20063) idle
連接配接coord1進行讀寫操作:
[[email protected] highavailable]$ coord1
psql (PGXC 1.2devel, based on PG 9.3.13 ADB 2.2devel 69c858e353)
Type "help" for help.
postgres=# explain verbose select count(*) from a;
QUERY PLAN
---------------------------------------------------------------------------------------
Aggregate (cost=2.50..2.51 rows=1 width=0)
Output: pg_catalog.count(*)
-> Data Node Scan on "__REMOTE_GROUP_QUERY__" (cost=0.00..0.00 rows=1000 width=0)
Output: (count(*))
Node/s: db1, db2
Remote query: SELECT count(*) FROM ONLY public.a a WHERE true
( rows)
postgres=# insert into a select generate_series(,),'jay'||generate_series(,),ceil(random()*);
INSERT
postgres=# select count(*) from a;
count
-------
110
(1 row)
由于antdb 2.2版本,datanode 主備nodename必須相同,讀寫分離看起來不是很明顯,我們可以這樣操作:
- 通過adbmgr子產品,将datanode master 節點都手動stop,來驗證coord2是否仍可讀:
postgres=# monitor all ;
nodename | nodetype | status | description | host | port
----------+-----------------+--------+-------------+--------------+-------
coord1 | coordinator | t | running | 10.21.20.175 | 6603
coord2 | coordinator | t | running | 10.21.20.176 | 6603
db1 | datanode master | t | running | 10.21.20.175 | 14000
db1 | datanode slave | t | running | 10.21.20.176 | 14001
db2 | datanode master | t | running | 10.21.20.176 | 15000
db2 | datanode slave | t | running | 10.21.20.175 | 15001
gtm | gtm master | t | running | 10.21.20.175 | 7663
(7 rows)
postgres=# stop datanode master db1 MODE F;
NOTICE: 10.21.20.175, pg_ctl stop -D /data/zgy/pgxc_data/db1 -Z datanode -m fast -o -i -w -c
nodename | status | description
----------+--------+-------------
db1 | t | success
(1 row)
postgres=# stop datanode master db2 MODE F;
NOTICE: 10.21.20.176, pg_ctl stop -D /data/zgy/pgxc_data/db2 -Z datanode -m fast -o -i -w -c
nodename | status | description
----------+--------+-------------
db2 | t | success
(1 row)
postgres=# monitor all ;
nodename | nodetype | status | description | host | port
----------+-----------------+--------+-------------+--------------+-------
coord1 | coordinator | t | running | 10.21.20.175 | 6603
coord2 | coordinator | t | running | 10.21.20.176 | 6603
db1 | datanode master | f | not running | 10.21.20.175 | 14000
db1 | datanode slave | t | running | 10.21.20.176 | 14001
db2 | datanode master | f | not running | 10.21.20.176 | 15000
db2 | datanode slave | t | running | 10.21.20.175 | 15001
gtm | gtm master | t | running | 10.21.20.175 | 7663
(7 rows)
分别連接配接coord1 和coord2 讀表a的資料:
[[email protected] highavailable]$ coord1
psql (PGXC 1.2devel, based on PG 9.3.13 ADB 2.2devel 69c858e353)
Type "help" for help.
postgres=# select count(*) from a;
ERROR: error message from poolmgr:reconnect three thimes , could not connect to server: Connection refused
Is the server running on host "10.21.20.175" and accepting
TCP/IP connections on port 14000?
[[email protected] ~]$ coord2
psql (PGXC 1.2devel, based on PG 9.3.13 ADB 2.2devel 69c858e353)
Type "help" for help.
postgres=# select count(*) from a;
count
-------
110
(1 row)
四、異常場景
有以下幾個場景需要考慮:
- 1、datanode 備機節點異常(當機,網絡異常等等),業務查詢将出錯
- 2、datanode matser發生一次切換後,需要人工介入添加備機,再将備機的資訊更新到讀的coord清單
- 3、datanode 備機流複制延時(異步wal日志檔案延時&同步節點recover資料延時),都将會導緻查詢的資料不準确
- 4、提供讀的coord節點 ,不能作為寫操作的提供者
4.1 datanode master 節點發生異常,觸發failover 操作後:
模拟過程:
postgres=# monitor all ;
nodename | nodetype | status | description | host | port
----------+-----------------+--------+-------------+--------------+-------
coord1 | coordinator | t | running | 10.21.20.175 | 6603
coord2 | coordinator | t | running | 10.21.20.176 | 6603
db1 | datanode master | t | running | 10.21.20.175 | 14000
db1 | datanode slave | t | running | 10.21.20.176 | 14001
db2 | datanode master | t | running | 10.21.20.176 | 15000
db2 | datanode slave | t | running | 10.21.20.175 | 15001
gtm | gtm master | t | running | 10.21.20.175 | 7663
(7 rows)
postgres=# failover datanode db1;
NOTICE: lock cluster: SELECT PG_PAUSE_CLUSTER();
NOTICE: 10.21.20.175, pg_ctl stop -D /data/zgy/pgxc_data/db1 -Z datanode -m immediate -o -i -w -c
NOTICE: 10.21.20.176, pg_ctl promote -w -D /data/zgy/pgxc_data/db1s
NOTICE: unlock cluster: SELECT PG_UNPAUSE_CLUSTER();
WARNING: the datanode master "db1" has no slave or extra, it is better to append a new datanode slave node
nodename | status | description
----------+--------+-------------
db1 | t | success
(1 row)
postgres=# monitor all ;
nodename | nodetype | status | description | host | port
----------+-----------------+--------+-------------+--------------+-------
coord1 | coordinator | t | running | 10.21.20.175 | 6603
coord2 | coordinator | t | running | 10.21.20.176 | 6603
db1 | datanode master | t | running | 10.21.20.176 | 14001
db2 | datanode master | t | running | 10.21.20.176 | 15000
db2 | datanode slave | t | running | 10.21.20.175 | 15001
gtm | gtm master | t | running | 10.21.20.175 | 7663
(6 rows)
分别連接配接coord1和coord2,執行操作:
[[email protected] highavailable]$ coord1
psql (PGXC 1.2devel, based on PG 9.3.13 ADB 2.2devel 69c858e353)
Type "help" for help.
postgres=# select * from pgxc_node;
node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-----------+-----------+-----------+--------------+----------------+------------------+-------------
coord1 | C | 6603 | 10.21.20.175 | f | f | 1885696643
coord2 | C | 6603 | 10.21.20.176 | f | f | -1197102633
db1 | D | 14001 | 10.21.20.176 | t | f | -2885965
db2 | D | 15000 | 10.21.20.176 | f | t | -79866771
(4 rows)
postgres=# select count(*) from a;
count
-------
110
(1 row)
postgres=# insert into a select generate_series(,),'jay'||generate_series(,),ceil(random()*)
postgres-# ;
INSERT
postgres=# explain verbose select * from a;
QUERY PLAN
----------------------------------------------------------------------------
Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0)
Output: a.id, a.name, a.score
Node/s: db1, db2
Remote query: SELECT id, name, score FROM public.a a
( rows)
postgres=#
[[email protected] highavailable]$ coord2
psql (PGXC devel, based on PG ADB devel c858e353)
Type "help" for help.
postgres=# select * from pgxc_node;
node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-----------+-----------+-----------+--------------+----------------+------------------+-------------
coord1 | C | 6603 | 10.21.20.175 | f | f | 1885696643
coord2 | C | 6603 | 10.21.20.176 | f | f | -1197102633
db1 | D | 14001 | 10.21.20.176 | t | t | -2885965
db2 | D | 15000 | 10.21.20.176 | f | f | -79866771
(4 rows)
postgres=# select count(*) from a;
count
-------
120
(1 row)
postgres=# explain verbose select * from a;
QUERY PLAN
----------------------------------------------------------------------------
Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0)
Output: a.id, a.name, a.score
Node/s: db1, db2
Remote query: SELECT id, name, score FROM public.a a
( rows)
通過adbmgr rewind将宕掉的原master節點作為db1備節點再次加到叢集中來:
備注:antdb2.2版本使用rewind的前提是在初始化叢集時帶上參數 –datachecksums , 否則會報錯:ERROR: rewind datanode slave “db1” fail, target master need to use data checksums. 如果初始化沒帶參數,可以通過append datanode slave nodename的方式添加備機。
[zgy@INTEL175 highavailable]$ adbmgr
psql (PGXC devel, based on PG ADB devel c858e353)
Type "help" for help.
postgres=# list node;
name | host | type | mastername | port | sync_state | path | initialized | incluster
--------+------------+-----------------+------------+-------+------------+----------------------------+-------------+-----------
coord1 | localhost1 | coordinator | | | | /data/zgy/pgxc_data/coord1 | t | t
coord2 | localhost2 | coordinator | | | | /data/zgy/pgxc_data/coord2 | t | t
db1 | localhost2 | datanode master | | | | /data/zgy/pgxc_data/db1s | t | t
db2 | localhost2 | datanode master | | | | /data/zgy/pgxc_data/db2 | t | t
db2 | localhost1 | datanode slave | db2 | | sync | /data/zgy/pgxc_data/db2s | t | t
gtm | localhost1 | gtm master | | | | /data/zgy/pgxc_data/gtm | t | t
( rows)
postgres=# add datanode slave db1(host='localhost1',port=,path='/data/zgy/pgxc_data/db1');
ADD NODE
postgres=# rewind datanode slave db1;
NOTICE: pg_ctl restart datanode slave "db1"
NOTICE: , pg_ctl restart -D /data/zgy/pgxc_data/db1 -Z datanode -m fast -o -i -w -c -l /data/zgy/pgxc_data/db1/logfile
NOTICE: wait max seconds to check datanode slave "db1" running normal
NOTICE: pg_ctl stop datanode slave "db1" with fast mode
NOTICE: , pg_ctl stop -D /data/zgy/pgxc_data/db1 -Z datanode -m fast -o -i -w -c
NOTICE: wait max seconds to check datanode slave "db1" stop complete
NOTICE: update datanode master "db1" pg_hba.conf for the rewind node db1
NOTICE: update gtm master "gtm" pg_hba.conf for the rewind node db1
NOTICE: on datanode master "db1" execute "checkpoint"
NOTICE: , /data/zgy/app/bin/pg_controldata '/data/zgy/pgxc_data/db1s' | grep 'Minimum recovery ending location:' |awk '{print $5}'
NOTICE: receive msg: {"result":"0/0"}
NOTICE: , /data/zgy/app/bin/pg_controldata '/data/zgy/pgxc_data/db1s' |grep 'Min recovery ending loc' |awk '{print $6}'
NOTICE: receive msg: {"result":"0"}
NOTICE: , pg_rewind --target-pgdata /data/zgy/pgxc_data/db1 --source-server='host=10.21.20.176 port=14001 user=zgy dbname=postgres' -N db1
postgres=# monitor all ;
nodename | nodetype | status | description | host | port
----------+-----------------+--------+-------------+--------------+-------
coord1 | coordinator | t | running | |
coord2 | coordinator | t | running | |
db1 | datanode master | t | running | |
db1 | datanode slave | t | running | |
db2 | datanode master | t | running | |
db2 | datanode slave | t | running | |
gtm | gtm master | t | running | |
( rows)
添加db1後,需要手動修改coord2的pgxc_node 關于datanode 節點資訊:
[[email protected] highavailable]$ coord2
psql (PGXC 1.2devel, based on PG 9.3.13 ADB 2.2devel 69c858e353)
Type "help" for help.
postgres=#
postgres=# select * from pgxc_node;
node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-----------+-----------+-----------+--------------+----------------+------------------+-------------
coord1 | C | 6603 | 10.21.20.175 | f | f | 1885696643
coord2 | C | 6603 | 10.21.20.176 | f | f | -1197102633
db1 | D | 14001 | 10.21.20.176 | t | t | -2885965
db2 | D | 15000 | 10.21.20.176 | f | f | -79866771
(4 rows)
postgres=# update pgxc_node set node_port=,node_host='10.21.20.175' where node_name='db1';
UPDATE
postgres=# update pgxc_node set node_port=,node_host='10.21.20.175' where node_name='db2';
UPDATE
postgres=# select pgxc_pool_reload();
pgxc_pool_reload
------------------
t
(1 row)
此時,coord1 讀寫datanode master節點,而coord2又重新讀取datanode slave節點。
4.2 datanode slave 節點發生異常:
antdb監控腳本,針對datanode slave節點狀态異常後,主動嘗試重新拉起,重試間隔和次數均無法拉起。此時就需要将slave節點移出叢集。提供讀的coord的pgxc_node 的datanode節點資訊需要同步更新。
模拟過程:
連接配接adbmgr,手動停止datanode slave db1:
postgres=# stop datanode slave db1 mode f;
NOTICE: 10.21.20.175, pg_ctl stop -D /data/zgy/pgxc_data/db1 -Z datanode -m fast -o -i -w -c
nodename | status | description
----------+--------+-------------
db1 | t | success
(1 row)
postgres=# monitor all ;
nodename | nodetype | status | description | host | port
----------+-----------------+--------+-------------+--------------+-------
coord1 | coordinator | t | running | 10.21.20.175 | 6603
coord2 | coordinator | t | running | 10.21.20.176 | 6603
db1 | datanode master | t | running | 10.21.20.176 | 14001
db1 | datanode slave | f | not running | 10.21.20.175 | 14000
db2 | datanode master | t | running | 10.21.20.176 | 15000
db2 | datanode slave | t | running | 10.21.20.175 | 15001
gtm | gtm master | t | running | 10.21.20.175 | 7663
(7 rows)
此時連接配接coord2,查詢table a,報錯:
[[email protected] highavailable]$ coord2
psql (PGXC 1.2devel, based on PG 9.3.13 ADB 2.2devel 69c858e353)
Type "help" for help.
postgres=# select count(*) from a;
ERROR: error message from poolmgr:reconnect three thimes , could not connect to server: Connection refused
Is the server running on host "10.21.20.175" and accepting
TCP/IP connections on port 14000?
postgres=#
更新coord2的pgxc_node datanode db1的資訊:
postgres=# select * from pgxc_node;
node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-----------+-----------+-----------+--------------+----------------+------------------+-------------
coord1 | C | 6603 | 10.21.20.175 | f | f | 1885696643
coord2 | C | 6603 | 10.21.20.176 | f | f | -1197102633
db1 | D | 14000 | 10.21.20.175 | t | t | -2885965
db2 | D | 15001 | 10.21.20.175 | f | f | -79866771
(4 rows)
postgres=# update pgxc_node set node_port=,node_host='10.21.20.176' where node_name='db1';
UPDATE
postgres=# select pgxc_pool_reload();
pgxc_pool_reload
------------------
t
(1 row)
postgres=# select count(*) from a;
count
-------
120
(1 row)
由于之前添加一主一從為同步流複制方式,是以此時coord1寫會hang住,一直等待db1的同步從節點傳回結果,如下:
[[email protected] ~]$ coord1
psql (PGXC 1.2devel, based on PG 9.3.13 ADB 2.2devel 69c858e353)
Type "help" for help.
postgres=# select count(*) from a;
count
-------
120
(1 row)
postgres=# insert into a select generate_series(,),'jay'||generate_series(,),ceil(random()*)
;
對于這樣的場景,可以連接配接adbmgr 修改db1的流複制方式:
[[email protected] highavailable]$ adbmgr
psql (PGXC 1.2devel, based on PG 9.3.13 ADB 2.2devel 69c858e353)
Type "help" for help.
postgres=# alter datanode slave db1 (sync_state='async');
ALTER NODE
postgres=# monitor all ;
nodename | nodetype | status | description | host | port
----------+-----------------+--------+-------------+--------------+-------
coord1 | coordinator | t | running | 10.21.20.175 | 6603
coord2 | coordinator | t | running | 10.21.20.176 | 6603
db1 | datanode master | t | running | 10.21.20.176 | 14001
db1 | datanode slave | f | not running | 10.21.20.175 | 14000
db2 | datanode master | t | running | 10.21.20.176 | 15000
db2 | datanode slave | t | running | 10.21.20.175 | 15001
gtm | gtm master | t | running | 10.21.20.175 | 7663
(7 rows)
宕掉的datanode slave節點,可以通過adbmgr remove叢集後drop掉。
postgres=# remove datanode slave db1;
WARNING: the datanode master "db1" has no synchronous slave or extra node
REMOVE NODE
postgres=# drop datanode slave db1;
DROP NODE
postgres=# monitor all ;
nodename | nodetype | status | description | host | port
----------+-----------------+--------+-------------+--------------+-------
coord1 | coordinator | t | running | 10.21.20.175 | 6603
coord2 | coordinator | t | running | 10.21.20.176 | 6603
db1 | datanode master | t | running | 10.21.20.176 | 14001
db2 | datanode master | t | running | 10.21.20.176 | 15000
db2 | datanode slave | t | running | 10.21.20.175 | 15001
gtm | gtm master | t | running | 10.21.20.175 | 7663
(6 rows)
4.3 讀slave節點的資料可能不準确
antdb datanode節點也是基于postgresql 流複制的方式實作的。異步wal日志檔案延時&同步節點recover資料延時,極端情況下都将會導緻查詢的資料不準确。這一點暫時沒有想到什麼好的解決方案,期待社群大牛們有好的建議。
AntDB:
開源url:https://github.com/ADBSQL/AntDB
QQ交流群: