标簽
PostgreSQL , citus , pgbouncer , 網絡優化 , worker節點
https://github.com/digoal/blog/blob/master/201809/20180905_02.md#%E8%83%8C%E6%99%AF 背景
citus 節點間的網絡需求:
1、cn節點通路所有worker節點。oltp業務的通路較頻繁。
2、重分布資料時,worker節點間互相通路。通路頻度不大,OLAP業務常見,一旦有可能資料交換吞吐較大。
citus的cn節點連worker節點為有兩種模式,
一種為事務級保持連接配接模式(每條SQL發起時建立連接配接,SQL結束後釋放連接配接(除非在事務中,否則SQL結束立即釋放連接配接)。),
另一種為會話級保持連接配接模式(會話發起時建立連接配接,會話結束後釋放連接配接。)。
1、跑OLAP類的SQL時,使用的是第一種即時連接配接模式(OLAP場景并發不高,建立連接配接帶來的額外開銷不明顯)
可以在worker節點打開參數進行跟蹤
postgres=# show log_connections ;
log_connections
-----------------
on
(1 row)
postgres=# show log_disconnections ;
log_disconnections
--------------------
on
(1 row)
例子,
以下兩條SQL均為即時短連接配接模式(
Custom Scan (Citus Task-Tracker) Custom Scan (Citus Real-Time)
)。
postgres=# set citus.task_executor_type =task;
ERROR: invalid value for parameter "citus.task_executor_type": "task"
HINT: Available values: real-time, task-tracker.
postgres=# set citus.task_executor_type ='task-tracker';
SET
postgres=# explain select count(*) from pgbench_accounts ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Aggregate (cost=0.00..0.00 rows=0 width=0)
-> Custom Scan (Citus Task-Tracker) (cost=0.00..0.00 rows=0 width=0)
Task Count: 128
Tasks Shown: One of 128
-> Task
Node: host=172.24.211.224 port=1921 dbname=postgres
-> Aggregate (cost=231.85..231.86 rows=1 width=8)
-> Seq Scan on pgbench_accounts_106812 pgbench_accounts (cost=0.00..212.48 rows=7748 width=0)
(8 rows)
postgres=# set citus.task_executor_type ='real-time';
postgres=# explain select count(*) from pgbench_accounts ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Aggregate (cost=0.00..0.00 rows=0 width=0)
-> Custom Scan (Citus Real-Time) (cost=0.00..0.00 rows=0 width=0)
Task Count: 128
Tasks Shown: One of 128
-> Task
Node: host=172.24.211.224 port=1921 dbname=postgres
-> Aggregate (cost=231.85..231.86 rows=1 width=8)
-> Seq Scan on pgbench_accounts_106812 pgbench_accounts (cost=0.00..212.48 rows=7748 width=0)
(8 rows)
2、跑OLTP查詢時(通常并發很高,前端有連接配接池(保持會話)),為會話級保持連接配接模式(
Custom Scan (Citus Router)
)。
以下SQL為長連接配接模式(不會立即釋放,而是等會再釋放,以降低高并發時連接配接帶來的開銷)
postgres=# explain select * from pgbench_accounts where aid=5;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Custom Scan (Citus Router) (cost=0.00..0.00 rows=0 width=0)
Task Count: 1
Tasks Shown: All
-> Task
Node: host=172.24.211.224 port=1921 dbname=postgres
-> Index Scan using pgbench_accounts_pkey_106836 on pgbench_accounts_106836 pgbench_accounts (cost=0.28..2.50 rows=1 width=97)
Index Cond: (aid = 5)
(7 rows)
看以上兩種場景,CITUS應該說設計得已經很不錯了。既能滿足TP也能滿足AP。
但是前面也說了,連接配接保持是在事務或會話層面的,如果查詢量大,或者使用者使用了短連接配接,建立連接配接的開銷就會很凸顯。
newdb=> \c postgres postgres
You are now connected to database "postgres" as user "postgres".
postgres=# select * from pgbench_accounts where aid=1;
aid | bid | abalance | filler
-----+-----+----------+--------------------------------------------------------------------------------------
1 | 1 | 7214 |
(1 row)
Time: 11.264 ms -- 包括建立連接配接的開銷
postgres=# select * from pgbench_accounts where aid=1;
aid | bid | abalance | filler
-----+-----+----------+--------------------------------------------------------------------------------------
1 | 1 | 7214 |
(1 row)
Time: 0.905 ms -- 已建立連接配接
https://github.com/digoal/blog/blob/master/201809/20180905_02.md#%E4%BD%BF%E7%94%A8pgbouncer%E8%A7%A3%E5%86%B3%E5%BB%BA%E7%AB%8B%E8%BF%9E%E6%8E%A5%E7%9A%84%E5%BC%80%E9%94%80 使用pgbouncer,解決建立連接配接的開銷

在worker節點上,部署pgbouncer,所有與worker節點建立的連接配接都通過pgbouncer連接配接池,以此來保持住連接配接,降低worker節點頻繁建立連接配接的開銷。
部署方法
https://github.com/digoal/blog/blob/master/201809/20180905_02.md#1%E6%89%80%E6%9C%89worker%E8%8A%82%E7%82%B9 1、所有worker節點
pgbouncer
yum install -y pgbouncer
配置
vi /etc/pgbouncer/pgb.ini
[databases]
newdb = host=/tmp dbname=newdb port=1921 user=digoal pool_size=128 reserve_pool=10
[pgbouncer]
logfile = /var/log/pgbouncer/pgbouncer.log
pidfile = /var/run/pgbouncer/pgbouncer.pid
listen_addr = 0.0.0.0
listen_port = 8001
auth_type = any
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = session
server_reset_query = DISCARD ALL
max_client_conn = 5000
default_pool_size = 128
; 最大不要超過4倍CPU數
啟動
pgbouncer -d -u pgbouncer /etc/pgbouncer/pgb.ini
在一個citus叢集中,可以同時存在直連worker或者通過pgbouncer連接配接worker。
不同的database可以有不同的配置。
以下例子,建立一個database,使用pgbouncer連接配接worker.
https://github.com/digoal/blog/blob/master/201809/20180905_02.md#2%E6%89%80%E6%9C%89%E8%8A%82%E7%82%B9%E5%8C%85%E6%8B%ACcn-worker 2、所有節點(包括cn, worker)
建立資料庫,插件
su - postgres
psql -c "create role digoal login;"
psql -c "create database newdb;"
psql -c "grant all on database newdb to digoal;"
psql -U postgres newdb -c "create extension citus;"
https://github.com/digoal/blog/blob/master/201809/20180905_02.md#cn%E8%8A%82%E7%82%B9 cn節點
将worker添加到叢集配置,使用pgbouncer的連接配接端口
su - postgres -c "psql -U postgres newdb -c \"SELECT * from master_add_node('xxx.xxx.xxx.224', 8001);\""
su - postgres -c "psql -U postgres newdb -c \"SELECT * from master_add_node('xxx.xxx.xxx.230', 8001);\""
su - postgres -c "psql -U postgres newdb -c \"SELECT * from master_add_node('xxx.xxx.xxx.231', 8001);\""
su - postgres -c "psql -U postgres newdb -c \"SELECT * from master_add_node('xxx.xxx.xxx.225', 8001);\""
su - postgres -c "psql -U postgres newdb -c \"SELECT * from master_add_node('xxx.xxx.xxx.227', 8001);\""
su - postgres -c "psql -U postgres newdb -c \"SELECT * from master_add_node('xxx.xxx.xxx.232', 8001);\""
su - postgres -c "psql -U postgres newdb -c \"SELECT * from master_add_node('xxx.xxx.xxx.226', 8001);\""
su - postgres -c "psql -U postgres newdb -c \"SELECT * from master_add_node('xxx.xxx.xxx.229', 8001);\""
MX配置,同樣,将worker節點添加到中繼資料同步中。
psql newdb postgres
select * from master_add_node('xxx.xxx.xxx.224',8001);
select * from master_add_node('xxx.xxx.xxx.230',8001);
開啟同步到中繼資料。
select start_metadata_sync_to_node('xxx.xxx.xxx.224',8001);
select start_metadata_sync_to_node('xxx.xxx.xxx.230',8001);
https://github.com/digoal/blog/blob/master/201809/20180905_02.md#%E6%B5%8B%E8%AF%95 測試
1、tpc-b 長連接配接測試
pgbench -i -s -U digoal newdb
psql -U digoal newdb
select create_distributed_table('pgbench_accounts','aid');
select create_distributed_table('pgbench_branches','bid');
select create_distributed_table('pgbench_tellers','tid');
select create_distributed_table('pgbench_history','aid');
pgbench -M prepared -v -r -P 1 -c 64 -j 64 -T 120 -U digoal newdb -S
性能與不使用pgbouncer差不多,因為使用了長連接配接測試簡單SQL(本身citus就使用了會話級連接配接保持,沒有短連接配接問題)。
https://github.com/digoal/blog/blob/master/201809/20180905_02.md#citus%E7%AC%AC%E4%B8%80%E6%AC%A1query%E7%9A%84%E9%9A%90%E8%97%8F%E8%80%97%E6%97%B6 citus第一次QUERY的隐藏耗時
在一個建立的會話中,第一次查詢總是需要需要耗費更多的時間(如果沒有建立連接配接,那麼包括建立連接配接的時間。即使已建立連接配接,也需要耗費額外的一些時間)(具體原因可以跟蹤分析一下code)。
以下使用的是pgbouncer連接配接worker,是以第一次QUERY不包含建立連接配接的時間。
newdb=> \q
postgres@digoal-citus-gpdb-test001-> psql newdb digoal
psql (10.5)
Type "help" for help.
\timing
newdb=> select * from pgbench_accounts where aid=5;
aid | bid | abalance | filler
-----+-----+----------+--------------------------------------------------------------------------------------
5 | 1 | 0 |
(1 row)
Time: 6.016 ms -- 不包括建立連接配接(已使用pgbouncer建立),但是也多了幾毫秒
-- 但是相比未使用pgbouncer,已經降低了5毫秒左右的延遲。
newdb=> select * from pgbench_accounts where aid=5;
aid | bid | abalance | filler
-----+-----+----------+--------------------------------------------------------------------------------------
5 | 1 | 0 |
(1 row)
Time: 0.989 ms
多出的幾毫秒,我們社群的小夥伴鄧彪、王健給出了原因如下,很多地方需要檢查安裝的citus版本與citus.control控制檔案的版本是否相容(比如加載分布式TABLE的RELCACHE時,第一次通路就是這個問題),不相容報錯:
詳見代碼
https://github.com/citusdata/citus/blob/3fa04d8f2c8f27b1377fe4c1468ee47358117e3c/src/backend/distributed/utils/metadata_cache.c/*
* CheckAvailableVersion compares CITUS_EXTENSIONVERSION and the currently
* available version from the citus.control file. If they are not compatible,
* this function logs an error with the specified elevel and returns false,
* otherwise it returns true.
*/
bool
CheckAvailableVersion(int elevel)
{
char *availableVersion = NULL;
if (!EnableVersionChecks)
{
return true;
}
availableVersion = AvailableExtensionVersion();
if (!MajorVersionsCompatible(availableVersion, CITUS_EXTENSIONVERSION))
{
ereport(elevel, (errmsg("loaded Citus library version differs from latest "
"available extension version"),
errdetail("Loaded library requires %s, but the latest control "
"file specifies %s.", CITUS_MAJORVERSION,
availableVersion),
errhint("Restart the database to load the latest Citus "
"library.")));
return false;
}
return true;
}
/*
* AvailableExtensionVersion returns the Citus version from citus.control file. It also
* saves the result, thus consecutive calls to CitusExtensionAvailableVersion will
* not read the citus.control file again.
*/
static char *
AvailableExtensionVersion(void)
{
ReturnSetInfo *extensionsResultSet = NULL;
TupleTableSlot *tupleTableSlot = NULL;
FunctionCallInfoData *fcinfo = NULL;
FmgrInfo *flinfo = NULL;
int argumentCount = 0;
EState *estate = NULL;
bool hasTuple = false;
bool goForward = true;
bool doCopy = false;
char *availableExtensionVersion;
InitializeCaches();
estate = CreateExecutorState();
extensionsResultSet = makeNode(ReturnSetInfo);
extensionsResultSet->econtext = GetPerTupleExprContext(estate);
extensionsResultSet->allowedModes = SFRM_Materialize;
fcinfo = palloc0(sizeof(FunctionCallInfoData));
flinfo = palloc0(sizeof(FmgrInfo));
fmgr_info(F_PG_AVAILABLE_EXTENSIONS, flinfo);
InitFunctionCallInfoData(*fcinfo, flinfo, argumentCount, InvalidOid, NULL,
(Node *) extensionsResultSet);
/* pg_available_extensions returns result set containing all available extensions */
(*pg_available_extensions)(fcinfo);
tupleTableSlot = MakeSingleTupleTableSlot(extensionsResultSet->setDesc);
hasTuple = tuplestore_gettupleslot(extensionsResultSet->setResult, goForward, doCopy,
tupleTableSlot);
while (hasTuple)
{
Datum extensionNameDatum = 0;
char *extensionName = NULL;
bool isNull = false;
extensionNameDatum = slot_getattr(tupleTableSlot, 1, &isNull);
extensionName = NameStr(*DatumGetName(extensionNameDatum));
if (strcmp(extensionName, "citus") == 0)
{
MemoryContext oldMemoryContext = NULL;
Datum availableVersion = slot_getattr(tupleTableSlot, 2, &isNull);
/* we will cache the result of citus version to prevent catalog access */
oldMemoryContext = MemoryContextSwitchTo(CacheMemoryContext);
availableExtensionVersion = text_to_cstring(DatumGetTextPP(availableVersion));
MemoryContextSwitchTo(oldMemoryContext);
ExecClearTuple(tupleTableSlot);
ExecDropSingleTupleTableSlot(tupleTableSlot);
return availableExtensionVersion;
}
ExecClearTuple(tupleTableSlot);
hasTuple = tuplestore_gettupleslot(extensionsResultSet->setResult, goForward,
doCopy, tupleTableSlot);
}
ExecDropSingleTupleTableSlot(tupleTableSlot);
ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("citus extension is not found")));
return NULL;
}
https://github.com/digoal/blog/blob/master/201809/20180905_02.md#pgbouncer%E8%BF%9E%E6%8E%A5%E6%B1%A0%E8%BF%9E%E6%8E%A5worker%E7%9A%84%E5%A5%BD%E5%A4%84 pgbouncer連接配接池連接配接worker的好處
1、對于業務層短連接配接會有比較好的效果。可以降低至少5毫秒左右的延遲。
2、對于大量複雜查詢(需要motion的查詢),可以減少節點間的連接配接數。
https://github.com/digoal/blog/blob/master/201809/20180905_02.md#%E5%90%8C%E6%97%B6%E4%B9%9F%E9%9C%80%E8%A6%81%E6%B3%A8%E6%84%8Fpgbouncer%E7%93%B6%E9%A2%88%E5%B8%A6%E5%AE%BDqps%E5%8D%95%E6%A0%B8%E5%8D%95%E4%B8%AApgbouncer%E8%BF%9B%E7%A8%8B%E5%B3%B0%E5%80%BCqps%E7%BA%A65%E4%B8%87 同時也需要注意pgbouncer瓶頸:帶寬,QPS(單核,單個pgbouncer程序峰值QPS約5萬)
https://github.com/digoal/blog/blob/master/201809/20180905_02.md#%E5%8F%82%E8%80%83 參考
man 1 pgbouncer
man 5 pgbouncer
《PostgreSQL 連接配接池 pgbouncer 使用》