使用repmgrd實作postgresql failover和auto failover

前面的文章介紹了postgresql基于repmgr的高可用及切換方案，這篇文章主要聊聊通過repmgrd實作failover及auto failover。

前提是部署好postgresql主從，同時部署好repmgr。

[postgres@node1 ~]$ repmgr cluster show ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                            ----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------------- 1  | node1 | primary | * running |          | default  | 100      | 3        | host=192.168.1.1 user=repmgr dbname=repmgr connect_timeout=2 2  | node2 | standby |   running | node1    | default  | 100      | 3        | host=192.168.1.2 user=repmgr dbname=repmgr connect_timeout=2

failover

停止主庫，模拟主庫故障

[postgres@node1 ~]$ pg_ctl stop -D /pgdata/waiting for server to shut down..... doneserver stopped

備庫檢視是unreachable狀态

[postgres@node2 .ssh]$ repmgr cluster show ID | Name  | Role    | Status        | Upstream | Location | Priority | Timeline | Connection string                                            ----+-------+---------+---------------+----------+----------+----------+----------+--------------------------------------------------------------- 1  | node1 | primary | ? unreachable |          | default  | 100      | ?        | host=192.168.1.1 user=repmgr dbname=repmgr connect_timeout=2 2  | node2 | standby |   running     | ? node1  | default  | 100      | 3        | host=192.168.1.2 user=repmgr dbname=repmgr connect_timeout=2

備庫提升為主庫

[postgres@node2 ~]$ repmgr standby promoteNOTICE: promoting standby to primaryDETAIL: promoting server "node2" (ID: 2) using "pg_ctl  -w -D '/pgdata' promote"waiting for server to promote.... doneserver promotedNOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to completeNOTICE: STANDBY PROMOTE successfulDETAIL: server "node2" (ID: 2) was successfully promoted to primary

新主庫檢視叢集狀态

[postgres@node2 ~]$ repmgr cluster show ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                            ----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------------- 1  | node1 | primary | - failed  |          | default  | 100      | ?        | host=192.168.1.1 user=repmgr dbname=repmgr connect_timeout=2 2  | node2 | primary | * running |          | default  | 100      | 4        | host=192.168.1.2 user=repmgr dbname=repmgr connect_timeout=2 WARNING: following issues were detected  - unable to connect to node "node1" (ID: 1)

原主庫執行rejoin操作重新加入叢集

[postgres@node1 pgdata]$ repmgr node rejoin -d 'host=192.168.1.2 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose --dry-run[postgres@node1 pgdata]$ repmgr node rejoin -d 'host=192.168.1.2 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verboseINFO: looking for configuration file in /etcINFO: configuration file found at: "/etc/repmgr.conf"INFO: prerequisites for using pg_rewind are metINFO: 2 files copied to "/tmp/repmgr-config-archive-node1"NOTICE: executing pg_rewindDETAIL: pg_rewind command is "pg_rewind -D '/pgdata' --source-server='host=192.168.1.2 user=repmgr dbname=repmgr connect_timeout=2'"NOTICE: 2 files copied to /pgdataINFO: directory "/tmp/repmgr-config-archive-node1" deletedINFO: deleting "recovery.done"NOTICE: setting node 1's upstream to node 2WARNING: unable to ping "host=192.168.1.1 user=repmgr dbname=repmgr connect_timeout=2"DETAIL: PQping() returned "PQPING_NO_RESPONSE"NOTICE: starting server using "pg_ctl  -w -D '/pgdata' start"INFO: demoted primary is pingableINFO: node 1 has attached to its upstream nodeNOTICE: NODE REJOIN successfulDETAIL: node 1 is now attached to node 2

檢視叢集狀态

[postgres@node1 pgdata]$ repmgr cluster show ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                            ----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------------- 1  | node1 | standby |   running | node2    | default  | 100      | 3        | host=192.168.1.1 user=repmgr dbname=repmgr connect_timeout=2 2  | node2 | primary | * running |          | default  | 100      | 4        | host=192.168.1.2 user=repmgr dbname=repmgr connect_timeout=2

auto failover

可以利用repmgrd程序實作自動的failover，首先要在repmgr.conf檔案中将location參數設定為一緻，不設定的話預設也是一緻的。同時啟動repmgrd必須在postgres.conf配置檔案中設定shared_preload_libraries='repmgr'

修改主備庫repmgr.conf檔案

failover=automaticpromote_command='/pgsql/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'follow_command='/pgsql/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'log_file=/home/postgres/repmgrd.logmonitoring_history=true （啟用監控參數）                    monitor_interval_secs=5（定義監視資料間隔寫入時間參數）reconnect_attempts=10（故障轉移之前，嘗試重新連接配接主庫次數（預設為6）參數）reconnect_interval=5（每間隔5s嘗試重新連接配接一次參數）

重新開機主備庫使修改生效

[postgres@node1 ~]$ repmgr node service --action=restartDETAIL: executing server command "pg_ctl  -w -D '/pgdata' restart"

主備庫啟動repmgrd

[postgres@node1 ~]$ repmgrd –f /etc/repmgr.conf --pid-file /tmp/repmgrd.pid[2019-09-20 11:51:23] [NOTICE] redirecting logging output to "/home/postgres/repmgrd.log"

模拟主庫故障

[postgres@node1 ~]$ pg_ctl stop -D /pgdata/waiting for server to shut down..... doneserver stopped

檢視備庫日志，發現已經升為主庫

[2019-09-20 12:02:52] [NOTICE] promoting standby to primary[2019-09-20 12:02:52] [DETAIL] promoting server "node2" (ID: 2) using "pg_ctl  -w -D '/pgdata' promote"[2019-09-20 12:02:52] [NOTICE] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete[2019-09-20 12:02:52] [NOTICE] STANDBY PROMOTE successful[2019-09-20 12:02:52] [DETAIL] server "node2" (ID: 2) was successfully promoted to primary[2019-09-20 12:02:52] [INFO] 0 followers to notify[2019-09-20 12:02:52] [INFO] switching to primary monitoring mode[2019-09-20 12:02:52] [NOTICE] monitoring cluster primary "node2" (ID: 2)

檢視cluster狀态，備庫已經升主

[postgres@node2 ~]$ repmgr cluster show ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                            ----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------------- 1  | node1 | primary | - failed  |          | default  | 100      | ?        | host=192.168.1.1 user=repmgr dbname=repmgr connect_timeout=2 2  | node2 | primary | * running |          | default  | 100      | 5        | host=192.168.1.2 user=repmgr dbname=repmgr connect_timeout=2 WARNING: following issues were detected  - unable to connect to node "node1" (ID: 1)

原主庫執行rejoin加入叢集

[postgres@node1 ~]$ repmgr node rejoin -d 'host=192.168.1.2 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose --dry-run[postgres@node1 ~]$ repmgr node rejoin -d 'host=192.168.1.2 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verboseINFO: looking for configuration file in /etcINFO: configuration file found at: "/etc/repmgr.conf"INFO: prerequisites for using pg_rewind are metINFO: 2 files copied to "/tmp/repmgr-config-archive-node1"NOTICE: executing pg_rewindDETAIL: pg_rewind command is "pg_rewind -D '/pgdata' --source-server='host=192.168.1.2 user=repmgr dbname=repmgr connect_timeout=2'"NOTICE: 2 files copied to /pgdataINFO: directory "/tmp/repmgr-config-archive-node1" deletedINFO: deleting "recovery.done"NOTICE: setting node 1's upstream to node 2WARNING: unable to ping "host=192.168.1.1 user=repmgr dbname=repmgr connect_timeout=2"DETAIL: PQping() returned "PQPING_NO_RESPONSE"NOTICE: starting server using "pg_ctl  -w -D '/pgdata' start"INFO: demoted primary is pingableINFO: node 1 has attached to its upstream nodeNOTICE: NODE REJOIN successfulDETAIL: node 1 is now attached to node 2

[postgres@node1 ~]$ repmgr cluster show ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                            ----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------------- 1  | node1 | standby |   running | node2    | default  | 100      | 5        | host=192.168.1.1 user=repmgr dbname=repmgr connect_timeout=2 2  | node2 | primary | * running |          | default  | 100      | 6        |

使用repmgrd實作postgresql failover和auto failover

繼續閱讀

關于Gradle配置的小結

Java小案例——随機數猜測随機數猜測

nginx location中斜線的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的簡單使用

neo4j之cypher使用文檔

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

mybatis_入門程式Mybatis入門

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method