目錄
1.下載下傳
2.搭建mha
2.1 系統配置
2.2 架構
2.3 添加ssh公鑰信任
2.4 安裝mha節點
2.5 manager配置檔案
2.6 檢查
2.7 啟動manager程序
2.8 碰到的問題
3.測試切換
3.1 正常切換測試
3.2 回切測試
3.3 雪崩測試
3.4 主從不一緻切換測試
下載下傳
mha連結位址:http://pan.baidu.com/s/1pJkDGX9#dir/path=%2Fmysql%2FHA%2Fmha
或者:https://code.google.com/p/mysql-master-ha/
添加一個yum源:wget http://dl.fedoraproject.org/pub/ ... ease-5-4.noarch.rpm
rpm -ivh epel-release-5-4.noarch.rpm
Mha分manager節點和node節點。可以下載下傳源碼包,或者rpm包,随個人喜好,注意,版本與系統核心要比對。
搭建mha
1.系統配置
配置如下:
系統 | 核心 | Mysql版本 | 記憶體 |
CentOS release 5.8 | Linux 2.6.18-308.el5xen | Mysql 5.5.35 | 2G |
2.架構
伺服器清單:
IP | 機器名 | 角色 |
192.168.2.7 | haproxy001 | manager |
192.168.1.241 | Hd-dm-test01 | Node |
192.168.1.242 | Hd-dm-test02 | Node |
192.168.1.243 | Hd-dm-test03 | node |
架構圖:
<ignore_js_op>

3.添加ssh公鑰信任
(1)node節點之間配置公鑰信任,Manager節點向node節點添加公鑰信任
(2)建立公鑰密鑰,并互傳公鑰
[[email protected] ~]$ ssh-keygen -t rsa
結果如下:
[[email protected] ~]$ ls -l ~/.ssh/
-rw------- 1 leiche leiche 1675 07-04 10:18 id_rsa
-rw-r--r-- 1 leiche leiche 399 07-04 10:18 id_rsa.pub
-rw-r--r-- 1 leiche leiche 1182 07-04 10:27 known_hosts
配置manager公鑰信任:
[[email protected] ~] ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
[[email protected] ~] ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
[[email protected] ~] ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
配置node節點間的公鑰信任:
[[email protected]~] ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
[[email protected]~] ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
Ssh-copy-id加節點信任是追加的方式,在~/.ssh/目錄下生成authorized_keys檔案,可以打開檢視:
[[email protected] .ssh]# sudo cat authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA35i+lWMDeWJYvamobI3OdmSFXLMt4ngsI6aiJCkv2ZMaeoKEzdeK2GWP9P/F7dU2j1DCgKh8yzF9o4E713k5KW92RmeyglDt590R0PPNvGDFAauQ5wEkufylYLbaJ0fwsJPjeM4ZBrT7RqdYGn3lGQFW32Cih9LcV2A+8NkQbrh/kG2wcTiaejFQFgSKn87nsokhg5L/zY6qkeuV0dPfqfpeSI8uhTI+VUC83/5odeunXrPE30o6fZZDpp2oszt2TEElldZ6pS9mL3ZhGuP/o/IPe+w5/cK9J4C815y2mr9Agr/UUQoj1K4WWRn5uZ5y/pyvAAlbPteNVVlx4djoIw== [email protected]
...
(3)檢查
[[email protected] ~]# ssh 172.16.1.243 "/sbin/ifconfig |grep 'inet addr' |head -1"
inet addr:172.16.1.243 Bcast:172.16.1.255 Mask:255.255.255.0
4.安裝mha節點
在manager伺服器2.7上安裝manager節點:
rpm -ivh mha4mysql-manager-0.53-0.noarch.rpm
在manager和node伺服器安裝node節點:
sudo rpm -ivh mha4mysql-node-0.53-0.noarch.rpm
依賴包:
sudo yum -y install perl-DBD-MySQL.x86_64
sudo yum -y install perl-Log-Dispatch
sudo yum -y install perl-Config-Tiny
sudo yum -y install perl-Parallel-ForkManager
5.配置檔案
在manager伺服器上建立目錄
mkdir /etc/masterha
mkdir -p /masterha/app1
生成配置檔案/etc/masterha/app1.cnf
[server default]
#manager dir
manager_workdir=/masterha/app1
manager_log=/masterha/app1/manager.log
remote_workdir=/masterha/app1
#mysql manager user
user=root
password=123456
#node server user
ssh_user=root
#replication_user
repl_user=repluser
repl_password=hoodong
#checking master every second
ping_interval=1
#promote script
#shutdown_script=""
#master_ip_failover_script="/usr/local/bin/master_ip_failover"
#master_ip_online_change_script=""
#report_script=""
[server1]
hostname=172.16.1.241
master_binlog_dir="/data/mysqllog/3306"
ssh_port=22
candidate_master=1
[server2]
hostname=172.16.1.242
master_binlog_dir="/data/mysqllog/3306"
ssh_port=22
candidate_master=1
[server3]
hostname=172.16.1.243
master_binlog_dir="/data/mysqllog/3306"
ssh_port=22
candidate_master=1
6.檢查:
檢查ssh
[[email protected] app1]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Sat Jul 5 12:57:24 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Jul 5 12:57:24 2014 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Sat Jul 5 12:57:24 2014 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Sat Jul 5 12:57:24 2014 - [info] Starting SSH connection tests..
Sat Jul 5 12:57:26 2014 - [debug]
Sat Jul 5 12:57:24 2014 - [debug] Connecting via SSH from [email protected](172.16.1.241:22) to [email protected](172.16.1.242:22)..
Sat Jul 5 12:57:24 2014 - [debug] ok.
Sat Jul 5 12:57:24 2014 - [debug] Connecting via SSH from [email protected](172.16.1.241:22) to [email protected](172.16.1.243:22)..
Sat Jul 5 12:57:26 2014 - [debug] ok.
Sat Jul 5 12:57:26 2014 - [debug]
Sat Jul 5 12:57:24 2014 - [debug] Connecting via SSH from [email protected](172.16.1.242:22) to [email protected](172.16.1.241:22)..
Sat Jul 5 12:57:25 2014 - [debug] ok.
Sat Jul 5 12:57:25 2014 - [debug] Connecting via SSH from [email protected](172.16.1.242:22) to [email protected](172.16.1.243:22)..
Sat Jul 5 12:57:26 2014 - [debug] ok.
Sat Jul 5 12:57:27 2014 - [debug]
Sat Jul 5 12:57:25 2014 - [debug] Connecting via SSH from [email protected](172.16.1.243:22) to [email protected](172.16.1.241:22)..
Sat Jul 5 12:57:26 2014 - [debug] ok.
Sat Jul 5 12:57:26 2014 - [debug] Connecting via SSH from [email protected](172.16.1.243:22) to [email protected](172.16.1.242:22)..
Sat Jul 5 12:57:27 2014 - [debug] ok.
Sat Jul 5 12:57:27 2014 - [info] All SSH connection tests passed successfully.
檢查複制
[[email protected] app1]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Sat Jul 5 12:57:05 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Jul 5 12:57:05 2014 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Sat Jul 5 12:57:05 2014 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Sat Jul 5 12:57:05 2014 - [info] MHA::MasterMonitor version 0.53.
Sat Jul 5 12:57:06 2014 - [info] Dead Servers:
Sat Jul 5 12:57:06 2014 - [info] Alive Servers:
Sat Jul 5 12:57:06 2014 - [info] 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 12:57:06 2014 - [info] 172.16.1.242(172.16.1.242:3306)
Sat Jul 5 12:57:06 2014 - [info] 172.16.1.243(172.16.1.243:3306)
Sat Jul 5 12:57:06 2014 - [info] Alive Slaves:
Sat Jul 5 12:57:06 2014 - [info] 172.16.1.242(172.16.1.242:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul 5 12:57:06 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 12:57:06 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Jul 5 12:57:06 2014 - [info] 172.16.1.243(172.16.1.243:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul 5 12:57:06 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 12:57:06 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Jul 5 12:57:06 2014 - [info] Current Alive Master: 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 12:57:06 2014 - [info] Checking slave configurations..
Sat Jul 5 12:57:06 2014 - [info] Checking replication filtering settings..
Sat Jul 5 12:57:06 2014 - [info] binlog_do_db= , binlog_ignore_db=
Sat Jul 5 12:57:06 2014 - [info] Replication filtering check ok.
Sat Jul 5 12:57:06 2014 - [info] Starting SSH connection tests..
Sat Jul 5 12:57:09 2014 - [info] All SSH connection tests passed successfully.
Sat Jul 5 12:57:09 2014 - [info] Checking MHA Node version..
Sat Jul 5 12:57:09 2014 - [info] Version check ok.
Sat Jul 5 12:57:09 2014 - [info] Checking SSH publickey authentication settings on the current master..
Sat Jul 5 12:57:10 2014 - [info] HealthCheck: SSH to 172.16.1.241 is reachable.
Sat Jul 5 12:57:10 2014 - [info] Master MHA Node version is 0.53.
Sat Jul 5 12:57:10 2014 - [info] Checking recovery script configurations on the current master..
Sat Jul 5 12:57:10 2014 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysqllog/3306 --output_file=/masterha/app1/save_binary_logs_test --manager_version=0.53 --start_file=mysql3306.000008
Sat Jul 5 12:57:10 2014 - [info] Connecting to [email protected](172.16.1.241)..
Creating /masterha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/mysqllog/3306, up to mysql3306.000008
Sat Jul 5 12:57:11 2014 - [info] Master setting check done.
Sat Jul 5 12:57:11 2014 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sat Jul 5 12:57:11 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=172.16.1.242 --slave_ip=172.16.1.242 --slave_port=3306 --workdir=/masterha/app1 --target_version=5.5.35-log --manager_version=0.53 --relay_log_info=/data/mysqldata/3306/relay-log.info --relay_dir=/data/mysqldata/3306/ --slave_pass=xxx
Sat Jul 5 12:57:11 2014 - [info] Connecting to [email protected](172.16.1.242:22)..
Checking slave recovery environment settings..
Opening /data/mysqldata/3306/relay-log.info ... ok.
Relay log found at /data/mysqllog/3306, up to mysql-relay-bin.000005
Temporary relay log file is /data/mysqllog/3306/mysql-relay-bin.000005
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sat Jul 5 12:57:11 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=172.16.1.243 --slave_ip=172.16.1.243 --slave_port=3306 --workdir=/masterha/app1 --target_version=5.5.35-log --manager_version=0.53 --relay_log_info=/data/mysqldata/3306/relay-log.info --relay_dir=/data/mysqldata/3306/ --slave_pass=xxx
Sat Jul 5 12:57:11 2014 - [info] Connecting to [email protected](172.16.1.243:22)..
Checking slave recovery environment settings..
Opening /data/mysqldata/3306/relay-log.info ... ok.
Relay log found at /data/mysqllog/3306, up to mysql-relay-bin.000016
Temporary relay log file is /data/mysqllog/3306/mysql-relay-bin.000016
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sat Jul 5 12:57:12 2014 - [info] Slaves settings check done.
Sat Jul 5 12:57:12 2014 - [info]
172.16.1.241 (current master)
+--172.16.1.242
+--172.16.1.243
Sat Jul 5 12:57:12 2014 - [info] Checking replication health on 172.16.1.242..
Sat Jul 5 12:57:12 2014 - [info] ok.
Sat Jul 5 12:57:12 2014 - [info] Checking replication health on 172.16.1.243..
Sat Jul 5 12:57:12 2014 - [info] ok.
Sat Jul 5 12:57:12 2014 - [warning] master_ip_failover_script is not defined.
Sat Jul 5 12:57:12 2014 - [warning] shutdown_script is not defined.
Sat Jul 5 12:57:12 2014 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
7.啟動manager節點程序
啟動manager節點程序,日志被計入/data/masterha/manager.log
masterha_manager --conf=/etc/masterha/app1.cnf &
檢視日志
Sat Jul 5 17:41:15 2014 - [info] MHA::MasterMonitor version 0.53.
Sat Jul 5 17:41:15 2014 - [info] Dead Servers:
Sat Jul 5 17:41:15 2014 - [info] Alive Servers:
Sat Jul 5 17:41:15 2014 - [info] 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 17:41:15 2014 - [info] 172.16.1.242(172.16.1.242:3306)
Sat Jul 5 17:41:15 2014 - [info] 172.16.1.243(172.16.1.243:3306)
Sat Jul 5 17:41:15 2014 - [info] Alive Slaves:
Sat Jul 5 17:41:15 2014 - [info] 172.16.1.242(172.16.1.242:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul 5 17:41:15 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 17:41:15 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Jul 5 17:41:15 2014 - [info] 172.16.1.243(172.16.1.243:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul 5 17:41:15 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 17:41:15 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Jul 5 17:41:15 2014 - [info] Current Alive Master: 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 17:41:15 2014 - [info] Checking slave configurations..
Sat Jul 5 17:41:15 2014 - [info] Checking replication filtering settings..
Sat Jul 5 17:41:15 2014 - [info] binlog_do_db= , binlog_ignore_db=
Sat Jul 5 17:41:15 2014 - [info] Replication filtering check ok.
Sat Jul 5 17:41:15 2014 - [info] Starting SSH connection tests..
Sat Jul 5 17:41:18 2014 - [info] All SSH connection tests passed successfully.
Sat Jul 5 17:41:18 2014 - [info] Checking MHA Node version..
Sat Jul 5 17:41:19 2014 - [info] Version check ok.
Sat Jul 5 17:41:19 2014 - [info] Checking SSH publickey authentication settings on the current master..
Sat Jul 5 17:41:20 2014 - [info] HealthCheck: SSH to 172.16.1.241 is reachable.
Sat Jul 5 17:41:20 2014 - [info] Master MHA Node version is 0.53.
Sat Jul 5 17:41:20 2014 - [info] Checking recovery script configurations on the current master..
Sat Jul 5 17:41:20 2014 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysqllog/3306 --output_file=/masterha/app1/save_binary_logs_test --manager_version=0.53 --start_file=mysql3306.000008
Sat Jul 5 17:41:20 2014 - [info] Connecting to [email protected](172.16.1.241)..
Creating /masterha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/mysqllog/3306, up to mysql3306.000008
Sat Jul 5 17:41:20 2014 - [info] Master setting check done.
Sat Jul 5 17:41:20 2014 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sat Jul 5 17:41:20 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=172.16.1.242 --slave_ip=172.16.1.242 --slave_port=3306 --workdir=/masterha/app1 --target_version=5.5.35-log --manager_version=0.53 --relay_log_info=/data/mysqldata/3306/relay-log.info --relay_dir=/data/mysqldata/3306/ --slave_pass=xxx
Sat Jul 5 17:41:20 2014 - [info] Connecting to [email protected](172.16.1.242:22)..
Checking slave recovery environment settings..
Opening /data/mysqldata/3306/relay-log.info ... ok.
Relay log found at /data/mysqllog/3306, up to mysql-relay-bin.000005
Temporary relay log file is /data/mysqllog/3306/mysql-relay-bin.000005
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sat Jul 5 17:41:21 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=172.16.1.243 --slave_ip=172.16.1.243 --slave_port=3306 --workdir=/masterha/app1 --target_version=5.5.35-log --manager_version=0.53 --relay_log_info=/data/mysqldata/3306/relay-log.info --relay_dir=/data/mysqldata/3306/ --slave_pass=xxx
Sat Jul 5 17:41:21 2014 - [info] Connecting to [email protected](172.16.1.243:22)..
Checking slave recovery environment settings..
Opening /data/mysqldata/3306/relay-log.info ... ok.
Relay log found at /data/mysqllog/3306, up to mysql-relay-bin.000016
Temporary relay log file is /data/mysqllog/3306/mysql-relay-bin.000016
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sat Jul 5 17:41:21 2014 - [info] Slaves settings check done.
Sat Jul 5 17:41:21 2014 - [info]
172.16.1.241 (current master)
+--172.16.1.242
+--172.16.1.243
Sat Jul 5 17:41:21 2014 - [warning] master_ip_failover_script is not defined.
Sat Jul 5 17:41:21 2014 - [warning] shutdown_script is not defined.
Sat Jul 5 17:41:21 2014 - [info] Set master ping interval 1 seconds.
Sat Jul 5 17:41:21 2014 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Sat Jul 5 17:41:21 2014 - [info] Starting ping health check on 172.16.1.241(172.16.1.241:3306)..
Sat Jul 5 17:41:21 2014 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
檢視狀态
[[email protected] app1]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:21576) is running(0:PING_OK), master:172.16.1.241
8.碰到的問題:
問題1:執行ssh檢查時
Address 172.16.1.241 maps to localhost, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT!
解決:在manager伺服器/etc/hosts加上其他伺服器的解析
問題2:檢查複制時
Fri Jul 4 12:29:25 2014 - [info] Starting SSH connection tests..
cat: /masterha/app1/172.16.1.241_22_ssh_check.log: No such file or directory
Fri Jul 4 12:29:25 2014 - [error][/usr/lib/perl5/vendor_perl/5.8.8/Parallel/ForkManager.pm, ln354]
cat: /masterha/app1/172.16.1.242_22_ssh_check.log: No such file or directory
Fri Jul 4 12:29:26 2014 - [error][/usr/lib/perl5/vendor_perl/5.8.8/Parallel/ForkManager.pm, ln354]
cat: /masterha/app1/172.16.1.243_22_ssh_check.log: No such file or directory
Fri Jul 4 12:29:26 2014 - [error][/usr/lib/perl5/vendor_perl/5.8.8/Parallel/ForkManager.pm, ln354]
Fri Jul 4 12:29:26 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln475] Error happend on checking configurations. SSH Configuration Check Failed!
at /usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm line 339
Fri Jul 4 12:29:26 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln644] Error happened on monitoring servers.
Fri Jul 4 12:29:26 2014 - [info] Got exit code 1 (Not master dead).
解決:剛開始用一個sudo權限使用者執行,是以報錯,換成root使用者,就沒問題了。
問題3:檢查複制時
Sat Jul 5 10:38:33 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=172.16.1.242 --slave_ip=172.16.1.242 --slave_port=3306 --workdir=/masterha/app1 --target_version=5.5.35-log --manager_version=0.53 --relay_log_info=/data/mysqldata/3306/relay-log.info --relay_dir=/data/mysqldata/3306/ --slave_pass=xxx
Sat Jul 5 10:38:33 2014 - [info] Connecting to [email protected](172.16.1.242:22)..
mysqlbinlog version is 3.2 (included in MySQL Client 5.0 or lower). This is not recommended. Consider upgrading MySQL Client to 5.1 or higher.
mysqlbinlog is 3.2 (included in MySQL Client 5.0 or lower), but MySQL server version is 5.5.35-log. mysqlbinlog can not parse row based events. Terminating script for safety reasons.
at /usr/bin/apply_diff_relay_logs line 463
Sat Jul 5 10:38:33 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln371] Slaves settings check failed!
Sat Jul 5 10:38:33 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln248] Slave configuration failed.
Sat Jul 5 10:38:33 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln475] Error happend on checking configurations. at /usr/bin/masterha_check_repl line 48
Sat Jul 5 10:38:33 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln644] Error happened on monitoring servers.
Sat Jul 5 10:38:33 2014 - [info] Got exit code 1 (Not master dead).
解決:/usr/bin/mysqlbinlog版本過低。
[[email protected] ~]# apply_diff_relay_logs --command=test --slave_user=root --slave_host=172.16.1.242 --slave_ip=172.16.1.242 --slave_port=3306 --workdir=/masterha/app1 --target_version=5.5.35-log --manager_version=0.53 --relay_log_info=/data/mysqldata/3306/relay-log.info --relay_dir=/data/mysqldata/3306/ --slave_pass=hoodong
mysqlbinlog version is 3.2 (included in MySQL Client 5.0 or lower). This is not recommended. Consider upgrading MySQL Client to 5.1 or higher.
mysqlbinlog is 3.2 (included in MySQL Client 5.0 or lower), but MySQL server version is 5.5.35-log. mysqlbinlog can not parse row based events. Terminating script for safety reasons.
at /usr/bin/apply_diff_relay_logs line 463
[[email protected] ~]# /usr/local/mysql/bin/mysqlbinlog -V
/usr/local/mysql/bin/mysqlbinlog Ver 3.3 for Linux at x86_64
[[email protected] ~]# which mysqlbinlog
/usr/bin/mysqlbinlog
[[email protected] ~]# /usr/bin/mysqlbinlog -V
/usr/bin/mysqlbinlog Ver 3.2 for redhat-linux-gnu at x86_64
[[email protected] ~]# sudo rm -f /usr/bin/mysqlbinlog
[[email protected] ~]# sudo ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
測試切換
未配置以下參數,進行切換測試:
#promote script
#shutdown_script=""
#master_ip_failover_script="/usr/local/bin/master_ip_failover"
#master_ip_online_change_script=""
#report_script=""
3.1.正常切換測試
測試内容:手動停止寫庫241mysql服務,觀察主從狀況,觀察manager程序狀況。
3.1.1停止主寫241mysql端口
[[email protected] ~]# sudo /etc/init.d/mysqld3306 stop
Stopping MySQL: [ OK ]
3.1.2觀察manager日志,記錄切換時間
[[email protected] ~]# sudo tail -f /masterha/app1/manager.log
172.16.1.241 (current master)
+--172.16.1.242
+--172.16.1.243
Sat Jul 5 17:41:21 2014 - [warning] master_ip_failover_script is not defined.
Sat Jul 5 17:41:21 2014 - [warning] shutdown_script is not defined.
Sat Jul 5 17:41:21 2014 - [info] Set master ping interval 1 seconds.
Sat Jul 5 17:41:21 2014 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Sat Jul 5 17:41:21 2014 - [info] Starting ping health check on 172.16.1.241(172.16.1.241:3306)..
Sat Jul 5 17:41:21 2014 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Sat Jul 5 18:40:21 2014 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Sat Jul 5 18:40:21 2014 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysqllog/3306 --output_file=/masterha/app1/save_binary_logs_test --manager_version=0.53 --binlog_prefix=mysql3306
Sat Jul 5 18:40:21 2014 - [info] HealthCheck: SSH to 172.16.1.241 is reachable.
Sat Jul 5 18:40:22 2014 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Sat Jul 5 18:40:22 2014 - [warning] Connection failed 1 time(s)..
Sat Jul 5 18:40:23 2014 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Sat Jul 5 18:40:23 2014 - [warning] Connection failed 2 time(s)..
Sat Jul 5 18:40:24 2014 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Sat Jul 5 18:40:24 2014 - [warning] Connection failed 3 time(s)..
Sat Jul 5 18:40:24 2014 - [warning] Master is not reachable from health checker!
Sat Jul 5 18:40:24 2014 - [warning] Master 172.16.1.241(172.16.1.241:3306) is not reachable!
Sat Jul 5 18:40:24 2014 - [warning] SSH is reachable.
Sat Jul 5 18:40:24 2014 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..
Sat Jul 5 18:40:24 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Jul 5 18:40:24 2014 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Sat Jul 5 18:40:24 2014 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Sat Jul 5 18:40:24 2014 - [info] Dead Servers:
Sat Jul 5 18:40:24 2014 - [info] 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 18:40:24 2014 - [info] Alive Servers:
Sat Jul 5 18:40:24 2014 - [info] 172.16.1.242(172.16.1.242:3306)
Sat Jul 5 18:40:24 2014 - [info] 172.16.1.243(172.16.1.243:3306)
Sat Jul 5 18:40:24 2014 - [info] Alive Slaves:
Sat Jul 5 18:40:24 2014 - [info] 172.16.1.242(172.16.1.242:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul 5 18:40:24 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 18:40:24 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Jul 5 18:40:24 2014 - [info] 172.16.1.243(172.16.1.243:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul 5 18:40:24 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 18:40:24 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Jul 5 18:40:24 2014 - [info] Checking slave configurations..
Sat Jul 5 18:40:24 2014 - [info] Checking replication filtering settings..
Sat Jul 5 18:40:24 2014 - [info] Replication filtering check ok.
Sat Jul 5 18:40:24 2014 - [info] Master is down!
Sat Jul 5 18:40:24 2014 - [info] Terminating monitoring script.
Sat Jul 5 18:40:24 2014 - [info] Got exit code 20 (Master dead).
Sat Jul 5 18:40:24 2014 - [info] MHA::MasterFailover version 0.53.
Sat Jul 5 18:40:24 2014 - [info] Starting master failover.
Sat Jul 5 18:40:24 2014 - [info]
Sat Jul 5 18:40:24 2014 - [info] * Phase 1: Configuration Check Phase..
Sat Jul 5 18:40:24 2014 - [info]
Sat Jul 5 18:40:24 2014 - [info] Dead Servers:
Sat Jul 5 18:40:24 2014 - [info] 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 18:40:24 2014 - [info] Checking master reachability via mysql(double check)..
Sat Jul 5 18:40:24 2014 - [info] ok.
Sat Jul 5 18:40:24 2014 - [info] Alive Servers:
Sat Jul 5 18:40:24 2014 - [info] 172.16.1.242(172.16.1.242:3306)
Sat Jul 5 18:40:24 2014 - [info] 172.16.1.243(172.16.1.243:3306)
Sat Jul 5 18:40:24 2014 - [info] Alive Slaves:
Sat Jul 5 18:40:24 2014 - [info] 172.16.1.242(172.16.1.242:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul 5 18:40:24 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 18:40:24 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Jul 5 18:40:24 2014 - [info] 172.16.1.243(172.16.1.243:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul 5 18:40:24 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 18:40:24 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Jul 5 18:40:24 2014 - [info] ** Phase 1: Configuration Check Phase completed.
Sat Jul 5 18:40:24 2014 - [info]
Sat Jul 5 18:40:24 2014 - [info] * Phase 2: Dead Master Shutdown Phase..
Sat Jul 5 18:40:24 2014 - [info]
Sat Jul 5 18:40:24 2014 - [info] Forcing shutdown so that applications never connect to the current master..
Sat Jul 5 18:40:24 2014 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master ip address.
Sat Jul 5 18:40:24 2014 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Sat Jul 5 18:40:24 2014 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Sat Jul 5 18:40:24 2014 - [info]
Sat Jul 5 18:40:24 2014 - [info] * Phase 3: Master Recovery Phase..
Sat Jul 5 18:40:24 2014 - [info]
Sat Jul 5 18:40:24 2014 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Sat Jul 5 18:40:24 2014 - [info]
Sat Jul 5 18:40:24 2014 - [info] The latest binary log file/position on all slaves is mysql3306.000008:157085
Sat Jul 5 18:40:24 2014 - [info] Latest slaves (Slaves that received relay log files to the latest):
Sat Jul 5 18:40:24 2014 - [info] 172.16.1.242(172.16.1.242:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul 5 18:40:24 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 18:40:24 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Jul 5 18:40:24 2014 - [info] 172.16.1.243(172.16.1.243:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul 5 18:40:24 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 18:40:24 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Jul 5 18:40:24 2014 - [info] The oldest binary log file/position on all slaves is mysql3306.000008:157085
Sat Jul 5 18:40:24 2014 - [info] Oldest slaves:
Sat Jul 5 18:40:24 2014 - [info] 172.16.1.242(172.16.1.242:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul 5 18:40:24 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 18:40:24 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Jul 5 18:40:24 2014 - [info] 172.16.1.243(172.16.1.243:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul 5 18:40:24 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 18:40:24 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Jul 5 18:40:24 2014 - [info]
Sat Jul 5 18:40:24 2014 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Sat Jul 5 18:40:24 2014 - [info]
Sat Jul 5 18:40:24 2014 - [info] Fetching dead master's binary logs..
Sat Jul 5 18:40:24 2014 - [info] Executing command on the dead master 172.16.1.241(172.16.1.241:3306): save_binary_logs --command=save --start_file=mysql3306.000008 --start_pos=157085 --binlog_dir=/data/mysqllog/3306 --output_file=/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53
Creating /masterha/app1 if not exists.. ok.
Concat binary/relay logs from mysql3306.000008 pos 157085 to mysql3306.000008 EOF into /masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog ..
Dumping binlog format description event, from position 0 to 107.. ok.
Dumping effective binlog data from /data/mysqllog/3306/mysql3306.000008 position 157085 to tail(157104).. ok.
Concat succeeded.
Sat Jul 5 18:40:26 2014 - [info] scp from [email protected]:/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog to local:/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog succeeded.
Sat Jul 5 18:40:26 2014 - [info] HealthCheck: SSH to 172.16.1.242 is reachable.
Sat Jul 5 18:40:27 2014 - [info] HealthCheck: SSH to 172.16.1.243 is reachable.
Sat Jul 5 18:40:27 2014 - [info]
Sat Jul 5 18:40:27 2014 - [info] * Phase 3.3: Determining New Master Phase..
Sat Jul 5 18:40:27 2014 - [info]
Sat Jul 5 18:40:27 2014 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Sat Jul 5 18:40:27 2014 - [info] All slaves received relay logs to the same position. No need to resync each other.
Sat Jul 5 18:40:27 2014 - [info] Searching new master from slaves..
Sat Jul 5 18:40:27 2014 - [info] Candidate masters from the configuration file:
Sat Jul 5 18:40:27 2014 - [info] 172.16.1.242(172.16.1.242:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul 5 18:40:27 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 18:40:27 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Jul 5 18:40:27 2014 - [info] 172.16.1.243(172.16.1.243:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul 5 18:40:27 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 18:40:27 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Jul 5 18:40:27 2014 - [info] Non-candidate masters:
Sat Jul 5 18:40:27 2014 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Sat Jul 5 18:40:27 2014 - [info] New master is 172.16.1.242(172.16.1.242:3306)
Sat Jul 5 18:40:27 2014 - [info] Starting master failover..
Sat Jul 5 18:40:27 2014 - [info]
From:
172.16.1.241 (current master)
+--172.16.1.242
+--172.16.1.243
To:
172.16.1.242 (new master)
+--172.16.1.243
Sat Jul 5 18:40:27 2014 - [info]
Sat Jul 5 18:40:27 2014 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Sat Jul 5 18:40:27 2014 - [info]
Sat Jul 5 18:40:27 2014 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Sat Jul 5 18:40:27 2014 - [info] Sending binlog..
Sat Jul 5 18:40:28 2014 - [info] scp from local:/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog to[email protected]:/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog succeeded.
Sat Jul 5 18:40:28 2014 - [info]
Sat Jul 5 18:40:28 2014 - [info] * Phase 3.4: Master Log Apply Phase..
Sat Jul 5 18:40:28 2014 - [info]
Sat Jul 5 18:40:28 2014 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Sat Jul 5 18:40:28 2014 - [info] Starting recovery on 172.16.1.242(172.16.1.242:3306)..
Sat Jul 5 18:40:28 2014 - [info] Generating diffs succeeded.
Sat Jul 5 18:40:28 2014 - [info] Waiting until all relay logs are applied.
Sat Jul 5 18:40:28 2014 - [info] done.
Sat Jul 5 18:40:28 2014 - [info] Getting slave status..
Sat Jul 5 18:40:28 2014 - [info] This slave(172.16.1.242)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql3306.000008:157085). No need to recover from Exec_Master_Log_Pos.
Sat Jul 5 18:40:28 2014 - [info] Connecting to the target slave host 172.16.1.242, running recover script..
Sat Jul 5 18:40:28 2014 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=172.16.1.242 --slave_ip=172.16.1.242 --slave_port=3306 --apply_files=/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog --workdir=/masterha/app1 --target_version=5.5.35-log --timestamp=20140705184024 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
Sat Jul 5 18:40:28 2014 - [info]
Applying differential binary/relay log files /masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog on 172.16.1.242:3306. This may take long time...
Applying log files succeeded.
Sat Jul 5 18:40:28 2014 - [info] All relay logs were successfully applied.
Sat Jul 5 18:40:28 2014 - [info] Getting new master's binlog name and position..
Sat Jul 5 18:40:28 2014 - [info] mysql3306.000006:107
Sat Jul 5 18:40:28 2014 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.1.242', MASTER_PORT=3306, MASTER_LOG_FILE='mysql3306.000006', MASTER_LOG_POS=107, MASTER_USER='repluser', MASTER_PASSWORD='xxx';
Sat Jul 5 18:40:28 2014 - [warning] master_ip_failover_script is not set. Skipping taking over new master ip address.
Sat Jul 5 18:40:28 2014 - [info] Setting read_only=0 on 172.16.1.242(172.16.1.242:3306)..
Sat Jul 5 18:40:28 2014 - [info] ok.
Sat Jul 5 18:40:28 2014 - [info] ** Finished master recovery successfully.
Sat Jul 5 18:40:28 2014 - [info] * Phase 3: Master Recovery Phase completed.
Sat Jul 5 18:40:28 2014 - [info]
Sat Jul 5 18:40:28 2014 - [info] * Phase 4: Slaves Recovery Phase..
Sat Jul 5 18:40:28 2014 - [info]
Sat Jul 5 18:40:28 2014 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Sat Jul 5 18:40:28 2014 - [info]
Sat Jul 5 18:40:28 2014 - [info] -- Slave diff file generation on host 172.16.1.243(172.16.1.243:3306) started, pid: 25375. Check tmp log /masterha/app1/172.16.1.243_3306_20140705184024.log if it takes time..
Sat Jul 5 18:40:28 2014 - [info]
Sat Jul 5 18:40:28 2014 - [info] Log messages from 172.16.1.243 ...
Sat Jul 5 18:40:28 2014 - [info]
Sat Jul 5 18:40:28 2014 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Sat Jul 5 18:40:28 2014 - [info] End of log messages from 172.16.1.243.
Sat Jul 5 18:40:28 2014 - [info] -- 172.16.1.243(172.16.1.243:3306) has the latest relay log events.
Sat Jul 5 18:40:28 2014 - [info] Generating relay diff files from the latest slave succeeded.
Sat Jul 5 18:40:28 2014 - [info]
Sat Jul 5 18:40:28 2014 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Sat Jul 5 18:40:28 2014 - [info]
Sat Jul 5 18:40:28 2014 - [info] -- Slave recovery on host 172.16.1.243(172.16.1.243:3306) started, pid: 25377. Check tmp log /masterha/app1/172.16.1.243_3306_20140705184024.log if it takes time..
Sat Jul 5 18:40:30 2014 - [info]
Sat Jul 5 18:40:30 2014 - [info] Log messages from 172.16.1.243 ...
Sat Jul 5 18:40:30 2014 - [info]
Sat Jul 5 18:40:28 2014 - [info] Sending binlog..
Sat Jul 5 18:40:29 2014 - [info] scp from local:/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog to[email protected]:/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog succeeded.
Sat Jul 5 18:40:29 2014 - [info] Starting recovery on 172.16.1.243(172.16.1.243:3306)..
Sat Jul 5 18:40:29 2014 - [info] Generating diffs succeeded.
Sat Jul 5 18:40:29 2014 - [info] Waiting until all relay logs are applied.
Sat Jul 5 18:40:29 2014 - [info] done.
Sat Jul 5 18:40:29 2014 - [info] Getting slave status..
Sat Jul 5 18:40:29 2014 - [info] This slave(172.16.1.243)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql3306.000008:157085). No need to recover from Exec_Master_Log_Pos.
Sat Jul 5 18:40:29 2014 - [info] Connecting to the target slave host 172.16.1.243, running recover script..
Sat Jul 5 18:40:29 2014 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=172.16.1.243 --slave_ip=172.16.1.243 --slave_port=3306 --apply_files=/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog --workdir=/masterha/app1 --target_version=5.5.35-log --timestamp=20140705184024 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
Sat Jul 5 18:40:30 2014 - [info]
Applying differential binary/relay log files /masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog on 172.16.1.243:3306. This may take long time...
Applying log files succeeded.
Sat Jul 5 18:40:30 2014 - [info] All relay logs were successfully applied.
Sat Jul 5 18:40:30 2014 - [info] Resetting slave 172.16.1.243(172.16.1.243:3306) and starting replication from the new master 172.16.1.242(172.16.1.242:3306)..
Sat Jul 5 18:40:30 2014 - [info] Executed CHANGE MASTER.
Sat Jul 5 18:40:30 2014 - [info] Slave started.
Sat Jul 5 18:40:30 2014 - [info] End of log messages from 172.16.1.243.
Sat Jul 5 18:40:30 2014 - [info] -- Slave recovery on host 172.16.1.243(172.16.1.243:3306) succeeded.
Sat Jul 5 18:40:30 2014 - [info] All new slave servers recovered successfully.
Sat Jul 5 18:40:30 2014 - [info]
Sat Jul 5 18:40:30 2014 - [info] * Phase 5: New master cleanup phease..
Sat Jul 5 18:40:30 2014 - [info]
Sat Jul 5 18:40:30 2014 - [info] Resetting slave info on the new master..
Sat Jul 5 18:40:30 2014 - [info] 172.16.1.242: Resetting slave info succeeded.
Sat Jul 5 18:40:30 2014 - [info] Master failover to 172.16.1.242(172.16.1.242:3306) completed successfully.
Sat Jul 5 18:40:30 2014 - [info]
----- Failover Report -----
app1: MySQL Master failover 172.16.1.241 to 172.16.1.242 succeeded
Master 172.16.1.241 is down!
Check MHA Manager logs at haproxy001:/masterha/app1/manager.log for details.
Started automated(non-interactive) failover.
The latest slave 172.16.1.242(172.16.1.242:3306) has all relay logs for recovery.
Selected 172.16.1.242 as a new master.
172.16.1.242: OK: Applying all logs succeeded.
172.16.1.243: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
172.16.1.243: OK: Applying all logs succeeded. Slave started, replicating from 172.16.1.242.
172.16.1.242: Resetting slave info succeeded.
Master failover to 172.16.1.242(172.16.1.242:3306) completed successfully.
耗時10s。242提升為主,檢視242主從狀态
mysql> show slave status\G;
Empty set (0.00 sec)
ERROR:
No query specified
mysql> show master status\G;
*************************** 1. row ***************************
File: mysql3306.000009
Position: 107
Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.00 sec)
ERROR:
No query specified
242主從資訊被清空。檢視243主從狀态。
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.16.1.242
Master_User: repluser
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql3306.000006
Read_Master_Log_Pos: 107
Relay_Log_File: mysql-relay-bin.000002
Relay_Log_Pos: 253
Relay_Master_Log_File: mysql3306.000006
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: mysql
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 107
Relay_Log_Space: 409
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 12423306
1 row in set (0.00 sec)
ERROR:
No query specified
3.1.3檢視manager狀态
masterha_check_status --conf=/etc/masterha/app1.cnf
app1 is stopped(2:NOT_RUNNING).
啟動manager程序,發現啟不來
[[email protected] app1]# masterha_manager --conf=/etc/masterha/app1.cnf
Sat Jul 5 18:51:34 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Jul 5 18:51:34 2014 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Sat Jul 5 18:51:34 2014 - [info] Reading server configurations from /etc/masterha/app1.cnf..
[[email protected] app1]# masterha_check_status --conf=/etc/masterha/app1.cnf
日志如下:
Sat Jul 5 18:51:34 2014 - [info] MHA::MasterMonitor version 0.53.
Sat Jul 5 18:51:34 2014 - [info] Dead Servers:
Sat Jul 5 18:51:34 2014 - [info] 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 18:51:34 2014 - [info] Alive Servers:
Sat Jul 5 18:51:34 2014 - [info] 172.16.1.242(172.16.1.242:3306)
Sat Jul 5 18:51:34 2014 - [info] 172.16.1.243(172.16.1.243:3306)
Sat Jul 5 18:51:34 2014 - [info] Alive Slaves:
Sat Jul 5 18:51:34 2014 - [info] 172.16.1.243(172.16.1.243:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul 5 18:51:34 2014 - [info] Replicating from 172.16.1.242(172.16.1.242:3306)
Sat Jul 5 18:51:34 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Jul 5 18:51:34 2014 - [info] Current Alive Master: 172.16.1.242(172.16.1.242:3306)
Sat Jul 5 18:51:34 2014 - [info] Checking slave configurations..
Sat Jul 5 18:51:34 2014 - [info] Checking replication filtering settings..
Sat Jul 5 18:51:34 2014 - [info] binlog_do_db= , binlog_ignore_db=
Sat Jul 5 18:51:34 2014 - [info] Replication filtering check ok.
Sat Jul 5 18:51:34 2014 - [info] Starting SSH connection tests..
Sat Jul 5 18:51:35 2014 - [info] All SSH connection tests passed successfully.
Sat Jul 5 18:51:35 2014 - [info] Checking MHA Node version..
Sat Jul 5 18:51:36 2014 - [info] Version check ok.
Sat Jul 5 18:51:36 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln361] Server 172.16.1.241(172.16.1.241:3306) is dead, but must be alive! Check server settings.
Sat Jul 5 18:51:36 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln475] Error happend on checking configurations. at /usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm line 361
Sat Jul 5 18:51:36 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln644] Error happened on monitoring servers.
Sat Jul 5 18:51:36 2014 - [info] Got exit code 1 (Not master dead).
測試結果:(1)無寫負載壓力下,切換正常,耗時10s。(2)manager程序停止,無法控制接下來的主從問題。(3)被提升為寫的執行個體,原主從狀态被清空。
預測:因為主從資訊被清空,回切肯定失敗。
3.2 回切測試1;
測試内容:(1)保留241宕,242主,243從的狀态,未配置master_ip_failover_script。(2)啟動241的mysql服務,觀察主從。(3)啟動manager服務,觀察主從。(4)觀察日志。
3.3.1 啟動241的mysql服務。此時manager服務還是起不來,觀察主從,觀察manager日志:
242的show slave status:
mysql> show slave status\G;
Empty set (0.00 sec)
ERROR:
No query specified
mysql> show master status\G;
*************************** 1. row ***************************
File: mysql3306.000009
Position: 107
Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.00 sec)
ERROR:
No query specified
243的show slave status:
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.16.1.242
Master_User: repluser
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql3306.000007
Read_Master_Log_Pos: 107
Relay_Log_File: mysql-relay-bin.000004
Relay_Log_Pos: 253
Relay_Master_Log_File: mysql3306.000007
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: mysql
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 107
Relay_Log_Space: 1013
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 12423306
1 row in set (0.00 sec)
ERROR:
No query specified
3.3.2 啟動manager。觀察主從,觀察manager日志。
242,243主從照舊,manager還是起不來。
Sat Jul 5 19:27:16 2014 - [info] MHA::MasterMonitor version 0.53.
Sat Jul 5 19:27:17 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/ServerManager.pm, ln274] There are 2 non-slave servers! MHA manages at most one non-slave server. Check configurations.
Sat Jul 5 19:27:17 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln475] Error happend on checking configurations. at /usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm line 298
Sat Jul 5 19:27:17 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln644] Error happened on monitoring servers.
Sat Jul 5 19:27:17 2014 - [info] Got exit code 1 (Not master dead).
測試結論:未配置master_ip_failover_script等參數狀态下,回切失敗。
3.3 雪崩測試
少一台伺服器,在243再加一個執行個體3307。
測試内容:(1)寫庫241停止通路後,242提升為主庫;(2)再停止242mysql服務;(3)觀察是否會産生新的主庫。
結論:沒戲,manager起不來。
3.4 資料缺失測試
測試内容:判斷是否按照server1,server2,server3的順序提升新主庫。
(1)還原241為主,242,243為從的狀态;(2)通過事務送出,讓242資料同步比243慢,宕掉241寫庫,觀查哪個執行個體被提升為寫庫。(3)通過鎖,讓243比242資料同步慢,宕掉241寫庫,觀察哪個從庫會被提升為寫庫。
3.4.1 恢複主從,啟動manager。
3.4.2 建立表
CREATE TABLE `t_mha` (
`id` int(8) NOT NULL AUTO_INCREMENT,
`name` varchar(32) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
鎖242從庫表t_mha為隻讀。
lock table t_mha read;
寫資料。
mysql> insert into t_mha(name) select 'abf';
mysql> insert into t_mha(name) select 'abg';
mysql> insert into t_mha(name) select 'abe';
宕掉主寫241,觀察錯誤日志,發現一直在等待242的relay log執行,并未發生切換。
Mon Jul 7 17:08:38 2014 - [info]
Mon Jul 7 17:08:38 2014 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Mon Jul 7 17:08:38 2014 - [info]
Mon Jul 7 17:08:38 2014 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Mon Jul 7 17:08:38 2014 - [info]
Mon Jul 7 17:08:38 2014 - [info] * Phase 3.4: Master Log Apply Phase..
Mon Jul 7 17:08:38 2014 - [info]
Mon Jul 7 17:08:38 2014 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Mon Jul 7 17:08:38 2014 - [info] Starting recovery on 172.16.1.242(172.16.1.242:3306)..
Mon Jul 7 17:08:38 2014 - [info] This server has all relay logs. Waiting all logs to be applied..
解鎖242表,觀察錯誤日志,觀察主從。
Mon Jul 7 17:10:20 2014 - [info] done.
Mon Jul 7 17:10:20 2014 - [info] All relay logs were successfully applied.
Mon Jul 7 17:10:20 2014 - [info] Getting new master's binlog name and position..
Mon Jul 7 17:10:20 2014 - [info] mysql3306.000008:620
Mon Jul 7 17:10:20 2014 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.1.242', MASTER_PORT=3306, MASTER_LOG_FILE='mysql3306.000008', MASTER_LOG_POS=620, MASTER_USER='repluser', MASTER_PASSWORD='xxx';
Mon Jul 7 17:10:20 2014 - [warning] master_ip_failover_script is not set. Skipping taking over new master ip address.
Mon Jul 7 17:10:20 2014 - [info] Setting read_only=0 on 172.16.1.242(172.16.1.242:3306)..
Mon Jul 7 17:10:20 2014 - [info] ok.
Mon Jul 7 17:10:20 2014 - [info] ** Finished master recovery successfully.
Mon Jul 7 17:10:20 2014 - [info] * Phase 3: Master Recovery Phase completed.
Mon Jul 7 17:10:20 2014 - [info]
Mon Jul 7 17:10:20 2014 - [info] * Phase 4: Slaves Recovery Phase..
Mon Jul 7 17:10:20 2014 - [info]
Mon Jul 7 17:10:20 2014 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Mon Jul 7 17:10:20 2014 - [info]
Mon Jul 7 17:10:20 2014 - [info] -- Slave diff file generation on host 172.16.1.243(172.16.1.243:3306) started, pid: 9489. Check tmp log /masterha/app1/172.16.1.243_3306_20140707170837.log if it takes time..
Mon Jul 7 17:10:20 2014 - [info]
Mon Jul 7 17:10:20 2014 - [info] Log messages from 172.16.1.243 ...
Mon Jul 7 17:10:20 2014 - [info]
Mon Jul 7 17:10:20 2014 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Mon Jul 7 17:10:20 2014 - [info] End of log messages from 172.16.1.243.
Mon Jul 7 17:10:20 2014 - [info] -- 172.16.1.243(172.16.1.243:3306) has the latest relay log events.
Mon Jul 7 17:10:20 2014 - [info] Generating relay diff files from the latest slave succeeded.
Mon Jul 7 17:10:20 2014 - [info]
Mon Jul 7 17:10:20 2014 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Mon Jul 7 17:10:20 2014 - [info]
Mon Jul 7 17:10:20 2014 - [info] -- Slave recovery on host 172.16.1.243(172.16.1.243:3306) started, pid: 9491. Check tmp log /masterha/app1/172.16.1.243_3306_20140707170837.log if it takes time..
Mon Jul 7 17:10:20 2014 - [info]
Mon Jul 7 17:10:20 2014 - [info] Log messages from 172.16.1.243 ...
Mon Jul 7 17:10:20 2014 - [info]
Mon Jul 7 17:10:20 2014 - [info] Starting recovery on 172.16.1.243(172.16.1.243:3306)..
Mon Jul 7 17:10:20 2014 - [info] This server has all relay logs. Waiting all logs to be applied..
Mon Jul 7 17:10:20 2014 - [info] done.
Mon Jul 7 17:10:20 2014 - [info] All relay logs were successfully applied.
Mon Jul 7 17:10:20 2014 - [info] Resetting slave 172.16.1.243(172.16.1.243:3306) and starting replication from the new master 172.16.1.242(172.16.1.242:3306)..
Mon Jul 7 17:10:20 2014 - [info] Executed CHANGE MASTER.
Mon Jul 7 17:10:20 2014 - [info] Slave started.
Mon Jul 7 17:10:20 2014 - [info] End of log messages from 172.16.1.243.
Mon Jul 7 17:10:20 2014 - [info] -- Slave recovery on host 172.16.1.243(172.16.1.243:3306) succeeded.
Mon Jul 7 17:10:20 2014 - [info] All new slave servers recovered successfully.
Mon Jul 7 17:10:20 2014 - [info]
Mon Jul 7 17:10:20 2014 - [info] * Phase 5: New master cleanup phease..
Mon Jul 7 17:10:20 2014 - [info]
Mon Jul 7 17:10:20 2014 - [info] Resetting slave info on the new master..
Mon Jul 7 17:10:20 2014 - [info] 172.16.1.242: Resetting slave info succeeded.
Mon Jul 7 17:10:20 2014 - [info] Master failover to 172.16.1.242(172.16.1.242:3306) completed successfully.
Mon Jul 7 17:10:20 2014 - [info]
----- Failover Report -----
app1: MySQL Master failover 172.16.1.241 to 172.16.1.242 succeeded
Master 172.16.1.241 is down!
Check MHA Manager logs at haproxy001:/masterha/app1/manager.log for details.
Started automated(non-interactive) failover.
The latest slave 172.16.1.242(172.16.1.242:3306) has all relay logs for recovery.
Selected 172.16.1.242 as a new master.
172.16.1.242: OK: Applying all logs succeeded.
172.16.1.243: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
172.16.1.243: OK: Applying all logs succeeded. Slave started, replicating from 172.16.1.242.
172.16.1.242: Resetting slave info succeeded.
Master failover to 172.16.1.242(172.16.1.242:3306) completed successfully.
3.4.3 這次鎖243的t_mha表。
日志如下:
f it takes time..
Mon Jul 7 17:35:39 2014 - [info]
Mon Jul 7 17:35:39 2014 - [info] Log messages from 172.16.1.243 ...
Mon Jul 7 17:35:39 2014 - [info]
Mon Jul 7 17:35:38 2014 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Mon Jul 7 17:35:39 2014 - [info] End of log messages from 172.16.1.243.
Mon Jul 7 17:35:39 2014 - [info] -- 172.16.1.243(172.16.1.243:3306) has the latest relay log events.
Mon Jul 7 17:35:39 2014 - [info] Generating relay diff files from the latest slave succeeded.
Mon Jul 7 17:35:39 2014 - [info]
Mon Jul 7 17:35:39 2014 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Mon Jul 7 17:35:39 2014 - [info]
Mon Jul 7 17:35:39 2014 - [info] -- Slave recovery on host 172.16.1.243(172.16.1.243:3306) started, pid: 9886. Check tmp log /masterha/app1/172.16.1.243_3306_20140707173538.log if it takes time..
Mon Jul 7 17:37:10 2014 - [info]
Mon Jul 7 17:37:10 2014 - [info] Log messages from 172.16.1.243 ...
Mon Jul 7 17:37:10 2014 - [info]
Mon Jul 7 17:35:39 2014 - [info] Starting recovery on 172.16.1.243(172.16.1.243:3306)..
Mon Jul 7 17:35:39 2014 - [info] This server has all relay logs. Waiting all logs to be applied..
Mon Jul 7 17:37:10 2014 - [info] done.
Mon Jul 7 17:37:10 2014 - [info] All relay logs were successfully applied.
Mon Jul 7 17:37:10 2014 - [info] Resetting slave 172.16.1.243(172.16.1.243:3306) and starting replication from the new master 172.16.1.242(172.16.1.242:3306)..
Mon Jul 7 17:37:10 2014 - [info] Executed CHANGE MASTER.
Mon Jul 7 17:37:10 2014 - [info] Slave started.
Mon Jul 7 17:37:10 2014 - [info] End of log messages from 172.16.1.243.
Mon Jul 7 17:37:10 2014 - [info] -- Slave recovery on host 172.16.1.243(172.16.1.243:3306) succeeded.
Mon Jul 7 17:37:10 2014 - [info] All new slave servers recovered successfully.
Mon Jul 7 17:37:10 2014 - [info]
Mon Jul 7 17:37:10 2014 - [info] * Phase 5: New master cleanup phease..
Mon Jul 7 17:37:10 2014 - [info]
Mon Jul 7 17:37:10 2014 - [info] Resetting slave info on the new master..
Mon Jul 7 17:37:10 2014 - [info] 172.16.1.242: Resetting slave info succeeded.
Mon Jul 7 17:37:10 2014 - [info] Master failover to 172.16.1.242(172.16.1.242:3306) completed successfully.
Mon Jul 7 17:37:10 2014 - [info]
----- Failover Report -----
app1: MySQL Master failover 172.16.1.241 to 172.16.1.242 succeeded
Master 172.16.1.241 is down!
Check MHA Manager logs at haproxy001:/masterha/app1/manager.log for details.
Started automated(non-interactive) failover.
The latest slave 172.16.1.242(172.16.1.242:3306) has all relay logs for recovery.
Selected 172.16.1.242 as a new master.
172.16.1.242: OK: Applying all logs succeeded.
172.16.1.243: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
172.16.1.243: OK: Applying all logs succeeded. Slave started, replicating from 172.16.1.242.
172.16.1.242: Resetting slave info succeeded.
Master failover to 172.16.1.242(172.16.1.242:3306) completed successfully.
結論:(1)因為sql線程引起的資料延遲問題,mha會等待relay-log全部寫入盤中後,才進行切換。(2)由于IO線程引起的資料不一緻,目前還未測試。
小結:(1)無負載切換越需10s。
(2)預設按照server子產品順序進行切換。
(3)無腳本無法回切。
(4)主寫down後,manager程序會停止。
(5)無法處理雪崩情況。
(6)需要等待relay-log全部追上之後,才會切換主從。
根據這些缺陷,來确認,相關腳本需要實作什麼樣的功能?
下面配置切換腳本參數,再測試一遍上面案例,綁定vip。
#promote script
#shutdown_script=""
#master_ip_failover_script="/usr/local/bin/master_ip_failover"
#master_ip_online_change_script=""
#report_script=""
配置做如下調整:
(1) 配置vip
(2) 配置promote script,master_ip_failover_script,master_ip_online_change_script