天天看点

Mysql半同步复制+MHA+Keepalived部署

目前公司mysql服务器只是部署了简单的mysql主从,并没有自动故障转移,故此部署Mysql半同步复制+MHA+Keepalived进行测试,理论就不过多作说明了,网上资料很多,此次部署也有参考网上资料。

目前MHA主流架构是一主多从,此次部署采用一主二从(一台充当master,一台充当备用master,一台充当从库)

拓扑图:

Mysql半同步复制+MHA+Keepalived部署

环境:

4台centos6.8 私有云主机,VIP为192.168.1.251

主机名     IP              职能                  部署软件
vm01       MHA-Manager         MHA-Manager,MHA-node
vm02       mysql-master        mysql5,MHA-node
vm03       mysql-master-backup mysql5,MHA-node,keepalived
vm04       mysql-slave         mysql5,MHA-node,keepalived
           

所有主机/etc/hosts文件添加

192.168.1.211   vm01
192.168.1.212   vm02
192.168.1.221   vm03
192.168.1.229   vm04
           

所有服务器防火墙和SELINUX关闭

一、服务器mysql环境部署

1、mysql源码安装(本次部署使用mysql-5.6.35)

安装编译源码所需要工具及库

[root@vm02 ~]# cd /usr/local/src/
[root@vm02 src]# yum install gcc gcc-c++ ncurses-devel perl  -y
           

下载并安装cmake(mysql5.5以后源码安装需要使用cmake)

[root@vm02 src]# wget https://cmake.org/files/v3.8/cmake-3.8.1.tar.gz
[root@vm02 src]# tar zxf cmake-3.8.1.tar.gz 
[root@vm02 src]# cd cmake-3.8.1
[root@vm02 cmake-.]# ./configure 
[root@vm02 cmake-.]# make -j `cat /proc/cpuinfo |grep processor|wc -l` && make install
           

创建mysql用户及组

[root@vm03 src]# groupadd mysql
[root@vm03 src]# useradd -g mysql -s /bin/false -M mysql
           

创建mysql安装目录及数据库存放目录

[[email protected] src]# mkdir /data/mysql
[[email protected] src]# mkdir /data/mysql/data
           

下载mysql源码(如果嫌麻烦的话也可用二进制包方式安装,便于维护)

[[email protected] src]# wget https://cdn.mysql.com/archives/mysql-5.6/mysql-5.6.35.tar.gz
[root@vm02 src]# cd mysql-5.6.35
[[email protected] mysql-5.6.35]# cmake ./
-DCMAKE_INSTALL_PREFIX=/data/mysql \
-DMYSQL_DATADIR=/data/mysql/data \
-DMYSQL_UNIX_ADDR=/data/mysql/mysql.sock \
-DDEFAULT_CHARSET=utf8 \
-DDEFAULT_COLLATION=utf8_general_ci \
-DWITH_INNOBASE_STORAGE_ENGINE= \
-DWITH_ARCHIVE_STORAGE_ENGINE= \
-DWITH_BLACKHOLE_STORAGE_ENGINE= \
-DMYSQL_TCP_PORT= \
-DEXTRA_CHARSETS=all \
-DENABLED_LOCAL_INFILE=
           

在此处,我碰到一个很怪异的情况,cmake生成的CMackCache.txt文件中mysql及数据库目录的安装目录仍是默认的/usr/local/mysql,而并非我指定的目录,手动修改掉

[root@vm02 mysql-.]# sed -i 's#/usr/local/mysql#/data/mysql#' CMakeCache.txt 
[root@vm02 mysql-.]# make -j `cat /proc/cpuinfo|grep processor|wc -l`
[root@vm02 mysql-.]# make install
           

修改mysql目录属主及组

初始化mysql数据库

[root@vm02 mysql-.]# cd /data/mysql/
[root@vm02 mysql]# scripts/mysql_install_db --user=mysql --datadir=/data/mysql/data/
           

将mysql bin及lib目录加入PATH环境变量

[root@vm02 mysql]# echo "export PATH=/data/mysql/bin:/data/mysql/lib:$PATH" >>/etc/profile
[root@vm02 mysql]# . /etc/profile
           

复制mysql启动配置文件

修改为:

[client]
port        = 3306
socket      = /data/mysql/mysql.sock

[mysql]
no-auto-rehash

[mysqld]
user=mysql
character_set_server=utf8mb4
basedir = /data/mysql
datadir = /data/mysql/data
port = 3306
# 此次测试部署中 server_id取ip的最后一段,server_id可以随意取,只要互不相同
server_id = 212   
socket = /data/mysql/mysql.sock

lower_case_table_names=1
#mysql回收空闲连接的时间(7天)
wait_timeout=604800
symbolic-links=0
#二进制日志相关参数
log-bin=/data/mysql/data/mysql-bin
expire_logs_days=7
binlog_cache_size = 1M
binlog_format=mixed
binlog_checksum=none

#slave相关参数
skip-slave-start
relay_log_purge=0
relay_log=mysql-relay-bin
relay_log_index=mysql-relay-bin.index

back_log = 100
max_connect_errors = 2000
table_open_cache = 1024
skip-external-locking
skip-name-resolve
key_buffer_size = 128M
bulk_insert_buffer_size = 32M
max_allowed_packet = 100M
sort_buffer_size = 16M
read_buffer_size = 16M
join_buffer_size = 16M
read_rnd_buffer_size = 16M
myisam_sort_buffer_size = 128M
thread_cache_size = 8
query_cache_size = 256M
thread_stack = 192K
max_connections = 500

tmp_table_size= 256M
max_heap_table_size= 128M
explicit_defaults_for_timestamp=true
slow_query_log
log_output='Table,File'
slow_query_log_file=/data/mysql/slow.log
long_query_time = 2
innodb_file_format = Barracuda
innodb_file_per_table = 1
innodb_large_prefix = on
innodb_data_file_path              = ibdata1:10M:autoextend
innodb_log_file_size               = 256M
innodb_log_files_in_group = 3
innodb_buffer_pool_size = 2G
innodb_log_buffer_size = 4M
innodb_flush_log_at_trx_commit = 2
innodb_file_per_table
innodb_lock_wait_timeout           = 120
innodb_thread_concurrency = 8
innodb_autoextend_increment = 128M

[mysqldump]
quick
max_allowed_packet = 100M

[mysqld_safe]
open-files-limit = 8192
log_error =/data/mysql/mysqld.log
pid-file=/data/mysql/mysqld.pid

[mysqlhotcopy]
interactive-timeout
           

复制mysql启动脚本文件并创建为服务

[root@vm02 mysql]# cp support-files/mysql.server /etc/rc.d/init.d/mysqld
[root@vm02 mysql]# chkconfig --add mysqld
           

手动创建mysql日志文件,否则启动mysql时报错

[root@vm02 mysql]# touch /data/mysql/mysqld.log
[root@vm02 mysql]# chown mysql.mysql /data/mysql/mysqld.log
           

启动mysql

设置mysql root账号初始密码为123456

mysql命令行中删除匿名账户

[[email protected] mysql]# mysql -uroot -p
mysql> delete  from mysql.user where user="";
mysql> update mysql.user set password=password("123456") where user='root'; 
mysql> flush privileges;
           

vm03、vm04服务器上mysql安装配置方法完全一样,仅/etc/my.cnf中server_id的值不一样,各自取服务器ip最后一段作为自己的server_id

二、配置主从半同步复制

在mysql master(vm02)上创建主从复制账号

[root@vm02 ~]# mysql -uroot -p
mysql> grant replication slave on *.* to 'repl'@'192.168.1.%' identified by '654321';
           

备份mysql master上数据库(或者直接物理备份到slave上也行)

其中–master-data=2代表备份时刻记录master的Binlog位置和Position,–single-transaction意思是获取一致性快照,-R意思是备份存储过程和函数,–triggres的意思是备份触发器,-A代表备份所有的库。

将备份的数据传送到vm03,vm04服务器

[root@vm02 ~]# hosts="vm03 vm04"
[root@vm02 ~]# for host in ${hosts[*]};do rsync -avzP master_data.sql $host:/usr/local/src;done;
           

在vm03,vm04上恢复数据

[root@vm03 src]# mysql -uroot -p123456 <master_data.sql
[root@vm04 src]# mysql -uroot -p123456 <master_data.sql 
           

在vm03(mysql-master-backup)上配置主从同步,此master备机上也需开启bin-log

[[email protected] src]# mysql -uroot -p123456
mysql> change master to master_host='192.168.1.212',master_port=,master_user='repl',master_password='654321',master_log_file='mysql-bin.000006',master_log_pos=;
Query OK,  rows affected,  warnings ( sec)

mysql> start slave;
Query OK,  rows affected ( sec)

mysql> show slave status\G;
*************************** . row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: .
                  Master_User: repl
                  Master_Port: 
                Connect_Retry: 
              Master_Log_File: mysql-bin.
          Read_Master_Log_Pos: 
               Relay_Log_File: mysql-relay-bin.
                Relay_Log_Pos: 
        Relay_Master_Log_File: mysql-bin.
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 
                   Last_Error: 
                 Skip_Counter: 
          Exec_Master_Log_Pos: 
              Relay_Log_Space: 
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 
                Last_IO_Error: 
               Last_SQL_Errno: 
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 
                  Master_UUID: cfc4e2--e7-b46d-
             Master_Info_File: /data/mysql/data/master.info
                    SQL_Delay: 
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 
 row in set ( sec)
           

vm04(slave)操作与vm02操作相同,由于这台只做从库,只需要修改server id即可,不需要开启log-bin。

MySQL复制默认是异步复制,Master将事件写入binlog,但并不知道Slave是否或何时已经接收且已处理。在异步复制的机制的情况下,如果Master宕机,事务在Master上已提交,但很可能这些事务没有传到任何的Slave上。假设有Master->Salve故障转移的机制,此时Slave也可能会丢失事务。

半同步复制的功能要在Master,Slave都开启,半同步复制才会起作用;否则,只开启一边,它依然为异步复制。

mysql5.5及更高版本才有半同步复制功能。在MySQL上安装插件需要数据库支持动态载入。检查是否支持,用如下检测:

mysql> show global variables like 'have_dynamic_loading';
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| have_dynamic_loading | YES   |
+----------------------+-------+
1 row in set (0.01 sec)
           

半同步复制是基于复制的环境。也就是说配置半同步复制前,已有复制的环境。

在所有mysql节点(主,从)安装半同步插件

mysql> INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
Query OK,  rows affected ( sec)
mysql> INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
Query OK,  rows affected ( sec)
           

如果不清楚Plugin的目录,用如下查找:

mysql>  show global variables like 'plugin_dir';
+---------------+-------------------------+
| Variable_name | Value                   |
+---------------+-------------------------+
| plugin_dir    | /data/mysql/lib/plugin/ |
+---------------+-------------------------+

[[email protected] ~]# ls /data/mysql/lib/plugin/
adt_null.so          connection_control.so  mypluglib.so          qa_auth_server.so     validate_password.so
auth.so              daemon_example.ini     mysql_no_login.so     semisync_master.so
auth_socket.so       debug                  qa_auth_client.so     semisync_slave.so
auth_test_plugin.so  libdaemon_example.so   qa_auth_interface.so  test_udf_services.so
           

检查Plugin是否已正确安装

mysql> show plugins;                            
+----------------------------+----------+--------------------+--------------------+---------+
| Name                       | Status   | Type               | Library            | License |
+----------------------------+----------+--------------------+--------------------+---------+
| binlog                     | ACTIVE   | STORAGE ENGINE     | NULL               | GPL     |
| mysql_native_password      | ACTIVE   | AUTHENTICATION     | NULL               | GPL     |
| mysql_old_password         | ACTIVE   | AUTHENTICATION     | NULL               | GPL     |
| sha256_password            | ACTIVE   | AUTHENTICATION     | NULL               | GPL     |
| CSV                        | ACTIVE   | STORAGE ENGINE     | NULL               | GPL     |
| MEMORY                     | ACTIVE   | STORAGE ENGINE     | NULL               | GPL     |
| MyISAM                     | ACTIVE   | STORAGE ENGINE     | NULL               | GPL     |
| MRG_MYISAM                 | ACTIVE   | STORAGE ENGINE     | NULL               | GPL     |
| ARCHIVE                    | ACTIVE   | STORAGE ENGINE     | NULL               | GPL     |
| BLACKHOLE                  | ACTIVE   | STORAGE ENGINE     | NULL               | GPL     |
| FEDERATED                  | DISABLED | STORAGE ENGINE     | NULL               | GPL     |
| InnoDB                     | ACTIVE   | STORAGE ENGINE     | NULL               | GPL     |
| INNODB_TRX                 | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_LOCKS               | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_LOCK_WAITS          | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_CMP                 | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_CMP_RESET           | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_CMPMEM              | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_CMPMEM_RESET        | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_CMP_PER_INDEX       | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_CMP_PER_INDEX_RESET | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_BUFFER_PAGE         | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_BUFFER_PAGE_LRU     | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_BUFFER_POOL_STATS   | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_METRICS             | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_FT_DEFAULT_STOPWORD | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_FT_DELETED          | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_FT_BEING_DELETED    | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_FT_CONFIG           | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_FT_INDEX_CACHE      | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_FT_INDEX_TABLE      | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_SYS_TABLES          | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_SYS_TABLESTATS      | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_SYS_INDEXES         | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_SYS_COLUMNS         | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_SYS_FIELDS          | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_SYS_FOREIGN         | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_SYS_FOREIGN_COLS    | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_SYS_TABLESPACES     | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| INNODB_SYS_DATAFILES       | ACTIVE   | INFORMATION SCHEMA | NULL               | GPL     |
| PERFORMANCE_SCHEMA         | ACTIVE   | STORAGE ENGINE     | NULL               | GPL     |
| partition                  | ACTIVE   | STORAGE ENGINE     | NULL               | GPL     |
| rpl_semi_sync_master       |             | ACTIVE        | REPLICATION        |                  | semisync_master.so |                     | He Zhenxing                                | Semi-synchronous replication master                                       | GPL            | ON          |
| rpl_semi_sync_slave        |             | ACTIVE        | REPLICATION        |                  | semisync_slave.so  |                     | He Zhenxing                                | Semi-synchronous replication slave                                        | GPL            | ON          |
+----------------------------+----------+--------------------+--------------------+---------+
           

或者

mysql> select * from information_schema.plugins where PLUGIN_NAME like "rpl_semi_sync%";
+----------------------+----------------+---------------+-------------+---------------------+--------------------+------------------------+---------------+-------------------------------------+----------------+-------------+
| PLUGIN_NAME          | PLUGIN_VERSION | PLUGIN_STATUS | PLUGIN_TYPE | PLUGIN_TYPE_VERSION | PLUGIN_LIBRARY     | PLUGIN_LIBRARY_VERSION | PLUGIN_AUTHOR | PLUGIN_DESCRIPTION                  | PLUGIN_LICENSE | LOAD_OPTION |
+----------------------+----------------+---------------+-------------+---------------------+--------------------+------------------------+---------------+-------------------------------------+----------------+-------------+
| rpl_semi_sync_master | 1.0            | ACTIVE        | REPLICATION | 2.0                 | semisync_master.so | 1.4                    | He Zhenxing   | Semi-synchronous replication master | GPL            | ON          |
| rpl_semi_sync_slave  | 1.0            | ACTIVE        | REPLICATION | 2.0                 | semisync_slave.so  | 1.4                    | He Zhenxing   | Semi-synchronous replication slave  | GPL            | ON          |
+----------------------+----------------+---------------+-------------+---------------------+--------------------+------------------------+---------------+-------------------------------------+----------------+-------------+
2 rows in set (0.00 sec)
           

在Master上执行:

mysql> SET GLOBAL rpl_semi_sync_master_enabled = ;
           

在所有Slave上执行(包括master备机): SET GLOBAL rpl_semi_sync_slave_enabled = 1;

mysql> show variables like "rpl_semi_sync_slave_enabled";
+-----------------------------+-------+
| Variable_name               | Value |
+-----------------------------+-------+
| rpl_semi_sync_slave_enabled | OFF   |
+-----------------------------+-------+
1 row in set (0.00 sec)

mysql> SET GLOBAL rpl_semi_sync_slave_enabled = 1;
Query OK, 0 rows affected (0.00 sec)

mysql> show variables like "rpl_semi_sync_slave_enabled";
+-----------------------------+-------+
| Variable_name               | Value |
+-----------------------------+-------+
| rpl_semi_sync_slave_enabled | ON    |
+-----------------------------+-------+
1 row in set (0.00 sec)
           

如果在一个正在运行的Slave上开启半同步复制的功能,必须先停止Slave I/O,将其启用半同步后,再开启Slave I/O.

mysql> STOP SLAVE IO_THREAD; START SLAVE IO_THREAD;
           

如果不这样做,Slave还是会以异步的方式进行复制。

正如大家所知,如果不将变量的设置写到配置文件,下次重启数据库,将失效。写入配置文件:

Master上:

[mysqld]
rpl_semi_sync_master_enabled=
rpl_semi_sync_master_timeout=  # 秒(默认)
           

所有Slave上:

[mysqld]
rpl_semi_sync_slave_enabled=
           

三、配置ssh免密码登录

配置MHA-Manager(vm01)到所有MHA-node(vm02,vm03,vm04)

[root@vm01 ~]# ssh-keygen -t rsa         #一直回车
[root@vm01 ~]# hosts="vm02 vm03 vm04"
[root@vm01 ~]# for host in ${hosts[*]};do ssh-copy-id -i /root/.ssh/id_rsa.pub $host;done;
           

配置mysql master(vm02)到所有MHA-node(vm01,vm03,vm04)

[root@vm02 ~]# ssh-keygen -t rsa         #一直回车
[root@vm02 ~]# hosts="vm01 vm03 vm04"
[root@vm02 ~]# for host in ${hosts[*]};do ssh-copy-id -i /root/.ssh/id_rsa.pub $host;done;
           

配置mysql slave(vm03,vm04)到所有MHA-node

[root@vm03 ~]# ssh-keygen -t rsa         #一直回车
[root@vm03 ~]# hosts="vm01 vm02 vm04"       
[root@vm03 ~]# for host in ${hosts[*]};do ssh-copy-id -i /root/.ssh/id_rsa.pub $host;done;

[root@vm04 ~]# ssh-keygen -t rsa         #一直回车
[root@vm04 ~]# hosts="vm01 vm02 vm03"       
[root@vm04 ~]# for host in ${hosts[*]};do ssh-copy-id -i /root/.ssh/id_rsa.pub $host;done;
           

四、在mysql master数据库中创建mha管理用户

mysql> grant all privileges on *.* to 'mha'@'192.168.1.%' identified by '[email protected]';
Query OK,  rows affected ( sec)

mysql> flush privileges;
Query OK,  rows affected ( sec)
           

与此同时,在从库上检察是否同步

mysql> select host,user,password from mysql.user; 
+------------------+------+-------------------------------------------+
| host             | user | password                                  |
+------------------+------+-------------------------------------------+
| localhost        | root | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
| vm02.localdomain | root | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
| 127.0.0.1        | root | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
| ::1              | root | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
| 192.168.1.%      | repl | *2A032F7C5BA932872F0F045E0CF6B53CF702F2C5 |
| 192.168.1.%      | mha  | *792262F3BDF0EAA8F94DA7C5FA717730EF732632 |
+------------------+------+-------------------------------------------+
           

五、安装MHA

MHA特点:

MHA监控复制架构的主服务器,一旦检测到主服务器故障,就会自动进行故障转移。即使有些从服务器没有收到最新的relay log,MHA自动从最新的从服务器上识别差异的relay log并把这些日志应用到其他从服务器上,因此所有的从服务器保持一致性了。MHA通常在几秒内完成故障转移,9-12秒可以检测出主服务器故障,7-10秒内关闭故障的主服务器以避免脑裂,几秒中内应用差异的relay log到新的主服务器上,整个过程可以在10-30s内完成。还可以设置优先级指定其中的一台slave作为master的候选人。由于MHA在slaves之间修复一致性,因此可以将任何slave变成新的master,而不会发生一致性的问题,从而导致复制失败。

mha下载及文档地址: https://code.google.com/p/mysql-master-ha/

国内访问谷歌需要翻墙(如果没有vpn账号的话,可以到http://vpn.disa8.com下载一个免费的vpn)

所有服务器配置epel的yum源,安装相关依赖包

rpm -Uvh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-
sed -i 's/^#baseurl/baseurl/g' /etc/yum.repos.d/epel.repo
sed -i 's/^mirrorlist/#mirrorlist/g' /etc/yum.repos.d/epel.repo

sed -i 's/^#baseurl/baseurl/g' /etc/yum.repos.d/epel-testing.repo
sed -i 's/^mirrorlist/#mirrorlist/g' /etc/yum.repos.d/epel-testing.repo
           

在所有运行MySQL服务的服务器上安装运行MHA-Node,无论是master还是slave。由于MHA-Manager需要MHA-Node,因此在运行MHA-Manager的服务器上也需要安装MHA-Node

安装mha-manager依赖

安装mha-manager(可以使用rpm包安装,更方便)

[root@vm01 src]# tar zxf mha4mysql-manager-0.55.tar.gz
[root@vm01 src]# cd mha4mysql-manager-0.55
# 检查mha manager所需依赖包是否安装
[root@vm01 mha4mysql-manager-.]# perl Makefile.PL 
*** Module::AutoInstall version 
*** Checking for Perl dependencies...
[Core Features]
- DBI                   ...loaded. ()
- DBD::mysql            ...loaded. ()
- Time::HiRes           ...loaded. ()
- Config::Tiny          ...missing.
- Log::Dispatch         ...loaded. ()
- Parallel::ForkManager ...loaded. (.)
- MHA::NodeConst        ...missing.
==> Auto-install the  mandatory module(s) from CPAN? [y] y
*** Dependencies will be installed the next time you type 'make'.
*** Module::AutoInstall configuration finished.
Checking if your kit is complete...
Looks good
Warning: prerequisite Config::Tiny  not found.
Warning: prerequisite MHA::NodeConst  not found.
Writing Makefile for mha4mysql::manager
# 安装mha-manager
[root@vm01 mha4mysql-manager-.]# make
[root@vm01 mha4mysql-manager-.]# make install
           

安装完成后,会在/usr/local/bin目录下生成以下脚本文件

[root@vm01 ~]# ls /usr/local/bin/masterha_*
/usr/local/bin/masterha_check_repl  /usr/local/bin/masterha_check_status  /usr/local/bin/masterha_manager         /usr/local/bin/masterha_master_switch    /usr/local/bin/masterha_stop
/usr/local/bin/masterha_check_ssh   /usr/local/bin/masterha_conf_host     /usr/local/bin/masterha_master_monitor  /usr/local/bin/masterha_secondary_check
           

此外,mha-manager源码包解压出来,会有一些脚本模板可供参考(因为这些脚本不完整,需要自己修改,这是软件开发着留给我们自己发挥的,如果开启下面的任何一个脚本对应的参数,而对应这里的脚本又没有修改,则会抛错,可能会被坑的很惨)

1、master_ip_failover是自动切换时vip管理的脚本,不是必须,如果我们使用keepalived的,我们可以自己编写脚本完成对vip的管理,比如监控mysql,如果mysql异常,我们停止keepalived就行,这样vip就会自动漂移

[root@vm01 scripts]# ls -l /usr/local/src/mha4mysql-manager-0.55/samples/scripts/master_ip_failover 
-rwxr-xr-x  root root  Dec    /usr/local/src/mha4mysql-manager-./samples/scripts/master_ip_failover
           

2、master_ip_online_change是在线切换时vip的管理,不是必须,同样可以可以自行编写简单的shell完成

[root@vm01 scripts]# ls -l /usr/local/src/mha4mysql-manager-0.55/samples/scripts/master_ip_online_change
-rwxr-xr-x  root root  Dec    /usr/local/src/mha4mysql-manager-./samples/scripts/master_ip_online_change
           

3、power_manager故障发生后关闭主机的脚本,不是必须

[root@vm01 scripts]# ls -l /usr/local/src/mha4mysql-manager-0.55/samples/scripts/power_manager
-rwxr-xr-x  root root  Dec    /usr/local/src/mha4mysql-manager-./samples/scripts/power_manager
           

4、send_report因故障切换后发送报警的脚本,不是必须,可自行编写简单的shell完成

[root@vm01 scripts]# ls -l /usr/local/src/mha4mysql-manager-0.55/samples/scripts/send_report
-rwxr-xr-x  root root  Dec    /usr/local/src/mha4mysql-manager-./samples/scripts/send_report
           

安装mha-node依赖

安装mha-node(源码或rpm包安装均可)

[root@vm01 src]# rpm -ivh mha4mysql-node-0.54-0.el6.noarch.rpm 
Preparing...                ########################################### [100%]
   :mha4mysql-node         ########################################### [100%]
           

或者

其它服务器(vm02,vm03,vm04)安装mha-node及其依赖和vm01安装完全一样。

六、配置MHA

源码包里有配置文件模板,可参考: mha4mysql-manager-0.55源码压缩包里samples/conf目录下的模板配置文件。

创建mha配置文件目录

mha配置文件app1.conf

(在软件包解压后的目录里面有样例配置文件)

[[email protected] app1]# cat /etc/mha/app1/app1.conf 
[server default]
# default 全局配置
manager_workdir=/etc/mha/app1   #设置manager的工作目录
manager_log=/etc/mha/app1/manager.log  #设置manager的日志
master_binlog_dir= /data/mysql/data #设置master保存binlog的位置,以便MHA可以找到master的日志,我这里的也就是mysql的数据目录
ssh_user=root  # 设置ssh的登录用户名
user=mha   # 设置mha监控用户
password[email protected]  #mha监控用户的密码
repl_user=repl  # 主从复制账号
repl_password=654321  # 主从复制账号的密码
secondary_check_script= /usr/local/bin/masterha_secondary_check -s 192.168.1.221 -s 192.168.1.229  # 一旦MHA到master的监控之间出现问题,MHA Manager将会尝试从192.168.1.221登录到master
ping_interval=3  # 设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三次没有回应的时候自动进行failover

master_ip_failover_script= /etc/mha/app1/master_ip_failover #设置自动failover时候的切换脚本
#shutdown_script= /etc/mha/app1/masterha/power_manager
#report_script= /etc/mha/app1/send_report
#master_ip_online_change_script= /etc/mha/app1/master_ip_online_change   #设置手动切换时候的切换脚本

[server1]
# vm02 master
hostname=192.168.1.212
port=3306
candidate_master=1 

[server2]
# vm03 master-backup
hostname=192.168.1.221
port=3306
candidate_master=1 # 设置为候选master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中事件最新的slave
check_repl_delay=0  # 默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master,因为对于这个slave的恢复需要花费很长时间,通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时,这个参数对于设置了candidate_master=1的主机非常有用,因为这个候选主在切换的过程中一定是新的master

[server3]
# vm04 mysql slave
hostname=192.168.1.229
port=3306
no_master=1
           

所有slave服务器上 设置定时任务清理relay log

注意:

MHA在发生切换的过程中,从库的恢复过程中依赖于relay log的相关信息,所以这里要将relay log的自动清除设置为OFF(已将relay_log_purge=0写入到了my.cnf配置文件),采用手动清除relay log的方式。在默认情况下,从服务器上的中继日志会在SQL线程执行完毕后被自动删除。但是在MHA环境中,这些中继日志在恢复其他从服务器时可能会被用到,因此需要禁用中继日志的自动删除功能。定期清除中继日志需要考虑到复制延时的问题。在ext3的文件系统下,删除大的文件需要一定的时间,会导致严重的复制延时。为了避免复制延时,需要暂时为中继日志创建硬链接,因为在linux系统中通过硬链接删除大文件速度会很快。(在mysql数据库中,删除大表时,通常也采用建立硬链接的方式)

[root@vm03 ~]# cat /etc/cron.d/purge_relay_logs
  * * * /usr/bin/purge_relay_logs --user=root --password=123456 --disable_relay_log_purge  --port=3306 --workdir=/data/mysql/data/ >>/etc/mha/app1/purge_relay_logs.log >&
           

参数说明:

--user=root              # mysql用户名
--password=        # mysql用户密码
--port=              # mysql端口号
--workdir=/data/mysql/data    # 指定创建relay log的硬链接的位置,默认是/var/tmp,由于系统不同分区创建硬链接文件会失败,故需要执行硬链接具体位置,成功执行脚本后,硬链接的中继日志文件被删除
--disable_relay_log_purge     # 默认情况下,如果relay_log_purge=1,脚本会什么都不清理,自动退出,通过设定这个参数,当relay_log_purge=1的情况下会将relay_log_purge设置为0。清理relay log之后,最后将参数设置为OFF。
           

故障转移脚本,虚拟ip(vip)配置为自己的:192.168.1.251

[root@vm01 app1]# cat /etc/mha/app1/master_ip_failover 
#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all';

use Getopt::Long;

my (
    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);

my $vip = '192.168.1.251/24';  # Virtual IP
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
$ssh_user = "root";

GetOptions(
    'command=s'          => \$command,
    'ssh_user=s'         => \$ssh_user,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s'   => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'new_master_host=s'  => \$new_master_host,
    'new_master_ip=s'    => \$new_master_ip,
    'new_master_port=i'  => \$new_master_port,
);

exit &main();

sub main {

    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

    if ( $command eq "stop" || $command eq "stopssh" ) {

        # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
        # If you manage master ip address at global catalog database,
        # invalidate orig_master_ip here.
        my $exit_code = ;

        #eval {
        #    print "Disabling the VIP on old master: $orig_master_host \n";
        #    &stop_vip();
        #    $exit_code = 0;
        #};


        eval {
                print "Disabling the VIP on old master: $orig_master_host \n";
                #my $ping=`ping -c 1 10.0.0.13 | grep "packet loss" | awk -F',' '{print $3}' | awk '{print $1}'`;
                #if ( $ping le "90.0%" && $ping gt "0.0%" ){
                #$exit_code = 0;
                #}
                #else {

                &stop_vip();

                # updating global catalog, etc
                $exit_code = ;

                #}
        };


        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {

        # all arguments are passed.
        # If you manage master ip address at global catalog database,
        # activate new_master_ip here.
        # You can also grant write access (create user, set read_only=0, etc) here.
        my $exit_code = ;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            $exit_code = ;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        `ssh $ssh_user\@$orig_master_ip \" $ssh_start_vip \"`;
        exit ;
    }
    else {
        &usage();
        exit ;
    }
}

# A simple system call that enable the VIP on the new master
sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}

# A simple system call that disable the VIP on the old_master
sub stop_vip() {
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

# the end.
           

backup master & slave 设置read_only防止被误写

master备机和slave都要设置read_only=1,如果master自动切换后,会由mha设置原master备机为read_only=0

从库对外提供读服务,之所以没有写进配置文件,是因为随时slave会提升为master

mysql> select @@read_only;
+-------------+
| @@read_only |
+-------------+
|           0 |
+-------------+
1 row in set (0.00 sec)

mysql> set global read_only=1;
Query OK, 0 rows affected (0.00 sec)

mysql> select @@read_only;
+-------------+
| @@read_only |
+-------------+
|           1 |
+-------------+
1 row in set (0.00 sec)
           

MHA维护

创建软连接(在所有mysql上都执行)

ln -s /data/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
ln -s /data/mysql/bin/mysql /usr/bin/mysql
           

检查主从复制情况 masterha_check_repl –conf=/etc/mha/app1/app1.conf

[[email protected] app1]# masterha_check_repl --conf=/etc/mha/app1/app1.conf 
Thu May  ::  - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu May  ::  - [info] Reading application default configurations from /etc/mha/app1/app1.conf..
Thu May  ::  - [info] Reading server configurations from /etc/mha/app1/app1.conf..
Thu May  ::  - [info] MHA::MasterMonitor version .
Thu May  ::  - [info] Dead Servers:
Thu May  ::  - [info] Alive Servers:
Thu May  ::  - [info]   (:)
Thu May  ::  - [info]   (:)
Thu May  ::  - [info]   (:)
Thu May  ::  - [info] Alive Slaves:
Thu May  ::  - [info]   (:)  Version=-log (oldest major version between slaves) log-bin:enabled
Thu May  ::  - [info]     Replicating from (:)
Thu May  ::  - [info]     Primary candidate for the new Master (candidate_master is set)
Thu May  ::  - [info]   (:)  Version=-log (oldest major version between slaves) log-bin:disabled
Thu May  ::  - [info]     Replicating from (:)
Thu May  ::  - [info]     Not candidate for the new Master (no_master is set)
Thu May  ::  - [info] Current Alive Master: (:)
Thu May  ::  - [info] Checking slave configurations..
Thu May  ::  - [warning]  log-bin is not set on slave (:). This host can not be a master.
Thu May  ::  - [info] Checking replication filtering settings..
Thu May  ::  - [info]  binlog_do_db= , binlog_ignore_db= 
Thu May  ::  - [info]  Replication filtering check ok.
Thu May  ::  - [info] Starting SSH connection tests..
Thu May  ::  - [info] All SSH connection tests passed successfully.
Thu May  ::  - [info] Checking MHA Node version..
Thu May  ::  - [info]  Version check ok.
Thu May  ::  - [info] Checking SSH publickey authentication settings on the current master..
Thu May  ::  - [info] HealthCheck: SSH to  is reachable.
Thu May  ::  - [info] Master MHA Node version is .
Thu May  ::  - [info] Checking recovery script configurations on the current master..
Thu May  ::  - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/data --output_file=/var/tmp/save_binary_logs_test --manager_version=0.55 --start_file=mysql-bin.000008 
Thu May  ::  - [info]   Connecting to root@().. 
  Creating /var/tmp if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /data/mysql/data, up to mysql-bin
Thu May  ::  - [info] Master setting check done.
Thu May  ::  - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Thu May  ::  - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.1.221 --slave_ip=192.168.1.221 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.35-log --manager_version=0.55 --relay_log_info=/data/mysql/data/relay-log.info  --relay_dir=/data/mysql/data/  --slave_pass=xxx
Thu May  ::  - [info]   Connecting to root@(:).. 
  Checking slave recovery environment settings..
    Opening /data/mysql/data/relay-log.info ... ok.
    Relay log found at /data/mysql/data, up to mysql-relay-bin
    Temporary relay log file is /data/mysql/data/mysql-relay-bin
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Thu May  ::  - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.1.229 --slave_ip=192.168.1.229 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.35-log --manager_version=0.55 --relay_log_info=/data/mysql/data/relay-log.info  --relay_dir=/data/mysql/data/  --slave_pass=xxx
Thu May  ::  - [info]   Connecting to root@(:).. 
  Checking slave recovery environment settings..
    Opening /data/mysql/data/relay-log.info ... ok.
    Relay log found at /data/mysql/data, up to mysql-relay-bin
    Temporary relay log file is /data/mysql/data/mysql-relay-bin
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Thu May  ::  - [info] Slaves settings check done.
Thu May  ::  - [info] 
 (current master)
 +--192.168.1.221
 +--192.168.1.229

Thu May  ::  - [info] Checking replication health on ..
Thu May  ::  - [info]  ok.
Thu May  ::  - [info] Checking replication health on ..
Thu May  ::  - [info]  ok.
Thu May  ::  - [info] Checking master_ip_failover_script status:
Thu May  ::  - [info]   /etc/mha/app1/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.212 --orig_master_ip=192.168.1.212 --orig_master_port=3306 


IN SCRIPT TEST====/sbin/ifconfig eth0: down==/sbin/ifconfig eth0: /===

Checking the Status of the script.. OK 
Thu May  ::  - [info]  OK.
Thu May  ::  - [warning] shutdown_script is not defined.
Thu May  ::  - [info] Got exit code  (Not master dead).

MySQL Replication Health is OK.
           

检查ssh连接情况 masterha_check_ssh –conf=/etc/mha/app1/app1.conf

[[email protected] app1]# masterha_check_ssh --conf=/etc/mha/app1/app1.conf 
Thu May  ::  - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu May  ::  - [info] Reading application default configurations from /etc/mha/app1/app1.conf..
Thu May  ::  - [info] Reading server configurations from /etc/mha/app1/app1.conf..
Thu May  ::  - [info] Starting SSH connection tests..
Thu May  ::  - [debug] 
Thu May  ::  - [debug]  Connecting via SSH from root@(:) to root@(:)..
Thu May  ::  - [debug]   ok.
Thu May  ::  - [debug]  Connecting via SSH from root@(:) to root@(:)..
Thu May  ::  - [debug]   ok.
Thu May  ::  - [debug] 
Thu May  ::  - [debug]  Connecting via SSH from root@(:) to root@(:)..
Thu May  ::  - [debug]   ok.
Thu May  ::  - [debug]  Connecting via SSH from root@(:) to root@(:)..
Thu May  ::  - [debug]   ok.
Thu May  ::  - [debug] 
Thu May  ::  - [debug]  Connecting via SSH from root@(:) to root@(:)..
Thu May  ::  - [debug]   ok.
Thu May  ::  - [debug]  Connecting via SSH from root@(:) to root@(:)..
Thu May  ::  - [debug]   ok.
Thu May  ::  - [info] All SSH connection tests passed successfully.
           

七、启动mha

vm01服务器上启动mha

[root@vm01 app1]# nohup /usr/local/bin/masterha_manager --conf=/etc/mha/app1/app1.conf --ignore_fail_on_start > /etc/mha/app1/mha_manager.log < /dev/null 2>&1 &
[] 
[root@vm01 app1]# ps -ef|grep masterha_manager
root         : pts/    :: perl /usr/local/bin/masterha_manager --conf=/etc/mha/app1/app1.conf --ignore_fail_on_start
root         : pts/    :: grep masterha_manager
           

此时,检查mysql master服务器上(vm02)的虚拟ip(vip)

[root@vm02 ~]# ip a
: lo: <LOOPBACK,UP,LOWER_UP> mtu  qdisc noqueue state UNKNOWN 
    link/loopback ::::: brd :::::
    inet ../ scope host lo
    inet6 ::/ scope host 
       valid_lft forever preferred_lft forever
: eth: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu  qdisc pfifo_fast state UP qlen 
    link/ether ::::: brd ff:ff:ff:ff:ff:ff
    inet ./ brd . scope global eth
    inet ./ brd . scope global secondary eth:
    inet6 fe8:::ff:fe78:/ scope link 
       valid_lft forever preferred_lft forever
           

在mysql master数据库(vm02)上创建应用账号app,密码为[email protected]

mysql> grant alter,create,select,update,insert,delete  on *.* to 'app'@'192.168.1.%' identified by '[email protected]'; 
Query OK,  rows affected ( sec)

mysql> flush privileges;
Query OK,  rows affected ( sec)
           

在vm01服务器上模拟应用程序远程使用vip连接数据库操作

[[email protected] app1]# mysql -uapp -p -h192.168.1.251
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 
Server version: -log Source distribution

Copyright (c) , , Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| test               |
+--------------------+
 rows in set ( sec)

mysql> create database zabbix;
Query OK,  row affected ( sec)

mysql> use zabbix
Database changed
mysql> create table test_table(id int,name varchar());
Query OK,  rows affected ( sec)

mysql> insert into test_table values(,'zhangjian');
Query OK,  row affected ( sec)
           

模拟master宕机故障,mha自动故障转移,切换master

[root@vm02 ~]# ps -ef|grep mysql
root             : ?        :: /bin/sh /data/mysql/bin/mysqld_safe --datadir=/data/mysql/data --pid-file=/data/mysql/data/vm02.localdomain.pid
mysql    15951 15125  0 17:32 ?        00:00:11 /data/mysql/bin/mysqld --basedir=/data/mysql --datadir=/data/mysql/data --plugin-dir=/data/mysql/lib/plugin --user=mysql --log-error=/data/mysql/mysqld.log --open-files-limit=8192 --pid-file=/data/mysql/data/vm02.localdomain.pid --socket=/data/mysql/mysql.sock --port=
root         : pts/    :: grep mysql
[root@vm02 ~]# pkill mysqld
[root@vm02 ~]# ip a
: lo: <LOOPBACK,UP,LOWER_UP> mtu  qdisc noqueue state UNKNOWN 
    link/loopback ::::: brd :::::
    inet ../ scope host lo
    inet6 ::/ scope host 
       valid_lft forever preferred_lft forever
: eth: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu  qdisc pfifo_fast state UP qlen 
    link/ether ::::: brd ff:ff:ff:ff:ff:ff
    inet ./ brd . scope global eth
    inet6 fe8:::ff:fe78:/ scope link 
       valid_lft forever preferred_lft forever
[root@vm02 ~]# !ps
ps -ef|grep mysql
root         : pts/    :: grep mysql
           

此时,mysql-backup(vm03)服务器上

[[email protected] ~]# ip a
: lo: <LOOPBACK,UP,LOWER_UP> mtu  qdisc noqueue state UNKNOWN 
    link/loopback ::::: brd :::::
    inet / scope host lo
    inet6 ::/ scope host 
       valid_lft forever preferred_lft forever
: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu  qdisc pfifo_fast state UP qlen 
    link/ether fa:a::e9:a: brd ff:ff:ff:ff:ff:ff
    inet / brd  scope global eth0
    inet / brd  scope global secondary eth0:
    inet6 fe80::f89a:ff:fee9:a00/ scope link 
       valid_lft forever preferred_lft forever
[[email protected] ~]# mysql -uroot -p123456
Warning: Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 
Server version: -log Source distribution

Copyright (c) , , Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show master status\G
***************************  row ***************************
             File: mysql-bin
         Position: 
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: 
 row in set ( sec)

mysql> show slave status\G
Empty set ( sec)
           

此时要注意:

因为前面我们是将半同步复制的配置写死在my.cnf配置文件中了,此时master切换后,需要将新master和原master中关于半同步复制的内容对调,同时在mysql命令行中执行set global操作,如下:

新Master:

[mysqld] 
rpl_semi_sync_master_enabled= 
rpl_semi_sync_master_timeout= # 默认10秒


mysql> set global rpl_semi_sync_master_enabled=;
Query OK,  rows affected ( sec)

mysql> set global rpl_semi_sync_slave_enabled=;
Query OK,  rows affected ( sec)
           

原Master:

[mysqld] 
rpl_semi_sync_slave_enabled= 

mysql> set global rpl_semi_sync_master_enabled=;
Query OK,  rows affected ( sec)

mysql> set global rpl_semi_sync_slave_enabled=;      
Query OK,  rows affected ( sec)
           

所有slave上重启io_thread:

mysql> stop slave io_thread;
Query OK,  rows affected ( sec)

mysql> start slave io_thread;
Query OK,  rows affected ( sec)
发现有一个问题,在原master上重启io_thread后,Rpl_semi_sync_slave_status状态仍是OFF.
           

检查半同步复制配置及状态:

show variables like ‘%rpl%’; 
show status like ‘%rpl%’;
           

此时,vm04(mysql slave)服务器上

[[email protected] ~]# mysql -uroot -p123456
Warning: Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 
Server version: -log Source distribution

Copyright (c) , , Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show master  status;
Empty set ( sec)

mysql> show slave status\G
***************************  row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 
                  Master_User: repl
                  Master_Port: 
                Connect_Retry: 
              Master_Log_File: mysql-bin
          Read_Master_Log_Pos: 
               Relay_Log_File: mysql-relay-bin
                Relay_Log_Pos: 
        Relay_Master_Log_File: mysql-bin
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 
                   Last_Error: 
                 Skip_Counter: 
          Exec_Master_Log_Pos: 
              Relay_Log_Space: 
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 
                Last_IO_Error: 
               Last_SQL_Errno: 
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 
                  Master_UUID: c520243d-a--b483-fa9a21e94a00
             Master_Info_File: /data/mysql/data/master.info
                    SQL_Delay: 
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 
 row in set ( sec)
           

此时,从vm01服务器模拟应用程序继续远程使用vip连接数据库操作

mysql> insert into test_table values(,'zhangjian');
ERROR  (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    
Current database: zabbix

Query OK,  row affected ( sec)
           

八、模拟原宕机master恢复

发现,原宕机master恢复后,没有自动变为新master的从库,但vip也没有重新设置回来。新的master仍保持其master地位不变。

[[email protected] ~]# ps -ef|grep mysql
root         : pts/    :: grep mysql
[[email protected] ~]# service mysqld restart
 ERROR! MySQL server PID file could not be found!
Starting MySQL.. SUCCESS! 
[[email protected] ~]# !ps
ps -ef|grep mysql
root             : pts/    :: /bin/sh /data/mysql/bin/mysqld_safe --datadir=/data/mysql/data --pid-file=/data/mysql/data/vm02.localdomain.pid
mysql       : pts/    :: /data/mysql/bin/mysqld --basedir=/data/mysql --datadir=/data/mysql/data --plugin-dir=/data/mysql/lib/plugin --user=mysql --log-error=/data/mysql/mysqld.log --open-files-limit=8192 --pid-file=/data/mysql/data/vm02.localdomain.pid --socket=/data/mysql/mysql.sock --port=3306
root         : pts/    :: grep mysql
[[email protected] ~]# mysql -uroot -p123456
Warning: Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 
Server version: -log Source distribution

Copyright (c) , , Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show master status\G
***************************  row ***************************
             File: mysql-bin
         Position: 
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: 
 row in set ( sec)

mysql> show slave status\G           
Empty set ( sec)
           

原宕机的master开启后,开启mha会报错,原因是检测到非slave服务器(即两个master,尽管只有一个生效)导致启动失败

mha是一次性的,主库发生切换后,切换完成过,mha进程就自动退了。

[[email protected] scripts]# nohup /usr/local/bin/masterha_manager --conf=/etc/mha/app1/app1.conf --ignore_fail_on_start > /etc/mha/app1/mha_manager.log < /dev/null 2>&1 &
[[email protected] scripts]# nohup /usr/local/bin/masterha_manager --conf=/etc/mha/app1/app1.conf --ignore_fail_on_start > /etc/mha/app1/mha_manager.log < /dev/null 2>&1 &
[1] 14621
[[email protected] scripts]# ps -ef|grep manager
root     14627 29444  0 15:12 pts/0    00:00:00 grep manager
[1]+  Exit 1                  nohup /usr/local/bin/masterha_manager --conf=/etc/mha/app1/app1.conf --ignore_fail_on_start > /etc/mha/app1/mha_manager.log < /dev/null 2>&1
[[email protected] ~]# cat /etc/mha/app1/manager.log
Fri May 12 15:06:21 2017 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln604] There are 2 non-slave servers! MHA manages at most one non-slave server. Check configurations.
Fri May 12 15:06:21 2017 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln386] Error happend on checking configurations.  at /usr/local/share/perl5/MHA/MasterMonitor.pm line 300
Fri May 12 15:06:21 2017 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln482] Error happened on monitoring servers.
Fri May 12 15:06:21 2017 - [info] Got exit code 1 (Not master dead).
           

此时查看MHA Manager监控是不正常的

[root@vm01 ~]# /usr/local/bin/masterha_check_status --conf=/etc/mha/app1/app1.conf 
app1 is stopped(:NOT_RUNNING).
           

此时,需要修改mha manager配置文件(app1.conf),将原宕机后恢复的master重做为新master的slave。

在新master上全库备份,用于宕机后恢复的原master用来做新master的slave主从同步用,以下得到的备份文件中会有用于主从同步的change master的相关信息。

将新master上的全库备份文件复制到原宕机Master(vm02)上,恢复数据库,并从数据库备份文件中得到change master所需要的信息。

[root@vm03 ~]# rsync -avzP new_master_data.sql vm02:/root
# 恢复新主库执行全备时完整数据
[root@vm02 ~]# mysql -uroot -p123456 <new_master_data.sql 
# 从备份文件中得到change master所需要信息
[root@vm02 ~]# grep -i "change master to" new_master_data.sql |head -n1
-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000005', MASTER_LOG_POS=;
           

原宕机master(vm02)变为新master(vm03)的从库,同时在vm02上设置 只读set global read_only=1;

在执行change master前我在新master上插入了几条测试数据用于验证同步

# vm02服务器上
mysql> CHANGE MASTER TO master_host='192.168.1.221' , master_user='repl' , master_password='654321' , master_port=3306 , MASTER_LOG_FILE='mysql-bin.000005', MASTER_LOG_POS=344;     
Query OK, 0 rows affected, 2 warnings (0.02 sec)

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

mysql> set global read_only=1;
Query OK, 0 rows affected (0.00 sec)

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.221
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000005
          Read_Master_Log_Pos: 594
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 529
        Relay_Master_Log_File: mysql-bin.000005
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 594
              Relay_Log_Space: 698
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 221
                  Master_UUID: c520243d-356a-11e7-b483-fa9a21e94a00
             Master_Info_File: /data/mysql/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
1 row in set (0.00 sec)

ERROR: 
No query specified

mysql> select * from zabbix.test_table;
+------+-----------+
| id   | name      |
+------+-----------+
|    1 | zhangjian |
|    1 | zhangjian |
|    2 | yy        |
|    3 | fangfang  |
|    8 | hm        |
|    6 | mm        |
+------+-----------+
6 rows in set (0.00 sec)
           

此时,再在MHA manager(vm01)管理节点上开启MHA,正常

注意:

在开启MHA前,要修改MHA manager配置文件secondary_check_script那部分

[root@vm01 ~]# vim /etc/mha/app1/app1.conf
secondary_check_script= /usr/local/bin/masterha_secondary_check -s . -s .
           

开启mha

[root@vm01 ~]# nohup /usr/local/bin/masterha_manager --conf=/etc/mha/app1/app1.conf --ignore_fail_on_start > /etc/mha/app1/mha_manager.log < /dev/null 2>&1 &
[] 
[root@vm01 ~]# ps -ef|grep manager
root         : pts/    :: perl /usr/local/bin/masterha_manager --conf=/etc/mha/app1/app1.conf --ignore_fail_on_start
root         : pts/    :: grep manager
[root@vm01 ~]# ps -ef|grep manager
root         : pts/    :: perl /usr/local/bin/masterha_manager --conf=/etc/mha/app1/app1.conf --ignore_fail_on_start
root         : pts/    :: grep manager
[root@vm01 ~]# ps -ef|grep manager
root         : pts/    :: perl /usr/local/bin/masterha_manager --conf=/etc/mha/app1/app1.conf --ignore_fail_on_start
root         : pts/    :: grep manager
[root@vm01 ~]# ps -ef|grep manager
root         : pts/    :: perl /usr/local/bin/masterha_manager --conf=/etc/mha/app1/app1.conf --ignore_fail_on_start
root         : pts/    :: grep manager
           

检查 MHA状态正常

[root@vm01 ~]# /usr/local/bin/masterha_check_status --conf=/etc/mha/app1/app1.conf 
app1 (pid:) is running(:PING_OK), master:.
           

查看mha启动日志

[[email protected] ~]# cat /etc/mha/app1/manager.log 

Fri May  ::  - [info] MHA::MasterMonitor version .
Fri May  ::  - [info] Dead Servers:
Fri May  ::  - [info] Alive Servers:
Fri May  ::  - [info]   (:)
Fri May  ::  - [info]   (:)
Fri May  ::  - [info]   (:)
Fri May  ::  - [info] Alive Slaves:
Fri May  ::  - [info]   (:)  Version=-log (oldest major version between slaves) log-bin:enabled
Fri May  ::  - [info]     Replicating from (:)
Fri May  ::  - [info]     Primary candidate for the new Master (candidate_master is set)
Fri May  ::  - [info]   (:)  Version=-log (oldest major version between slaves) log-bin:disabled
Fri May  ::  - [info]     Replicating from (:)
Fri May  ::  - [info]     Not candidate for the new Master (no_master is set)
Fri May  ::  - [info] Current Alive Master: (:)
Fri May  ::  - [info] Checking slave configurations..
Fri May  ::  - [warning]  log-bin is not set on slave (:). This host can not be a master.
Fri May  ::  - [info] Checking replication filtering settings..
Fri May  ::  - [info]  binlog_do_db= , binlog_ignore_db= 
Fri May  ::  - [info]  Replication filtering check ok.
Fri May  ::  - [info] Starting SSH connection tests..
Fri May  ::  - [info] All SSH connection tests passed successfully.
Fri May  ::  - [info] Checking MHA Node version..
Fri May  ::  - [info]  Version check ok.
Fri May  ::  - [info] Checking SSH publickey authentication settings on the current master..
Fri May  ::  - [info] HealthCheck: SSH to  is reachable.
Fri May  ::  - [info] Master MHA Node version is .
Fri May  ::  - [info] Checking recovery script configurations on the current master..
Fri May  ::  - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/data --output_file=/var/tmp/save_binary_logs_test --manager_version=0.55 --start_file=mysql-bin.000005 
Fri May  ::  - [info]   Connecting to root@().. 
  Creating /var/tmp if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /data/mysql/data, up to mysql-bin
Fri May  ::  - [info] Master setting check done.
Fri May  ::  - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Fri May  ::  - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.1.212 --slave_ip=192.168.1.212 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.35-log --manager_version=0.55 --relay_log_info=/data/mysql/data/relay-log.info  --relay_dir=/data/mysql/data/  --slave_pass=xxx
Fri May  ::  - [info]   Connecting to root@(:).. 
  Checking slave recovery environment settings..
    Opening /data/mysql/data/relay-log.info ... ok.
    Relay log found at /data/mysql/data, up to mysql-relay-bin
    Temporary relay log file is /data/mysql/data/mysql-relay-bin
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Fri May  ::  - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.1.229 --slave_ip=192.168.1.229 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.35-log --manager_version=0.55 --relay_log_info=/data/mysql/data/relay-log.info  --relay_dir=/data/mysql/data/  --slave_pass=xxx
Fri May  ::  - [info]   Connecting to root@(:).. 
  Checking slave recovery environment settings..
    Opening /data/mysql/data/relay-log.info ... ok.
    Relay log found at /data/mysql/data, up to mysql-relay-bin
    Temporary relay log file is /data/mysql/data/mysql-relay-bin
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Fri May  ::  - [info] Slaves settings check done.
Fri May  ::  - [info] 
 (current master)
 +--192.168.1.212
 +--192.168.1.229

Fri May  ::  - [info] Checking master_ip_failover_script status:
Fri May  ::  - [info]   /etc/mha/app1/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.221 --orig_master_ip=192.168.1.221 --orig_master_port=3306 


IN SCRIPT TEST====/sbin/ifconfig eth0: down==/sbin/ifconfig eth0: /===

Checking the Status of the script.. OK 
Fri May  ::  - [info]  OK.
Fri May  ::  - [warning] shutdown_script is not defined.
Fri May  ::  - [info] Set master ping interval  seconds.
Fri May  ::  - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s  -s 
Fri May  ::  - [info] Starting ping health check on (:)..
Fri May  ::  - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
           

其中”[info] Ping(SELECT) succeeded, waiting until MySQL doesn’t respond..”说明整个系统已经开始监控了。

关闭MHA Manage监控 masterha_stop –conf=/etc/mha/app1/app1.conf

[root@vm01 scripts]# /usr/local/bin/masterha_stop --conf=/etc/mha/app1/app1.conf 
Stopped app1 successfully.
[root@vm01 scripts]# ps -ef|grep manager
root         :09 pts/    :: grep manager
# 再次开启,让mha时刻监控mysql集群状态
[root@vm01 scripts]# nohup /usr/local/bin/masterha_manager --conf=/etc/mha/app1/app1.conf --ignore_fail_on_start > /etc/mha/app1/mha_manager.log < /dev/null 2>&1 &
[] 
[root@vm01 scripts]# ps -ef|grep manager
root         : pts/    :: perl /usr/local/bin/masterha_manager --conf=/etc/mha/app1/app1.conf --ignore_fail_on_start
           

十、配置VIP

vip配置可以采用两种方式,一种通过keepalived的方式管理虚拟ip的浮动;另外一种通过脚本方式启动虚拟ip的方式(即不需要keepalived或者heartbeat类似的软件)。

前面测试是通过脚本方式启动的虚拟ip。

初步测试完毕,暂时先写这么多了,lvs+keepalived部署后续再写。

继续阅读