基于keepalived设计MySQL高可用方案,测试方案的可行性。
1.实现在以下两层上检查状态,服务不可用时切换
a)网络不可用或服务器宕机。
b)mysql服务不可用。
2.原主恢复后不自动升级为master,防止因网络问题导致频繁切换
3.保障keepalived切换时,从机可用(复制延迟在可接受范围),否则不提供服务。
本方案实现双机热备(HA)功能。基本设计思路:通过设置 keepalived ,使两个mysql节点属于一个集群(共享一个VIP与VRID,VIP绑定在主机的网卡上)。外部通过VIP访问mysql,当前主机不可用时,从机会抢占VIP。因此外部可以通过VIP来访问mysql以实现高可用。网络层的服务不可用的判定与切换由keepalived实现(基于VRRP协议)。
应用层的服务不可用的判定由自定义脚本实现。判定服务不可用后通过关闭本节点keepalived,释放VIP触发重新选主。此外,还需实现原主恢复后不自动升级为master功能。keepalived自带nopreempt参数实现不抢占功能,但当新主服务再挂掉后由于原主带nopreempt参数,即使原主优先级高仍无法完成切换。故现在结合自定义脚本实现类似功能,备机节点脚本只有当自身mysql可用且主机mysql不可用时才触发切换。
MySql复制有可能出现延迟、复制出错等情况,造成从机MySql不可用,因此在切换时首先判定从机MySql复制状态。初始时从机设成readonly,当从机要升级成主时,先判断复制状态、延迟时间(延迟时间由平时运行的check_run脚本收集),符合条件后再关闭readonly。如果从不符合升级条件就直接把从的vip也关闭,整个服务设为不可用。
1.在主备两台服务器上安装keepalived
2.设置keepalived的配置文件、添加相应的监控与切换检测脚本。
3.设置两台服务器上mysql主主复制。
node-1的keepalived.conf配置:
! Configuration File for keepalived
global_defs {
router_id MySQL-ha
}
vrrp_script check_run {
script "/home/ddb/keep_check/keepalived_check_mysql.sh"
interval 10 # check every 10 seconds
weight -10 # if failed, decrease 10 of the priority
fall 3 # require 2 failures for failures
rise 1 # require 1 sucesses for ok
}
vrrp_instance VI_1 {
state BACKUP #两台配置此处均是BACKUP
interface eth1
virtual_router_id 51
priority 100 #优先级
advert_int 1
notify_backup "/home/ddb/keep_check/keepalived_to_backup.sh"
notify_master "/home/ddb/keep_check/keepalived_to_master.sh"
nopreempt #不抢占,只在优先级高的机器上设置
authentication {
auth_type PASS
auth_pass 1111
}
track_script {
check_run
}
node-2的keepalived.conf配置
! Configuration File for keepalived
global_defs {
router_id MySQL-ha
}
vrrp_script check_run {
script "/home/ddb/keep_check/keepalived_check_mysql.sh"
interval 10 # check every 10 seconds
weight -10 # if failed, decrease 40 of the priority
fall 3 # require 2 failures for failures
rise 1 # require 1 sucesses for ok
}
vrrp_instance VI_1 {
state BACKUP
interface eth1
virtual_router_id 51
priority 50
advert_int 1
notify_backup "/home/ddb/keep_check/keepalived_to_backup.sh"
notify_master "/home/ddb/keep_check/keepalived_to_master.sh"
authentication {
auth_type PASS
auth_pass 1111
}
track_script {
check_run
}
virtual_ipaddress {
10.130.10.43/23
}
}
node-1/node-2 mysql基本配置
[mysqld]
bind-address=0.0.0.0
port=5331
vi /home/pw/mysql/keepalived_check_mysql.sh
chmod +x /home/pw/mysql/keepalived_check_mysql.sh
keepalived_check_mysql.sh:
#!/bin/bash
#监测mysql状态,主mysql不可用时才触发切换。监测当前主从复制状态,如果复制失败输出标志到本地,供keepalived_to_master脚本使用
VIP=10.130.10.43
MYSQL=/home/ddb/mysql/xtradb-5.1.47_netease/bin/mysql
MYSQL_HOST=127.0.0.1
MYSQL_PORT=5331
MYSQL_USER=root
MYSQL_PASSWORD=123
PDIR=(dirname0)
Seconds_Behind=30 #复制延迟时间报警阀值
Email_interval=30 #发送报警的时间间隔
echo>/tmp/mysql_status.txt
MYSQL−hMYSQL_HOST -PMYSQLPORT−uMYSQL_USER -p${MYSQL_PASSWORD} -e "show slave status\G" >/tmp/mysql_status.txt 2>&1
MYSQL_STATUS=$?
#master:0 backup:1
ip a|grep $VIP
IS_MASTER=$?
#判断mysql实例是否可用,不可用时释放VIP,确定VIP被别人抢占后再重新开启keepalived
if [ IS_MASTER
-eq 0 ] && [MYSQL_STATUS -eq 1 ] ;then
sudo /etc/init.d/keepalived stop
while ip a|grep $VIP ; do
sleep 1
done
echo "1">>$PDIR/keepalived.log
if ping $VIP -w 30 >/dev/null ;then
sudo /etc/init.d/keepalived start
fi
echo "2">>$PDIR/keepalived.log
msg="[`date +\"%Y-%m-%d %T \"`] MySql on `hostname -i`:$MYSQL_PORT HALT ! "
echo msg>>PDIR/keepalived.log
cd PDIR && python sendmail.py
"msg"
cd PDIR && python message.py
"msg"
exit 0
#判断复制是否可用,供keepalived_to_master脚本使用
isOK=1
while read line ; do
if echo $line|grep -i 'Seconds_Behind_Master' ;then
status=(echoline|cut -d':' -f2)
if ! echo status|grep
"NULL" && [status -gt $Seconds_Behind ] ; then
isOK=0
fi
fi
if echo $line|grep -i 'Slave_SQL_Running' ;then
status=(echoline|cut -d':' -f2)
if echo $status|grep "No"; then
isOK=0
fi
fi
if echo $line|grep -i 'Master_Host' ;then
Master_Host=(echoline|cut -d':' -f2)
fi
if echo $line|grep -i 'Master_Port' ;then
Master_Port=(echoline|cut -d':' -f2)
fi
if echo $line|grep -i 'Empty set' ;then
isOK=0
fi
done</tmp/mysql_status.txt
if [ $isOK -eq 0 ] ; then
if [ -f PDIR/repl ] && grep "fail"PDIR/repl ; then
lasttime=(ls−lPDIR/repl --time-|awk '{print $6}')
curtime=$(date "+%s")
cur_interval=`expr curtime−lasttime`
echo curinterval>>PDIR/keepalived.log
if [ curinterval−ltEmail_interval ] ;then
exit 0
fi
fi
echo "fail">$PDIR/repl
msg="[`date +\"%Y-%m-%d %T \"`] MySql on `hostname -i`:MYSQLPORT==>MasterHost:Master_Port replication failed"
echo msg>>PDIR/keepalived.log
cd PDIR && python sendmail.py
"msg"
else
相关阅读:MySQL高可用方案设计与测试(下篇)
本文来自网易实践者社区,经作者潘威授权发布。