keepalived_to_master.sh
升级为master时的检测脚本(主要检查备机复制状态是否可用,并发送报警)。
#!/bin/bash
#检查从mysql复制状态,确认是否可升级为主
VIP=10.130.10.43
MYSQL=/home/ddb/mysql/xtradb-5.1.47_netease/bin/mysql
MYSQL_HOST=127.0.0.1
MYSQL_PORT=5331
MYSQL_USER=root
MYSQL_PASSWORD=123
Seconds_Behind=5
PDIR=(dirname0)
isOK=1
if [ -f PDIR/repl ] && grep
"fail"PDIR/repl ; then
isOK=0
fi
msg="[`date +\"%Y-%m-%d %T \"`] change master to `hostname -i`:$MYSQL_PORT "
if [ $isOK -eq 1 ] ; then
MYSQL−hMYSQL_HOST -PMYSQLPORT−uMYSQL_USER -pMYSQLPASSWORD−e"SETGLOBALreadonly=OFF;">>PDIR/keepalived.log 2>&1
msg="$msg succeed"
else
sudo /etc/init.d/keepalived stop >>$PDIR/keepalived.log 2>&1
msg="$msg failed"
fi
echo msg>>PDIR/keepalived.log
cd PDIR && python sendmail.py
"msg"
keepalived_to_backup.sh
降级为backup时的检测脚本(设置备机mysql为readonly)。
#!/bin/bash
#备机状态打开read_only
MYSQL=/home/ddb/mysql/xtradb-5.1.47_netease/bin/mysql
MYSQL_HOST=127.0.0.1
MYSQL_PORT=5331
MYSQL_USER=root
MYSQL_PASSWORD=123
PDIR=(dirname0)
MYSQL−hMYSQL_HOST -PMYSQLPORT−uMYSQL_USER -pMYSQLPASSWORD−e"SETGLOBALreadonly=ON;">>PDIR/keepalived.log
在主备两台服务器上执行service keepalived start,启动keepalived。即可实现对mysql的监测与切换。
测试项a-c虚拟机环境,测试项d-f物理机环境
a)按上述配置启动keepalived后,先测试vip是否绑定成功。
node-1网卡配置
node-2网卡配置
如图所示,vip(192.168.194.133)与node-1绑定成功。
b)测试通过vip访问mysql
mysql -h 192.168.194.133 -P 5336 -utest -p123 -e "show variables like 'server_id';"
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id | 20 |
+---------------+-------+
可以看出通过vip访问mysql成功,且访问的是node-1(node-1 server_id=20,node-2 server_id=221 )
c)模拟网络不可用后的切换
1)关闭node-1网卡
192.168.194.130[root@pw-desktop mysql]# ifconfig eth0 down
2)检查vip是否已经切换到node-2
3)测试mysql是否可用
mysql -h 192.168.194.133 -P 5336 -utest -p123 -e "show variables like '%server_id%'"
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id | 221 |
+---------------+-------+
可以看出服务仍可用,切换到了node-2。
4)恢复node-1,测试当前主从。
192.168.194.130[root@pw-desktop mysql]# ifconfig eth0 up
mysql -h 192.168.194.133 -P 5336 -utest -p123 -e "show variables like '%server_id%'"
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id | 221 |
+---------------+-------+
可以看到node-1的网卡恢复后,node-1并未升级为主,从而防止因网络问题导致频繁切换。
d) 模拟mysql服务不可用后的切换
物理机环境
1) kill掉node-1的mysql
mysql -h 10.130.10.43 -P 5331 -uroot -p123 -e "show variables like '%server_id%'";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id | 1 |
+---------------+-------+
ddb@buzz137:~/keep_check$ ps aux|grep mysqld
ddb 10236 0.0 0.0 400672 44556 pts/0 Sl 10:41 0:00 /home/ddb/mysql/xtradb-5.1.47_netease/libexec/mysqld --defaults-file=/home/ddb/mysql/my5331.cnf
ddb 30531 0.0 0.0 10192 880 pts/0 S+ 11:39 0:00 grep mysqld
ddb@buzz137:~/keep_check$ kill -9 10236
2)测试服务
mysql -h 10.130.10.43 -P 5331 -uroot -p123 -e "show variables like '%server_id%'";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id | 2 |
+---------------+-------+
服务仍可用,自动切换到node-2
并接受报警邮件:
3)恢复node-1的mysql服务
ddb@buzz137:~/keep_check$ /home/ddb/mysql/xtradb-5.1.47_netease/libexec/mysqld --defaults-file=/home/ddb/mysql/my5331.cnf &
[1] 31571
ddb@buzz137:~/keep_check$ 130106 11:43:41 [Warning] '--default-character-set' is deprecated and will be removed in a future release. Please use '--character-set-server' instead.
ddb@buzz137:~/keep_check$ mysql -h 10.130.10.43 -P 5331 -uroot -p123 -e "show variables like '%server_id%'";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id | 2 |
+---------------+-------+
可用看到,服务仍在node-2上,从而防止因mysq不稳定导致频繁切换。
e)切换对客户端的影响
使用客户机通过vip连接mysql,中途kill掉绑定vip的node-1的mysqld,可以看到mysql会出现重连,重连后又可正常使用。
mysql> select * from TB_server_ip limit 5;
+----------+----------------+--------------------+
| serverid | ip | hostname |
+----------+----------------+--------------------+
| 453 | 115.236.113.63 | fs-3.photo.163.org |
| 453 | 172.17.2.63 | fs-3.photo.163.org |
| 454 | 115.236.113.64 | fs-4.photo.163.org |
| 454 | 172.17.2.64 | fs-4.photo.163.org |
+----------+----------------+--------------------+
4 rows in set (0.00 sec)
mysql> select * from TB_server_ip limit 5;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
ERROR 2003 (HY000): Can't connect to MySQL server on '10.130.10.43' (110)
ERROR:
Can't connect to the server
mysql> select * from TB_server_ip limit 5;
No connection. Trying to reconnect...
Connection id: 47
Current database: DMP
+----------+----------------+--------------------+
| serverid | ip | hostname |
+----------+----------------+--------------------+
| 453 | 115.236.113.63 | fs-3.photo.163.org |
| 453 | 172.17.2.63 | fs-3.photo.163.org |
| 454 | 115.236.113.64 | fs-4.photo.163.org |
| 454 | 172.17.2.64 | fs-4.photo.163.org |
+----------+----------------+--------------------+
4 rows in set (0.00 sec)
f)复制延迟时不执行切换
1人为制造复制出错环境
buzz138:
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.130.10.137
Master_User: repl
Master_Port: 5331
Connect_Retry: 60
Master_Log_File: mysql-bin.000025
Read_Master_Log_Pos: 907
Relay_Log_File: mysqld-relay-bin.000003
Relay_Log_Pos: 251
Relay_Master_Log_File: mysql-bin.000025
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
…….
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1062
Last_SQL_Error: Error 'Duplicate entry '2-2' for key 'PRIMARY'' on query. Default database: 'test_dbmang'. Query: 'insert into test_dbmang.dbrelation(source_dbid,target_dbid,relation_type,isuse) values(2,2,1,1)'
此时buzz138的check_run脚本会输出复制错误消息到本地
ddb@buzz138:~/keep_check$ cat repl
fail
并发邮件报警:
2关闭主节点buzz137的mysql
ddb@buzz137:~/keep_check/conf$ ps axu|grep mysqld
ddb 7365 0.0 0.0 10192 876 pts/0 S+ 10:06 0:00 grep mysqld
ddb 30235 0.0 0.0 449724 44784 ? Sl Jan05 0:03 /home/ddb/mysql/xtradb-5.1.47_netease/libexec/mysqld --defaults-file=/home/ddb/mysql/my5331.cnf
ddb@buzz137:~/keep_check/conf$ kill -9 30235
buzz137的check_run脚本检测到本机mysql不可用,释放VIP:
ddb@buzz137:~/keep_check/conf$ ip a
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
link/ether d4:ae:52:66:22:20 brd ff:ff:ff:ff:ff:ff
inet 10.130.10.137/23 brd 10.130.11.255 scope global eth1
inet6 fe80::d6ae:52ff:fe66:2220/64 scope link
valid_lft forever preferred_lft forever
并发邮件报警:
3观察复制出错的buzz138能否正常切换
buzz138的keepalived_to_master脚本检测到复制出错后,保持备机mysql read_only,关闭本机keepalived,不抢占VIP。与预期相同
ddb@buzz138:~/keep_check$ ps axu|grep keep
ddb 25026 0.0 0.0 10192 880 pts/1 S+ 10:30 0:00 grep keep
ddb@buzz138:~/keep_check$ ip a
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
link/ether d4:ae:52:66:81:cd brd ff:ff:ff:ff:ff:ff
inet 10.130.10.138/23 brd 10.130.11.255 scope global eth1
并发邮件报警:
此时buzz137的服务也是不可用的,整个服务不可用。
方案设计与测试在虚拟机上完成,使用keepalived实现mysql高可用,特点是简单实用,切换利用虚拟ip重绑定实现,过程对应用层透明。实现应用层的服务状态判定与切换需要通过编写脚本实现。缺点是keepalived对上层开放接口不多(如无法读取当前节点状态、权值),给实现复杂切换业务带来难度。
相关阅读:MySQL高可用方案设计与测试(上篇)
本文来自网易实践者社区,经作者潘威授权发布。