MySQL高可用方案设计与测试(下篇)

阿凡达2018-07-13 09:31


Mysql切换检测脚本

keepalived_to_master.sh

升级为master时的检测脚本(主要检查备机复制状态是否可用,并发送报警)。

#!/bin/bash

#检查从mysql复制状态,确认是否可升级为主

VIP=10.130.10.43

MYSQL=/home/ddb/mysql/xtradb-5.1.47_netease/bin/mysql

MYSQL_HOST=127.0.0.1

MYSQL_PORT=5331

MYSQL_USER=root

MYSQL_PASSWORD=123

Seconds_Behind=5

PDIR=(dirname0)

isOK=1

if [ -f PDIR/repl ] && grep
"fail"
PDIR/repl ; then

        isOK=0

fi

msg="[`date +\"%Y-%m-%d %T \"`] change master to `hostname -i`:$MYSQL_PORT "

if [ $isOK -eq 1 ] ; then

        MYSQLhMYSQL_HOST -PMYSQLPORTuMYSQL_USER -pMYSQLPASSWORDe"SETGLOBALreadonly=OFF;">>PDIR/keepalived.log 2>&1

        msg="$msg succeed"

else

        sudo /etc/init.d/keepalived stop >>$PDIR/keepalived.log 2>&1

        msg="$msg failed"

fi

echo msg>>PDIR/keepalived.log

cd PDIR && python sendmail.py
"
msg"

keepalived_to_backup.sh

降级为backup时的检测脚本(设置备机mysqlreadonly)。

#!/bin/bash

#备机状态打开read_only

MYSQL=/home/ddb/mysql/xtradb-5.1.47_netease/bin/mysql

MYSQL_HOST=127.0.0.1

MYSQL_PORT=5331

MYSQL_USER=root

MYSQL_PASSWORD=123

PDIR=(dirname0)

MYSQLhMYSQL_HOST -PMYSQLPORTuMYSQL_USER -pMYSQLPASSWORDe"SETGLOBALreadonly=ON;">>PDIR/keepalived.log

启动keepalived

在主备两台服务器上执行service keepalived start,启动keepalived。即可实现对mysql的监测与切换。

五、方案测试

测试项a-c虚拟机环境,测试项d-f物理机环境

a)按上述配置启动keepalived后,先测试vip是否绑定成功。

node-1网卡配置

node-2网卡配置

如图所示,vip192.168.194.133)与node-1绑定成功。

b)测试通过vip访问mysql

mysql -h 192.168.194.133 -P 5336 -utest -p123 -e "show variables like 'server_id';"

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| server_id     | 20    |

+---------------+-------+

可以看出通过vip访问mysql成功,且访问的是node-1node-1 server_id=20,node-2 server_id=221

c)模拟网络不可用后的切换

1)关闭node-1网卡

192.168.194.130[root@pw-desktop mysql]# ifconfig eth0 down

2)检查vip是否已经切换到node-2

3)测试mysql是否可用

mysql -h 192.168.194.133 -P 5336 -utest -p123 -e "show variables like '%server_id%'"

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| server_id     | 221   |

+---------------+-------+

可以看出服务仍可用,切换到了node-2

4)恢复node-1,测试当前主从。

192.168.194.130[root@pw-desktop mysql]# ifconfig eth0 up

mysql -h 192.168.194.133 -P 5336 -utest -p123 -e "show variables like '%server_id%'"

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| server_id     | 221   |

+---------------+-------+

可以看到node-1的网卡恢复后,node-1并未升级为主,从而防止因网络问题导致频繁切换。

d) 模拟mysql服务不可用后的切换

物理机环境

 

1) killnode-1mysql

mysql -h 10.130.10.43 -P 5331 -uroot -p123 -e "show variables like '%server_id%'";

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| server_id     | 1     |

+---------------+-------+

ddb@buzz137:~/keep_check$ ps aux|grep mysqld

ddb      10236  0.0  0.0 400672 44556 pts/0    Sl   10:41   0:00 /home/ddb/mysql/xtradb-5.1.47_netease/libexec/mysqld --defaults-file=/home/ddb/mysql/my5331.cnf

ddb      30531  0.0  0.0  10192   880 pts/0    S+   11:39   0:00 grep mysqld

ddb@buzz137:~/keep_check$ kill -9 10236

2)测试服务

mysql -h 10.130.10.43 -P 5331 -uroot -p123 -e "show variables like '%server_id%'";

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| server_id     | 2     |

+---------------+-------+

服务仍可用,自动切换到node-2

并接受报警邮件:

3)恢复node-1mysql服务

ddb@buzz137:~/keep_check$ /home/ddb/mysql/xtradb-5.1.47_netease/libexec/mysqld --defaults-file=/home/ddb/mysql/my5331.cnf &

[1] 31571

ddb@buzz137:~/keep_check$ 130106 11:43:41 [Warning] '--default-character-set' is deprecated and will be removed in a future release. Please use '--character-set-server' instead.

ddb@buzz137:~/keep_check$ mysql -h 10.130.10.43 -P 5331 -uroot -p123 -e "show variables like '%server_id%'";

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| server_id     | 2     |

+---------------+-------+

可用看到,服务仍在node-2上,从而防止因mysq不稳定导致频繁切换。

e)切换对客户端的影响

使用客户机通过vip连接mysql,中途kill掉绑定vipnode-1mysqld,可以看到mysql会出现重连,重连后又可正常使用。

mysql> select * from TB_server_ip  limit 5;

+----------+----------------+--------------------+

| serverid | ip             | hostname           |

+----------+----------------+--------------------+

|      453 | 115.236.113.63 | fs-3.photo.163.org |

|      453 | 172.17.2.63    | fs-3.photo.163.org |

|      454 | 115.236.113.64 | fs-4.photo.163.org |

|      454 | 172.17.2.64    | fs-4.photo.163.org |

+----------+----------------+--------------------+

4 rows in set (0.00 sec)

mysql> select * from TB_server_ip  limit 5;

ERROR 2006 (HY000): MySQL server has gone away

No connection. Trying to reconnect...

ERROR 2003 (HY000): Can't connect to MySQL server on '10.130.10.43' (110)

ERROR:

Can't connect to the server

mysql> select * from TB_server_ip  limit 5;

No connection. Trying to reconnect...

Connection id:    47

Current database: DMP

+----------+----------------+--------------------+

| serverid | ip             | hostname           |

+----------+----------------+--------------------+

|      453 | 115.236.113.63 | fs-3.photo.163.org |

|      453 | 172.17.2.63    | fs-3.photo.163.org |

|      454 | 115.236.113.64 | fs-4.photo.163.org |

|      454 | 172.17.2.64    | fs-4.photo.163.org |

+----------+----------------+--------------------+

4 rows in set (0.00 sec)

f)复制延迟时不执行切换

1人为制造复制出错环境

buzz138

mysql> show slave status\G

*************************** 1. row ***************************

               Slave_IO_State: Waiting for master to send event

                  Master_Host: 10.130.10.137

                  Master_User: repl

                  Master_Port: 5331

                Connect_Retry: 60

              Master_Log_File: mysql-bin.000025

          Read_Master_Log_Pos: 907

               Relay_Log_File: mysqld-relay-bin.000003

                Relay_Log_Pos: 251

        Relay_Master_Log_File: mysql-bin.000025

             Slave_IO_Running: Yes

            Slave_SQL_Running: No

              Replicate_Do_DB:

         …….

        Seconds_Behind_Master: NULL

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 0

                Last_IO_Error:

               Last_SQL_Errno: 1062

               Last_SQL_Error: Error 'Duplicate entry '2-2' for key 'PRIMARY'' on query. Default database: 'test_dbmang'. Query: 'insert into test_dbmang.dbrelation(source_dbid,target_dbid,relation_type,isuse) values(2,2,1,1)'

此时buzz138check_run脚本会输出复制错误消息到本地

ddb@buzz138:~/keep_check$ cat repl

fail

并发邮件报警:

2关闭主节点buzz137mysql

ddb@buzz137:~/keep_check/conf$ ps axu|grep mysqld

ddb       7365  0.0  0.0  10192   876 pts/0    S+   10:06   0:00 grep mysqld

ddb      30235  0.0  0.0 449724 44784 ?        Sl   Jan05   0:03 /home/ddb/mysql/xtradb-5.1.47_netease/libexec/mysqld --defaults-file=/home/ddb/mysql/my5331.cnf

ddb@buzz137:~/keep_check/conf$ kill -9 30235

buzz137check_run脚本检测到本机mysql不可用,释放VIP

ddb@buzz137:~/keep_check/conf$ ip a

3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000

    link/ether d4:ae:52:66:22:20 brd ff:ff:ff:ff:ff:ff

    inet 10.130.10.137/23 brd 10.130.11.255 scope global eth1

    inet6 fe80::d6ae:52ff:fe66:2220/64 scope link

       valid_lft forever preferred_lft forever

并发邮件报警:

3观察复制出错的buzz138能否正常切换

buzz138keepalived_to_master脚本检测到复制出错后,保持备机mysql read_only,关闭本机keepalived,不抢占VIP。与预期相同

ddb@buzz138:~/keep_check$ ps axu|grep keep

ddb      25026  0.0  0.0  10192   880 pts/1    S+   10:30   0:00 grep keep

ddb@buzz138:~/keep_check$ ip a

3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000

    link/ether d4:ae:52:66:81:cd brd ff:ff:ff:ff:ff:ff

    inet 10.130.10.138/23 brd 10.130.11.255 scope global eth1

并发邮件报警:

此时buzz137的服务也是不可用的,整个服务不可用。

六、结论

方案设计与测试在虚拟机上完成,使用keepalived实现mysql高可用,特点是简单实用,切换利用虚拟ip重绑定实现,过程对应用层透明。实现应用层的服务状态判定与切换需要通过编写脚本实现。缺点是keepalived对上层开放接口不多(如无法读取当前节点状态、权值),给实现复杂切换业务带来难度。

相关阅读:MySQL高可用方案设计与测试(上篇)

本文来自网易实践者社区,经作者潘威授权发布。