内容纲要
概要描述
本文描述Kundb relay-log文件不正常的排查思路以及解决方案
详细描述
常见于集群出现意外断电等情况,导致kundb服务不正常
确认kundb角色状态
链接kundb角色 执行
select * from performance_schema.replication_group_members;
如下图所示发现只有一个primary可用,其余节点均处于recovery状态,且过一段时间会退出。
查看日志
查看2个问题节点的日志 /mnt/disk1/kundb11/kundbdata/error.log
均发现以下报错
[ERROR] [MY-013122] [Repl] Slave SQL for channel 'group_replication_applier': Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, the server was unable to fetch a keyring key required to open an encrypted relay log file, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: MY-013122
关键词:the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted
如果存在该错表明relaylog在断电时文件发生了损坏,需要清理掉错误日志。
清理方式
在有问题的节点上式将relay-log目录mv掉并新建一个空目录
进入kundb的pod内
ps -ef|grep mysqld
根据进程出来的路径,比如
在对应的pod内执行:
cd /vdir/mnt/disk1/kundb11/kundbdata/
mv relay-logs relay-logs-bak
mkdir relay-logs
退出pod ,kubectl delete pod xxxx 对问题pod进行重启
重启之后 链接kundb再次进行验证
select * from performance_schema.replication_group_members;
确保三个都是online状态即可。