内容纲要
概要描述
yarn服务启动失败,查看启动失败角色的日志有如下报错:
java.io.IOException: Packet len9470260 is out of range; org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /rmstore-yarn1/ZKRMStateRoot/application_XXXX
详细说明
报错说明
1、上述报错信息说明yarn服务在启动的时候会去连接zookeeper中对应application的znode,但是连接失败了,导致服务启动失败;
2、手动进入到zookeeper中,执行 ls
或 rmr
命令的时候报了跟上述日志一样的错;
原因说明
1、/rmstore-yarn1/ZKRMStateRoot/application_XXXX 是提交任务到yarn上后产生的znode信息,任务运行完毕之后就可以删除;
2、提交到yarn上的任务太大,在zookeeper中创建的znode超过默认的 1M
限制,因为zookeeper读取数据时会加载到内存中加快读取速度,所以znode数据不宜过大;
解决方案
1、进入zookeeper的pod
[root@mll01 config]# kubectl get pod -owide | grep zookeeper1
zookeeper-server-zookeeper1-69644d7868-48m5n 1/1 Running 0 47d 172.22.33.1 mll01
zookeeper-server-zookeeper1-69644d7868-rn2x6 1/1 Running 0 61d 172.22.33.2 mll02
zookeeper-server-zookeeper1-69644d7868-vrqgt 1/1 Running 0 61d 172.22.33.3 mll03
[root@mll01 config]# kubectl exec -it zookeeper-server-zookeeper1-69644d7868-48m5n bash
[root@mll01 ~]#
2、修改以下文件,调大启动命令时的堆内存限制
/usr/lib/zookeeper/bin/zkCli.sh
/usr/lib/zookeeper/bin/zkServer.sh
启动命令后面添加参数: "-Djute.maxbuffer=10485760"
3、再进入zookeeper中,删除已经运行完毕的application的znode,重启yarn服务后正常