升级docker-monitor解决进程被误杀的问题

  其他常见问题
内容纲要

概要描述


本案例详细介绍升级docker-monitor的操作过程以及注意事项,简单描述一下升级docker-monitor的作用和意义;
本案例环境:TDH 5.2.2 docker-monitor 1.0.2

详细说明


docker-monitor是为了解决因docker异常导致一些正常服务被误杀的问题,一般表现为Manager、Agent进程莫名消失了,但是pid文件还在。或者是个别节点经常出现端口占用的问题,导致服务无法正常启动;

$ /etc/init.d/transwarp-manager status
transwarp-manager is Not running but pid file found
Caused by:java.net.BindException:Address already in use

操作步骤

  1. 将docker-monitor的rpm包放入repo源
  2. 更新repo源索引,并清空yum缓存
  3. 安装docker-monitor并验证功能

将docker-monitor的rpm包放入repo源

5.2之前的版本,将docker-monitor的rpm包copy至Manager节点的如下目录下:

$ pwd
/var/lib/transwarp-manager/master/pub/transwarp/hadoop_related/common/docker

5.2之后的版本变成如下路径:
CentOS

$ pwd
/var/lib/transwarp-manager/master/pub/native/RHEL7/transwarp/hadoop_related/common/docker/

SUSE:

$ pwd
/var/lib/transwarp-manager/master/pub/native/SLES12/transwarp/hadoop_related/common/docker/

更新repo源索引,并清空yum缓存

$ cd /var/lib/transwarp-manager/master/pub/transwarp
//不要忘记createrepo后面的 ·
$ createrepo .
Spawning worker 0 with 4 pkgs
Spawning worker 1 with 4 pkgs
Spawning worker 2 with 4 pkgs
Spawning worker 3 with 4 pkgs
Spawning worker 4 with 4 pkgs
Spawning worker 5 with 4 pkgs
Spawning worker 6 with 4 pkgs
Spawning worker 7 with 4 pkgs
Workers Finished
Saving Primary metadata
Saving file lists metadata
Saving other metadata
Generating sqlite DBs
Sqlite DBs complete

清空yum缓存
CentOS:

$ yum clean all
已加载插件:fastestmirror
Repodata is over 2 weeks old. Install yum-cron? Or run: yum makecache fast
正在清理软件源: os transwarp
Cleaning up everything
Cleaning up list of fastest mirrors

SUSE:

$ zypper clean -a

安装docker-monitor并验证功能

$ yum remove docker-monitor-tos -y
$ yum -y install docker-monitor-tos
已加载插件:fastestmirror
os                                                                               | 2.9 kB  00:00:00     
transwarp                                                                        | 2.9 kB  00:00:00     
(1/2): transwarp/primary_db                                                      |  27 kB  00:00:00     
(2/2): os/primary_db                                                             | 2.8 MB  00:00:00     
Determining fastest mirrors
软件包 docker-monitor-tos-1.0-1.el7.centos.x86_64 已安装并且是最新版本
无须任何处理

启用docker-monitor,然后启动docker-monitor

$ systemctl enable docker-monitor
$ systemctl start docker-monitor

验证docker-monitor版本和功能

$ md5sum /usr/sbin/docker-monitor
2c656f64f82a00c9535fa65bf73e0c1e  /usr/sbin/docker-monitor
$ docker-monitor version
docker_monitor version 1.0.2 (amd64)
$ journalctl -f -u docker-monitor
-- Logs begin at 三 2019-10-09 09:39:02 CST. --
10月 10 15:40:52 tdh-01 systemd[1]: Started Docker Monitor Service.
10月 10 15:40:52 tdh-01 systemd[1]: Starting Docker Monitor Service...
10月 10 15:40:52 tdh-01 docker-monitor[4005]: I1010 15:40:52.279824    4005 docker_client.go:241] Connecting to docker on unix:///var/run/docker.sock
10月 10 15:40:52 tdh-01 docker-monitor[4005]: I1010 15:40:52.279918    4005 docker_client.go:41] Start docker client with request timeout=1m59s
10月 10 15:40:52 tdh-01 docker-monitor[4005]: I1010 15:40:52.281213    4005 metrics.go:106] Starting metrics server: :9323
10月 10 15:40:52 tdh-01 docker-monitor[4005]: I1010 15:40:52.281759    4005 monitor.go:187] [docker version] func work normally, docker version: 1.13.1
10月 10 15:40:53 tdh-01 docker-monitor[4005]: I1010 15:40:53.375403    4005 containerd_client.go:53] containerd currently has 67 running containers
10月 10 15:40:53 tdh-01 docker-monitor[4005]: I1010 15:40:53.375446    4005 monitor.go:224] containerd component is healthy.
10月 10 15:40:53 tdh-01 docker-monitor[4005]: I1010 15:40:53.396667    4005 monitor.go:238] [containers status] total: 112, running: 67, paused: 0, stopped: 45
10月 10 15:40:53 tdh-01 docker-monitor[4005]: I1010 15:40:53.400259    4005 monitor.go:252] [docker exec] func work normally!

docker-monitor的rpm包下载链接请联系星环售后工程师获取

这篇文章对您有帮助吗?

平均评分 5 / 5. 次数: 1

尚无评价,您可以第一个评哦!

非常抱歉,这篇文章对您没有帮助.

烦请您告诉我们您的建议与意见,以便我们改进,谢谢您。