概要描述
kubelet 中会发现有下面这样的报错:
summary_sys_containers.go:47] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service
可能后续kubelet记录的错误会变为:CPUAccounting not enabled for pid
7月 31 22:23:43 vqa41 kubelet[13757]: W0731 22:23:43.530498 13757 container_manager_linux.go:842] CPUAccounting not enabled for pid: 13757
7月 31 22:28:43 vqa41 kubelet[13757]: W0731 22:28:43.531162 13757 container_manager_linux.go:842] CPUAccounting not enabled for pid: 13757
7月 31 22:33:43 vqa41 kubelet[13757]: W0731 22:33:43.532605 13757 container_manager_linux.go:842] CPUAccounting not enabled for pid: 13757
7月 31 22:38:43 vqa41 kubelet[13757]: W0731 22:38:43.533253 13757 container_manager_linux.go:842] CPUAccounting not enabled for pid: 13757
7月 31 22:43:43 vqa41 kubelet[13757]: W0731 22:43:43.536079 13757 container_manager_linux.go:842] CPUAccounting not enabled for pid: 13757
7月 31 22:48:43 vqa41 kubelet[13757]: W0731 22:48:43.536546 13757 container_manager_linux.go:842] CPUAccounting not enabled for pid: 13757
7月 31 22:53:43 vqa41 kubelet[13757]: W0731 22:53:43.537916 13757 container_manager_linux.go:842] CPUAccounting not enabled for pid: 13757
7月 31 22:58:43 vqa41 kubelet[13757]: W0731 22:58:43.538449 13757 container_manager_linux.go:842] CPUAccounting not enabled for pid: 13757
7月 31 23:03:43 vqa41 kubelet[13757]: W0731 23:03:43.539382 13757 container_manager_linux.go:842] CPUAccounting not enabled for pid: 13757
7月 31 23:08:43 vqa41 kubelet[13757]: W0731 23:08:43.544078 13757 container_manager_linux.go:842] CPUAccounting not enabled for pid: 13757
以及 MemoryAccounting not enabled for pid
7月 31 22:53:43 vqa41 kubelet[13757]: W0731 22:53:43.537925 13757 container_manager_linux.go:845] MemoryAccounting not enabled for pid: 13757
7月 31 22:58:43 vqa41 kubelet[13757]: W0731 22:58:43.538459 13757 container_manager_linux.go:845] MemoryAccounting not enabled for pid: 13757
7月 31 23:03:43 vqa41 kubelet[13757]: W0731 23:03:43.539393 13757 container_manager_linux.go:845] MemoryAccounting not enabled for pid: 13757
7月 31 23:08:43 vqa41 kubelet[13757]: W0731 23:08:43.544087 13757 container_manager_linux.go:845] MemoryAccounting not enabled for pid: 13757
7月 31 23:13:43 vqa41 kubelet[13757]: W0731 23:13:43.544813 13757 container_manager_linux.go:845] MemoryAccounting not enabled for pid: 13757
7月 31 23:18:43 vqa41 kubelet[13757]: W0731 23:18:43.546129 13757 container_manager_linux.go:845] MemoryAccounting not enabled for pid: 13757
7月 31 23:23:43 vqa41 kubelet[13757]: W0731 23:23:43.546597 13757 container_manager_linux.go:845] MemoryAccounting not enabled for pid: 13757
7月 31 23:28:43 vqa41 kubelet[13757]: W0731 23:28:43.550039 13757 container_manager_linux.go:845] MemoryAccounting not enabled for pid: 13757
7月 31 23:33:43 vqa41 kubelet[13757]: W0731 23:33:43.552043 13757 container_manager_linux.go:845] MemoryAccounting not enabled for pid: 13757
解决方案及分析
问题分析
参考几个 Kubernetes Github 上的 issue:
https://github.com/kubernetes/kubernetes/issues/56850
https://github.com/kubermatic/machine-controller/pull/476
https://github.com/kubernetes/kubernetes/issues/56850#issuecomment-406241077
从上面各个 issue 中,该问题只会发生在 CentOS 系统上,而引起上面的问题的原因是 kubelet 启动时,会执行节点资源统计,需要 systemd 中开启对应的选项,如下:
CPUAccounting:是否开启该 unit 的 CPU 使用统计,bool 类型,可配置 true 或者 false。
MemoryAccounting:是否开启该 unit 的 Memory 使用统计,bool 类型,可配置 true 或者 false。
如果不设置这两项,kubelet 是无法执行该统计命令,导致 kubelet 一直报上面的错误信息。
解决方案
解决上面问题也很简单,直接编辑 systemd 中的 kubelet 服务配置文件中,添加 CPU 和 Memory 配置,可以按下面操作进行更改。
1.编辑配置文件并添加对应配置项
编辑 /usr/lib/systemd/system/kubelet.service 文件,并添加下面配置(默认没有这个配置):
CPUAccounting=true
MemoryAccounting=true
配置信息
[Unit]
Description=Kubernetes Kubelet
After=docker.service docker.socket
Wants=docker.socket
[Service]
CPUAccounting=true
MemoryAccounting=true
ExecStart=/opt/kubernetes/bin/kubelet \
--logtostderr=false \
--v=2 \
--hostname-override=vqa45 \
--log-dir=/var/log/kubernetes/kubelet \
--log-file=/var/log/kubernetes/kubelet/kubelet.log \
--node-labels=master=true,worker=true \
--node-ip=172.22.7.45 \
--pod-infra-container-image=transwarp/pause:tos-2.1.2 \
--network-plugin=cni \
--eviction-hard= \
--bootstrap-kubeconfig=/srv/kubernetes/bootstrap.kubeconfig \
--feature-gates=SupportPodPidsLimit=false,SupportNodePidsLimit=false \
--kubeconfig=/srv/kubernetes/kubeconfig \
--config=/opt/kubernetes/kubelet-config.yaml \
--tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
KillMode=process
Restart=always
RestartSec=15
[Install]
WantedBy=multi-user.target
2.重启 Kubelet 服务
重启 kubelet 服务,让 kubelet 重新加载配置。
# systemctl daemon-reload && systemctl restart kubelet
3.观察 kubelet 日志
重启完 kubelet 后等一段时间,再次观察 kubelet 日志信息。
其他信息
manager 933 修复该问题