解决rancher-v2.6.3报helm-operator更新rancher-webhook异常问题

问题描述

namespace Cattle-system中 频繁创建podhelm-operation-xxxxx, 并且状态为Error

排查解决

查看helm-operator相关pod

kubectl get pod -n cattle-system

NAME                               READY   STATUS      RESTARTS       AGE
helm-operation-2mchw               1/2     Error       0              32m
helm-operation-54bq4               1/2     Error       0              22m
helm-operation-5zv78               1/2     Error       0              32m
helm-operation-6tbkg               1/2     Error       0              32m
helm-operation-7z6jn               1/2     Error       0              12m
helm-operation-9tps9               1/2     Error       0              37m

查看日志

kubectl logs helm-operation-9tps9 -n cattle-system -c helm

# 显示如下
helm upgrade --history-max=5 --install=true --namespace=cattle-system --reset-values=true --timeout=5m0s --values=/home/shell/helm/values-rancher-webhook-1.0.2-up0.2.2.yaml --version=1.0.2+up0.2.2 --wait=true rancher-webhook /home/shell/helm/rancher-webhook-1.0.2-up0.2.2.tgz
Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress

根据日志得知 helm 更新rancher-webhook 时, 有已经正在 安装/更新/回滚 的 进程

尝试查看helm 列表,没有看到已经部署的app

helm ls -n cattle-system

NAME           	NAMESPACE    	REVISION	UPDATED                               	STATUS         	CHART  

不显示pending-install 状态服务

helm 增加 -a 参数后显示

helm ls -a -n cattle-system

NAME           	NAMESPACE    	REVISION	UPDATED                               	STATUS         	CHART                        	APP VERSION
rancher-webhook	cattle-system	1       	2021-12-31 04:07:38.73169718 +0000 UTC	pending-install	rancher-webhook-1.0.2+up0.2.2	0.2.2

卸载部署异常的helm app rancher-webhook

helm uninstall rancher-webhook -n cattle-system

等待helm-operation-xxxxx 自动再次触发,并查看

[root@master1 rancher]# kubectl get pod -n cattle-system
NAME                           READY   STATUS              RESTARTS   AGE
helm-operation-ghlwj           1/2     Error               0          5m
helm-operation-h5rfs           1/2     Error               0          44m
helm-operation-hv4ht           1/2     Error               0          39m
helm-operation-jch7c           0/2     ContainerCreating   0          0s

查看日志为正常

kubectl logs helm-operation-jch7c -n cattle-system -c helm

# 显示如下
helm upgrade --history-max=5 --install=true --namespace=cattle-system --reset-values=true --timeout=5m0s --values=/home/shell/helm/values-rancher-webhook-1.0.2-up0.2.2.yaml --version=1.0.2+up0.2.2 --wait=true rancher-webhook /home/shell/helm/rancher-webhook-1.0.2-up0.2.2.tgz
Release "rancher-webhook" does not exist. Installing it now.
creating 6 resource(s)
beginning wait for 6 resources with timeout of 5m0s
Deployment is not ready: cattle-system/rancher-webhook. 0 out of 1 expected pods are ready
Deployment is not ready: cattle-system/rancher-webhook. 0 out of 1 expected pods are ready
Deployment is not ready: cattle-system/rancher-webhook. 0 out of 1 expected pods are ready
Deployment is not ready: cattle-system/rancher-webhook. 0 out of 1 expected pods are ready
Deployment is not ready: cattle-system/rancher-webhook. 0 out of 1 expected pods are ready
NAME: rancher-webhook
LAST DEPLOYED: Thu Oct 13 10:11:09 2022
NAMESPACE: cattle-system
STATUS: deployed
REVISION: 1
TEST SUITE: None

---------------------------------------------------------------------
SUCCESS: helm upgrade --history-max=5 --install=true --namespace=cattle-system --reset-values=true --timeout=5m0s --values=/home/shell/helm/values-rancher-webhook-1.0.2-up0.2.2.yaml --version=1.0.2+up0.2.2 --wait=true rancher-webhook /home/shell/helm/rancher-webhook-1.0.2-up0.2.2.tgz
---------------------------------------------------------------------

查看pod 状态为正常

kubectl get pod helm-operation-jch7c -n cattle-system

NAME                   READY   STATUS      RESTARTS   AGE
helm-operation-jch7c   0/2     Completed   0          2m32s

参考

https://github.com/fluxcd/helm-controller/issues/149

https://forums.rancher.com/t/rancher-in-docker-helm-operation-error/37787

Last updated