解决rancher-v2.6.3报helm-operator更新rancher-webhook异常问题

问题描述

namespace Cattle-system中 频繁创建podhelm-operation-xxxxx, 并且状态为Error

排查解决

查看helm-operator相关pod

kubectl get pod -n cattle-system

NAME                               READY   STATUS      RESTARTS       AGE
helm-operation-2mchw               1/2     Error       0              32m
helm-operation-54bq4               1/2     Error       0              22m
helm-operation-5zv78               1/2     Error       0              32m
helm-operation-6tbkg               1/2     Error       0              32m
helm-operation-7z6jn               1/2     Error       0              12m
helm-operation-9tps9               1/2     Error       0              37m

查看日志

kubectl logs helm-operation-9tps9 -n cattle-system -c helm

# 显示如下
helm upgrade --history-max=5 --install=true --namespace=cattle-system --reset-values=true --timeout=5m0s --values=/home/shell/helm/values-rancher-webhook-1.0.2-up0.2.2.yaml --version=1.0.2+up0.2.2 --wait=true rancher-webhook /home/shell/helm/rancher-webhook-1.0.2-up0.2.2.tgz
Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress

根据日志得知 helm 更新rancher-webhook 时, 有已经正在 安装/更新/回滚 的 进程

尝试查看helm 列表,没有看到已经部署的app

不显示pending-install 状态服务

helm 增加 -a 参数后显示

卸载部署异常的helm app rancher-webhook

等待helm-operation-xxxxx 自动再次触发,并查看

查看日志为正常

查看pod 状态为正常

参考

https://github.com/fluxcd/helm-controller/issues/149

https://forums.rancher.com/t/rancher-in-docker-helm-operation-error/37787

Last updated