Helm: UPGRADE FAILED: No resource with the name "" found

Created on 13 Sep 2016  ·  110Comments  ·  Source: helm/helm

Repro

Create a simple Chart.yaml:

name: upgrade-repro
version: 0.1.0

With a single K8S resource in the templates/ dir:

kind: ConfigMap
apiVersion: v1
metadata:
  name: cm1
data:
  example.property.1: hello

Install the chart:

helm install .
exasperated-op
Last Deployed: Tue Sep 13 12:43:23 2016
Namespace: default
Status: DEPLOYED

Resources:
==> v1/ConfigMap
NAME      DATA      AGE
cm1       1         0s

Verify the release exists:

helm status exasperated-op
Last Deployed: Tue Sep 13 12:43:23 2016
Namespace: default
Status: DEPLOYED

Resources:
==> v1/ConfigMap
NAME      DATA      AGE
cm1       1         1m

Now add a 2nd K8S resource in templates/ dir:

kind: ConfigMap
apiVersion: v1
metadata:
  name: cm2
data:
  example.property.2: hello

Upgrade the chart:

helm upgrade exasperated-op .
Error: UPGRADE FAILED: Looks like there are no changes for cm1

That's weird. Bump the version in Chart.yaml:

name: upgrade-repro
version: 0.2.0

Try upgrade again:

helm upgrade exasperated-op .
Error: UPGRADE FAILED: No resource with the name cm2 found.

Expected

helm upgrade should create the cm2 resource instead of erroring that it doesn't exist.

Edit: to be clear: helm _is_ creating the cm2 ConfigMap, but helm fails regardless.

Current state after performing steps

helm status exasperated-op
Last Deployed: Tue Sep 13 12:43:23 2016
Namespace: default
Status: DEPLOYED

Resources:
==> v1/ConfigMap
NAME      DATA      AGE
cm1       1         6m

kubectl get configmap --namespace default
NAME           DATA      AGE
cm1            1         6m
cm2            1         4m
bug

Most helpful comment

This is a process I use to recover from this problem (so far it has worked every time without any incident... but be careful anyway):

  1. Run helm list and find out latest revision for affected chart

    NAME        REVISION UPDATED                  STATUS  CHART              NAMESPACE
    fetlife-web 381      Thu Mar 15 19:46:00 2018 FAILED  fetlife-web-0.1.0  default
    
  2. Go from there and find latest revision with DEPLOYED state
    kubectl -n kube-system edit cm fetlife-web.v381 kubectl -n kube-system edit cm fetlife-web.v380 kubectl -n kube-system edit cm fetlife-web.v379 kubectl -n kube-system edit cm fetlife-web.v378
  3. Once you find last DEPLOYED revision, change its state from DEPLOYED to SUPERSEDED and save the file
  4. Try to do helm upgrade again, if it's successful then you are done!
  5. If you encounter upgrade error like this:
    Error: UPGRADE FAILED: "fetlife-web" has no deployed releases
    then edit the status for very last revision from FAILED to DEPLOYED
    kubectl -n kube-system edit cm fetlife-web.v381
  6. Try to do helm upgrade again, if it fails again just flip the table...

All 110 comments

I'm running into a similar issue where I have a chart with bundled dependencies. If I add a new dependency and run a helm upgrade the result is the same as described. The resources are properly created however helm returns an error.

So, if this is installed: helm install -n my-release

my-thing/
  Chart.yml
  charts/
    depended-upon-thing/

And then a new chart is added as a dependency:

my-thing/
  Chart.yml
  charts/
    depended-upon-thing/
    new-dependency/

When the release is upgraded with: helm upgrade my-release my-thing helm produces the following error:

Error: UPGRADE FAILED: No resource with the name new-dependency found.

@devth I'm not able to reproduce this issue on master. Are you still seeing this problem? What version of helm/tiller are you running?

Thanks!

@elementalvoid I was also unable to reproduce the new dependency error on master. Are you still seeing this problem? What version of helm/tiller are you running?

Thank you.

At the time I was on alpha 4. Using alpha 5 and @devth's example I was also unable to reproduce the issue.

Alright. I'll close this for now. Feel free to file an issue if you see either of these problems again.

Thanks again.

@michelleN thanks! Sorry I haven't had time this week to attempt a repro on master. Looking forward to upgrading soon!

Same for me when moving a hostPath Deployment/Volume spec to PVC.
Bug seems to be when an upgrade Manifest depends on a new one ("missing" in the old one?)
version: 2.7.2

Strange, I am seeing the same behavior trying to upgrade a chart in version 2.7.2 with a new role. Tiller complains that it can't find the role and fails the deployments, even though it really created the role.

My situation was that I had a new resource, and I deployed the new version of the helm chart with the new resource. That deployment failed b/c I fat fingered some yaml. Well, the new objects were created in kubernetes. I fixed the yaml, and ran the upgrade on my chart again, and voila, the error message that the resource is not found appears. I had to go into kubernetes and remove the new resources (in my case a role and rolebinding) that were created by the failed deployment. After that, the helm check to see if the current object exists fails (https://github.com/kubernetes/helm/blob/7432bdd716c4bc34ad95a85a761c7cee50a74ca3/pkg/kube/client.go#L257) will not succeed, and the resources are created again. Seems like a bug, where maybe new resources for a failed chart should be accounted for?

Getting similar error while upgrading:

$ helm upgrade --install bunny ./app --namespace=staging --reuse-values --debug
[debug] Created tunnel using local port: '53859'

[debug] SERVER: "127.0.0.1:53859"

Error: UPGRADE FAILED: no ConfigMap with the name "bunny-proxy-config" found

Configmap is created

$ k get configmap
NAME                 DATA      AGE
bunny-proxy-config   1         7m

My configmap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: {{ template "proxy.fullname" . }}-config
  labels:
    app: {{ template "proxy.name" . }}
    chart: {{ template "proxy.chart" . }}
    release: {{ .Release.Name }}
    heritage: {{ .Release.Service }}
data:
  asd: qwe

We have the same issue.

I deleted the whole release and then installed again. Currently it seems to be working.

$ helm del --purge bunny

I am also having this issue on

Client: &version.Version{SemVer:"v2.8.0", GitCommit:"14af25f1de6832228539259b821949d20069a222", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.8.0", GitCommit:"14af25f1de6832228539259b821949d20069a222", GitTreeState:"clean"}

This happens frequently with our usage of helm and requires a full --purge. That is _not_ a solution.

It is not applicable if you use CI/CD.
What happens if a upgrade fails and you use rolling update strategy. Must I delete my still working release?

I see the same issue as well when there is a Deployment issue or similar but the secret/cm got created but then Helm loses track of it, refusing to let you do much.

I've seen it, rarely tho, happen even on a non-broken release (i.e. it is seen as have gone through) but I have yet to figure out what could cause that.

We're also able to reproduce this issue (server v2.8.2) when adding resources to existing helm deployments. Having to delete the deployment and redeploy each time a new resource has to be added will be a big problem in production.

In our case we were adding a configmap to a chart and the chart fails to be upgraded with:

Error: UPGRADE FAILED: no resource with the name "" found

Note: We're using 2.7.2; on later versions this message has changed to include the type of the resource that can't be found.

I believe this happens because when helm is determining what has changed it looks for the new configmap resource in the old release, and fails to find it. See https://github.com/kubernetes/helm/blob/master/pkg/kube/client.go#L276-L280 for the code where this error comes from.

Tiller logs for the failing upgrade:

[tiller] 2018/05/03 19:09:14 preparing update for staging-collector
[storage] 2018/05/03 19:09:14 getting deployed release from "staging-collector" history
[tiller] 2018/05/03 19:10:39 getting history for release staging-collector
[storage] 2018/05/03 19:10:39 getting release history for "staging-collector"
[tiller] 2018/05/03 19:10:41 preparing update for staging-collector
[storage] 2018/05/03 19:10:41 getting deployed release from "staging-collector" history
[storage] 2018/05/03 19:10:42 getting last revision of "staging-collector"
[storage] 2018/05/03 19:10:42 getting release history for "staging-collector"
[tiller] 2018/05/03 19:10:44 rendering collector chart using values
[tiller] 2018/05/03 19:10:44 creating updated release for staging-collector
[storage] 2018/05/03 19:10:44 creating release "staging-collector.v858"
[tiller] 2018/05/03 19:10:44 performing update for staging-collector
[tiller] 2018/05/03 19:10:44 executing 0 pre-upgrade hooks for staging-collector
[tiller] 2018/05/03 19:10:44 hooks complete for pre-upgrade staging-collector
[kube] 2018/05/03 19:10:44 building resources from updated manifest
[kube] 2018/05/03 19:10:44 checking 3 resources for changes
[tiller] 2018/05/03 19:10:44 warning: Upgrade "staging-collector" failed: no resource with the name "collector-config" found 
[storage] 2018/05/03 19:10:44 updating release "staging-collector.v857"
[storage] 2018/05/03 19:10:44 updating release "staging-collector.v858" 

This problem also arises when changing the name label of a deployed Service, perhaps other things as well.

I'm changing the name of a Service in a release and it fails to upgrade with:

Error: UPGRADE FAILED: no Service with the name "new-service-name" found

I'd be willing to create a PR to fix this behavior, but I'd like to know what the intended or suggested way of handling this. Even a CLI flag that allows --force to take precedence would be great.

Agree on the importance.

This problem can be weird when you cannot simply delete a deployment.

I found our issue was because of a failed deploy.

Helm doesn't attempt to clean up after a failed deploy, which means things like the new ConfigMap I added above get created but without a reference in the 'prior' deploy. That means when the next deploy occurs, helm finds the resource in k8s and expects it to be referenced in the latest deployed revision (or something; I'm not sure what exact logic it uses to find the 'prior' release) to check what changes there are. It's not in that release, so it cannot find the resource, and fails.

This is mainly an issue when developing a chart as a failed deploy puts k8s in a state helm does not properly track. When I figured out this is what was happening I knew I just needed to delete the ConfigMap from k8s and try the deploy again.

@krishicks Yes, this is one way to repro it. A failed deploy + a never created resource (i.e invalid configmap) can also cause this as well I've noticed, which then leads to a unrecoverable state

We are hitting this one, too. It's the same issue that @krishicks and @jaredallard mention:

  1. We have a failed deployment:
    UPGRADE FAILED: the server was unable to return a response in the time allotted, but may still be processing the request (get configmaps)
  2. Any subsequent changes, also to other releases, fail with a warning like
    Error: UPGRADE FAILED: no Service with the name "…" found

I'll try to use the helm upgrade --timeout … flag to mitigate the first issue, but a failed deployment blocking everything is quite an issue for us. Also, using helm rollback … did not resolve this.

As helm upgrade … runs automatically in our use-case, an --auto-rollback flag for helm upgrade would be very helpful, which reverts the failed changes.

This is happening for me with v2.7.2 when adding new resources to the chart.
Is there any estimate on when a fix will be in for this?

This should be fixed with #4146.

EDIT: see below

So I found a few bugs with #4146 that makes it an undesirable PR to move forward with. I reported my findings between master, #4146, and #4223 here: https://github.com/kubernetes/helm/pull/4223#issuecomment-397413568

@adamreese and I managed to identify the underlying bug that causes this particular error, and go through the different scenarios and edge cases with each of the proposed PRs. If anyone else could confirm my findings or find other edge cases, that would be much appreciated!

Oh, and something I failed to mention: because the cluster's in an inconsistent state, this can easily be worked around by manually intervening and deleting the resource that the error reports as "not found". Following the example I demonstrated in https://github.com/kubernetes/helm/pull/4223#issuecomment-397413568:

><> helm fetch --untar https://github.com/kubernetes/helm/files/2103643/foo-0.1.0.tar.gz
><> helm install ./foo/
...
><> vim foo/templates/service.yaml
><> kubectl create -f foo/templates/service.yaml
service "foo-bar" created
><> helm upgrade $(helm last) ./foo/
Error: UPGRADE FAILED: no Service with the name "foo-bar" found
><> kubectl delete svc foo-bar
service "foo-bar" deleted
><> helm upgrade $(helm last) ./foo/
Release "riotous-echidna" has been upgraded. Happy Helming!
...
><> kubectl get svc
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
foo-bar      ClusterIP   10.104.143.52   <none>        80/TCP    3s
kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP   1h

@bacongobbler That doesn't always work. We've had situations where deleting it does work and some where it still doesn't work after doing that.

this can easily be worked around by manually intervening and deleting the resource

@bacongobbler Please, understand that this resource might be a production namespace's Service or Deployment object, which might (and already did) heavily disrupt our service guarantees.

Yes I know. I'm just explaining and observing the bug's behaviour so others know what is involved. :)

Just run into this issue twice on different clusters. Each time with configmaps. Deleting the resources didn't solve the issue, so we had to remove the entire release.

Same here. Not only Configmaps I have one with ServiceAccount.

PVC here. :)

Basically every kind of prioritized object is subject to this bug.

Is there someone assigned to fix this? Is there a PR for this already? Can I help with anything?

I've been bitten by this issue more than once since it's an easy situation to get yourself into but apparently there's no easy way to get out of. I suppose the "good" part in my case is that resources are updated even with the error on the release (not sure if that makes me happy or worried)

I think helm should either forbid the user from getting into this wrong state or correctly handle it.
Are there any real fixes to this outside of deleting everything (that is only viable for non-production uses)?

If anyone else can determine the other edge case where deleting the resource didn't solve the issue, that would be very helpful in determining the root cause to solve that particular issue. There may be multiple paths that can end up with the same error.

@Draiken no, we've attempted multiple solutions to the problem, and none of them seem reasonable as they either

a) don't perform the upgrade as intended, or
b) introduce new bugs

I wrote up about those solutions here and why they won't work. If you can figure out an alternative solution, we'd be happy to take a look at it.

We can't gatekeep users from getting into this wrong state either; we've looked at solutions but again they all introduce a different set of problems. Once the install's in an inconsistent state, it's hard to "fix" it without manual intervention. 😢

A workaround that worked for me is to do a helm rollback ... to right before the failure occurred. I then validate that the chart works on a new release with helm install -n new-test-release ..

Once everything works, I cleanup test release and run the helm upgrade ... against the old release; and everything worked. This is an annoying workaround but it seems to work.

I don't know if this helps at all, but I just ran into this issue on both my test and production clusters.

The changes I made to my helm files were pretty simple:
I had 1 existing deployment, with an associated service and poddisruptionbudget, which was left unchanged.
I added a 2nd deployment, with its own service and poddisruptionbudget.
I incremented the chart version number.

When running helm I got this error the first time:

KUBECONFIG=**** helm upgrade --kube-context=mycluster -f helm/project/mycluster.yaml project ./helm/project --install --wait --timeout 1200

Error: UPGRADE FAILED: Deployment.apps "blah-blah" is invalid: spec.template.metadata.labels: Invalid value: map[string]string{"app":"blah-blah", "chart":"blah-3", "name":"blah", "release":"project"}: `selector` does not match template `labels`

When running helm again, I now get this error over and over again:

KUBECONFIG=**** helm upgrade --kube-context=mycluster -f helm/project/mycluster.yaml project ./helm/project --install --wait --timeout 1200

Error: UPGRADE FAILED: no Service with the name "blah-blah" found

The helm chart of course worked in test when I deleted everything and redeployed. That is not really an option for prod though.

@veqryn I've run into this a lot, I basically use my build in #4146 whenever I run into this issue and then swap back to the mainline one. Works every time. I know the maintainers might not recommend it, but _it works_ which is better than nothing.

EDIT: If anyone is interested in trying it, I can push it to a public docker repo and include a quick snippet of how to use it.

@jaredallard we are interested. Thanks!

I know the maintainers might not recommend it, but it works which is better than nothing.

@jaredallard we couldn't recommend that patch simply because it doesn't work on its own (ref: https://github.com/helm/helm/pull/4223#issuecomment-397413568). It bypasses the error, but it doesn't upgrade the resources, so the patch doesn't do what the user originally intended to do. It fixes one problem but introduces another without fixing the original issue that users intend to perform: upgrade a resource.

However, this is intriguing:

I basically use my build in #4146 whenever I run into this issue and then swap back to the mainline one.

If I'm reading this right, you're suggesting that you've found a workaround that

a) bypasses the error
b) allows one to upgrade resources as they originally intended

by doing 2 helm upgrades: one with the patch, and one without? That could help us better identify the root cause and how to fix this error.

@bacongobbler I'd have to revisit this to 100% verify that that is the behaviour. I'll update this comment or post another when I have.

I know the maintainers might not recommend it, but it works which is better than nothing.

Also, to clarify, I'm not trying to throw shade there! It's little badly worded looking back at it now, sorry

I'm still confused on why my helm failed the first time to begin with.
It did not get the no X with the name Y error until the second time I tried to apply it.

@veqryn I wrote up on how this issue comes up in the first place in the issue I linked above. Please read through the comment; happy to help clarify the issue in more detail if it's unclear.

For the lazy: https://github.com/helm/helm/pull/4223#issuecomment-397413568

I actually did read that, and my understanding was that the issue happened to you because you changed your service's name.

However, at no point did any of my services or any resources change names.

And after re-re-reading your comment and talking with my crew, we figured out the cause of our error:
I had bumped my helm Chart's version.
That chart version was referenced as a label by my deployments and services.
Kube/helm does not like it when your label's change, and this is what caused the original error.

The solution (for me) was to use helm to revert to the last successful deploy, then to revert the chart version change so that the chart version stayed the same, which was then succesful.

this (ugly) fix works for me:

  1. I'm getting error :
helm upgrade az-test-2-prom ./prometheus --namespace monitor --set cluster_name="az-test-2" -f values.yaml
Error: UPGRADE FAILED: no ConfigMap with the name "az-test-2-prom-prometheus-grafana-config" found

1. find last DEPLOYED revisions

export TEMPLATE='{{range .items}}{{.metadata.name}}{{"\t"}}{{.metadata.labels.STATUS}}{{"\n"}}{{end}}'
kubectl -nkube-system get cm -l 'OWNER=TILLER' -ogo-template="$TEMPLATE"
az-test-2-prom.v1   SUPERSEDED
az-test-2-prom.v10  SUPERSEDED
az-test-2-prom.v11  SUPERSEDED
az-test-2-prom.v12  SUPERSEDED
az-test-2-prom.v13  SUPERSEDED
az-test-2-prom.v14  SUPERSEDED
az-test-2-prom.v15  SUPERSEDED
az-test-2-prom.v16  SUPERSEDED
az-test-2-prom.v17  DEPLOYED
az-test-2-prom.v18  FAILED
az-test-2-prom.v19  FAILED
az-test-2-prom.v2   SUPERSEDED
az-test-2-prom.v20  FAILED
az-test-2-prom.v21  FAILED
az-test-2-prom.v22  FAILED
az-test-2-prom.v23  FAILED
az-test-2-prom.v24  FAILED
az-test-2-prom.v25  FAILED
az-test-2-prom.v26  FAILED
az-test-2-prom.v27  FAILED
az-test-2-prom.v28  FAILED
az-test-2-prom.v29  FAILED
az-test-2-prom.v3   SUPERSEDED
az-test-2-prom.v30  FAILED
az-test-2-prom.v4   SUPERSEDED
az-test-2-prom.v5   FAILED
az-test-2-prom.v6   SUPERSEDED
az-test-2-prom.v7   SUPERSEDED
az-test-2-prom.v8   SUPERSEDED
az-test-2-prom.v9   FAILED



md5-6d9e4edff5e9111525fecb734bfec15a



for ii in {17..30}
> do
>   kubectl -nkube-system delete cm az-test-2-prom.v${ii}
> done



md5-cf6357e67a21fb9f80abb7b78b9e5b8e



kubectl -nkube-system patch cm az-test-2-prom.v16 -p '{"metadata": {"labels": {"STATUS": "DEPLOYED"}}}'

** 4. (Important) Find all resources existing new resources were added since last deployed (v16) and delete them, for example
kubectl -nmonitor delete cm az-test-2-prom-prometheus-grafana-config
kubectl -nmonitor delect svc ...

Run helm upgrade ... and see Happy Helming

As @kosta709 said, reverting to the last deployed release, fixing (manually) the chart or the current status (whatever is wrong) and doing a new upgrade, usually works.

Helm is a great piece of software that can be discarded in some automatic workflows (CI/CD) if the outcome of a command is not stable.

Is there an option that the known solutions get eventually implemented in helm to (try to) solve this well known (and a bit annoying) problem? Thanks.

So, recently I'm hitting this often as well, enough to get me working on this issue on my own. For starters, I've created a workaround (
https://github.com/Nopik/helm/commit/afe6451cc2c6295e71ea2213ccce654ec3f5b686) which basically causes Tiller to get existing resource as a starting point instead of resource taken from old manifest. Works like a charm for me, though I do believe the core devs will not want to merge it, as it contains hard-coded behaviour.

There might be two bugs hidden under same behaviour, as at least once when this bug bite me I had to delete lots (>40) resources, including some which were present for >20 successful release versions already. But in 99% of the cases, just deleting freshly created (and yet unknown to helm) resources will do.

So, I've been thinking about how to solve it the proper way. I'm describing it below. Core devs please correct me here and weigh in if you agree with me. If yes, I'm willing to lead this effort and provide PR to fix it.

Generally helm seems to operate in 'patch' mode: if user modifies resource somehow, and new release version changes some other parameters, helm does calculate patch between 2 revisions and apply it - I do believe that it is trying to keep user changes intact.

That sadly leaves us with 3-way merge problem, as we have resource version taken from old manifest, another version from new manifest, and another version from currently living resource. And helm is apparently bad at solving conflicts, when they arise.

I think that the proper way would be either to chose better defaults (basically merging my branch will do long way), or provide flag for user. E.g. like this:

  • --ignore-old-manifest=never (default, current behavior)
  • --ignore-old-manifest=create-only (applies to this case, when old manifest has no notion of the resource, but resource already exists, we can take this as new base and just patch it, if necessary) - I would recommend this to be new default. This would also allow helm to start taking ownership of manually created resources.
  • --ignore-old-manifest=always - just for sake of completeness, probably not strictly necessary. It would always create patch between current resource and newest manifest, basically removing all user modifications.

Of course you can rename the flag to use reversed logic: --use-current-resources=never(currently default)/create-only/always or something like that.

Later on, this flag could be taken from resource annotations, something like:

annotations:
  helm.io/ignore-old-manifest: always

which helm could recognize and apply this strategy per-resource. I'm not sure if helm devs want to go there, though ;)

So, what do you think about this proposal?

See also issue #3805 where Helm devs are considering 3 way merge patch.

Same issue here.
Trying to setup a CD/CI environment with google cloud build.
Error: UPGRADE FAILED: no Deployment with the name "baobab-sidekiq" found

The funny thing is that the deployment exists:

kc get deployments
NAME             DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
baobab           3         3         3            2           2d
baobab-ermis     1         1         1            1           23h
baobab-sidekiq   1         1         1            1           23h

This is the first chart I create and I was expecting helm to be the solution to handle the complexity of deploying complex applications in a CI/CD environment.

Is the intent of helm to be able to work in a CI/CD pipeline?

Thanks

Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}

I am also running into this, trying to upgrade helm 0.8.0 to helm 1.0.0.

helm version --tls Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"} Server: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}

I've been running into this as well when upgrading requirements of a chart. I see that Istio is running into this bug as well, their install doc uses helm template. Is that a workaround for this issue?

Check the recent discussions in https://github.com/helm/helm/issues/3805

having this as well, still happen for latest helm 2.10

It's time for this issue to be seriously considered, it's been 2 years now and a looooot of people report the same exact problems which makes helm quite unusable in production, and is a real pain when your deployment depends on helm.

With a lot of GitHub stars come great responsibilities

@brendan-rius You're welcome to contribute code to fix this issue, or think of ideas. See #3805 and #4146 for some pointers.

@brendan-rius, #3805 in particular has the most recent up-to-date discussions surrounding this bug. I highly suggest giving that thread a read to get an idea on what we're up against.

Reposting my comment here since it's more related to this issue than 3-way merge strategies:

If a three way merge is not viable for new deployments in Helm 2.y.z, when will #1193 be fixed? The bug has been open for nearly two years with no clear resolution planned for Helm 2.0.

At this point, we're stumped on how to proceed. We've discussed the bug for weeks and none of the proposed solutions will work in all cases, either by introducing new bugs or significantly changing tiller's upgrade behaviour.

For example, @michelleN and I brainstormed earlier this week and thought of two possible solutions, neither of which are particularly fantastic:

  1. When an upgrade fails, we automatically roll back and delete resources that were created during this release.

This is very risky as the cluster may be in an unknown state after a failed upgrade, so Helm may be unable to proceed in a clean fashion, potentially causing application downtime.

  1. During an upgrade, if we are creating a new resource and we see that it already exists, we instead apply those changes to the existing resource, or delete/re-create it.

This is extremely risky as Helm may delete objects that were installed via other packages or through kubectl create, neither of which users may want.

The safest option so far has been to ask users to manually intervene in the case of this conflict, which I'll demonstrate below.

If anyone has suggestions/feedback/alternative proposals, we'd love to hear your thoughts.

@bacongobbler , if no support for the 3-way merge feature is planned, we need an alternative or a workaround. Otherwise, #1193 is a seriously painful bocker.

To re-iterate the issue as well as the workaround:

When an upgrade that installs new resources fails, the release goes into a FAILED state and stops the upgrade process. The next time you call helm upgrade, Helm does a diff against the last DEPLOYED release. In the last DEPLOYED release, this object did not exist, so it tries to create the new resource, but fails because it already exists. The error message is completely misleading as @arturictus points out.

This can easily be worked around by manually intervening and deleting the resource that the error reports as "not found". Following the example I demonstrated in https://github.com/helm/helm/pull/4223#issuecomment-397413568:

><> helm fetch --untar https://github.com/helm/helm/files/2103643/foo-0.1.0.tar.gz
><> helm install ./foo/
...
><> vim foo/templates/service.yaml                    # change the service name from "foo" to "foo-bar"
><> kubectl create -f foo/templates/service.yaml      # create the service
service "foo-bar" created
><> helm upgrade $(helm last) ./foo/
Error: UPGRADE FAILED: no Service with the name "foo-bar" found
><> kubectl delete svc foo-bar
service "foo-bar" deleted
><> helm upgrade $(helm last) ./foo/
Release "riotous-echidna" has been upgraded. Happy Helming!
...
><> kubectl get svc
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
foo-bar      ClusterIP   10.104.143.52   <none>        80/TCP    3s
kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP   1h

In other words, deleting resources created during the FAILED release works around the issue.

@bacongobbler first off, i want to thank you for taking a look at this issue in more detail over the last few weeks. I'm not quite sure exactly what the issue is in Istio 0.8.0 to Istio 1.0.0 upgrade that causes the problem or if it completely matches your issue statement. My speculation is that about 3 days prior to release, some objects which were previously unmanaged (i.e. added in a post-install job) were migrated to not be installed in a post-install job.

In speaking with the Istio operator community who has alot of prior experience with Helm, a few operators have told us that unmanaged resources in Helm are bad news, and often lead to upgrade failures. If the Istio charts implementation has a flaw that make them incompatible with upgrade with Helm 2.y.z, fixing those incompatibilities would be great - so we don't have upgrade failures in the future.

We are willing to take the 1 time hit from 0.8.0 to 1.0.0 upgrade. If upgrade is constantly flakey - thats a different problem.

I executed a bisect on Istio - as upgrade was working until July 27th (3 days prior to the Istio 1.0.0 release) and found this commit to be problematic: https://github.com/istio/istio/commit/301612af08128b15069d27ff6d03cdb87420e15b

This PR essentially removed object registration from post install jobs. I believe, but am not certain, we have removed all instances of post-install jobs in the 3 day runup to Istio 1.0.0.

Can you offer advice on Istio's specific Helm chart as it relates to upgrade? Will keeping object registration out of the post install jobs solve our upgrade problems permanently?

I can't really offer up any strong advice nor proposal for fixing this problem helm-wide as people with more recent experience with Helm have been unable to find a generalized solution (and not for lack of taking a look at the problem). I don't think I could do better.

Cheers
-steve

Updated the title to better reflect the error.

We're also affected by the issue. We use the latest helm 2.10 with GKE 10.6.
When can I expect that will be fixed?
Do we have some reasonable workaround for the issue? Removing the whole deployment with --purge option is so poor.

Please feel free to weigh in on my last comment. We really need feedback on how to best proceed here.

A workaround has been repeated multiple times in this thread. Please read through https://github.com/helm/helm/issues/1193#issuecomment-419555433.

I like the idea of a helm auto-rollback feature (option 1) to solve this issue. We know that the last DEPLOYED Helm release was working and the cluster was in a good state so it should be safe to revert to it. If it's risky for some use cases, it could be opt-in via a flag to helm upgrade.

This is very risky as the cluster may be in an unknown state after a failed upgrade, so Helm may be unable to proceed in a clean fashion, potentially causing application downtime.

I think a lot of helm users use helm in an automated fashion via a CD or CM tool and it is more risky to leave the helm release in a FAILED state. Those incomplete resources in a failed release could affect other resources unexpectedly and could themselves cause downtime. For example, if the pod spec contained a missing image version that somehow made it to production then your workload would be in a non-working ImagePullBackOff state. For our company, it's even worse since we have some on-premise customers that can upgrade themselves via our UI and if it fails we have to gain access to their system to debug.

Even disregarding the fact that it could fix this issue, auto-rollback would be a useful feature and would help ensure Helm releases are more transactional in nature. It pivots Helm from best effort deployments to prioritizing stability and successful deployments.

@bacongobbler would the following approach be viable for new deployments:

  • Add a flag --three-way-merge
  • Only permit that flag to be consumed in a helm install (new deployment)
  • Once this flag is enabled, upgrade would always use a 3 way merge
  • Existing deployments would be stuck without a migration path - the standard workaround folks seem to be using at this point is helm delete --purge followed by a helm reinstall so this may not be as unpalatable as it first appears.

Would this actually solve the problem?

Some individuals are considering implementation of an Operator to work around this Helm limitation. That would be a serious shame. See https://github.com/istio/istio/issues/8841#issue-361871147

Cheers
-steve

Going back to @bacongobbler's earlier comment:

  1. During an upgrade, if we are creating a new resource and we see that it already exists, we instead apply those changes to the existing resource, or delete/re-create it.
    This is extremely risky as Helm may delete objects that were installed via other packages or through kubectl create, neither of which users may want.

I wonder if we can mitigate this risk by making the new behaviour opt-in? Within a given namespace I generally use helm exclusively, and I suspect this is the case for many. If I could give Helm install/upgrade a flag to tell it that anything in the given namespace that isn't part of an existing release is fine to delete/overwrite, would that help?

Since you also said "via other packages", I presume you don't want Helm to have to examine other releases as part of performing a release, so my suggestion wouldn't work except in the single-release-per-namespace model. To reply to that objection, I would say: if you want to manage multiple packages in a namespace and still get this behaviour, create an umbrella chart whose sole purpose is to specify the chart dependencies you want. Then use the new flag ("--exclusive"?) when deploying that umbrella chart.

Obviously this doesn't solve the problem for all use cases, but perhaps it's enough of a workaround.

@bacongobbler I could be way off here. I am facing similar issues in upgrade. Judging by how difficult it is to solve this problem, I wonder if something more fundamental needs to be reconsidered. Part of the complexity appears to be due to the fact that Helm maintains its own version of the known configuration, separate from the actual source-of-truth which is kubernetes. Would the system be more reliable if Helm only kept a copy of previously deployed helm charts for the purposes of history and rollback, but didn't use it at all during upgrade. Instead, Helm would get the truth from kubectl itself, and then always have a 2-way diff to perform?

If a helm chart says it should have resource X, and kubectl sees an existing resource X, then:

  • If the existing resource is tagged as being controlled by Helm, then Helm performs the required upgrade
  • If the existing resource is not tagged as being controlled by Helm, then Helm fails with a clean error message (or some command line flag can be used to --force the upgrade and cause Helm to take ownership of existing Resource X).

If the helm chart says it should have resource X and there isn't one according to kubectl, then Helm creates it.

If kubectl reports that it has a resource Y tagged as being controlled by this helm chart, and there is no resource Y in this helm chart, then helm deletes the resource.

Any resources not tagged as being controlled by this helm chart are always ignored by helm when performing the upgrade, except in the case mentioned above where the helm chart says it needs resource X and X exists but isn't tagged.

If for some reason the roll-out of a helm chart happens and fails, and only half the resources were rolled out, then during a rollback helm would use the stored config files from the previous successful deployment and run the exact same algorithm, or things could be left in a broken state relative to some helm command line flag. If the user attempts to upgrade again, since kubernetes is used as the source of truth and not the last-known successful deployment, it should still be a simple 2-way diff between the new helm chart and the existing state of the system.

We are seeing this problem, too. Our reproduce steps:

  1. helm install a chart that successfully installs a deployment
  2. Update the chart to include a custom resource in addition to existing deployment
  3. Change the image of the deployment podspec to mimick a deployment failure
  4. helm install new chart. This will cause a rolling update of the deployment, which we've intentionally set up to fail.
  5. The helm install should fail. But the the custom resource should be left in k8s etcd. (verify using kubectl)
  6. (At this point, helm is in a bad state)
  7. Fix the chart -- put a goot image in the deployment podspec.
  8. helm install. We expect this to work, but it doesn't. Reports the "No resource with name ___". The name is that of the custom resource.
  9. Recovery: delete the residual custom resource using kubectl. Now helm install will work.

Note that first attempt at helm install with a newly introduced custom resource in the chart must fail to get into this state.

@rbair23 we tried that earlier and that didn't work. There's the Apply Working Group which is looking to improve the state of declarative object management by fixing kubectl apply, moving the logic from the client to the server, but that is still in its infancy stages. Kubectl (and tiller) both need to retain a copy of the last applied configuration to perform a diff. You cannot diff against the live state of the cluster without performing a three-way merge patch strategy, circling back to this ticket.

As #3275 was closed as duplicate of that: we have a similar situation like in #3275

There is already a running job my-long-running-job. Und we are trying to upgrade the release:

>helm upgrade --install my-release --namespace my-environment my-chart --timeout 60
Error: UPGRADE FAILED: no Job with the name "my-long-running-job" found

The job exists:

>kubectl -n=my-environment get jobs
NAME                                DESIRED   SUCCESSFUL   AGE
my-long-running-job                 1         0            16m

Deleting that job:

>kubectl -n=my-environment delete job my-long-running-job
job.batch "my-long-running-job" deleted

Resolves that impediment:

>helm upgrade --install my-release --namespace my-environment my-chart --timeout 60
Error: UPGRADE FAILED: timed out waiting for the condition

At least the message no Job with the name "my-long-running-job" found is misleading, but my expectation was, that the job would be updated, too.

Still seeing this in v2.9.1 (currently released stable version)

I disagree that it's "very dangerous" to back out of an upgrade. I think doing so is the correct solution.

Kubernetes is declarative. Snapshot what the cluster state was before attempting to upgrade.
If there's an error partway through, then roll back to the snapshot.
If someone has script hooks that would leave the cluster in a bad state when doing this, then that's their own fault. (Maybe that could be solved with rollback hooks, too)

Of course, it would be great if an upgrade was pre-flighted and didn't file in the first place as much as possible.
Errors in dependency charts generated by values or --set arguments should be possible to check before trying to change anything, for example. Things like forgetting to bump the version number could also be pre-flighted to avoid making changes when it won't work.

Hi,

Had the same issue with:

Client: v2.10.0+g9ad53aa
Server: v2.10.0+g9ad53aa

Deleting the serviceAccount, configMap and service was the only way to make Helm upgrade the release.

Hi,

we have the same issue as @dilumr described... with version 2.11.0:

Client: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"}

Error: UPGRADE FAILED: no ConfigMap with the name "xxx" found

Ran into this on v2.9.1.

The chart upgrade I was running was jumping a few major versions on a private chart with lots of changes so I'm not sure what exactly triggered the error moving forward, but the reason the deployment originally ended up in FAILED state was that I had a --wait flag and it timed out.

We ended up with several duplicate FAILED deployments in helm list but the last working deployment was DEPLOYED. Creating new deployments threw No resource with the name x found.

Was able to fix by running a helm rollback to the last version that was in the DEPLOYED state on helm list. After that upgrade that was able to run without errors.

Like others from the looks of things, this error seems to happen most frequently (I'm not sure about always) for me when my last deployment failed, and new assets from that deployment were left installed.

I understand how it could be tricky and/or undesirable to uninstall components from a failed Helm deploy, but what is the ideal Helm behavior for this sort of situation?

First, I think Helm should be OK with namespaces and other resources already existing if it's trying to (re-)install them. Kubernetes is all about "make the configuration right, and let kube figure out how to make the world match the config."
Second I think Helm should be all-or-nothing. If a deploy fails, the cluster should be in the state it was before the deploy started.
If there are two releases that both want to create namespace X, then there's a reference counting problem. If there is a release that wants to create namespace X, but it already exists, then there's a provenance problem. However, helm can record this using annotations on the objects, and do the right thing.

I am hitting this issue as well with latest helm 2.12.0 and kubernetes 1.10.11, even rolling back to latest good release as @aguilarm suggested did not work, also deleting the resources that helm complains about does not help, and after the upgrade command fails it leaves those same resources as actually partially recreated. Very annoying for a prod env...

I have 2 clusters with very similar environment, main different between the 2 being the total number of nodes. In one case a helm delete --purge followed by fresh helm install worked but in another it did not and I am yet to figure out a way to bring that to the latest template changes.

Is there any ETA on this?

I was able to workaround this with helm rollback and specifying the most recent revision (the one that failed)

Had the same issue today,
Error: UPGRADE FAILED: no Service with the name "xxx-api" found
kubectl get svc -n stage | grep xxx-api
xxx-api ClusterIP 172.20.149.229 <none> 8300/TCP 19h
helm rollback worked.

We're experiencing this on a fairly regular basis while doing helm upgrades - this happens after successful deployments not just failed ones. We cannot helm delete --purge as these are production systems with non-trivial components that 1) cannot be unavailable and 2) would take too long to fully recover from scratch to be acceptable.

See the diagnosis and workaround I posted above. Let me know if that works for you.

@bacongobbler Thanks for the response. That is in fact the workaround I came up with as well and it'll have to suffice for now, but we're using helm upgrades dozens of times a day via our CICD so this crops up often enough to be a headache. I'm not sure the above is the whole story though as the resources named are often already existing, part of a successful deployment and not being changed in the current deployment - iirc it's nearly always the most recent set of resources in the last successful deployment though interestingly enough.

+1 to @ajcann and thanks @bacongobbler
I am in the exact same situation.
Our CICD is automated and often deployments are done by a slack bot for lower environments.
When it fails, I have to manually do helm rollback and deploy again.
The issue is not consistent at all but frequent.
For me, it happens only during the second deployment of the chart/resource so far.
Resource always exist.

we observe the same problem. It happens if you have a template, which is either:

  • in a {{if $condition -}} statement
  • or in a {{ range $index, $value := $array-}}

@jkroepke just pointed out to me that PR #5143 provides a good workaround for this. When the --atomic flag is released in the next minor version, you should be able to use it to automatically purge or rollback when there is an error.

@bacongobbler given you have been involved with most of the back and forth on this one, is there something else that can be done to fully fix this, or would the --atomic flag be sufficient?

I think @distorhead might want to take a look at that one and see if it also resolves his concerns he raised in https://github.com/helm/helm/pull/4871. Other than that, it looks like --atomic should address the concern assuming you always use the --atomic flag.

I don't believe there's been any proposed solutions to address the issue when you get into this particular state, but I could be wrong. If the mitigation strategy for this issue is

  • manually go through the cluster's live state and fix it as per the workaround
  • upgrade to Helm 2.13.0 and use helm upgrade --atomic going forward

Then I think this is safe to close.

Hopefully Helm 2.13.0 is not so far away.
This bug breaks CI if an error occurred somewhere on a release.

Atomic will not resolve the issue

Example chart: https://github.com/distorhead/ex-helm-upgrade-failure

  1. Check out master, run deploy.
git clone https://github.com/distorhead/ex-helm-upgrade-failure
cd ex-helm-upgrade-failure
helm upgrade --atomic --install --namespace myns myrelease .

Chart contains 2 deployments -- myserver1 and myserver2:

Release "myrelease" does not exist. Installing it now.
NAME:   myrelease
LAST DEPLOYED: Tue Feb  5 23:48:57 2019
NAMESPACE: myns
STATUS: DEPLOYED

RESOURCES:
==> v1beta1/Deployment
NAME       READY  UP-TO-DATE  AVAILABLE  AGE
myserver1  1/1    1           1          5s
myserver2  1/1    1           1          5s
  1. Make breaking change. Delete deployment myserver1 from chart and modify deployment myserver2 with user-error (delete image field for example):
git checkout break-atomic
git diff master
diff --git a/templates/deploy.yaml b/templates/deploy.yaml
index 198516e..64be153 100644
--- a/templates/deploy.yaml
+++ b/templates/deploy.yaml
@@ -1,21 +1,5 @@
 apiVersion: apps/v1beta1
 kind: Deployment
-metadata:
-  name: myserver1
-spec:
-  replicas: 1
-  template:
-    metadata:
-      labels:
-        service: myserver1
-    spec:
-      containers:
-      - name: main
-        command: ["/bin/bash", "-c", "while true ; do date ; sleep 1 ; done"]
-        image: ubuntu:16.04
----
-apiVersion: apps/v1beta1
-kind: Deployment
 metadata:
   name: myserver2
 spec:
@@ -28,4 +12,3 @@ spec:
       containers:
       - name: main
         command: ["/bin/bash", "-c", "while true ; do date ; sleep 1 ; done"]
-        image: ubuntu:16.04
  1. Run deploy:
git checkout break-atomic
helm upgrade --atomic --install --namespace myns myrelease .

Say hello to our friend again:

UPGRADE FAILED
ROLLING BACK
Error: Deployment.apps "myserver2" is invalid: spec.template.spec.containers[0].image: Required value
Error: no Deployment with the name "myserver1" found

@bacongobbler @thomastaylor312 @jkroepke

@distorhead what was your expected behavior for this scenario?

Slightly offtop about rollbacks, but anyway.

For those people, who want to use rollback, but also do not want rollback to occur immediately after deploy as in --atomic for some reasons. Because, for example, there is no way for the user to manually inspect bad cluster state after failure and because --wait flag does not cause helm to log any info about failures in the resources being deployed. Then there is some way: rollback on the next run, before upgrade (more info https://github.com/helm/helm/issues/3149#issuecomment-462271103)

To re-iterate the issue as well as the workaround:

When an upgrade that installs new resources fails, the release goes into a FAILED state and stops the upgrade process. The next time you call helm upgrade, Helm does a diff against the last DEPLOYED release. In the last DEPLOYED release, this object did not exist, so it tries to create the new resource, but fails because it already exists. The error message is completely misleading as @arturictus points out.

This can easily be worked around by manually intervening and deleting the resource that the error reports as "not found". Following the example I demonstrated in #4223 (comment):

><> helm fetch --untar https://github.com/helm/helm/files/2103643/foo-0.1.0.tar.gz
><> helm install ./foo/
...
><> vim foo/templates/service.yaml                    # change the service name from "foo" to "foo-bar"
><> kubectl create -f foo/templates/service.yaml      # create the service
service "foo-bar" created
><> helm upgrade $(helm last) ./foo/
Error: UPGRADE FAILED: no Service with the name "foo-bar" found
><> kubectl delete svc foo-bar
service "foo-bar" deleted
><> helm upgrade $(helm last) ./foo/
Release "riotous-echidna" has been upgraded. Happy Helming!
...
><> kubectl get svc
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
foo-bar      ClusterIP   10.104.143.52   <none>        80/TCP    3s
kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP   1h

In other words, deleting resources created during the FAILED release works around the issue.

Thanks for putting this workaround together @bacongobbler - it's essentially what we came to as a process as well. One painful issue here is during complex upgrades many new resources - at times a few dependencies levels deep - may find themselves in this state. I haven't yet found a way to fully enumerate these states in an automatic way leading to situations where one needs to repeatedly fail an upgrade to "search" for all relevant resources. For example, recently a newly added dependency itself had a dependency on a postgresql chart. In order to resolve this issue it was necessary to delete a secret, configmap, service, deployment and pvc - each found the long way 'round.

You could write a plugin similar to helm diff that would enumerate the templates created in the last release. You could even consume pkg/kube directly from Helm. client.Update has some business logic written for helm's resource tracking/deletion which could be reused by fetching the two releases from Tiller and reversing the comparison order. target.Difference(original) should give you a result of all the resources that were introduced since the previous release.

@bacongobbler What solution would you recommend to take an application that's deployed as part of release A (for example a larger release made up of several applications) and break it out of release A into its own release (or vice versa) without incurring any downtime (the workaround to delete resources would cause some downtime)... Trying to update a resource via a different release results in the error that's described by this Github issue.

it sound like this new chart its been installed and replaces the old charts even before a sucessfull deploy. Same thing with a failing upgrade --install. It sould not install if the chart is wrong.
Just rollback to previous state on error or update tiller charts on success.

This is a process I use to recover from this problem (so far it has worked every time without any incident... but be careful anyway):

  1. Run helm list and find out latest revision for affected chart

    NAME        REVISION UPDATED                  STATUS  CHART              NAMESPACE
    fetlife-web 381      Thu Mar 15 19:46:00 2018 FAILED  fetlife-web-0.1.0  default
    
  2. Go from there and find latest revision with DEPLOYED state
    kubectl -n kube-system edit cm fetlife-web.v381 kubectl -n kube-system edit cm fetlife-web.v380 kubectl -n kube-system edit cm fetlife-web.v379 kubectl -n kube-system edit cm fetlife-web.v378
  3. Once you find last DEPLOYED revision, change its state from DEPLOYED to SUPERSEDED and save the file
  4. Try to do helm upgrade again, if it's successful then you are done!
  5. If you encounter upgrade error like this:
    Error: UPGRADE FAILED: "fetlife-web" has no deployed releases
    then edit the status for very last revision from FAILED to DEPLOYED
    kubectl -n kube-system edit cm fetlife-web.v381
  6. Try to do helm upgrade again, if it fails again just flip the table...

@bacongobbler @michelleN
Is there anything what makes it hard to improve error message for this issue?

I believe error message should state that "there is a conflict because resource wasn't created by helm and manual intervention is required" and not "not found". Only this small change to the error will improve user experience by a good margin.

@selslack I would very much in favor of improving the error message 👍

@michelleN I've prepared a PR to change the error text: #5460.

I'm experiencing this issue and I'm in a situation where I'm not sure how to resolve it.

I tried all the steps listed by @reneklacan here: https://github.com/helm/helm/issues/1193#issuecomment-470208910

Unfortunately that didn't work. The only thing that resolves the issue is to delete the resource generating the error message, then helm upgrade again, which will be successful.

However, the next helm upgrade will fail with the same error, and I have to delete the resource again and reupgrade... this isn't sustainable or good.

I have two environments I use helm to deploy to as part of our CI process: a QA and a production environment.

The QA environment had the same issue so I used helm delete --purge, and then helm upgrade - that resolved the issue permanently.

However I can't do this for the production environment - I can't just wipe it out and reupgrade, so currently I'm stuck deleting the resource before each deploy. I'm just lucky it's not an import resource.

@zacharyw what error are you facing at the moment? No resource with the name ... or "fetlife-web" has no deployed releases?

Can you share any additional info that would help with debugging this?

Maybe output of kubectl -n kube-system describe cm -l NAME=YOUR_RELEASE_NAME | grep -A1 STATUS= (replace YOUR_RELEASE_NAME)

Feel free to send me an email with more info if you don't want to spam this issue with potentially unrelated data (rene (at) klacan (dot) sk).

Please see https://github.com/helm/helm/issues/1193#issuecomment-419555433 for a possible diagnosis and workaround, @zacharyw.

@reneklacan It's the no resource with the name ... error. In our case we added an ingress, it seemingly worked, but then subsequent upgrades started failing with this error... even though the ingress already existed.

The status of my most recent release (after deleting the offending ingress and allow helm upgrade to recreate it) is DEPLOYED:

STATUS=DEPLOYED
VERSION=197

However, if I were to try to upgrade again, it would fail.

@bacongobbler Unless I'm misunderstanding I think I already am doing the workaround in that comment: I delete the resource and let it get recreated... the issue is I have to do this every time.

@reneklacan in https://github.com/helm/helm/issues/1193#issuecomment-470208910 saved my life.

It's a disappointment that Helm fails this way. Deleting things in pretty much any environment is far from ideal.

It would be great if helm updated it's own database when this kind of error appears, and then retry.

I believe that with the --cleanup-on-fail flag enabled, this error case should go away. Closing as resolved via #4871 and #5143.

If there are further issues arising without those flags, please re-open a new issue. Thanks!

The issue is closed, by I thought to add a comment about how to deal with the issue without having to delete the helm release or the running deployments.

So, I reproduced the issue with the following steps:

  1. Install chart test-chart-failure with a service template.
  2. Add a subchart with a service template that has a string (e.g. "test") in the service's port
  3. Upgrade the chart. It will fail with error Service in version "v1" cannot be handled as a Service: v1.Service.Spec: v1.ServiceSpec.Ports: []v1.ServicePort: v1.ServicePort.Port: readUint32: unexpected character: ...

I was able to upgrade after correcting the port to a number, without running helm delete, by applying the suggestion at http://centosquestions.com/helm-how-to-delete-bad-deployment:

  1. Found the failed revision with helm history test-chart-failure
  2. Deleted the config map of the specific revision with kubectl delete cm -n kube-system test-chart-failure.v2
  3. Executed helm upgrade with the corrected chart
Was this page helpful?
0 / 5 - 0 ratings

Related issues

danielcb picture danielcb  ·  3Comments

libesz picture libesz  ·  3Comments

antoniaklja picture antoniaklja  ·  3Comments

burnettk picture burnettk  ·  3Comments

KavinduZoysa picture KavinduZoysa  ·  3Comments