Helm: `helm upgrade --install` doesn't perform an install/upgrade if the first ever install fails

Created on 17 Jan 2018  ·  33Comments  ·  Source: helm/helm

Using helm upgrade --install is a nice way to install or upgrade depending on if the release exists. But it looks like there's a bug in the logic; its not handling failed installs. In my case the first install failed; then a subsequent attempt wasn't even made as it crashes out immediately.

Maybe if the last release failed then helm upgrade --install should delete it and install again?

$ helm list
NAME            REVISION    UPDATED                     STATUS      CHART                           NAMESPACE
foo             2           Wed Jan 17 11:48:08 2018    FAILED  something-0.0.1                         default

$ helm upgrade "foo" . --install 
Error: UPGRADE FAILED: "foo" has no deployed releases
questiosupport

Most helpful comment

The suggested fix seems completely untenable in an automated system. I definitely don't want everything invoking helm to have to know about "if first release fails, delete and retry". For one, most of my tooling isn't aware if it's an install or upgrade, or if it's the first time or 100th time, it's almost always just running helm upgrade --install.

All 33 comments

This was intentional by design of https://github.com/kubernetes/helm/pull/3097. Basically, diffing against a failed deployment caused undesirable behaviour, most notably this long list of bugs:

If your initial release ends up in a failed state, we recommend purging the release via helm delete --purge foo and trying again. After a successful initial release, any subsequent failed releases will be ignored, and helm will do a diff against the last known successful release.

Now that being said, it might be valuable to not perform a diff when no successful releases have been deployed. The experience would be the same as if the user ran helm install for the very first time in the sense that there would be no "current" release to diff against. I'd be a little concerned about certain edge cases though. @adamreese do you have any opinions on this one?

The suggested fix seems completely untenable in an automated system. I definitely don't want everything invoking helm to have to know about "if first release fails, delete and retry". For one, most of my tooling isn't aware if it's an install or upgrade, or if it's the first time or 100th time, it's almost always just running helm upgrade --install.

I'd also like to call out that I commented on the original PR https://github.com/kubernetes/helm/pull/3097#discussion_r151808599 asking specifically about this case.

The old behavior was better for this case.
I agree with @chancez. This makes upgrade --install non-idempotent for a common occurrence.

@bacongobbler
If we're worried about releases failing and leaving shrapnel due to failed hooks, I'd say that's a design issue the chart. ( Hooks work better when they are idempotent )
Users are free to build error handling and non-idempotent behavior around helm.

What other edge-cases are we concerned about?
Seems the #3097 takes care of a lot 👍

My local development would go much smoother if I could make helm upgrade -i be idempotent even against Failed releases for at least some combination of arguments. My use case is when I have a script of many releases that I know I want to get up to start a local development env.

This might be analogous to the --replace flag for helm install. Note that --replace is one of only two flags from helm install that is missing in helm upgrade, the other being --name-template .

To be absolutely clear, yes this would be a good thing to fix. Anyone wanna take a crack at it while we've got our hands full with other work?

Hi,
I've created a PR https://github.com/kubernetes/helm/pull/3437 that should fix this issue

I am not sure why we need the install and upgrade commands, I only ever use the upgrade --install command and is seems like a lot of people do the same. I just need one command that does upgrade --install and doesn't trip over a failed run. Can we just rename upgrade --install to deploy, make it truly idempotent, and ditch the other two?

(I'm struggling with a new variant this problem behavior in 2.8.0. Since upgrading from 2.7.2 now if I have a failed install, and then delete --purge it, and the upgrade --install it, I can still get the Error: UPGRADE FAILED: "xyz" has no deployed releases error. Seems like --purge isn't full effective in 2.8.0 and tiller has some stuck state not showing in list --all. I have to then to a install to get tiller back to a state where I can do the usual upgrade --install again.)

I agree with @whereisaaron, I would be nice with a deploy command that worked more like kubectl apply. Makes automation of Helm much easier too, since you don't have to check for releases existing in some shell script madness :)

Perhaps the solution is to have helm automatically run helm delete --purge?
Something like:
1) User executes helm upgrade --install
2) First release fails
3) User makes some changes to chart and executes again helm upgrade --install
4) Helm tries to run the command
5) It fails and there is precisely one prior release in failed state
6) Helm silently executes helm delete --purge
6) After purge, Helm auto-retries helm upgrade --install and shows output from that

Perhaps this behavior could be triggered via the --force flag which already has similar behavior for other scenarios

Good idea, but I don't think we should ever delete the release ledger without the user explicitly asking to remove that data. Operators of Helm will want to learn why the service failed to upgrade from previously failed releases, or deduce failures by collecting that data from the ledger.

I provided a comment earlier in the thread that describes a solution to the issue. It's similar to your solution in execution, but without the need to delete the entire release ledger. I believe #3437 is attempting to apply that solution as a patch.

@rchernobelskiy happens to me as well. Exactly as you describe.

I run into this issue maybe once per day when deploying new apps.
It's a pain!

@gmanolache We're still on helm 2.7.0 for this reason.
It's unclear to me whether upgrading to use the --force flag is safe: comment

If you need to downgrade, here's a good way to do it: downgrade to 2.7.0

What is this useful sounding 'helm ledger' diagnostic info and how do we get to it? :smile:

I'm worried the below might be read as moody, it is genuinely just an invitation to for pointers on how we can get diagnostic info when you have a failed deploy. Because it really sounds like I'm missing something. It sounds like the failed state is supposed to have some utility for operators? I trawled through the helm manual site again; will something like 'helm get manifest' work in a failed state to extract useful diagnostic info?

My user experience when I get a failed deployment is you get no useful info. Helm disowns all the partially created/remaining resources such that 'helm status' doesn't show anything. All you can do is 'rollback' or 'delete --purge' (you can't just 'delete' or your CI 'upgrade --install' will keep failing). The failed state only seems to serve to break the idempotency of 'upgrade --install' that we all crave for our CI deployments.

Would it be reasonable to have an '--auto-rollback' option for CI situations, e.g. 'upgrade --install --auto-rollback'. I'd usually rather a roll back that have to get out of bed to deal with a failed state 😆 😴 💤

What is this useful sounding 'helm ledger' diagnostic info and how do we get to it? 😄

helm help history

Thanks @bacongobbler. Ok, I understand that list is what is meant by the ledger. And if you still have the ledger, that you can use helm get manifest --revision 123 to see what was deployed that failed? That is certainly useful to preserve. And if we rollback we don't lose that information.

History prints historical revisions for a given release.

A default maximum of 256 revisions will be returned. Setting '--max'
configures the maximum length of the revision list returned.

The historical release set is printed as a formatted table, e.g:

    $ helm history angry-bird --max=4
    REVISION   UPDATED                      STATUS           CHART        DESCRIPTION
    1           Mon Oct 3 10:15:13 2016     SUPERSEDED      alpine-0.1.0  Initial install
    2           Mon Oct 3 10:15:13 2016     SUPERSEDED      alpine-0.1.0  Upgraded successfully
    3           Mon Oct 3 10:15:13 2016     SUPERSEDED      alpine-0.1.0  Rolled back to 2
    4           Mon Oct 3 10:15:13 2016     DEPLOYED        alpine-0.1.0  Upgraded successfully

If we had helm upgrade --install --auto-rollback then both the failed deployment the rollback would be recorded in the ledger and available to operators. And that would go a long way to preventing CI deployments getting to the intractable 'failed' state where 'helm upgrade --install' stops working. Failed CI deployments are usually developers injecting typos/mistakes into the deployment system. With '--auto-rollback' They can inspect the helm command error message retained in the deployment server log, fix and deployed corrected values.

I guess even without the '--auto-rollback' option we could use a wrapper automate to run helm rollback any time helm update --install returns an 'FAILED' error. And maybe detect where is it the initial install, and helm delete --purge instead in those cases.

That is, we could fashion a wrapper script to ensure the results of a CI 'helm upgrade --install' is always a state where the next CI 'helm upgrade --install' will always be possible. Whilst retaining the ledger information for any failed attempts (at least for releases whose initial install worked).

helm deploy =

  • helm upgrade --install
  • if FAIL then

    • if revision=1

    • then helm delete --purge

    • else helm rollback

@whereisaaron that would be elegant 👍

Is there an easy way to get the latest working release other than something like helm history ${name} | tail -2 | head -1 | awk '{print $1}', to be used by helm rollback?

Hello there,

I'm using Helm 2.12.2 and still have the issue, that helm fails, when the first deployment is failed. Is this a regression maybe?

I'm not sure it's a regression, but that it was never actually "fixed".

@RickS-C137 I think this is supposed to be fixed by using helm upgrade --install --force which will 'delete' then 'install --replace' a failed release.

Still trying to fix this issue in a Jenkins Pipeline I am trying to use.
I am trying to deploy a new image of my application and I couldn't care less if the deployment already exists or not.
I want to run one command that either replaces the current deployment or just installs it if it does not exist.
I tried helm install --replace I often get Error: a released named xyz is in use, cannot re-use a name that is still in use Which obviously kills my pipeline and the build fails.

@bacongobbler What do you think about https://github.com/helm/helm/issues/3353#issuecomment-385222233?

I do not see how there would be downtime or data loss if we destroy and recreate the initial release if the initial release fails.

I implemented this in our build:

if helm history --max 1 "$name" 2>/dev/null | grep FAILED | cut -f1 | grep -q 1; then
    helm delete --purge "$name"
fi

helm upgrade --install --wait "$release" chart/

With helm currently, you don't know which helm command+options combination to use without inspecting the current state. And for a given helm command you don't know what you are going to get, because it depends on what the current state is. That's not really the declarative desired state dream ☁️ 💤 😄

In helm 3 we can potentially deprecate install / upgrade / --replace / --upgrade / --force and replace them all with an idempotent helm deploy that either achieves the desired state, or leaves the state unchanged. Maybe using an algorithm similar to above, which if helm deploy fails, rolls back (revision > 1) or deletes+purges (revision = 1), to leave the state as it was before. The failed manifest would still be available via helm history/get. And there could even be a '--no-rollback' option for people who want to preserve the deployment in a failed state for investigation

The option of helm upgrade --install --force is getting close, except that rather than rolling back and upgrading, it deletes and replaces failed releases (even for revisions >1), which makes some people angry over on #3208... 😮 ⚡️ 💥

For right now we can use wrapper scripts or meta-tools like helmsman whose feature list is in part to employ helm but mitigate this issue:

  • Idempotency: As long your desired state file does not change, you can execute Helmsman several times and get the same result. _[...regardless of the current state]_
  • Continue from failures: In the case of partial deployment due to a specific chart deployment failure, fix your helm chart and execute Helmsman again without needing to rollback the partial successes first.

replace them all with an idempotent helm deploy that either achieves the desired state, or leaves the state unchanged

In retrospect, this is a breathtakingly obvious design goal.

Hi,
In our case the initial release did not really fail... It's just either that our application was not completely up when the install timeout elapsed or some other strange issue that was fixed. In any case, the application is running perfectly fine, and thus having to delete it would be a problem for us (we have some persistent storage attached that would be also removed!!) .

Is there any workaround to deploy a chart when the initial release 'apparently failed' but it's actually ok?

So is the conclusion that upgrade --force is too forceful, ie there are times when a delete+replace+retry_upgrade is not correct remedy to failed upgrade?

Is there a separate issue tracking the idea of merging install & upgrade into a deploy command?

Not that I know of @dcow. What is the use case over helm upgrade --install command?

https://github.com/helm/helm/issues/3353#issuecomment-362497951

I am not sure why we need the install and upgrade commands, I only ever use the upgrade --install command and is seems like a lot of people do the same. I just need one command that does upgrade --install and doesn't trip over a failed run. Can we just rename upgrade --install to deploy, make it truly idempotent, and ditch the other two?
...

and

https://github.com/helm/helm/issues/3353#issuecomment-469109854

With helm currently, you don't know which helm command+options combination to use without inspecting the current state. And for a given helm command you don't know what you are going to get, because it depends on what the current state is. That's not really the declarative desired state dream cloud zzz smile

In helm 3 we can potentially deprecate install / upgrade / --replace / --upgrade / --force and replace them all with an idempotent helm deploy that either achieves the desired state, or leaves the state unchanged.
...

I generally agree that helm should work like kubectl apply and attempt to achieve the desired reality rather than needing to run different types of commands depending on the state of your cluster. Was hoping to add support to a dedicated issue if one existed or at least figure out what the resolution was since deploy is not currently implemented and we're on helm 3.2.

@dcow Ok, do you want to create an issue then with your proposal?

Was this page helpful?
0 / 5 - 0 ratings