Packer: --no-destroy-on-error like Vagrant

Created on 10 Sep 2013  ·  86Comments  ·  Source: hashicorp/packer

It would appear that an error exit code by postinstall.sh is enough to totally wipe out the generated boxes.

It would be useful to keep them around to manually manipulate while working on them. The -debug switch can be used for this, but it's not really ideal since you basically have to know the appropriate step (stepCreateVM) to wait at.

See also: https://github.com/mitchellh/vagrant/issues/2011

+1 core debug-mode enhancement

Most helpful comment

Almost 3 years later... and still almost nothing. I've spent the last few days smashing my head on a keyboard trying to do complex windows builds which arbitrarily and randomly fail execution of powershell scripts with no output and because of the auto-cleanup I can't jump onto the instance. When I run with -debug enabled, the extra "pauses" introduced by requiring manual entry seem to cause this problem to not occur. Which, you'd think that would make sense I just add a ton of sleeps into my powershell scripts to simulate this, and that does not help.

Not even lying, I'll Paypal someone a bounty of $100 if someone can seriously make a --no-destroy-on-error feature ASAP and get the ball rolling on a PR for this. I (and it seems like hundreds of others) need this feature, especially when considering that packer is usually used with automation in mind (via CI/CD/etc). So here's my long +1 and plea.

All 86 comments

This sounds reasonable. I think the -debug flag is indeed the right approach here, but maybe the -debug flag should allow options such as:

  • Step through every step
  • Continue until an error
  • Continue until cleanup steps begin

I would find the option to continue until an error and not destroy the vm extremely useful

If someone can give me some pointers on where to start looking to implement this I may be able to put some time into adding this as an option

This would be very useful for me, too.

@timmow you may need to modify each builder's instance creation cleanup step to do nothing if a certain flag is set (for example https://github.com/mitchellh/packer/blob/master/builder/amazon/common/step_run_source_instance.go#L122)

It would be a certain amount of work to comb through all of the steps and figure out where it would be appropriate to take no action.

An idea I just had would be to give a flag that would wait for user input before processing any clean up step. That way you could perform your debugging, hit enter for example, and packer would take care of the cleanup.

Feel free to ping me here if I can offer any help.

fyi that's done in a FILO manner here https://github.com/mitchellh/multistep/blob/master/basic_runner.go#L71

you may need to extend the basic runner (debuggable_runner?)

It'd be great to add some sort of step "skipping" functionality lower down, which would basically skip cleanup steps for this --no-destroy-on-error type configuration. It would also enable some cool stuff in the -debug mode, like pressing s to skip, interactively.

Similar to debug "pausing", I think an option like -pause-on-error would be beneficial.

Hi, I see that this issue is fixed by this commit https://github.com/mitchellh/packer/commit/2ed7c3f65cc2e0a14d39d8934ef1168f8192bb08 but I don't see the change in HEAD of master branch. Where and why did it disappear?

I really need this as well.

Is there any hope to have this feature? What needs to be done to have 2ed7c3f or some variation of it merged?

Yeah, I could also use this option. I see it was committed but then disappeared.

Is there any update on this?

I would really love this too. I can't tell you how much time I've wasted trying to debug problems and have to go through a lengthy VM creation process to get to the error again and again. Being able to keep the VM around would be a huge win.

Is there an ETA when this (or similar functionality) will be merged into main? Trying to use Packer to build a VM with Visual Studio installed as part of the base Vagrant box, and I really need it to not destroy the VM before I've had a chance to look at why the steps are failing. Having to acknowledge each step via --debug is not acceptable.

Another vote for this one, as the -debug option suppresses the failure I'm trying to analyze.

Blowing so much time trying to debug the final state of the machine before it fails. The -debug switch doesn't cut it - I want it to run through normal process then leave the working folder in tact after failure so I can diagnose with logs and state. Really looking forward to some sort of preserve working state switch.

Another +1 for this feature, it would be immensely helpful.

+1 Running into similar issues where it would be nice to debug the final state, tweak some provisioning scripts, and then run the build again to see if that fixed the process, rather than manually hitting enter on ever debug step.

Another + 1 for this feature. It would be nice to know what happened to this? No one from the team answered. Go ahead step up to the plate it doesn't hurt. LOL! I am totally new to Packer. I was at the tail end of an ISO build of 1.5 hours and this happened. Testing and debugging should be paramount to bringing a totally sweet application full stream.

+1 here as well, we create our images headless, so having --debug require manual stepping through is no good to us, but being able to inspect the faulty image would be great.

:+1: I like to have this feature too

+1 This feature would be great!

Related to or maybe duplicated by #1687

+1 Just to be able to leave the VM as it is without deleting @error it would be very useful. Our install scripts are quite long, lots and lots of stepping with the current system..:(

+1 this one will be very useful

+1 to help debug provisioning when failing only with Packer

+1 I'm in the same boat. I've spent untold hours recreating windows VMs, only to have a Chef error in the provisioning step, and no way to debug the VM when it is deleted. Just please let there be an option to not delete everything during a failure.

After seeing this issue has been alive for two years, I don't have any hope this will get fixed. I'm really trying to like Packer, but I end up spending more time waiting for the build step than actually using the results.

Pleeeeeaseeeeeee +1

+1

+1

+1

Wondered what the argument was for this, found this issue via Google. Got sad when feature didn't exist.

Hi devs, I revisited this thread and safe to say while I've managed to continue to use packer effectively, this one bug seriously slows the development of our systems. We can make do, but it would sincerely be nice if a staffer could provide some guidance on this issue @mitchellh. I may even have time to contribute a solution if I can be pointed in the right direction, but I'll wait for your response or someone on your team hopefully. Thanks for the amazing tool though. I definitely want you and your team to know how awesome I think this product is.

Since I got tired of all the +1 e-mail notifications for a feature that I also wanted ;-), I started digging into the codebase and added an initial implementation. NOTE - this is not tested as of yet...and I don't even know if it'll work properly. If you try to build it from source, I ran into an interesting issue with Packer self-referencing itself from github, which will cause this code to not build properly. You'll need to temporarily link your packer source folder in the GoPath to the folder you download this repo to (or wait for me to test and submit a pull request.)

https://github.com/jcoutch/packer

The fact that this isn't the default behavior, if you'll pardon my french, is completely fucking insane. Made a typo in your install script? Well, let's just conveniently _destroy all of your work and never give you back that hour of your life. Over. And over. Again._

I imagine that _literally every single person_ who uses this tool for anything beyond the simplest examples is running into this issue _every single time they use_ it. Clearly, there is a massive demand for this feature, and yet it is still not implemented, 2 years later.

Absolutely staggering.

+1

Man, that was a snarky comment. Bad mood yesterday, but all this time wasting is starting to cost a lot of time and money.

@jcoutch - do you have a build you can share?

I have an OSX build on my machine, just haven't had a chance to test if it
works yet. Working on this in my spare time...which I haven't had much to
spare lately. Not to mention, this is my first experience with Go (quite
an interesting language.) I'll try testing it out by the end of this week,
and if everything looks good, I'll submit a pull request. I'll also try to
post OSX and Windows builds for others to test once I know it's stable.

On Wed, Sep 23, 2015, 5:14 PM Rich Jones [email protected] wrote:

Man, that was a snarky comment. Bad mood yesterday, but all this time
wasting is starting to cost a lot of time and money.

@jcouth - do you have a build you can share?


Reply to this email directly or view it on GitHub
https://github.com/mitchellh/packer/issues/409#issuecomment-142730452.

Pleeeease!! :-D

I'm trying to run it with Ansible but, it doesn't work and the KVM guest is gone after the error, so, it is not possible to go there to see what is wrong...

Cheers!

Much needed. Thanks.

Here is @jcoutch patch with proper line endings for easier review: https://github.com/orivej/packer/commit/23bbd4d8fd2d3971eb40eb9348204e3c6c086cca

This patch prevents deletion only if preprocessors fail, it does not keep artifacts when a builder (with its provisioners) fail.

EDIT: That seems to be the intention, but actually it does nothing, although it could be easily fixed to meet it.

Yeah, I hadn't had a chance to reply back to this thread. I finally tried
out my changes with a falling provisioner...it doesn't work like I
intended. Looking deeper into the code, it looks like the builder handles
the deletion of artifacts on a provisioning failure...instead of the code I
modified.

On Sat, Oct 3, 2015, 9:37 AM Orivej Desh [email protected] wrote:

Here is @jcoutch https://github.com/jcoutch patch with proper line
endings for easier review: orivej@23bbd4d
https://github.com/orivej/packer/commit/23bbd4d8fd2d3971eb40eb9348204e3c6c086cca


Reply to this email directly or view it on GitHub
https://github.com/mitchellh/packer/issues/409#issuecomment-145249481.

Here https://github.com/orivej/multistep/commit/e02bce9811c65138ea2e84c7162cd8769f35858f is a proof of concept that redefines --debug to stop just once, after the first failure. It requires https://github.com/mitchellh/multistep/pull/5 to stop before the first cleanup rather than before the second cleanup. This behaviour was proposed in #1687. (This is not a proof of concept but a solution if redefining --debug as proposed in #1687 is OK.)

+1 to preserving artifacts on a failed build in -debug mode.

I have been running packer for a while with the patch, and never had a reason to start it without -debug. I wonder if I should publish binaries for wider testing.

+1

I've just noticed that the link I posted was a patch for multistep, not for packer. The fix that makes packer pause on error when running with -debug is at https://github.com/orivej/packer/tree/debug-on-error

@orivej which patch should I start with if I want to test your no-destroy behavior patch ? https://github.com/orivej/packer/commit/a713a4698831a8dfcd48484dc4675631779b6840 ?

Yes, there is one commit, orivej@a713a46. It still can be cleanly rebased onto master.
You also need a patch for github.com/mitchellh/multistep from https://github.com/mitchellh/multistep/pull/5, or packer will pause after destroying the last step.

@orivej do you have a patched binary for OSX? Restarting the whole build process due to a small error when building a Gentoo Linux box is incredibly painful (time consuming). Having the possibility to load the box after the failure and find out what's wrong is a must to me.

I added an option to retry the failed step instead of aborting, although even if this succeeds the build overall may fail; and, if I did not err, packer does not reliably process input, and user may have to respond multiple times.

This change does not depend on patched multistep and lives in branch, commit.

I uploaded binaries here: https://orivej-packer.s3.amazonaws.com/index.html (subtree debug-on-error-2).

Having the possibility to load the box after the failure and find out what's wrong is a must to me.

My patch does not preserve the box that can be loaded, but instead it leaves the current box alive until you manually terminate the build, so that you can SSH into it and perform debugging (when calling packer with -debug option).

Thanks for the feedback, @orivej.

My patch does not preserve the box that can be loaded, but instead it leaves the current box alive until you manually terminate the build, so that you can SSH into it and perform debugging (when calling packer with -debug option).

Noticed the default packer build, with --debug, pauses before the environment gets destroyed giving you the option to debug it as you described. In order to do that I use "headless": false. How different is the process with your patch?

  • It makes packer pause only after a step fails, instead of pausing after every step.
  • It pauses before packer cleans up after the failed step. (Although I do not remember why I needed this, since the most problematic provisioning step does not do any cleanup.)
  • The second edition of the patch allows retrying the failed step. (When provisioning fails, this reruns all the provisioners.)

I just noticed that #2608 made an unfortunate decision to prioritize plugins from older version of packer to newer builtin plugins, so to use my build of packer (or future releases of packer, unless the authors reconsider this behaviour), you need to remove all binaries whose names start with packer-.

Unreliable input handling is also an artefact of #2608, I'll see if I can fix it.

Unreliable input handling is caused by extra initialization of built-in plugins, in particular by setupStdin() in main.go. Since this call seems to be unable to serve its declared purpose anyway, I was able to disable it without repercussions, and rebuilt my binaries.

Simply being able to exit packer without stopping or destroying the VM on an error, would be very useful. This is particularly important in the provisioning components which usually contain the most custom logic. Being able to SSH into a box and re-run the original script or try a modified script or recipe for testing can provide quick and valuable insights into what actually caused the error and what the fix is. Doing a whole packer build is much too time consuming to require it for even the most simply troubleshooting.

The -debug flag is useful, but it makes the process much to manual. Very often, it is useful to run an unattended build, but have it exit when it encounters an error and leave the system in a state that allows for investigation into the cause and fix.

:+1: regardless of whether -debug passes or fails there should be an option to keep the instance running so you can replay scripts/debug on the instance etc. Unless this somehow interferes with the capturing the AMI image.

+1

+1.. I'm surprised this would be around 2.5 years later as it would be so useful. This would make my life so much easier troubleshooting my Packer build.

I was able to overcome this on AWS by using termination protection on the instance before the chef-client starts. it is not a decent option but hey it works. Any other options :)

+500 - why isn't this feature in yet?

Maybe we, as developers, could try to get our hands dirty instead of complaining?

The feature request couldn't be simpler.

  • Read a new command line option (--no-destroy-on-error)
  • Add a humble if in the right place. Pseudocode:
unless no_destroy_on_error # add this conditional <<<<<<<<<
   perform_cleanup
end

I'll give it a shot. And if it works I won't share it (mostly for avoiding hypothetical requests/complaints). Effort is a good thing.

@vemv, I already essentially solved this issue with two commits at https://github.com/orivej/packer/commits/debug-on-error-2.

@orivej That is awesome! I have been planning to add a --pause-on-error which I think is the best way to go (when a step fails wait for a keystroke before cleaning up, allowing the user to login and troubleshoot.).

Could you open a PR with your code and we can discuss the details there.
CC @cbednarski

@vemv I've been following this issue for a few years. I can only speak for myself, but I don't really know Go at all, at least any more than to muddle through and figure out what code might do. I wouldn't be comfortable writing code for something as widely-used as Packer, let alone testing it properly.

@orivej and @rickard-von-essen, anything that requires user input doesn't really work for me, as I only use Packer in automated tooling (i.e. Jenkins or TravisCI); I know there are a lot of other people in my position as well. I think what I'd really want is something that (1) perhaps increases the verbosity of the output, and (2) just leaves the source machine (whether it's EC2, VMWare, whatever) running so that a human can inspect it after the job has failed.

Currently debug will pause between steps, requiring you to hit enter to continue, so as long as you know which step you're about to fail on, you can merely 'hold' the VM there for debug purposes but obviously that's not as good. You really want the template to go through every step so you can examine the full failed state.

Just adding my :+1:. I could really use this feature.

@jantman I am going to make packer -debug skip cleanup when the process fails and can not get input (e.g. with input from /dev/null). Note that packer run sequence is built around idea that every step can and will be cleaned up afterwards, so abrupt termination will leave the system in a state that packer may not be able to deal with on its own (e.g. it will complain that output directory already exists), so you should expect to have to figure out how to make your process repeatable, but this is likely easy.

@rickard-von-essen I will update my patch (add new providers) and make a pull request later today.

From @DarwinJS in https://github.com/mitchellh/packer/issues/3445#issue-148713866

I am building windows boxes on AWS and have the ebs volume "delete_on_termination" set to false so after a failed build I can [a] attach the volume, [b] boot an instance, [c] look at it's logs, [d] shutdown the instance, [e] detach the volume, [f] manually delete the volume.

I noticed the c:\windows\temp<guid>.out files contain the console output of powershell provisioners I run.

Getting this output is the only reason I have to take all these extra steps to get this information.

Would be great if Packer supported something like PACKER_CONSOLE_LOGS_COPY=$env:temp so that those logs could always be brought back (especially the last one that failed) and I could avoid the extra steps.

For those who share my goal of compiling the latest packer dev release while also integrating orivej earlier fix that pauses on first fail of packer build here are the steps I took that worked for me.

Complete "Setting up Go to work on Parker" steps 1-4 . ( see https://github.com/mitchellh/packer/blob/master/CONTRIBUTING.md )
git checkout master
git remote add fork https://github.com/orivej/packer
git fetch fork
git checkout -b debug-on-error fork/debug-on-error
git merge debug-on-error
make
run ./bin/packer build -debug template.json

I can confirm that this worked for me and provisioning only paused when there was an error.

I was not able to successfully merge https://github.com/orivej/packer/tree/debug-on-error-2.

I'm curious, I'm fairly new to packer and git and this issue; is there some other way people have been implementing orivej's fixes then how I have described? I may be missing something very obvious so please clue me in if that is the case.

Just checking on the state of this issue.

Is it that it's @orivej's changes address this issue and a pull request needs to be made? Or does this still need to be addressed?

+1

it would be really useful, right now I'm using an inline shell with sleep 1800 to keep the vm alive.
Please implement ASAP :)

Imho -debug is doing what we all need. After every command you need to press enter to proceed next one. No enter = vm alive :)

@noose - I don't sit and watch the build - there are some very long running sections (like installing SQL server) that I wouldn't want it to hold up on for user input. I would like to kick off a test build and when I come back to it, have something I can debug with minimal effort.

IMHO the -debug is totally useless. I'm running complicated builds, and I really don't have patience of pressing enter thousand times until I get to the issue.
I really don't get it why a no-brainer feature like this is so hard to get implemented.

@henris42 while I agree with you on the uselessness of -debug in this context, if it seems like such a no-brainer, why don't you give a go at a pull request?

@noose, I automate the packer build in a Jenkins job (which pull from Git the config/scripts and Ansible playbooks). Using packer in this way, an interactive mode is not useful; it's much more useful a post failure analsys.
I think this is a common scenario in the DevOps world :)

Seems like everyone needs this. Building these AMI's is error prone and this feature would make it less time consuming to troubleshoot

I agree with @worstadmin. In the case of building Vagrant boxes, you can tackle the problem from multiple angles (e.g. keep the virtual machine around, try things with the null provisioner, etc.), whereas Amazon images are a special breed and very tiresome to debug when there is an issue.

Combined with https://github.com/mitchellh/packer/issues/1687 this would be great.

Additionally, it is often helpful to ignore errors from the provisioners and let it continue, specially during the early stage of development of an image, etc.

Almost 3 years later... and still almost nothing. I've spent the last few days smashing my head on a keyboard trying to do complex windows builds which arbitrarily and randomly fail execution of powershell scripts with no output and because of the auto-cleanup I can't jump onto the instance. When I run with -debug enabled, the extra "pauses" introduced by requiring manual entry seem to cause this problem to not occur. Which, you'd think that would make sense I just add a ton of sleeps into my powershell scripts to simulate this, and that does not help.

Not even lying, I'll Paypal someone a bounty of $100 if someone can seriously make a --no-destroy-on-error feature ASAP and get the ball rolling on a PR for this. I (and it seems like hundreds of others) need this feature, especially when considering that packer is usually used with automation in mind (via CI/CD/etc). So here's my long +1 and plea.

Hey there could be a workaround for a shell provisioner, I have no idea about other provisioners though. :crying_cat_face:

I had it almost working today, yet learning into Go I didn't know that I'll land in metaprogramming hell again chasing the interface through several files to find the implementation :(

Check out my current proposal at #3885 that already looks good to me!

@tmartinx:

I'm trying to run it with Ansible but, it doesn't work and the KVM guest is gone after the error, so, it is not possible to go there to see what is wrong...

As a workaround until there's a new packer release which contains #3885:

    {
      "type": "shell",
      "inline": [
...
        "ansible-playbook ... || (echo \"*** FAILED WITH CODE $? *** Hit Ctrl-C to terminate\"; sleep 14400; exit 1)"
        ]
    }

You then have 4 hours to ssh into the still-running VM and poke around.

What the hell is going on here?

  • Packer detected a VMware 'Unknown Error'.
  • _Packer_ told me to check VMWare's log file for more information. The log is supposed to be in the output directory.
  • But _Packer itself_ deletes the output directory, so I can't check the log. Haha! Good one, Packer, you rascal you!
  • Shitloads of other people have run into a similar situation, as they obviously would.
  • People have kept requesting a seemingly very simple, no-brainer fix to this problem _for years_ now.
  • A couple of them even decided to try and fix this themselves. It seems their patches have been rejected by HashiCorp, or maybe they were just unsuccessful.
  • Either way, HashiCorp has maintained radio silence. It looks like they're just not going to fix this, ever.

Are we to conclude that the US government has gag-ordered HashiCorp and told them not to fix this, or something?

I'm having a hard time coming up with alternative explanations.

I've had the impression that HashiCorp's tools are a good choice for DevOpsy stuff overall, but now I'm having second thoughts. Seriously. Are we all missing something obvious here, or is HashiCorp just being super shady?

The reason this ticket is closed is because the problem has already been fixed.

Add flag -on-error=ask to the command line, and then if there's an error you'll be prompted whether you want to delete the build artefacts or not.

Furthermore, before answering this question, you can ssh into the VM and poke around.

@peterlindstrom234, this has already been implemented. You can use "-on-error=abort" and packer shouldn't perform any cleanup when an error occurs.

Alright, my bad. It sure took strangely long though.

@peterlindstrom234 it took long because of the US-gov't gag order

Was this page helpful?
0 / 5 - 0 ratings

Related issues

pleschev picture pleschev  ·  85Comments

AndrewCi picture AndrewCi  ·  45Comments

shibumi picture shibumi  ·  48Comments

delitescere picture delitescere  ·  48Comments

basictheprogram picture basictheprogram  ·  42Comments