Restic: Delete Files from Existing Snapshot

Created on 15 Nov 2014  ·  57Comments  ·  Source: restic/restic

In cases of accidential backup of e.g. too large files, I would like to be able to delete specific files or directories (incl. recursion) from existing snapshots

user interface feature suggestion

Most helpful comment

So, thank you all for your feedback, we have a clear way forward: implement a command which allows removing files from an existing snapshot. The name of the command will be decided when we get there. We can revisit the other uses cases when the need arises.

I don't think we need more discussion here, thanks for participating!

All 57 comments

That'd be really nice.

It would also allow removing sensitive data that got included unwittingly.

This would be a great feature!

Any feedback from the devs on this idea? It would be very nice. For example, I just discovered that a program I build from git checkouts has been creating enormous binaries (almost 100 MB), and these have been getting backed up in my Restic backups unnecessarily. I haven't been using Restic for very long, as I'm still in a testing phase, so it's not a problem to delete the old snapshots in question. But this issue can happen quite easily, and it would be good to have long-term solutions for it, other than forgetting every snapshot.

I suppose it would be possible to write a script to restore every snapshot, delete undesired files, and re-backup the snapshot by setting the date manually, but obviously that would take a very long time. It would be great if Restic could do this natively.

Thanks.

I think there are multiple valid use cases for this. Seems like a really good feature to have. I would probably use it myself at some point.

It probably doesn't really change the implementation effort, but from an UX viewpoint, this might be done with a rather low profile by extending the backup command instead of adding an entirely new command:

restic backup [flags] FILE/DIR/SNAPSHOT [FILE/DIR/SNAPSHOT] ...

So instead of offering a command that modifies snapshots, this would allow making a new backup based on an existing snapshot ID. Deleting a file would be achieved with exclude rules.
All the documentation on restic backup could basically be "reused" (that is, almost nothing would need to be added for this new feature).

@dnnr See https://github.com/restic/restic/issues/1550#issuecomment-358536554

However, I don't follow you here. Removing data from old snapshots is definitely a distinct operation and should have its own command. Something like:

restic purge --snapshots abcd1234 deadbeef --paths /path/to/file1 /path/to/file2

And --snapshots should probably accept an all keyword to operate on all snapshots (or all snapshots with the specified --tag). And the command should probably require confirmation by typing yes.

It would also be good for it to have a --patterns option, which would delete paths matching the given patterns.

purge is one possibility for the command's name. erase might also be a good choice, as well as delete. Whatever is chosen, it should make it clear that the operation permanently deletes data. This is backup software we're talking about, and any dangerous operations should be distinct, explicit, and require confirmation.

Well, I left out the step where you'd delete the source snapshot afterwards (using forget, then maybe prune) , because I thought that was obvious.

In my opinion, doing it like this would keep the command set more orthogonal compared to adding a new command that overlaps with the functionality of existing commands. Right now, there is backup, forget and prune and they all do completely separate things. Adding a purge like you describe it, changes that. My suggestion doesn't.

since we are proposing one file operations it would be nice being able to rename.

I agree with @alphapapa that there should be a distinct command for this type of operation. It might be purge, that's not a bad name, then again there might be other similar operations in the future, e.g. @alvarolm already suggested being able to rename files.

For that reason I think perhaps adding a rewrite command is the best alternative in this case, and make that command have e.g. --purge and --rename options, assuming the latter is relevant to implement. So the final commands would be e.g. restic -r foo rewrite --purge snap1,snap2 path1 path2 ... and restic -r foo rewrite --rename snap1,snap2 pathFrom pathTo.

That said I'm not entirely sure renaming is something that's reasonable to implement - it goes quite a long way from what a backup program is about. But sure, why not.

I don't think it's wise to have the purge stuff be part of the backup command. In one perspective, you could argue that it's fine - you are doing an operation on your backup. But with that rationale the prune and unlock and forget actions should also be part of the backup command, as they too are about maintaining stuff in your backup. I don't think that makes sense, so I think it should indeed be a separate operation/command, e.g. rewrite or purge.

@dnnr

Well, I left out the step where you'd delete the source snapshot afterwards (using forget, then maybe prune) , because I thought that was obvious.

It's definitely not obvious. It's also better if Restic handles that for the user, rather than the user having to keep track of which snapshot IDs have changed and need to be forgotten--which would be quite a burden if the user were rewriting all snapshots in the repo.

In my opinion, doing it like this would keep the command set more orthogonal compared to adding a new command that overlaps with the functionality of existing commands.

I don't understand what you mean. The opposite is the case. This proposed purge/delete/rewrite command does not overlap with backup at all--it deletes data from existing snapshots. It is orthogonal to existing commands.

Right now, there is backup, forget and prune and they all do completely separate things. Adding a purge like you describe it, changes that. My suggestion doesn't.

Again, no idea what you're thinking here. purge is completely separate from backup, forget, and prune:

  • backup: Creates a new snapshot of given paths.
  • forget: Removes existing snapshots.
  • prune: Garbage-collects unused blobs from forgotten snapshots.
  • purge/rewrite/whatever: Deletes files from existing snapshots.

You are proposing making the backup command operate in two modes, one of which backs up data, and the other of which would delete data.

@rawtaz Yes, rewrite is a good suggestion, because it literally rewrites existing snapshots. I'd suggest a UI like:

restic --repo REPO rewrite --snapshots abcd1234 deadbeef --delete /path/to/file1 "*.unwanted-file-extension-glob"

I recommend against using commas as separators, because it makes constructing command lines in scripts much more complicated.

backup: Creates a new snapshot of given paths.

Well, in a sense, modifying the contents of a snapshot is creating a new snapshot (because it's not the same snapshot as before). Think git commit --amend, which creates a new commit based an existing commit. The analogy is actually pretty fitting, since this ticket seems to move rapidly towards reinventing Git.

You are proposing making the backup command operate in two modes, one of which backs up data, and the other of which would delete data.

I didn't say that. Why would it? There is forget and prune, which are perfectly fine for removing things.

Well, in a sense, modifying the contents of a snapshot is creating a new snapshot (because it's not the same snapshot as before). Think git commit --amend, which creates a new commit based an existing commit. The analogy is actually pretty fitting, since this ticket seems to move rapidly towards reinventing Git.

You're right. But at the same time, Restic is not git, and it's not designed to require knowledge of content-based addressing to work. Regardless of how it works under the hood, I think that, to users, the command we are proposing should be considered to modify an existing snapshot, not create a new one, therefore it should be a distinct command.

I didn't say that. Why would it?

Well, you said:

from an UX viewpoint, this might be done with a rather low profile by extending the backup command instead of adding an entirely new command

Maybe you should explain in more detail.

There is forget and prune, which are perfectly fine for removing things.

Let's be specific. forget removes snapshots, and prune removes blobs. We're proposing a command to remove files within snapshots. It should be a distinct command.

I'd like to add my opinion:

I think having a way to modify snapshots in the repo is valuable, based on the feedback how many people would like to have something like this.

The command should be independent of the backup command, not only for orthogonality reasons (which is quite Go-like), but also out of practical consideration: The backup command is already complex enough so I'd like to separate the other command from it.

I don't like the name purge, because of the similarity to prune. What about change? Then we have restic backup, restic restore and restic change.

For the supported operations of the command, I've seen requests for:

  • Delete files, e.g. --delete
  • Rename files, e.g. --rename

The former is exactly what this issue (originally) is about, but are there really use cases for renaming files?

I think change sounds more like taking something out and putting something in, rather than modifying the contents of something.

Imagine the repo/backup/snapshot is a bucket. Change is more like swapping the bucket itself for something else, or taking something out of it and putting another thing in, rather than picking something in the bucket up, modifying it a bit, and putting it back.

Perhaps some native english/american person knows which is more proper :) It boils down to linguistics I think.

Hm, modify then?

modify is definitely better than change. So either rewrite or modify out of what's been proposed so far. Curious what others think :)

If this is only about deleting files, would it make sense to enhance the forget command to work with snapshots and files? Or would this be too complex?

If this new feature is about deleting and renaming (or something else) I'd vote for modify.

Thanks for your input @dimejo 👍

I think that when you're renaming and/or deleting, you are not forgetting (at least not in the former case).

IMHO "rewrite" conveys the meaning the best.

The forget command is also very complex, we won't add anything to that if we can help it ;)

If it's gonna be separate command, calling it modify would be my favorite as well (I'd also like modify-snapshot, even though it is rather long). It's also generic enough to be an appropriate place for all kinds of modifying file operations (renaming, maybe even adding). However, I still think that anything beyond removing files smells strongly of feature creep.

By the way, I feel that restic would benefit from command categories, similar to what Git has with its plumbing commands. Right now, restic -h lists all commands in lexical order, mixing low-level commands (e.g., cat, list, which will never be needed by "normal" users) with the primary high-level commands.

You might also consider update.

+1 for rewrite, it has a nice Orwellian ring to it. :-)

alter
discard
evict
expel
expunge
extrude
oust
...
nuke? 😄

I'd like to propose a new edit command. Based on all the feedback re this issue it appears to me that we might end up with multiple actions to edit one or multiple snapshots.

For the time being it could be just something like:

$ restic edit 40dc1520 remove dir/file

In the future we could implement deletion of one file from multiple snapshots (input list of snapshot ID or date range).

Other commands under the edit context might be

  • rename to rename files and folders
  • move to correct file/dir structures that may have changed

I believe it is important that we allow these actions to be executed on one or multiple snapshots (by ID or possibly a number of dates or a range).

I'm still not sure about how much restic should be able to do with backed up data. I mean, it's meant to back up data to preserve what things looked like at a certain point in time. It's not meant to be a NAS.

I especially don't see the validity in the use case of renaming and removing files. I mean, why would you change files on your local disk and then go fiddle with your backups to keep its file tree in sync with your current data. It doesn't make sense to me. Can you elaborate on that use case?

@rawtaz
My thoughts (almost) exactly.

I'd argue the validity of removing files lies in the scenario where you discover a mistake in your exclude rules after already having made backups with those rules. So removing files basically serves as the retroactive application of exclude rules. It seems that regardless of the controversy in this thread, everybody agrees on that particular use case.

Concerning operations beyond that (i.e., renaming, adding), I share your doubts. It's feature creep and not in the scope of a backup tool, IMHO.

I agree: deleting files from snapshots is important, as it's very easy to accidentally backup files that one didn't intend to. This is often necessary for both security and disk-usage reasons. Having this feature could mean the difference between being able to keep old backup data or having to "throw out the baby with the bathwater."

But renaming or moving files within a snapshot is probably not a good idea. To be frank I've never heard of backup software that can do this, and it seems like a weird feature. If a user absolutely needed this, it could be implemented outside of Restic by restoring the snapshot, rearranging the files, and backing it up again with the date set explicitly (although this might become more complicated in the future when Restic starts using absolute paths).

Granted, the remove-paths-from-snapshots feature could also be implemented this way, but since it seems much more likely to be needed, I think it's reasonable for it to be included in Restic.

So, thank you all for your feedback, we have a clear way forward: implement a command which allows removing files from an existing snapshot. The name of the command will be decided when we get there. We can revisit the other uses cases when the need arises.

I don't think we need more discussion here, thanks for participating!

Suggestion for the command name: restic purge.

I'm looking forward to this feature. Thanks

@fd0
Any update on this feature? Would love to use it :)
We're using restic in a government environment and deletion of a single file from a backup is required for them. We could fund some of the work if needed!

I'm looking forward to this too ! I propose using something like the base structure for restic find.

restic purge [flags] PATTERN

Where you could limit the purge to host (-H) snapshots (-s) or paths (--path)

Then maybe a restic prune would afterward do the actual delete

This would be soooo helpful when a unforseen file gets backed up by error (a large video in a document folder or maybe a some confidential file) Right now, I run a restic find then delete every snapshot containing the file... This is less than desirable if the file is far in the repo (in time)

Thanks !

No update, sorry. You'll get notified by subscribing to this issue when something happens.

It sounds like most people want to be able to clone a backup's metadata, but exclude offending files - without having to restore them all in a scratch location. The idea of cloning a backup would copy metadata with the ability to remove certain pointers.

Is this the use case?

  • restic backup --exclude <something> --clone <original backup id> [new feature]
  • restic forget <original backup id>
  • restic prune

rewrite and modify could be macros to the above process.

For me, that would indeed suffice @nullcake

Not too bad, @nullcake.

Though, based on my past experience, it was usually that I detect that I backup loads of worthless stuff only days or weeks later. When I have some time to investigate. What this means is that by the time I understand I need some specific --exclude, there's probably a dozen or more backups impacted.

Of course, even if any kind of cleanup is implemented based on a single backup, like you suggest, it would still be a great step forward. We, of course, know how to script. ;)

So, thumbs up. :)

While this is an interesting idea, I fear that the backup command is way too complex already, and adding another "source" for a backup will complicate it even more. Also, this function would only operate on data already in the repo (only on metadata, to be precise). A separate command (e.g. purge or so) could encapsulate the functionality nicely.

CrashPlan had an interesting behavior that when a file is excluded, it is purged from all existing snapshots. That could be something to consider.

This would be a great feature. Has it been added?

Nope.

@fd0 has there been any progress on this? I just discovered that I wasn't excluding caches like I thought and would love to remove them.

Yet another suggestion for a command name: scrub (though I'm fine with purge as well, the --rename flag would seem odd to me). My thoughts:

restic scrub [--dry-run] [--replace=<clean|prune>] [--diff] [--all | snapshot-ID...]

Where --replace=clean removes any modified snapshot, prune cleans and runs prune afterwards. --diff shows a diff for any modified snapshots. --dry-run avoids committing any changes to the repo.

Also valid are all of the --exclude flags from restic backup. I guess --host and --time also make sense (each replacing the values of the preexisting snapshot); --tag editing is already handled by restic tag.

Do any developers have guidance on how this could be implemented? It seems to me (from a cursory look) that most of the code can go in a cmd_scrub.go file; maybe just a few API additions to the internal library necessary since it seems to be mainly index operations, but maybe that's naive. Any estimated difficulty (I assume testing will be the bulk of it in any case)?

since this is a very very old issue.. is there any chance of getting this feature?

For all that monitor this for updates and hit it from Google, there's no need to wait for this issue to never go into fruition, just use duplicati for the meantime, it has first class support for removing files post fact from snapshots.

For all that monitor this for updates and hit it from Google, there's no need to wait for this issue to never go into fruition, just use duplicati for the meantime, it has first class support for removing files post fact from snapshots.

I've been using restic for about a year now and I stopped waiting for features to be implemented. I don't mean that everything should be added into restic, but there is basic things that should be there. I'm considering moving away from restic: the repository is very fragile and can get broken very easily.

Yesterday I deleted a snapshots because it included files that should not have been in the backup (I forgot to add an exclude). Since then I have errors in my repository and I haven't been able to repair it yet. I should not have to delete a whole snapshots because some files where included by mistake.

@MorgothSauron I usually just removed snapshots that contained it too, which is the only solution it seems in restic, but again, duplicati can do it via a single command for a while now, so I've changed since and had no issues.

I wish to thank everyone for their input on this matter. As we've seen, many people have wanted in particular the ability to remove files from a snapshot. I guess we all make mistakes once in a while when backing up ;)

At this point in time the available maintainer and developer time is needed on other parts of restic, so I do not foresee this issue being implemented in the foreseeable future. I'm also going to release a new rest-server as soon as I can, and will then start to look into some other issues.

That said, if someone makes a solid PR that is nicely and clearly written, well tested and bug free, and produced in coordination with maintainers, it will definitely be considered for inclusion. This specific issue is one where @fd0 has already given his blessing on the direction, so focus can be mainly on producing a solid implementation (that we know won't corrupt repos) rather than "should we add this feature", which is good.

Such a PR should be basic and act as a starting point which if needed can be built upon. An example of what I mean by that is it should for starters:

  • Just be one new command (e.g. rewrite since that's the most voted for in this issue).
  • Take a list of snapshots as its primary argument(s) (including support for all), e.g. all or 098db9d5 or 098db9d5 af92db33.
  • Take a list of one or more --exclude <pattern> to list the paths that should be excluded/removed form the snapshot (in other words, here's the --exclude that was missing when backing up), e.g. --exclude="*.o", --exclude=*.unwanted, --exclude="*.o" --exclude=*.unwanted --exclude=.DS_Store.

The rationale here is to get a minimal start as a proof of concept and minimum viable product. Once being tested we can adjust it as needed, e.g. by adding the other --exclude-* arguments from the backup command. If we make a rewrite command like this, it will have pretty much the same interface as the backup command that it's meant to "correct":

~
restic -r /some/repo rewrite all --exclude=".o" --exclude=.unwanted --exclude=.DS_Store
restic -r /some/repo rewrite 098db9d5 af92db33 --exclude=".o" --exclude=.unwanted --exclude=.DS_Store
~

On a related note, perhaps the work done by @middelink in https://github.com/restic/restic/issues/323 could be used as inspiration or a basis for the implementation, as it does some processing of existing snapshots. I'm going to see if we can get moving with this one too soon.

@rawtaz

Thanks for the thoughtful feedback!

Hi there.

I've added draft rewrite implementation close to comment by @rawtaz

It works here with test repo, passes restic check --read-data without errors, but have not tested it much. So I strongly suggest to not use it with important data.

I've tried to get syntax very close to backup command. So --exclude, --iexclude and --exclude-file are supported (but not tested). Ideally I also want to see --exclude-if-present option (ideal workflow for me is something like 'oops, not needed to backup, add CACHEDIR.TAG and restic rewrite'). But it's pretty complex because in such case we'll need to rewrite on same host where backup was made and access filesystem to collect these files (plus tons of magic with relative paths). So not right now...

Also I don't like idea to replace snapshots by default, so currently default behavior is to just create new snapshot with rewrite tag. But replacing is also possible with --inplace arg.

Any feedback would be greatly appreciated.

Hey Dmitry,

Thanks for this implementation, great work !

So far it works perfectly on Linux with a small test repo of 600 files + several test snapshots. Restore works and diff shows correctly excluded folders. I will be doing more intensive tests on a "clone" real repo with many GB of data with more 100's of snapshots. I will also try Windows sourced repos.

One proposition : have the option to specify a tag for the snapshots that contained the exclusions on a rewrite pass. (keeping the "rewrite" tag on newly created snapshots.)

restic rewrite --add-tag mytag -i thisfileshouldberemoved.txt all

This would help identify those snapshots that still contains "_thisfileshouldberemoved.txt_". On the other hand, the more direct --inplace works like expected.

Again very good work.

@NovacomExperts Yes, my initial motivation was to keep 'history editing as safe as possible. It's very easy to exclude something important with --exclude * and almost no way to recover from this (with backup it's just matter of start new backup again). Something like --dry-run but with ability to get actual snapshot and explicitly delete source snapshot after checking that it's ok.

I fully agree that currently this is not fully achieved. It's easy to 'observe' new snapshots, but too difficult to delete old one. Plus I don't like hardcoded rewrite snapshot name. Maybe it's better to have --inplace by default and and ---keep-source-tagged before-rewrite --tag-destination after-rewrite or something like this. (--add-tag is a bit unclear, whether it's old or new snapshot).

In any case I'll wait for feedback from maintainers. Don't want to spend much time if it's move in wrong direction.

PS. My primary restic repo is around ~2TB now. Will try on it later after making LVM snapshot.

@dionorgua Your initial motivation is fully correct. I'll cast my vote to keep it like that, with the "dangerous" option --inplace as far as possible from the user (definitely not by default). I would prefer a missing argument error on --keep-source-tagged / --tag-destination than --inplace by default.

But I agree, let's wait for feedback on this.

Yesterday, I forgot the cloned test repo (65 GB) inside a folder that was backed up by restic overnight. I could have forget yesterday's snapshot but went "all in" and tried your implementation. After forget + prune , I successfully removed the 65GB from a 400GB repo. All good, no error found.

I test more intensively with data that spans across multiple snapshots.

Cheers

I've replaced that wrong #2720 pull request with new one because old one was created from master branch. Just added one missing error check. Sorry for extra noise

Hm, modify then?

Very late for this, but _rectify_ is my suggestion for the delete-specific-file-from-backup command.

2731 is exciting, thanks a bunch!

Very late for this, but rectify is my suggestion for the delete-specific-file-from-backup command.

I have to say that's not a great name for it. Rectify implies there's something wrong that needs correcting/rectifying. While this may be true in one of the use cases, it's not always the case. A user may want to just remove some data from existing snapshots to free up space for all we know, while keeping the rest of the snapshot. The wording has to be more neutral than rectify, I think.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jpic picture jpic  ·  3Comments

cfbao picture cfbao  ·  3Comments

christian-vent picture christian-vent  ·  3Comments

viric picture viric  ·  5Comments

fd0 picture fd0  ·  3Comments