Restic: Backup option to remove a leading path prefix

Created on 16 Nov 2018  ·  24Comments  ·  Source: restic/restic

Output of restic version

restic 0.9.3 compiled with go1.11.1 on linux/amd64

What should restic do differently? Which functionality do you think we should add?

It would be helpful to have an option to restic backup that says "strip this leading path from the paths of all backed-up files." For example, --backup-root /some/path. This would have the following effects:

  • A file /some/path/to/file would be stored in the snapshot as /to/file.
  • This file would also have its metadata check performed against /to/file in the parent snapshot.
  • You are not permitted to specify files/directories to restic backup that do not begin with this prefix.

(I think this may be related to #1376.)

What are you trying to do?

One of our backup scripts is run on a system with many running services that a cannot stop. These services guarantee that recovery is possible from a specific point in time (e.g. they do enough journaling to get their data back in a consistent state following a power cut). However, restic backups are not atomic; therefore, restic backups break the recovery guarantee from the service.

To fix this, we:

  1. Take an LVM snapshot of /. The snapshot is an atomic block-level copy of the entire volume.
  2. Mount the LVM snapshot under /mnt/backup-snapshot.
  3. Run the restic backup against /mnt/backup-snapshot.
  4. Unmount the LVM snapshot.
  5. Delete the LVM snapshot.

This makes the backup truly point-in-time and guarantees that the restored backup is effectively in a consistent state.

Unfortunately, this also causes files to be stored in our restic repository with the (useless) prefix /mnt/backup-snapshot. This can complicate restore efforts, and it's also a bit confusing if you don't know the details of how the backup was created.

The only feasible workaround I can think of is to run the backup within a chroot. While not the end of the world, it might be nicer for restic to provide an option to remove some leading prefix from files.

backup need direction feature suggestion

Most helpful comment

Hi all, I've started working on implementing this "custom root" function. The implementation itself was seemingly simple, though I've had to learn golang having previously only known C#... Anyways, I'm trying to gauge what sort of support this issue still has, seeing as this stems from 2018, 2 years ago. I'll be committing to https://github.com/TheRealVincentVanGogh/restic/tree/2092-feature-custom-path-prefix soon, should anyone want to help me out with golang 😅. Hopefully soon later, I'll be putting up a pull request here.

All 24 comments

Here's an older incarnation of this request I found: #555

+1

I also think this would be a really useful feature.

So, let me summarize: you're running restic backup /mnt/backup-snapshot, so the file /mnt/backup-snapshot/foo is /mnt/backup-snapshot/foo in the snapshot, but you'd like it to be /foo. Is that correct?

You can achieve that with restic > 0.9.0 by changing the current directory, just run cd /mnt/backup-snapshot and then restic backup ..

Does that work for you?

Changing the cwd works, but I've noticed that there is an unpleasant side-effect if using files for include/exclude. It seems that if absolute paths are placed in there, then they will be skipped when changing the cwd. I'd much rather use absolute paths as well - for now I'll probably head down the chroot path, but I agree it would be nicer to have something similar to the -C flag in tar.

I think this fake root option would be a useful feature. I would love to to the same as cdhowie but with apfs snapshot on macOS. To access the readonly apfs snapshots they need to be mounted somewhere. But when restore I would like the "original" path to be the canonical path stored in the snapshot.

the cd trick is unfortunately not optimal as I have lots (125) of absolute paths collected from StdExclusions.plist (macOS standard backup exclusion list) and all files and folders mdfind can find with the com_apple_backup_excludeItem attribute set.

Just leaves the problem if you put /mnt into the ignore-file and start backup from /mnt/fs-snapshot it will exclude itself.

Plus cd $path && restic backup . still gives $path in the snapshot overview, while pathes in the snapshot are /-based.

I found a workaround with proot.

I also wanted to find a way to remove the path prefix. My use case is slightly different - I'm creating a zfs snapshot (fs@$(date +%s)) and wanted to back this up without having to mount it (/path/to/mount/.zfs/snapshots/${TS}) - this way hopefully I don't have to worry about the snapshot not unmounting and then hanging around forever in the case of something crashing.

The restic forget output for this makes me think that snapshots with different paths won't be forgotten as-per the schedule (daily / weekly / etc.).

The proot comment from @blurayne was a nice starting point, I think i've come to the same conclusion:

$snap_path="/path/to/where/snapshot/is/accessible"
$orig_fs="/path/to/filesystem"
proot -b "${snap_path}":"${orig_fs}" restic backup "${orig_fs}"

This works nicely, and now all the snapshots have the same path, with no cd or pushd required. Also, proot is available in user-space, so if backups aren't done as root, it's still possible.

My use case: dumping database data into a temporary directory, like /tmp/tmpzmn28r02 (obtained via mktemp or python's mkdtemp()) and then backing it up.
This method will mark all files between 2 snapshots as being different. So I need a way to tell restic to totally ignore the temporary directory prefix.
Another possible use case: today I have all my pictures into '/mnt/something/pictures' but tomorrow, same content will be under '/mnt/external/pictures-from-home' (different partitioning scheme/whatever)

Also, if you want to use restic and backup multiple directories in the same run, in order to use the same snapshot, this gets even more complicated.

Until a fix is done, I'm going to use the 'proot' proposal - thank you @blurayne and @whi-tw

Hi! I have similar case. For Example I have folders

/srv/my/long/server1/path/data (with many subfolders and dozen of files)
/path/to/dump.sql
/path/certbot.tar.gz

so I want to get backup like

/data
/dump.sql
/certbot.tar.gz

and get ability to restore on other server (I don't know about previous folder structure) by different path (relative).

I have no ideas hot to solve this trivial task. Restic is amazing tool but... why it works so difficult for end users?

I'm copying in predefined backup folder (/backup) everything I need and there run Restic backup (via cd). But this solution works only for small amounts of data.

Will be great to have ability restore with --include subfolder right after template (or including this mask). ex.:
restic restore --include data --target /my/new/path
and get as result
/my/new/path/data


Thank you @whi-tw for solution with proot -b /path/i/wanted:./path_in_repo restic backup . - it works for me.

My use case is migrating snapshots from other backup solutions to restic (Time Machine and disk images in my case).

I migrate them from where I mount the image or the subdirectory of the snapshot created by TM, which can get very long, e.g. /Volumes/TimeMachine-Backups/Backups.backupdb/MacBook Pro/2019-05-22-185113/Macintosh SSD/.

The cd solution works when using restic mount and restic restore, but the absolute path of the original snapshot is listed when I run restic snapshots.

Since it's a migrated snapshot, I'd like that to be the path from where the original snapshot has been taken too. Apart from that, with the long paths, it also makes to output of restic snapshots a bit noisy.

A flag to set an alternative prefix would be ideal for me too.

This works nicely, and now all the snapshots have the same path, with no cd or pushd required. Also, proot is available in user-space, so if backups aren't done as root, it's still possible.

This would have been a nice workaround, but proot isn't available on macOS and doesn't seem to come anytime soon (most of code written against Linux specifically): Does PRoot work on MacOSX?

Is there another workaround that comes to mind?

My use case: dumping database data into a temporary directory, like /tmp/tmpzmn28r02 (obtained via mktemp or python's mkdtemp()) and then backing it up.
This method will mark all files between 2 snapshots as being different.

Note that the files probably _are_ different anyway; database backups usually include a timestamp in the first few lines.

You can tune database dump commands to exclude dynamic comments and be sorted by primary keys though to make slow changing data really dedupable

An update: 'proot' worked only on one machine for me, on another it segfaults.
An alternative to it (newer) - bubblewrap
Added a wrapper over it (attached) which should work with the same '-b' parameters. It seems to work so far. Note that depending on your needs and directory locations, you might have to change the wrapper a bit.
I hope it helps you guys, but I'm looking forward for the support inside restic itself.

proot.sh.txt

I tried proot. It seems to break the ability to run restic as non-root with additional capabilities (https://restic.readthedocs.io/en/stable/080_examples.html#full-backup-without-root); at least I got scan: Open: open /.pulse: permission denied errors that I didn't get when running restic without proot.

Same problem with bwrap.

So to me, stripping a path prefix in restic itself still seems useful.

This missing feature makes it harder than it needs to be to backup VMs.
My VM snapshots end up in a temporary folder and are then backed up by restic.
This results in the following:

ID        Time                 Host         Tags        Paths
--------------------------------------------------------------------------------------
02c536db  2020-04-10 14:28:27  resolver-02              /tmp/tmp.vOFFxxly9O/config.xml
c5709aed  2020-04-10 14:28:29  resolver-02              /tmp/tmp.vOFFxxly9O/sdb.img
a88cc1e7  2020-04-10 14:36:22  resolver-02              /tmp/tmp.FoY1j5JPIZ/config.xml
7c44e6ee  2020-04-10 14:36:24  resolver-02              /tmp/tmp.FoY1j5JPIZ/sdb.img
65456111  2020-04-10 14:37:48  resolver-02              /tmp/tmp.vjtI9JE3Iz/config.xml
eaced756  2020-04-10 14:37:49  resolver-02              /tmp/tmp.vjtI9JE3Iz/sdb.img
8eccec2c  2020-04-10 16:04:30  resolver-02              /tmp/tmp.YtLYRd0rNI/config.xml
34c897e1  2020-04-10 16:04:31  resolver-02              /tmp/tmp.YtLYRd0rNI/sdb.img
99b67b97  2020-04-10 16:07:53  resolver-02              /tmp/tmp.aWaEDqAaTq/config.xml
cad2c9d8  2020-04-10 16:07:54  resolver-02              /tmp/tmp.aWaEDqAaTq/sdb.img
--------------------------------------------------------------------------------------

This breaks restic forget because it doesn't recognize it's the same file and keeps a snapshot for every instance. I'd prefere if there was a way to either remove a known prefix or only store a relativ path, no absolut.
I'm already calling restic with a relativ path and ching in the temporary folder. Doesn't help sadly and I'd prefere not having to use bindmounts for this.

This breaks restic forget because it doesn't recognize it's the same file and keeps a snapshot for every instance.

We run into this too but the solution is pretty straightforward: tag each backup based on the file(s) being backed up.

For example, you could use the tags config.xml and sdb.img here. Then add --group-by host,tags when running restic forget.

What makes this feature so hard to implement? Isn't it just same basic string filtering on snapshot metadata? The value it would bring is enormous. Yeah you can workaround with tagging, but there's a path field and it could be usable...

What makes this feature so hard to implement?

Speaking as a developer myself (not of restic, but other open source projects): it's often not the complexity of a feature that prevents implementing it, but rather mundane things like lack of time, motivation or simply "real life"...

Of course, my aim was not to be critical, genuinely looking to map out the complexity for potential contributors

Hi all, I've started working on implementing this "custom root" function. The implementation itself was seemingly simple, though I've had to learn golang having previously only known C#... Anyways, I'm trying to gauge what sort of support this issue still has, seeing as this stems from 2018, 2 years ago. I'll be committing to https://github.com/TheRealVincentVanGogh/restic/tree/2092-feature-custom-path-prefix soon, should anyone want to help me out with golang 😅. Hopefully soon later, I'll be putting up a pull request here.

@TheRealVincentVanGogh I'm not going to learn Go, but I'm still eager for this feature and have a ton of backups I still want to port to restic but for this issue. Open a PR once you have something that looks like it's working and post the link here, I'll lend some heavy testing

@TheRealVincentVanGogh How does your planned implementation relate to PR #2010?

@TheRealVincentVanGogh How does your planned implementation relate to PR #2010?

@MichaelEischer Oh shoot! Looks like someone beat me to it already. Yeah, PR #2010 is exactly what I'm in the midst of... re-accomplishing... Darn. Perhaps @cdhowie could link PR #2010 to this issue to help avoid future confusion? Thanks!

@themightychris Here's a link to that PR. Looks like dev dropped out in 2018 as well... curious.

Edit:

There seems to be some ambiguity between removing a path prefix from the snapshot file VS. removing a path prefix from every file structure + snapshot file. Looks like PR #2010 only addresses the former. Since OP was looking for "strip this leading path from the paths of all backed-up files" (AKA, file-structure level path fixing) I have to take back what I said about linking PR #2010 to this issue. Sorry for the mention cdhowie!

Nevertheless! @MichaelEischer My intentions have always been to get a file-structure + snapshot level path prefix slicing implementation in Restic (man that's a long feature/sentence). So I'll most likely begin working on that off of PR #2010 's existing code, which should speed up implementation.

P.S. I'm pretty busy nowadays so work might be slow for a while; of course, I'll post a PR when I think I have something worth sharing with all you folks! Stay Safe Everyone! 😄

Was this page helpful?
0 / 5 - 0 ratings

Related issues

reallinfo picture reallinfo  ·  4Comments

stevesbrain picture stevesbrain  ·  3Comments

viric picture viric  ·  5Comments

TheLastProject picture TheLastProject  ·  3Comments

RafaelAybar picture RafaelAybar  ·  3Comments