Restic: Unable to backup/restore files/dirs with same name

Created on 26 Jul 2016  ·  28Comments  ·  Source: restic/restic

Output of restic version

restic 0.1.0
compiled at 2016-07-20 12:42:43 with go1.6.3

Expected behavior

After restore all directories should be restored.

Actual behavior

Only one directory is restored.

Steps to reproduce the behavior

  1. Create files

/tmp/restic/FILESNEW01/Dir01/Test01.txt
/tmp/restic/FILESNEW01/Dir01/Test02.txt
/tmp/restic/FILESNEW01/Dir01/Test03.txt

/tmp/restic/FILESNEW02/Dir01/Test01.txt
/tmp/restic/FILESNEW02/Dir01/Test02.txt
/tmp/restic/FILESNEW02/Dir01/Test03.txt

content of files:
cat /tmp/restic/FILESNEW01/Dir01/Test0*
Content file. /tmp/restic/FILESNEW01/Dir01/Test01.txt
Content file. /tmp/restic/FILESNEW01/Dir01/Test02.txt
Content file. /tmp/restic/FILESNEW01/Dir01/Test03.txt

cat /tmp/restic/FILESNEW02/Dir01/Test0*
Content file. /tmp/restic/FILESNEW02/Dir01/Test01.txt
Content file. /tmp/restic/FILESNEW02/Dir01/Test01.txt
Content file. /tmp/restic/FILESNEW02/Dir01/Test03.txt

I want backup

  • /tmp/restic/FILESNEW01/Dir01/
  • /tmp/restic/FILESNEW02/Dir01/

Commands:
Initiate repository in /tmp/restic/BACKUP directory

  • restic -r /tmp/restic/BACKUP/ init

Make backup

  • restic backup /tmp/restic/FILESNEW01/Dir01 /tmp/restic/FILESNEW02/Dir01 -r /tmp/restic/BACKUP/

scan [/tmp/restic/FILESNEW01/Dir01 /tmp/restic/FILESNEW02/Dir01]
scanned 2 directories, 6 files in 0:00
[0:00] 16.67% 0B/s 51B / 306B 0 / 8 items 0 errors ETA 0:00 duration: 0:00, 0.01MiB/s
snapshot 4d197b90 saved

Checking if backup exists in repository

  • restic -r /tmp/restic/BACKUP/ snapshots

ID Date Host Directory

4d197b90 2016-07-26 14:14:43 nebss /tmp/restic/FILESNEW01/Dir01 /tmp/restic/FILESNEW02/Dir01

Restore backup

  • restic -r /tmp/restic/BACKUP/ restore 4d197b90 -t /tmp/restic/RESTORE/

restoring <Snapshot 4d197b90 of [/tmp/restic/FILESNEW01/Dir01 /tmp/restic/FILESNEW02/Dir01] at 2016-07-26 14:14:43.208840145 +0300 EEST> to /tmp/restic/RESTORE/

Checking if directories/files exists

  • ls /tmp/restic/RESTORE/
    Dir01
  • cat /tmp/restic/RESTORE/Dir01/Test0*
    Content file. /tmp/restic/FILES01/Dir01/Test01.txt
    Content file. /tmp/restic/FILES01/Dir01/Test02.txt
    Content file. /tmp/restic/FILES01/Dir01/Test03.txt
backup restore bug

Most helpful comment

Maybe not for 0.7.1, but for 0.8.0 or so. I've already started working on it though. Maybe a bit of background: This is caused by the archiver code, which is the oldest code present in restic. Unfortunately (as I was just beginning to learn Go back in 2013/2014) the archiver code is very complex and I made a lot of beginner mistakes (too much concurrency, too many channels). I also worried about things that turned out to be not a problem at all, and overlooked things that became a problem (e.g. the index handling).

So, I've already started on reimplementing the archiver code completely, using concurrency only when it makes sense (i.e. processing individual chunks) and not reading 20 files from the disk in parallel. This code also includes proper directory walking and will insert the complete paths into the repo.

Fortunately, this is really just the archiver that needs to be touched, the rest of the code will (thanks to the design of restic and the repo) just continue to work fine.

All 28 comments

Thanks for reporting this issue, I think this is a bug.

Probably will happen whenever top-level directories have the same name. Because the full path is not restored, only the top-level directory.

Solution is to reconstruct the full path upon restore, and restore each tree into the full path. So resulting path would be like /tmp/restic/tmp/restic/FILESNEW0{1,2}/Dir01/. I think it's acceptable.

Does the patch need to be implemented as part of the restore?
Or, maybe it has to be done during backup by building a different top-level tree that includes the full path components?

I also suspect this is the case. At the moment, restic works like this:

When called as restic backup A/foo B/foo it creates a tree structure in the repository that looks like this:

├── foo
└── foo

So only the last path component from the arguments to the backup command is taken, this leads to a problem when restoring such a snapshot.

In order to correct this, I propose implementing the same behavior as tar, which would in this case create the following tree:

.
├── A
│   └── foo
└── B
    └── foo

This will require some work in the archiver part of restic. I don't think we'll need to touch restore at all.

588 I reported the same thing. But that one has a test case you can use.

@fd0 I propose to also include an option (--store-full-path) to backup where it explicitly stores the full 'real' path of the backup target.

The reasoning is that in the tar case and with several other backup tools you can get a little convoluted restore tree. While this is a good sane default I personally would like it if my restores resemble the entire layout of the original filesystem for host backups. (Even better if restore could also prefix the hostname to the restore location)

@trbs I think the default needs to be to store full paths, with a switch for the special case of using relative paths. Reason being that rel paths can produce unexpected or undefined behaviour, but abs can't. If you want to request prefixes or some other form of path mangling, I'd suggest that's an entirely separate issue.

I've thought about this and I think we need to change the backup behavior so that always the full path (as given on the command-line) is saved. That's what tar does, and it works very well. This is unfortunately a relict of a bad design decision early in the development for restic.

+1 for --store-full-path

Hate to just +1, but I'm also very interested in solution for this bug. Have several pending installations of restic, where this bug is unfortunately a showstopper.

Thanks @fd0 for your work on this, I understand it's not easy to unwind now.

-1 for --store-full-path. I would much rather see the full path always going in the backup and then having a --strip-components <N> to take parts away if you don't need them at restore time. This means the full data is always available in the backup and if the user strips too many components from the path at restore time and therefore combines subdirs, it becomes a recoverable user error.

As to prefixing the hostname to the backup location, this seems it can be easily done from the cmdline, as most people know from which host they are going to restore beforehand :)

Given that you're not 1.0 yet, I vote that, if a breaking change has to be made in order for the ideal fix, go ahead and do it sooner rather than later.

@mholt I agree, I'm already working on this. As I said, this is caused by a bad design decision early on and needs to be corrected.

Hey @fd0 -- just saw that 0.7 was released. Is this (and #910 and #909) on the map for 0.7.1?

Maybe not for 0.7.1, but for 0.8.0 or so. I've already started working on it though. Maybe a bit of background: This is caused by the archiver code, which is the oldest code present in restic. Unfortunately (as I was just beginning to learn Go back in 2013/2014) the archiver code is very complex and I made a lot of beginner mistakes (too much concurrency, too many channels). I also worried about things that turned out to be not a problem at all, and overlooked things that became a problem (e.g. the index handling).

So, I've already started on reimplementing the archiver code completely, using concurrency only when it makes sense (i.e. processing individual chunks) and not reading 20 files from the disk in parallel. This code also includes proper directory walking and will insert the complete paths into the repo.

Fortunately, this is really just the archiver that needs to be touched, the rest of the code will (thanks to the design of restic and the repo) just continue to work fine.

will this change affect existing repositories and if so, how?

"affecting" in terms of "new backups will have a slightly different structure", yes, but that's about it. No migrate or anything needed.

So, #1209 has been merged and it improves the situation by detecting name conflicts and resolving them (by renaming), but this issue is still not fully resolved. I'm working on it :)

@fd0 Any idea when we might expect snapshots that contain the full original path? We are currently working on automating backups and restores using restic.

When automating the restore, having the source path intact is essential.

If I have a server with two 'data' directories being backed up (and this is not theoretical, we have a number of servers with Confluence and JIRA 'data' directories that need to be backed up) - the restore process needs to know which data directory belongs to Confluence and which data directory belongs to JIRA. A name like 'data' and 'data-1' obviously doesn't cut it here.

I think the best workaround for now is backing up the data directories in seperate snapshots and tagging them with 'JIRA' or 'Confluence'?

There's no timeline, sorry.

I think the best workaround for now is backing up the data directories in seperate snapshots and tagging them with 'JIRA' or 'Confluence'?

Yes, but per #1225 you won't be able to easily merge them into one repo later.

Regarding option --store-full-path: rsync has this option: -R, --relative.
Maybe use the same option name for restic?

For full-system backups I've described a workaround here: https://forum.restic.net/t/full-system-restore/126/8 It's not pretty but will do the job until #1494 is done.

This bug worried me a bit, but I can't reproduce it in 0.8.3 with the steps provided. Is this still an open issue?

Yes, unfortunately this is still an issue.

Hm, I somehow can't replicate the issue, so not sure what I'm doing different. I attached my test script.

test_restic_549.zip

You can reproduce it like this:

$ mkdir dir1/subdir
$ echo foo > dir1/subdir/foo

$ mkdir dir2/subdir
$ echo bar > dir2/subdir/bar

$ restic backup dir1/subdir dir2/subdir
password is correct
scan [/home/user/dir1/subdir /home/user/dir2/subdir]
scanned 2 directories, 2 files in 0:00
/home/user/dir2: name collision for "subdir", renaming to "subdir-1"
[...]
snapshot f6138d06 saved

For the two subdirs, restic uses the basename of the subdir as the top-level dir in the repo, so for either dir1/subdir and dir2/subdir it's both subdir, that's what causes the collision.

Listing the latest snapshot shows it:

$ restic ls latest
password is correct
snapshot f6138d06 of [/home/user/dir1/subdir /home/user/dir2/subdir] at 2018-03-21 20:38:33.58232292 +0100 CET):
/subdir
/subdir/foo
/subdir-1
/subdir-1/bar

In your test case, the basenames of $TESTDIR/dir1 and $TESTDIR/dir2 are different (dir1 vs. dir2) so the bug does not occur.

From the release notes of version 0.9:

The first backup with this release of restic will likely result in all files being re-read locally, so it will take a lot longer. The next backup after that will be fast again.

I just want to give you some statistics:

first backup:

-------------------------------------------------------------
Start: Do 24. Mai 05:15:01 CEST 2018
437 snapshots

Files:           0 new,     0 changed, 40524 unmodified
Dirs:            0 new,     0 changed,     2 unmodified
Added:      0 B

processed 40524 files, 14.805 GiB in 1:38
snapshot f724ff21 saved

Files:         556 new,     0 changed,     0 unmodified
Dirs:            2 new,     0 changed,     0 unmodified
Added:      719 B

processed 556 files, 914.493 GiB in 2:15:29
snapshot 3c0e0f1b saved

Files:       11570 new,     0 changed,     0 unmodified
Dirs:            2 new,     0 changed,     0 unmodified
Added:      719 B

processed 11570 files, 66.044 GiB in 16:21
snapshot 312fd29c saved

Files:        2309 new,     0 changed,     0 unmodified
Dirs:            2 new,     0 changed,     0 unmodified
Added:      719 B

processed 2309 files, 163.332 GiB in 24:13
snapshot 2baab573 saved

Files:         312 new,     0 changed,     0 unmodified
Dirs:            2 new,     0 changed,     0 unmodified
Added:      719 B

processed 312 files, 1.503 TiB in 4:48:23
snapshot 02dfe40c saved

Files:       743172 new,     0 changed,     0 unmodified
Dirs:            2 new,     0 changed,     0 unmodified
Added:      84.927 MiB

processed 743172 files, 89.131 GiB in 2:48:59
snapshot dcee3e70 saved

Files:         441 new,     0 changed,     0 unmodified
Dirs:            2 new,     0 changed,     0 unmodified
Added:      719 B

processed 441 files, 727.575 GiB in 1:56:36
snapshot 676adc45 saved
End:   Do 24. Mai 17:46:46 CEST 2018
Duration: 12h:31m:45s
-------------------------------------------------------------

second one:

-------------------------------------------------------------
Start: Fr 25. Mai 05:15:01 CEST 2018
444 snapshots

Files:           0 new,     0 changed, 40524 unmodified
Dirs:            0 new,     0 changed,     2 unmodified
Added:      0 B

processed 40524 files, 14.805 GiB in 1:42
snapshot 9c7cf320 saved

Files:           0 new,     0 changed,   556 unmodified
Dirs:            0 new,     0 changed,     2 unmodified
Added:      0 B

processed 556 files, 914.493 GiB in 0:15
snapshot 533e2155 saved

Files:           0 new,     0 changed, 11570 unmodified
Dirs:            0 new,     0 changed,     2 unmodified
Added:      0 B

processed 11570 files, 66.044 GiB in 0:17
snapshot 1c1235c3 saved

Files:           0 new,     0 changed,  2309 unmodified
Dirs:            0 new,     0 changed,     2 unmodified
Added:      0 B

processed 2309 files, 163.332 GiB in 0:13
snapshot d5ef168d saved

Files:           0 new,     0 changed,   312 unmodified
Dirs:            0 new,     0 changed,     2 unmodified
Added:      0 B

processed 312 files, 1.503 TiB in 0:16
snapshot 76e94946 saved

Files:         292 new,     0 changed, 743172 unmodified
Dirs:            0 new,     2 changed,     0 unmodified
Added:      32.790 MiB

processed 743464 files, 89.163 GiB in 1:06
snapshot 12fa66e8 saved

Files:           0 new,     0 changed,   441 unmodified
Dirs:            0 new,     0 changed,     2 unmodified
Added:      0 B

processed 441 files, 727.575 GiB in 0:15
snapshot ab2d29bb saved
End:   Fr 25. Mai 05:19:12 CEST 2018
Duration: 0h:4m:11s
-------------------------------------------------------------

so a lot longer, means a lot longer :-)
Keep up the great work! 👍

@fd0, awesome work! Thanks so much! Your backup tool has become my favorite for all my off-site backups (using b2) :-)

Was this page helpful?
0 / 5 - 0 ratings