Borg: Tagging archives for "prune"?

Created on 6 Apr 2016  ·  5Comments  ·  Source: borgbackup/borg

Currently borg prune can only restrict the archives to be pruned by a common prefix. This works for naming schemes were the "prune-relevant" part of archive names is in the front, e.g. system-<hostname>-<date> and userdata-<hostname>-<date>, but doesn't really work for anything else.

Adding tags, i.e. a list of arbitrary strings (excluding "," which would be the tag separator) would help. "prune" and other commands using "--prefix" would get a "--tags" option, and only archives which have _all_ (or _any_, discuss) tags listed would be affected (and they should be immutable for this reason).


EDIT: Different approach maybe, no extra metadata fields, backwards-applicable.

The names are already there. Most people probably already have some kind of "tag-gy" names, like those above or yyyy-mm-dd-hostname-part. We could just add something like --tags some,tags (always use , as delimiter here?) and --tag-delim - (what delim as default?). Then in stuff like prune:

tags = set(args.tags.split(args.tag_delim))
for archive in ...:
  if set(archive.name.split(args.tag_delim)) <= tags:
    ...  # prune
enhancement

Most helpful comment

I'd like to bump this feature request for tags/aliases.

After spending so much time in the git universe, I find myself wishing I could apply additional tags to specific borg archives.

Embedding tags in the archive name is currently possible, but it's quite unruly when you want to use multiple tags for an archive. For example, I already use the archive name to embed hostname, timestamp, and one or two other fields. I also want to add additional tags such as "@latest" and "@release-1". This gets messy quickly. Worse, I sometimes want to move a tag such as @latest from one archive to another.

If you're just using borg to backup files (granted, its original mission), there probably isn't a lot of need for tags. But if, like me, you have found borg's deduplication to be massively useful in other situations, like archiving very large files used in a data analysis pipeline :-) then the ability to assign multiple tags to an existing archive becomes really important.

Currently, my work-around is to create the original archive with the naming scheme I've devised, and then to immediately create multiple additional archives with names that begin with "@" -- @latest, @v1.0, @beta2, etc. Each one of those additional archives takes a couple minutes to scan/create, and adds just a few hundred bytes to the repository since the contents are completely identical to the original archive. (Well, as long as the files haven't changed in those couple minutes.)

It would be really nice to eliminate that slowdown by adding tag metadata.

I envision the UI being something like this:

  • Create a new tag and point it to an existing archive:
    borg tag [repo::archive-name] [tag1] [tag2] ...

  • List all tags and the archives they point to
    borg tag --list [repo]

  • Deleting tags could re-use the existing borg delete command or could also be a command option:
    borg tag -d [repo] [tagname]

Thanks for considering this!

All 5 comments

Mixing names and tags feels unclean. Tags could be separate archive metadata.

Good point, but I'm unsure whether that's not okay here (as a design decision). #866 made me think "Hm, what _is_ the archive name really for?". "Recycling" it for tagging isn't a really clean thing to do, but it seems quite practical to me (if it's 100 % explicit opt-in). In a way "tags" would just be a different way of looking at the "name" field.

I'd like to bump this feature request for tags/aliases.

After spending so much time in the git universe, I find myself wishing I could apply additional tags to specific borg archives.

Embedding tags in the archive name is currently possible, but it's quite unruly when you want to use multiple tags for an archive. For example, I already use the archive name to embed hostname, timestamp, and one or two other fields. I also want to add additional tags such as "@latest" and "@release-1". This gets messy quickly. Worse, I sometimes want to move a tag such as @latest from one archive to another.

If you're just using borg to backup files (granted, its original mission), there probably isn't a lot of need for tags. But if, like me, you have found borg's deduplication to be massively useful in other situations, like archiving very large files used in a data analysis pipeline :-) then the ability to assign multiple tags to an existing archive becomes really important.

Currently, my work-around is to create the original archive with the naming scheme I've devised, and then to immediately create multiple additional archives with names that begin with "@" -- @latest, @v1.0, @beta2, etc. Each one of those additional archives takes a couple minutes to scan/create, and adds just a few hundred bytes to the repository since the contents are completely identical to the original archive. (Well, as long as the files haven't changed in those couple minutes.)

It would be really nice to eliminate that slowdown by adding tag metadata.

I envision the UI being something like this:

  • Create a new tag and point it to an existing archive:
    borg tag [repo::archive-name] [tag1] [tag2] ...

  • List all tags and the archives they point to
    borg tag --list [repo]

  • Deleting tags could re-use the existing borg delete command or could also be a command option:
    borg tag -d [repo] [tagname]

Thanks for considering this!

Only started trying out borg recently but wanted to +1 the tagging idea. I can see a use-case relevant to backups whereby tags are used to define which of multiple cloud services an archive is backed up to. I imagine (based on other discussions) that the cloud backup would most likely be via a separate tool which picks up on the tags and, for example, handles creating a *.tgz file to be uploaded. (You could even add backup frequency as a separate detectable tag, but that sort of thing would be within the scope of the backup tool rather than borg itself.)

See issue #2300 for a possible tag implementation. It's currently more like git tag than like Gmail labels -- in other words, additional aliases can exist for an archive, but they need to be unique. It might not be hard to merge that idea with what's discussed here -- labels applied to multiple archives.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

russelldavis picture russelldavis  ·  3Comments

htho picture htho  ·  5Comments

rugk picture rugk  ·  5Comments

phdoerfler picture phdoerfler  ·  6Comments

anarcat picture anarcat  ·  4Comments