Borg: size placeholders for borg list <repo>

Created on 24 Jul 2017  ·  5Comments  ·  Source: borgbackup/borg

borg list does only support listing archive, barchive, time and id for an archive. When I e.g. see I need space and want to delete big archives, however, (e.g. because of https://github.com/borgbackup/borg/issues/2870), I'd like to see an overview over all archive size's to find the largest ones and potentially delete them.
I know I could borg info each archive, but this takes some time and is inconvenient to do for many archives.
Of course, I am only interested in the "deduplicated size" of "this archive".

But if you want, you could also sum up the values (or how you do it) and display a line "size of all archives: XY GB" at the bottom.

Maybe you need to cache this information somehow/somewhere, but this should be possible, as, usually, the size of an archive does not change and so the cache never has to expire (unless an archive is deleted).

Most helpful comment

Wishing for this feature myself, I slapped this (technically :wink:) one-liner together to get an archive list like I'd like it to look:

printf 'Archive\t\t\t\t\tOrig\tComp\tDedup\n'; printf '%-32.32s\t%s\t%s\t%s\n' $(borg info --json --sort-by name --glob-archives '*' REPO | jq '.archives[] | "\(.name) \(.stats.original_size) \(.stats.compressed_size) \(.stats.deduplicated_size)"' | sed --expression='s/^"//;s/"$//' | numfmt --field='2-4' --to=iec)

It uses jq to format the JSON and numfmt from coreutils to make the sizes human-readable. The result looks like this (trimmed to representative set of lines):

Archive                 Orig    Comp    Dedup
hostname-home-20190613-090600       288G    92G 12K
hostname-home-20190613-091013       288G    92G 117K
hostname-home-20190617-005337       220G    61G 6.9M
hostname-home-20190617-022904       288G    92G 16M
hostname-home-20190617-225658       288G    92G 40K
hostname-sysconfig-20190617-023108  12M 3.2M    40K
hostname-sysconfig-20190617-225820  12M 3.2M    32K
hostname-sysconfig-20190618-144623  12M 3.2M    105K
hostname-sysconfig-20190621-224259  13M 3.3M    110K
hostname-system-20190613-081754     300G    97G 20M
hostname-system-20190613-091212     300G    97G 14M
hostname-system-20190618-144635     300G    97G 37M
hostname-system-20190621-224311     308G    98G 4.6M
hostname-system-20190621-230350     308G    98G 617K

With only 39 archives speed is ok, but I guess doing it with --last 1 as part of the backup run and storing this in a separate log to consult on demand is going to be the way to use it in practice.

All 5 comments

Until recently, the infrastructure needed for this was missing - I added it (to master branch) when I implemented the "comment" placeholder for borg list <repo>. It can now compute stuff on demand and not only show data from the manifest entry, like before.

But, be aware that computing anything that needs to read the whole archive metadata will be slow, esp. if the listing shows many archives and/or repo is accessed over a slow connection.

borg info -a '*', borg info --last/first x

borg info -a '*'

Interesting that this is possible, but it is largely slow.

But, be aware that computing anything that needs to read the whole archive metadata will be slow, esp. if the listing shows many archives and/or repo is accessed over a slow connection.

Yeah, that's why I said: Can't you just cache this fact?

Interesting that this is possible, but it is largely slow.

_Deduplicated size_ has to be calculated, and cannot be cached — so this will always be kinda slow, though #2764 makes it a fair amount faster.

Wishing for this feature myself, I slapped this (technically :wink:) one-liner together to get an archive list like I'd like it to look:

printf 'Archive\t\t\t\t\tOrig\tComp\tDedup\n'; printf '%-32.32s\t%s\t%s\t%s\n' $(borg info --json --sort-by name --glob-archives '*' REPO | jq '.archives[] | "\(.name) \(.stats.original_size) \(.stats.compressed_size) \(.stats.deduplicated_size)"' | sed --expression='s/^"//;s/"$//' | numfmt --field='2-4' --to=iec)

It uses jq to format the JSON and numfmt from coreutils to make the sizes human-readable. The result looks like this (trimmed to representative set of lines):

Archive                 Orig    Comp    Dedup
hostname-home-20190613-090600       288G    92G 12K
hostname-home-20190613-091013       288G    92G 117K
hostname-home-20190617-005337       220G    61G 6.9M
hostname-home-20190617-022904       288G    92G 16M
hostname-home-20190617-225658       288G    92G 40K
hostname-sysconfig-20190617-023108  12M 3.2M    40K
hostname-sysconfig-20190617-225820  12M 3.2M    32K
hostname-sysconfig-20190618-144623  12M 3.2M    105K
hostname-sysconfig-20190621-224259  13M 3.3M    110K
hostname-system-20190613-081754     300G    97G 20M
hostname-system-20190613-091212     300G    97G 14M
hostname-system-20190618-144635     300G    97G 37M
hostname-system-20190621-224311     308G    98G 4.6M
hostname-system-20190621-230350     308G    98G 617K

With only 39 archives speed is ok, but I guess doing it with --last 1 as part of the backup run and storing this in a separate log to consult on demand is going to be the way to use it in practice.

Was this page helpful?
0 / 5 - 0 ratings