Right now borg diff shows something like this:
+27.4 kB -27.4 kB var/vmail/example.com/foobar/Auto/Steam und Co/dovecot-uidlist
+490.4 kB -489.3 kB var/vmail/example.com/foobar/Auto/Steam und Co/dovecot.index.cache
+29.6 kB -29.3 kB var/vmail/example.com/foobar/Auto/Steam und Co/dovecot.index.log
added 62.77 kB var/vmail/example.com/foobar/Auto/Steam und Co/new/1520542202.M174402P24533.turtle,S=62768,W=63599
+4.5 kB -4.5 kB var/vmail/example.com/foobar/maildirsize
And that's great. Except these human readable size informations are a bit tough for a machine to parse. Ideally there would be an option to output machine readable sizes (aka just the bytes, without any prefix).
Speaking of prefix: Are these kB as in 1000 Bytes or KiB as in 1024 Bytes? While at it maybe this could be clarified, too. Thanks!
Did you check whether we have json output support for that?
I did indeed check the borg help diff
for that. The only thing related to JSON I found is this:
--log-json Output one JSON object per log line instead of
formatted text.
So no luck there I am afraid.
OK, if we do not have that yet, adding json output seems to be a good idea.
In that case I would also like to request including both the uncompressed and compressed size of each change. As far as I can understand the Borg source code, the current output refers to the uncompressed size.
I propose to add an option --json-output
for diff.
And a diff output should be something like that:
{
added: [
{path: '/path/to/file', change: '27.4 kB'}
],
modified: [
{path: '/path/to/file', change: '+490.4 kB -489.3 kB'}
],
deleted: [
{path: '/path/to/file', change: 'directory'}
]
}
Given this way, users will get easy way to parse the output, cause they will have separate groups and fields.
So, I'm going to add inner function print_output_json
to do_diff
, which will produce the output like above from diffs generator.
Also for not-human-readable format we can add an option --bytes
and show file size without units. I think it would be nice to have this option and for standard output too.
Now diff function uses overridden ItemDiff.__repr__()
that in turn uses ItemDiff._content_string()
to get difference representation. As we can't add arguments to __repr__
In my opinion it would be g
ood to add separate function and put current __repr__
content there. __repr__
will invoke that function to get standard output. The function will take an argument(e.g. in_bytes) and pass that
to Item.get_size()
. Frankly, I haven't dived in Item.get_size()
yet, but I think I can get item size in bytes.
Any thoughts, suggestions?
I propose to add an option
--json-output
for diff.
That should either be --json
or --json-lines
(depending on which it outputs) for consistency with other commands.
And a diff output should be something like that:
Given this way, users will get easy way to parse the output
No, that looks rather akward to parse and recombine to something useful. The whole change
part is no better than parsing the existing non-JSON output, I'm afraid.
I'd rather suggest one entry by path, having a type
field for file/directory/hardlink/softlink etc., a change
list that lists change types, e.g. added
, deleted
, modified
(or content
:thinking: ), owner
, mode
, etc., sizes should be given in bytes of course, and probably rather like sizes { old: 12345, new: 12346}
Most helpful comment
That should either be
--json
or--json-lines
(depending on which it outputs) for consistency with other commands.No, that looks rather akward to parse and recombine to something useful. The whole
change
part is no better than parsing the existing non-JSON output, I'm afraid.I'd rather suggest one entry by path, having a
type
field for file/directory/hardlink/softlink etc., achange
list that lists change types, e.g.added
,deleted
,modified
(orcontent
:thinking: ),owner
,mode
, etc., sizes should be given in bytes of course, and probably rather likesizes { old: 12345, new: 12346}