mc mirror overwrite currently broken

Created on 29 Jan 2020  Β·  12Comments  Β·  Source: minio/mc

Expected behavior

mc mirror --overwrite should detect changed files

Actual behavior

It seems, it currently doesn't

Steps to reproduce the behavior

$ mc mb myminio/mybucket 
Bucket created successfully `myminio/mybucket`.

$ echo one > testdir/testfile.txt

$ cat testdir/testfile.txt 
one

$ mc mirror --overwrite testdir myminio/mybucket 
...estfile.txt:  4 B / 4 B ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 227 B/s 0s

$ mc cat myminio/mybucket/testfile.txt
one

$ echo two > testdir/testfile.txt

$ cat testdir/testfile.txt 
two

$ mc mirror --overwrite testdir myminio/mybucket 
 0 B / ? ┃░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░▓┃ 0s

$ mc cat myminio/mybucket/testfile.txt
one

mc --version

mc version RELEASE.2020-01-25T03-02-19Z

System information

Client and Server: Fedora 31 with XFS as filesystem
minio version 2020-01-25T02:50:51Z

community medium stale

Most helpful comment

With --overwrite and --preserve:

$ mc mb myminio/mybucket
Bucket created successfully `myminio/mybucket`.

$ echo one > testdir/testfile.txt

$ cat testdir/testfile.txt 
one

$ mc mirror --overwrite --preserve testdir myminio/mybucket 
...estfile.txt:  4 B / 4 B ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 283 B/s 0s

$ mc cat myminio/mybucket/testfile.txt
one

$ echo two > testdir/testfile.txt

$ cat testdir/testfile.txt 
two

$ mc mirror --overwrite --preserve testdir myminio/mybucket 
 0 B / ? ┃░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░▓┃ 0s

$ mc cat myminio/mybucket/testfile.txt
one

All 12 comments

With --overwrite and --preserve:

$ mc mb myminio/mybucket
Bucket created successfully `myminio/mybucket`.

$ echo one > testdir/testfile.txt

$ cat testdir/testfile.txt 
one

$ mc mirror --overwrite --preserve testdir myminio/mybucket 
...estfile.txt:  4 B / 4 B ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 283 B/s 0s

$ mc cat myminio/mybucket/testfile.txt
one

$ echo two > testdir/testfile.txt

$ cat testdir/testfile.txt 
two

$ mc mirror --overwrite --preserve testdir myminio/mybucket 
 0 B / ? ┃░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░▓┃ 0s

$ mc cat myminio/mybucket/testfile.txt
one

@sebschlue this is actually known & expected. mc mirror does not detect changes in a file if its size does not change, like one & two has the same length.

@vadmeste What is the limitation that causes this? It seems inconvenient at best.

@vadmeste What is the limitation that causes this? It seems inconvenient at best.

No checksum stored in the server's side (ETag is not equal to the md5sum of the object in some cases)

At Slack channel, some confirmed that it should work when using --preserve

@vadmeste What is the limitation that causes this? It seems inconvenient at best.

No checksum stored in the server's side (ETag is not equal to the md5sum of the object in some cases)

Ouch. That means for snapshotting certain stuff we'd need to rely on rsync.
Is there a way to append/change some harmless metadata which is checked to force this? Or ensure etag is equal to hash?

No checksum stored in the server's side (ETag is not equal to the md5sum of the object in some cases)

Ouch. That means for snapshotting certain stuff we'd need to rely on rsync.
Is there a way to append/change some harmless metadata which is checked to force this? Or ensure etag is equal to hash?

For that use rclone @seqizz which calculates checksum of entire content - ETag is not md5sum not always see SSE-C, Multipart etc - and md5sum is not reliable many objects out there can simply match the same md5sum - https://www.mscs.dal.ca/~selinger/md5collision/ and its quite common apparently at scale.

Unless of course we can calculate checksum of entire objects using techniques like blake2b - we need to calculate this before uploading the content, slowing this down significantly which you are going to upload.

rsync is meant for local disk to remote disk using delta protocol which reads both ends for checksum this would be unexpected in case of object storage, due to cloud costs.

Ah, of course, I am just free-shooting since currently not bound by "cloud traffic costs" :) I'll check the rclone. Thanks.

Just curious, would it even be possible to add another header like etag but containing hash for minio (on create/modify), without breaking compatibility?

Just curious, would it even be possible to add another header like etag but containing hash for minio (on create/modify), without breaking compatibility?

It is definitely possible @seqizz it is going to be very mc specific, meaning we have no control over your storage backend anyways, so any state change there wouldn't be properly understood by mc.

this can lead to double copy etc like issues, it is left away on purpose as we couldn't figure out cost effective way to do it proprely for all generalized usecases.

Can this issue be closed, then?

IMHO this needs to be documented more clearly, preferably in the mirror section of mc documentation directly.
But yeah if this is how minio works, doesn't sound like a bug. πŸ‘

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

silvernode picture silvernode  Β·  8Comments

mausch picture mausch  Β·  8Comments

d5ve picture d5ve  Β·  6Comments

ramosisw picture ramosisw  Β·  4Comments

i0x71 picture i0x71  Β·  5Comments