mc mirror should always mirror files with changes
As described in the original issue I'm still able to reproduce this behavior. Files with the same size will not be synced even if the content has changed. https://github.com/minio/mc/issues/2187
mc version RELEASE.2020-07-11T05-18-52Z
Ubuntu 18.08
@xoxys can you share your mc mirror
command line?
@harshavardhana sure:
mc mirror --no-color --overwrite --remove downloads/testing /local/path/testing
Hi,
I also confirm I have the exact same issue. Using --md5
doesn't do anything (btw, this option is missing in the minIO client guide).
I also removed the file locally, create a new one with the same name and put some random content with the exact same size but the file wasn't uploaded.
I'm running the latest versions of minIO (RELEASE.2020-07-31T03-39-05Z
) and mc(RELEASE.2020-07-31T23-34-13Z
).
As a "workaround", the only way is mc rm --recursive --force
and then mc cp --recursive
but this is not efficient at all.
mc mirror --json --overwrite --remove --preserve --md5 "${APTLY_DIR}/apt/db" "${APTLY_MINIO_PATH}/apt/db"
{
"status": "success",
"total": 0,
"transferred": 0,
"speed": 0
}
Is there any updates on this please?
This is actually incredibly easy to reproduce.
mc mb minio/test
mkdir tmp
cd tmp
echo "this is some random content" > content.txt
mc mirror --json --overwrite --remove --preserve --md5 ./ minio/test/
# {
# "status": "success",
# "source": "/root/tmp/content.txt",
# "target": "minio/test/content.txt",
# "size": 28,
# "totalCount": 1,
# "totalSize": 28
# }
# {
# "status": "success",
# "total": 0,
# "transferred": 56,
# "speed": 2485.540810183332
# }
echo "this is SOME rand0m c0ntent" > content.txt
# Note the file size is the same but the content is different.
mc mirror --json --overwrite --remove --preserve --md5 ./ minio/test/
# {
# "status": "success",
# "total": 0,
# "transferred": 0,
# "speed": 0
# }
rm content.txt
mc mirror --json --overwrite --remove --preserve --md5 minio/test/ ./
# {
# "status": "success",
# "source": "minio/test/content.txt",
# "target": "content.txt",
# "size": 28,
# "totalCount": 1,
# "totalSize": 28
# }
# {
# "status": "success",
# "total": 0,
# "transferred": 56,
# "speed": 9440.131109935215
# }
cat content.txt # the file content is the original content from the first 'echo command'
# this is some random content
Hi @aureq ,
I'll be looking into this as soon as the I am done with the issue I am currently working on.
Tested with the reproducer from @aureq:
Great!
Very helpful information.
Thank you.
@ebozduman I understand there is logic that checks, if the "same file" already exists in the destination and then skips the file to speed up the entire process. In our usecase, it would be perfectly fine to always mirror all files without such optimization, as we rarely have 100% identical files. That is, I'd like to err on the safe side: be slightly less efficient, but no chance to "forget" a file.
A way to disable the logic would be great. E.g. a --force or --force-overwrite effect.
@jnweiger in this case you will wait a lot longer between syncs... Personally I dont want a global ignore flag or whatever... I would like to have a working AND efficient sync
Another way may be compare last modified timestamps on the file.
@xoxys, @aureq, @jnweiger, @i0x71,
One last question: any one of you, was not using a multi-site setup, right?
That is; there is a single minio server running in your set-up, correct?
@ebozduman correct, single server setup here
@xoxys, @aureq, @jnweiger, @i0x71,
One last question: any one of you, was not using a multi-site setup, right?
That is; there is a single minio server running in your set-up, correct?
I've been having this same issue also and I am using a single minio server set up, if it helps.
@ebozduman Same here, it's a single site setup as well. So single minio server running.
@xoxys, @aureq, @jnweiger, @i0x71, @dpgarrick,
As @harshavardhana explained in his comment in PR#3353, to answer my question:
I asked:
@harshavardhana, Please check PR#3226: "diff: Disable comparing modtimes when multimaster context is not found" and your comments there. It looks like what is attempted to be fixed here with this PR, was actually deliberately removed then. So, Issue #3331 doesn't look like a regression.
@harshavardhana answered:
The reason for this is we have no way to know which is latest @ebozduman because last-modified is not the correct way to know - that's why active/active is necessary if "mtime" based mirroring is required.
So, here is an example how you can use mc mirror
's --active-active
flag as a solution:
- Start your minio server
- From @aureq' s easy reproduction steps, create your buckets and source files:
$ mc mb myminio/test
$ mkdir tmp
$ echo "this is some random content" > ./tmp/content.txt
$ mkdir tmp1
$ echo "this is SOME rand0m c0ntent" > ./tmp1/content.txt
"./tmp1/content.txt" will have a later/newer "mtime" than "./tmp/content.txt"
- Start your active/active session that watches changes in the "./tmp/" directory and copies
over those changed files instantly to the minio server side, including modified files/objects
that happen to stay the same size and have a newer "mtime":
$ mc mirror --active-active -a ./tmp/ myminio/test/
Your "./tmp/" directory content will be mirrored/copied over into "myminio/test/" as soon as
mirror active/active session starts to sync the contents and it also preserves original file
attributes, including "mtime".
- Copy "./tmp1/content.txt" file into "./tmp/" and preserve the file attributes. This will trigger
automatic copy of the new content.txt file that has a newer "mtime".
$ mc copy -a ./tmp1/content.txt ./tmp/
If you try to copy the same file with an older 'mtime", mirroring process will not sync this file
onto the destination.
I hope --active-active
flag and the above example helps to resolve the issue in your scenarios.
Please let us know what you think as our intention is to close this issue.
@ebozduman Thanks for the details and the example. What exactly is --active-active
doing? I don't really get it and it looks like this feature is not documented.
@ebozduman Thanks for the information.
From the help on mc mirror
command it says --active-active - enable active-active multi-site setup
. My use case is backing up a local directory onto a single site minio deployment, so I agree with @xoxys that perhaps documentation needs updating?
From a quick test mc mirror --active-active
seems to function similarly to mc mirror --watch
. If the mc mirror --active-active -a ./tmp my-minio/tmp-bak
command is left running at all times in one terminal then it is able to successfully capture my changes to files in the ./tmp directory including changes where the filesize is the same.
But my use case is I want to (manually) periodically back up a directory onto minio by running the mc mirror -a /path/to/mydir my-minio/path/to/backup-of-mydir
command, i.e. without something like mc mirror --watch
or mc mirror --active-active
continuously running in the background. For that use case the original "actual" vs "expected" behaviour described in this issue still exists. Also if for whatever reason the mc mirror --active-active -a
command is interrupted and restarted again, any changes while the command was stopped that occurred to files which result in the same filesize will again not be captured.
So is my manual backup use case not supported or is it only possible by continuously running mc mirror --active-active
as a daemon?
Some more configurable options to force mc mirror
to operate on files by checking size + modtime (even if only allowing modtime when -a
flag is used) or using md5 checksums would be helpful to me at least, similar to how rclone behaves, e.g. see https://rclone.org/commands/rclone_sync/ and --checksum
flag here https://rclone.org/flags/
@xoxys,
--active-active
does full-sync and this special care is needed especially for more complex scenarios like multi-site setups as mc mirror --help
page says it:
--active-active enable active-active multi-site setup
Since --active-active
does full sync, it also honors mtime
changes.
Another reason we've stopped checking mtime
changes during regular mirroring is that sometimes mtime
change not necessarily indicates a real change. Because of this reason, we have some users who don't want to sync when there are mtime
changes as some apps out there, like jekyll build
, modifies mtime
for no good reason at all and extends the sync process drastically.
You are welcome to open up an issue for missing information/documentation about --active-active
flag.
I'll also raise the issue in our team meeting.
@dpgarrick,
Thank you for your input.
One clarification on your comment:
Also if for whatever reason the mc mirror --active-active -a command is interrupted and restarted again, any changes while the command was stopped that occurred to files which result in the same filesize will again not be captured.
Actually, when mc mirror --active-active
process is restarted after some interruption, it starts by syncing all changes found between the resource and the destination. So, it'll pick up everything.
Unfortunately manual backup/mirroring is not supported as far as mtime changes go.
Yes, you can always daemonize a process in shell.
@dpgarrick,
Thank you for your input.
One clarification on your comment:Also if for whatever reason the mc mirror --active-active -a command is interrupted and restarted again, any changes while the command was stopped that occurred to files which result in the same filesize will again not be captured.
Actually, when
mc mirror --active-active
process is restarted after some interruption, it starts by syncing all changes found between the resource and the destination. So, it'll pick up everything.
That didn't happen when I tested it as follows:
mkdir tmpdir
echo "1234" > tmpdir/tmp
mc mirror --active-active -a ./tmpdir my-minio/tmpdir-mirror
Wait 5 seconds and then interrupt the mc mirror
mc cat my-minio/tmpdir-mirror/tmp
should show 1234
Now do
echo "0000" > tmpdir/tmp
mc mirror --active-active -a ./tmpdir my-minio/tmpdir-mirror
Wait 5 seconds and then interrupt the mc mirror
mc cat my-minio/tmpdir-mirror/tmp
should now show 0000
but instead still shows "1234"
My versions
mc version RELEASE.2020-08-20T00-23-01Z
my-minio Version: 2020-08-27T05:16:20Z
Unfortunately manual backup/mirroring is not supported as far as mtime changes go.
In that case, what mc commands should be used to reliably back up data to minio via an automated nightly cron job for example? I run both manual and automated backups but all are using mc mirror in that fashion and for various reasons I do not want to run mc mirror
as a daemon to backup
@ebozduman
Just realised this issue is a duplicate of #3060
From all the discussion here and there, I understand why the default behavior is what it is but would be nice if this issue could be documented in the help for mc mirror
Seems the only workaround at this stage is to use rclone
@dpgarrick,
You are right. The only workaround at this point is to use rclone
.
There is a MinIO document on how to use rclone
: Rclone with MinIO Server
I'm still unable to get it to work... That's what I've tried now:
I've started the mirror with:
mc mirror --active-active --no-color --overwrite --remove --quiet upload/mirror /path/to/local/dir/mirror
What I need is a way to periodically or continuously mirror from a central upload.example.com minio server to multiple webserver local directory...
Maybe that's because I try to mirror from a remote minio server to local fs? Is the active-active/watch mirror only working from local fs to remote minio? In this case that's not a solution for me as well and I would also need to checkout rclone.
@xoxys, @aureq, @jnweiger, @i0x71, @dpgarrick,
Finally, we do have a fix for this issue, PR#3402, which will be available in the next mc release.
I'll close this issue for now.
If you'd like you can pick up the fix and try it in your setups and let us know how it goes.
Most helpful comment
Hi,
I also confirm I have the exact same issue. Using
--md5
doesn't do anything (btw, this option is missing in the minIO client guide).I also removed the file locally, create a new one with the same name and put some random content with the exact same size but the file wasn't uploaded.
I'm running the latest versions of minIO (
RELEASE.2020-07-31T03-39-05Z
) and mc(RELEASE.2020-07-31T23-34-13Z
).As a "workaround", the only way is
mc rm --recursive --force
and thenmc cp --recursive
but this is not efficient at all.