Mc: Minio fails to mirror file after changes if file's size matches

Created on 30 Jul 2020  ·  25Comments  ·  Source: minio/mc

Expected behavior

mc mirror should always mirror files with changes

Actual

As described in the original issue I'm still able to reproduce this behavior. Files with the same size will not be synced even if the content has changed. https://github.com/minio/mc/issues/2187

mc --version

mc version RELEASE.2020-07-11T05-18-52Z

System information

Ubuntu 18.08

community

Most helpful comment

Hi,

I also confirm I have the exact same issue. Using --md5 doesn't do anything (btw, this option is missing in the minIO client guide).

I also removed the file locally, create a new one with the same name and put some random content with the exact same size but the file wasn't uploaded.

I'm running the latest versions of minIO (RELEASE.2020-07-31T03-39-05Z) and mc(RELEASE.2020-07-31T23-34-13Z).

As a "workaround", the only way is mc rm --recursive --force and then mc cp --recursive but this is not efficient at all.

mc mirror --json --overwrite --remove --preserve --md5 "${APTLY_DIR}/apt/db" "${APTLY_MINIO_PATH}/apt/db"
{
 "status": "success",
 "total": 0,
 "transferred": 0,
 "speed": 0
}

All 25 comments

@xoxys can you share your mc mirror command line?

@harshavardhana sure:

mc mirror --no-color --overwrite --remove downloads/testing /local/path/testing

Hi,

I also confirm I have the exact same issue. Using --md5 doesn't do anything (btw, this option is missing in the minIO client guide).

I also removed the file locally, create a new one with the same name and put some random content with the exact same size but the file wasn't uploaded.

I'm running the latest versions of minIO (RELEASE.2020-07-31T03-39-05Z) and mc(RELEASE.2020-07-31T23-34-13Z).

As a "workaround", the only way is mc rm --recursive --force and then mc cp --recursive but this is not efficient at all.

mc mirror --json --overwrite --remove --preserve --md5 "${APTLY_DIR}/apt/db" "${APTLY_MINIO_PATH}/apt/db"
{
 "status": "success",
 "total": 0,
 "transferred": 0,
 "speed": 0
}

Is there any updates on this please?

This is actually incredibly easy to reproduce.

mc mb minio/test
mkdir tmp
cd tmp
echo "this is some random content" > content.txt
mc mirror --json --overwrite --remove --preserve --md5 ./ minio/test/
# {
#  "status": "success",
#  "source": "/root/tmp/content.txt",
#  "target": "minio/test/content.txt",
#  "size": 28,
#  "totalCount": 1,
#  "totalSize": 28
# }
# {
#  "status": "success",
#  "total": 0,
#  "transferred": 56,
#  "speed": 2485.540810183332
# }

echo "this is SOME rand0m c0ntent" > content.txt
# Note the file size is the same but the content is different.

mc mirror --json --overwrite --remove --preserve --md5 ./ minio/test/
# {
#  "status": "success",
#  "total": 0,
#  "transferred": 0,
#  "speed": 0
# }

rm content.txt
mc mirror --json --overwrite --remove --preserve --md5 minio/test/ ./
# {
#  "status": "success",
#  "source": "minio/test/content.txt",
#  "target": "content.txt",
#  "size": 28,
#  "totalCount": 1,
#  "totalSize": 28
# }
# {
#  "status": "success",
#  "total": 0,
#  "transferred": 56,
#  "speed": 9440.131109935215
# }

cat content.txt # the file content is the original content from the first 'echo command'
# this is some random content

Hi @aureq ,
I'll be looking into this as soon as the I am done with the issue I am currently working on.

Tested with the reproducer from @aureq:

Great!
Very helpful information.
Thank you.

@ebozduman I understand there is logic that checks, if the "same file" already exists in the destination and then skips the file to speed up the entire process. In our usecase, it would be perfectly fine to always mirror all files without such optimization, as we rarely have 100% identical files. That is, I'd like to err on the safe side: be slightly less efficient, but no chance to "forget" a file.
A way to disable the logic would be great. E.g. a --force or --force-overwrite effect.

@jnweiger in this case you will wait a lot longer between syncs... Personally I dont want a global ignore flag or whatever... I would like to have a working AND efficient sync

Another way may be compare last modified timestamps on the file.

@xoxys, @aureq, @jnweiger, @i0x71,

One last question: any one of you, was not using a multi-site setup, right?
That is; there is a single minio server running in your set-up, correct?

@ebozduman correct, single server setup here

@xoxys, @aureq, @jnweiger, @i0x71,

One last question: any one of you, was not using a multi-site setup, right?
That is; there is a single minio server running in your set-up, correct?

I've been having this same issue also and I am using a single minio server set up, if it helps.

@ebozduman Same here, it's a single site setup as well. So single minio server running.

@xoxys, @aureq, @jnweiger, @i0x71, @dpgarrick,

As @harshavardhana explained in his comment in PR#3353, to answer my question:

I asked:

@harshavardhana, Please check PR#3226: "diff: Disable comparing modtimes when multimaster context is not found" and your comments there. It looks like what is attempted to be fixed here with this PR, was actually deliberately removed then. So, Issue #3331 doesn't look like a regression.

@harshavardhana answered:

The reason for this is we have no way to know which is latest @ebozduman because last-modified is not the correct way to know - that's why active/active is necessary if "mtime" based mirroring is required.

So, here is an example how you can use mc mirror's --active-active flag as a solution:

- Start your minio server

- From @aureq' s easy reproduction steps, create your buckets and source files:
     $ mc mb myminio/test
     $ mkdir tmp
     $ echo "this is some random content" > ./tmp/content.txt
     $ mkdir tmp1
     $ echo "this is SOME rand0m c0ntent" > ./tmp1/content.txt
"./tmp1/content.txt" will have a later/newer "mtime" than "./tmp/content.txt"

- Start your active/active session that watches changes in the "./tmp/" directory and copies
over those changed files instantly to the minio server side, including modified files/objects
that happen to stay the same size and have a newer "mtime":
     $ mc mirror --active-active -a ./tmp/ myminio/test/
Your "./tmp/" directory content will be mirrored/copied over into "myminio/test/" as soon as
mirror active/active session starts to sync the contents and it also preserves original file
attributes, including "mtime".

- Copy "./tmp1/content.txt" file into "./tmp/" and preserve the file attributes. This will trigger
automatic copy of the new content.txt file that has a newer "mtime".
     $ mc copy -a ./tmp1/content.txt ./tmp/
If you try to copy the same file with an older 'mtime", mirroring process will not sync this file
onto the destination.

I hope --active-active flag and the above example helps to resolve the issue in your scenarios.
Please let us know what you think as our intention is to close this issue.

@ebozduman Thanks for the details and the example. What exactly is --active-active doing? I don't really get it and it looks like this feature is not documented.

@ebozduman Thanks for the information.

From the help on mc mirror command it says --active-active - enable active-active multi-site setup. My use case is backing up a local directory onto a single site minio deployment, so I agree with @xoxys that perhaps documentation needs updating?

From a quick test mc mirror --active-active seems to function similarly to mc mirror --watch. If the mc mirror --active-active -a ./tmp my-minio/tmp-bak command is left running at all times in one terminal then it is able to successfully capture my changes to files in the ./tmp directory including changes where the filesize is the same.

But my use case is I want to (manually) periodically back up a directory onto minio by running the mc mirror -a /path/to/mydir my-minio/path/to/backup-of-mydir command, i.e. without something like mc mirror --watch or mc mirror --active-active continuously running in the background. For that use case the original "actual" vs "expected" behaviour described in this issue still exists. Also if for whatever reason the mc mirror --active-active -a command is interrupted and restarted again, any changes while the command was stopped that occurred to files which result in the same filesize will again not be captured.

So is my manual backup use case not supported or is it only possible by continuously running mc mirror --active-active as a daemon?

Some more configurable options to force mc mirror to operate on files by checking size + modtime (even if only allowing modtime when -a flag is used) or using md5 checksums would be helpful to me at least, similar to how rclone behaves, e.g. see https://rclone.org/commands/rclone_sync/ and --checksum flag here https://rclone.org/flags/

@xoxys,

--active-active does full-sync and this special care is needed especially for more complex scenarios like multi-site setups as mc mirror --help page says it:

  --active-active                    enable active-active multi-site setup

Since --active-active does full sync, it also honors mtime changes.

Another reason we've stopped checking mtime changes during regular mirroring is that sometimes mtime change not necessarily indicates a real change. Because of this reason, we have some users who don't want to sync when there are mtime changes as some apps out there, like jekyll build, modifies mtime for no good reason at all and extends the sync process drastically.

You are welcome to open up an issue for missing information/documentation about --active-active flag.
I'll also raise the issue in our team meeting.

@dpgarrick,

Thank you for your input.
One clarification on your comment:

Also if for whatever reason the mc mirror --active-active -a command is interrupted and restarted again, any changes while the command was stopped that occurred to files which result in the same filesize will again not be captured.

Actually, when mc mirror --active-active process is restarted after some interruption, it starts by syncing all changes found between the resource and the destination. So, it'll pick up everything.

Unfortunately manual backup/mirroring is not supported as far as mtime changes go.

Yes, you can always daemonize a process in shell.

@dpgarrick,

Thank you for your input.
One clarification on your comment:

Also if for whatever reason the mc mirror --active-active -a command is interrupted and restarted again, any changes while the command was stopped that occurred to files which result in the same filesize will again not be captured.

Actually, when mc mirror --active-active process is restarted after some interruption, it starts by syncing all changes found between the resource and the destination. So, it'll pick up everything.

That didn't happen when I tested it as follows:

mkdir tmpdir
echo "1234" > tmpdir/tmp
mc mirror --active-active -a ./tmpdir my-minio/tmpdir-mirror

Wait 5 seconds and then interrupt the mc mirror
mc cat my-minio/tmpdir-mirror/tmp should show 1234
Now do

echo "0000" > tmpdir/tmp
mc mirror --active-active -a ./tmpdir my-minio/tmpdir-mirror

Wait 5 seconds and then interrupt the mc mirror
mc cat my-minio/tmpdir-mirror/tmp should now show 0000 but instead still shows "1234"

My versions

mc version RELEASE.2020-08-20T00-23-01Z
my-minio    Version: 2020-08-27T05:16:20Z

Unfortunately manual backup/mirroring is not supported as far as mtime changes go.

In that case, what mc commands should be used to reliably back up data to minio via an automated nightly cron job for example? I run both manual and automated backups but all are using mc mirror in that fashion and for various reasons I do not want to run mc mirror as a daemon to backup

@ebozduman
Just realised this issue is a duplicate of #3060

From all the discussion here and there, I understand why the default behavior is what it is but would be nice if this issue could be documented in the help for mc mirror

Seems the only workaround at this stage is to use rclone

@dpgarrick,

You are right. The only workaround at this point is to use rclone.

There is a MinIO document on how to use rclone: Rclone with MinIO Server

I'm still unable to get it to work... That's what I've tried now:

I've started the mirror with:

mc mirror --active-active --no-color --overwrite --remove --quiet upload/mirror /path/to/local/dir/mirror
  • on the initial start of the mirror diffs are synced
  • after the mirror command was started (still running!) changes on upload/mirror are not synced to the local directory (waiting > 5min now)

What I need is a way to periodically or continuously mirror from a central upload.example.com minio server to multiple webserver local directory...

Maybe that's because I try to mirror from a remote minio server to local fs? Is the active-active/watch mirror only working from local fs to remote minio? In this case that's not a solution for me as well and I would also need to checkout rclone.

@xoxys, @aureq, @jnweiger, @i0x71, @dpgarrick,

Finally, we do have a fix for this issue, PR#3402, which will be available in the next mc release.

I'll close this issue for now.
If you'd like you can pick up the fix and try it in your setups and let us know how it goes.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

z0rc picture z0rc  ·  7Comments

richarson picture richarson  ·  5Comments

TJC picture TJC  ·  10Comments

silvernode picture silvernode  ·  8Comments

sebschlue picture sebschlue  ·  12Comments