Aws-cli: aws s3 sync does not synchronize s3 folder structure locally

Created on 12 Sep 2014  ·  100Comments  ·  Source: aws/aws-cli

The aws s3 sync does not fully synchronize the S3 folder structure locally even if I use it with --delete or --recursive arguments:

aws --version
aws-cli/1.4.3 Python/2.7.6 Linux/3.13.0-35-generic

$ aws s3 ls s3://s3.testbucket
$ aws s3 ls s3://s3.testbucket/
$ mkdir s3.testfolder
$ mkdir s3.testfolder/test1
$ aws s3 sync ./s3.testfolder s3://s3.testbucket/
$ aws s3 ls s3://s3.testbucket/
$ touch s3.testfolder/test1/1
$ aws s3 sync ./s3.testfolder/ s3://s3.testbucket/
upload: s3.testfolder/test1/1 to s3://s3.testbucket/test1/1
$ aws s3 sync ./s3.testfolder s3://s3.testbucket/
$ mkdir ./s3.testfolder/test-to-delete
$ aws s3 sync s3://s3.testbucket/ ./s3.testfolder/ --delete --recursive
$ aws s3 sync s3://s3.testbucket/ ./s3.testfolder/ --delete
$ ls -lah ./s3.testfolder/
total 60K
drwxrwxr-x 4 tobi tobi 4,0K szept 12 15:24 .
drwx------ 71 tobi tobi 44K szept 12 15:22 ..
drwxrwxr-x 2 tobi tobi 4,0K szept 12 15:23 test1
drwxrwxr-x 2 tobi tobi 4,0K szept 12 15:24 test-to-delete

$ aws s3 ls s3://s3.testbucket/
PRE test1/

feature-request s3 s3sync

Most helpful comment

Based on community feedback, we have decided to return feature requests to GitHub issues.

All 100 comments

This behavior is known. The reason why the sync command behaves this way is that s3 does not physically use directories. There are only buckets and objects. Objects have prefixes that act like directories, but s3 does not designate a specific physical object to be a directory.

Therefore, when the syncing occurs, only files are transferred to s3 because s3 does not have physical directories. So when you try to sync up empty directories, nothing is uploaded because there are no files in them. Once you put items in the directory, then the file (with the prefix representing the directory) will be uploaded.

Thank you Kyle, it is clear. I know how S3 stores files, but sometimes we need the same directory structure in sevaral places even if there are empty ones or remove from if we do not need anymore.
A good example if you have complex directory structure with a lot of contents locally than you synced to S3. After that an automated mechanism sync this structure periodically to several running instances. You keep up-to date (delete) most of the content from S3 then the automatism re-sync to the places where you used before. Unfortunately you will find the original complex directory structure remains forever on sync targets which may cause confusion if you want to check it or your program try to use this empty folders because of you need always the same everywhere. Moreover the people who use it with --delete options maybe used the "rsync" equivalent before on Linux which keeps the folders synced so counts on the same operation.
I think it would be not hard to implement a switch or option for aws tool to detect somehow if an S3 object is a file or folder (list, size, etc..) and create/delete them locally or in an S3 bucket (e.g. list(bucket.list("", "/"))?

That makes sense. Will look into adding a feature for it.

This would be very useful for our situation as well. If it were added as an option (--sync-empty-directories) people could choose to use it when needed.

+1 Need this feature very badly

+1. Would like to use it.

+1

I also was surprised by this behavior, given that it is called "sync".
I can work around this in my particular use case, but future users could be spared the pain :)

+1 on being able to sync directory structure! If you delete a folder it only removes the content, but it leaves the folder behind...

+1. I have the same needs.

+1 - surprised that hasn't been implemented yet. Sure, in my case it doesn't matter too much, and I can work around it (or just use placeholder files when creating structures), but it would be a benefit to just have it supported by either s3 sync or s3 cp.

+1

s3cmd sync does keep the folder structure but therefore it has some issues when granting access while synching so one needs to run another s3cmd setacl --recursive afterwards…

+1

+1

+1

Thanks for the feedback everyone. I think the best option I've seen is to add a --sync-empty-directories option. Let's do that.

@jamesls I'm expecting somewhat like rsync functionalities, but s3 as an object storage is definitely not the same though.

+1

+1

Any timeline for this feature?

As a temporary workaround I added an empty .s3keep file to the empty directories and it works for me. This is a hack I usually use to trick git to not treat empty directories as empty ones :)

Will this also allow to "remove/delete" empty directories on S3 ?

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

_Makes lot of sense during data migrations to s3._

+1

+1 Just got smashed by this... Arg....

+1

+10
It's possible to work around this with dummy files but it would be cleaner if there would be an option to force an empty prefix to synchronize.

+1. Use case: backing up an svn repository.

More generally:
aws s3 sync thing
aws s3 sync thing_copy

I expected thing_copy to match thing exactly.

+1

+1

+1

+1 need to delete empty directories

How's the progress of adding this option --sync-empty-directories?
any feedback from AWS Team?
Thanks.

+1 would be a very useful feature for a very useful tool

+1

+1 (I too wish that this feature was implemented and wish that Github.com had a StackOverflow.com like interface for "voting" on issues/features).

+1

+1

+1

+1

+1

2+ years later and it still hasn't happened.. ? will it ever? =/

+1

+1

+1

+1

+1000

+1

I did some digging on how this could be implemented. All s3 commands eventually end up using TransferManager from the s3transfer library. (referenced here)

To support adding a folder with PutObject we can send an empty string in the Body param. I don't know if this is officially supported though. I implemented this here:
https://github.com/svleeuwen/s3transfer/commit/b7d3745a995a75c5262950bb798c8c57e481c2b3

I'd like some feedback on this from a maintainer before continuing.

+1

my solution was to mount my bucket with s3fs and rsync from the s3 mount to a directory in my home directory.

+1

+1 really need this ...

+1

Open since 2014? Really? :unamused:

+1

+1

+1

+1

+1

+1

+1

+1

+1

@thenetimp This solution is fine for small buckets. We are using a bucket with more than 15TB. S3FS gets horrible slow with bigger buckets.

+1

Good Morning!

We're closing this issue here on GitHub, as part of our migration to UserVoice for feature requests involving the AWS CLI.

This will let us get the most important features to you, by making it easier to search for and show support for the features you care the most about, without diluting the conversation with bug reports.

As a quick UserVoice primer (if not already familiar): after an idea is posted, people can vote on the ideas, and the product team will be responding directly to the most popular suggestions.

We’ve imported existing feature requests from GitHub - Search for this issue there!

And don't worry, this issue will still exist on GitHub for posterity's sake. As it’s a text-only import of the original post into UserVoice, we’ll still be keeping in mind the comments and discussion that already exist here on the GitHub issue.

GitHub will remain the channel for reporting bugs.

Once again, this issue can now be found by searching for the title on: https://aws.uservoice.com/forums/598381-aws-command-line-interface

-The AWS SDKs & Tools Team

This entry can specifically be found on UserVoice at : https://aws.uservoice.com/forums/598381-aws-command-line-interface/suggestions/33168436-aws-s3-sync-does-not-synchronize-s3-folder-structu

great job Andre, close an issue and give us a link that isn't related to the issue. Of all the useless posts

The generic boilerplate is disappointing. I think the line between feature request and a bug report can be pretty blurry. To save people some searching the UserVoice post for this feature request is available at https://aws.uservoice.com/forums/598381-aws-command-line-interface/suggestions/33168436-aws-s3-sync-does-not-synchronize-s3-folder-structu

Based on community feedback, we have decided to return feature requests to GitHub issues.

+1

+1

+1

+1

+1

+1

+1. Would be a nice feature to add.

+1

+1

Same issue
awscli==1.16.74

+1

-1

The aws s3 sync command is already recursive, so there is no need for a recursive option, In addition the sync command only copies things that don't already exist on the destination. If you point to a folder it will recursively sync everything inside that doesn't already exist on your target destination. This is different then the aws s3 cp command. The cp command copies whatever you tell it to, regardless of it it already exists on the target. The cp/ mv/ rb command takes a --recursive option for recursively copying/ moving/ deleting folders/files. Thanks

@3ggaurav the issue is originally from 2014 when I recall sync had a --recursive option.

Additionally if you're going to quote a stack overflow answer verbatim, it's generally good practice to reference/give credit to it.

The stack overflow answer is here.

Still no progress on this ?

+1

Was this page helpful?
0 / 5 - 0 ratings