gsutil cp -R dir gs://my-bucket doesn't copy subdirectories

Created on 13 Sep 2017  ·  10Comments  ·  Source: GoogleCloudPlatform/gsutil

I just tried copying a directory containing several files and directories using:

gsutil -m cp -r dir gs://my-bucket

It copied only the top-level files from dir to the bucket.

I'm using:

gsutil version: 4.27
checksum: 522455e2d24593ff3a2d3d237eefde57 (OK)
boto version: 2.47.0
python version: 2.7.6 (default, Oct 26 2016, 20:30:19) [GCC 4.8.4]
OS: Linux 4.4.0-83-generic
multiprocessing available: True
using cloud sdk: False
pass cloud sdk credentials to gsutil: False
config path(s): /usr/local/google/home/mfschwartz/.boto_prod_oauth
gsutil path: /usr/local/google/home/mfschwartz/gsutil/gsutil
compiled crcmod: True
installed via package manager: False
editable install: False

Most helpful comment

Had the same issue with

gsutil cp -r gs://some loc_dir

after updated all gsutils to latest version.
Confirming, that prefix

gsutil cp -r -U -e -c

solved the issue

All 10 comments

If all of those subdirectories satisfied either of these two conditions:

  • contained no files (aside from more nested directories or symlinks)
  • were symlinks, or somehow otherwise not seen as a regular file

then that's working as intended. But if the subdirectories were non-symlink directories with regular files in them, that seems like a bug (although I'm not sure why this would happen).

Could you provide a file tree for which this is reproducible?

After digging I found out that this happened because the top-level directory I tried to copy had an invalid symlink, and the core problem is that gsutil gives up when it encounters this condition (so actually the problem is unrelated to subdirectories; it just happened in the case I originally reported that the symlink was encountered before the first subdirectory).

I'd point out that if you create a directory on Unix containing several files with an invalid symlink lexicographically earlier than some of the files and use the Unix cp command to try to copy them all, it will complain about the invalid symlink but finish copying the other files:

% mkdir repro
% touch repro/{1,3,4}
% ln -s /broken repro/2
% mkdir new
% cp repro/* new
cp: cannot stat ‘repro/2’: No such file or directory
% ls new
1 3 4

I think gsutil should similarly keep going after it encounters a broken symlink, given our guiding principle of making gsutil behave as similarly as possible to it Unix command ancestors.

I'm having a similar issue. gsutil is not copying some of my subdirectories. Specifying -c doesn't appear to help (I'm using -m anyway, so I'm not even sure specifying -c is necessary). The subdirectories are not symlinks. cp just seems to stop when it reaches a bad file. Any workaround?

I should also mention if I point directly to one of the subdirectories to have it upload, it works. So I'm at a loss for what's going on.

Trying gsutil rsync -D reveals this stack while syncing:

DEBUG: Exception stack trace:
    Traceback (most recent call last):
      File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 590, in _RunNamedCommandAndHandleExceptions
        user_project=user_project)
      File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 372, in RunNamedCommand
        return_code = command_inst.RunCommand()
      File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/rsync.py", line 1536, in RunCommand
        diff_iterator = _DiffIterator(self, src_url, dst_url)
      File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/rsync.py", line 939, in __init__
        raise CommandException('Caught non-retryable exception - aborting rsync')
    CommandException: CommandException: Caught non-retryable exception - aborting rsync

CommandException: Caught non-retryable exception - aborting rsync

By specifying -U and -e I was able to work around this issue. I still do not know which file caused cp and rsync to error out but it seems like -c should cause cp to continue despite the error. Some more useful feedback when this issue occurs would also help to identify the file causing the problem and help a user work around it.

Had the same issue with

gsutil cp -r gs://some loc_dir

after updated all gsutils to latest version.
Confirming, that prefix

gsutil cp -r -U -e -c

solved the issue

Me too - gsutil skips over files in directories under /src
gsutil -m cp -U -e -r /src gs://bucket/prefix/

gsutil version: 4.46
EDIT: gsutil cp / rsync will ignore any directories it does not have permission to enter without a warning or error message.

How to copy empty subdirectories.. gsutil is only copying subdirectories which have files and empty ones are ignore and not created in target bucket. Please help

@prabhat-diwaker gsutil skips empty directories, it is the intended behavior.

@prabhat-diwaker gsutil skips empty directories, it is the intended behavior.

Does gsutil have an option which would allow you to upload empty folders or, at least, folders which contain another folder but don't contain files?

Was this page helpful?
0 / 5 - 0 ratings