Gsutil: CommandException: No URLs matched when passing URLs to rm from stdin

Created on 21 Dec 2017  ·  11Comments  ·  Source: GoogleCloudPlatform/gsutil

I'm trying to use gsutil rm -I and pass a list of URLs to delete through stdin.

For an existing directory in, say, gs://test-bucket/test-dir, these are some commands I've tried:

# verify directory exists
$ gsutil ls -d gs://test-bucket/test-dir
gs://test-bucket/test-dir/

$ echo "gs://test-bucket/test-dir" | gsutil -m rm -r -I
CommandException: No URLs matched

$ echo gs://test-bucket/test-dir | gsutil -m rm -r -I
CommandException: No URLs matched

$ gsutil -m rm -r -I <<< "gs://test-bucket/test-dir"
CommandException: No URLs matched

$ gsutil ls -d gs://test-bucket/test-dir | gsutil -m rm -r -I
CommandException: No URLs matched

Am I missing something here?

Most helpful comment

Seeing the same problem. any suggested solutions?

All 11 comments

Nope, I don't think you're missing anything -- I can reproduce this as well. Running the gsutil rm command above with the -DD flag shows that gsutil isn't even making an API call to check for the object in question.

Looking in name_expansion.py, we're creating a PluralityCheckableIterator, which wraps a NameExpansionIterator, which wraps another PluralityCheckableIterator object that wraps the generator that's supposed to read lines from stdin (_phew_). Anyway, I threw a few debugging print statements into the _PopulateHead() method in plurality_checkable_iterator.py, and found that the underlying generator is throwing a StopIteration exception. Not quite sure why yet -- I'll continue to investigate soon.

Notes to self:

  • Go back and do a binary search through recent commits and see at what point this started happening.
  • Why didn't tests catch this?
  • This same approach works for the cp command, but doesn't work for rm.

I ran into this issue too today (gsutil version 4.28). Are there any known workarounds?

Off the top of my head, I can only think of one: You could write a thin wrapper script to pass the arguments to gsutil yourself (using something like xargs), making sure to run gsutil invocations for batches of _$ARG_MAX value>_ objects.

I'm seeing the same thing too...
CommandException: No URLs matched

I'm seeing the same exception while copying a tar file to my bucket. CommandException: No URLs matched

Seeing the same problem. any suggested solutions?

not sure if there is a resolution:
I am running --
gsutil version: 4.34

and get this

CommandException: Destination URL must name a directory, bucket, or bucket
subdirectory for the multiple source form of the cp command.

when trying to execute:

gsutil cp -n -R gs://hail-common/vep/vep/GRCh37/loftee_data /vep/loftee_data_grch37

Please let me know if I should post this elsewhere.
thanks

I had the same problem, I have changed the name of the file and it worked :).

Any news on this one?

Same issue, on gsutil v4.59. Trying to remove the bucket and getting the same error even though the bucket clearly exists when looking at it on the console.

Sorry for the delay in response. Our team is currently occupied with other priorities and does not have the bandwidth to address this issue at the moment. However, I did some investigation for future reference.

This seems to be happening because the url_strs gets iterated twice, once here https://github.com/GoogleCloudPlatform/gsutil/blob/d8626ae0ec4b4dc9fd729f115cdeefced4680cb5/gslib/commands/rm.py#L269 if recursion is requested, and next it gets passed to the NameExpansionIterator https://github.com/GoogleCloudPlatform/gsutil/blob/d8626ae0ec4b4dc9fd729f115cdeefced4680cb5/gslib/commands/rm.py#L288

So essentially, we are trying to iterate over the iterator twice and hence on the second instance, we get an empty iterator.

The easy fix would be to convert the iterator to a list, i.e changing https://github.com/GoogleCloudPlatform/gsutil/blob/d8626ae0ec4b4dc9fd729f115cdeefced4680cb5/gslib/commands/rm.py#L252 to

url_strs = [url for url in StdinIterator()]

But this can affect users who have really long list coming from stdin or users who are already using this feature in a pipeline and not really using the -r with -I. Note that this will only affect your if you are using -r and -I together.

The ideal fix would be to remove the recursion special case and instead handle the bucket deletion based on the NameExpansionIterator result itself.

A workaround would be something that is suggested here https://github.com/GoogleCloudPlatform/gsutil/issues/490#issuecomment-364611242

Alternatively, you can avoid using recursion (-r option) and pass in the list

gsutil ls gs://my_bucket/** | gsutil -m rm -I

Note that the above command will empty the bucket, but will not remove the bucket and you will have to run a separate command to remove it.

Was this page helpful?
0 / 5 - 0 ratings