Zenodo: Support of "HTTP/1.1 byte range request" in file retrieval

Created on 9 Sep 2018  ·  10Comments  ·  Source: zenodo/zenodo

I have one feature request on zenodo - can the zenodo server support HTTP/1.1 byte range request https://tools.ietf.org/html/rfc7233 ?

Zenodo platform is already incredible, and your support of the byte range request will increase the value of deposited data further since some applications have relied on byte range request, in particular when dealing large files.

I'd like to add an example on how the byte range request works, to make my point clear. For example, github (raw.githubusercontent.com) support the byte range request as below:

###
### The entire part of the README file is retrieved, and processed locally
###
$ curl  https://raw.githubusercontent.com/zenodo/zenodo/master/README.rst |head -5 | tail -1
    Zenodo is free software; you can redistribute it

###
### Only the specified bytes specified in the file is retrieved, which does not require local processing
###
$ curl -H "range: bytes=72-125"  https://raw.githubusercontent.com/zenodo/zenodo/master/README.rst 
    Zenodo is free software; you can redistribute it

However, the byte range request is ignored in zenodo.org

###
### the entire part of the file is retrieved
###
$ curl   https://zenodo.org/record/1407145/files/DOI_Test.txt
This is a test of the Zenodo DOI functionality for GitLab. 

###
### Only small bytes are requested, but the entire part is retrieved
###
$ curl -H "range: bytes=6-7"  https://zenodo.org/record/1407145/files/DOI_Test.txt
This is a test of the Zenodo DOI functionality for GitLab.
Enhancement Needs investigation Accepted

Most helpful comment

I just wanted to add my :+1: to state that enabling range requests would be very useful for geospatial data formats. Cloud Optimized GeoTIFF in particular would benefit a lot from this. Allowing range requests could really reduce the bandwidth needed from zenodo.

All 10 comments

I'll second this. It would be very useful e.g. for genomics datasets to be accessed directly with tabix. It seems to require a config change in the zenodo web server setting 'max_ranges' to a positive number.

Is there some technical reason not to do that?

Our file storage backend at the moment is not optimized to serve HTTP range requests (meaning that enabling this feature would potentially lead to significant slowdowns for the file upload/download API). Of course, there are people working on making it possible, though we can't give an accurate ETA on it...

I just wanted to add my :+1: to state that enabling range requests would be very useful for geospatial data formats. Cloud Optimized GeoTIFF in particular would benefit a lot from this. Allowing range requests could really reduce the bandwidth needed from zenodo.

Our file storage backend at the moment is not optimized to serve HTTP range requests (meaning that enabling this feature would potentially lead to significant slowdowns for the file upload/download API). Of course, there are people working on making it possible, though we can't give an accurate ETA on it...

Many people cannot download large genetic files (several GB). e.g.,
https://github.com/zenodo/zenodo/issues/460#issuecomment-546623751

Some has to retry many times, and that's actually wasting your bandwidth...

For our project also important that we can use Cloud-Optimized GeoTIFFs (see e.g. https://zenodo.org/record/4483227) directly from Zenodo. Figshare apparently works with COG's, zenodo does not? We wrote a tutorial for users how to get small chunks of data using COG files.

Could you please support this?

We need it to serve large image files (in Zarr format) by chunks, that allows us visualize the files in the browser instantly. It won't be possible to for the browser to download the, e.g.10GB, file and display.

Just noting the value for the Zarr use case. Thanks all for your work on Zenodo!

For Zarr, we could hypothetically get zenodo working today, without any changes. Zenodo does not support directories, but if we could map a regular zarr directory store to some sort of flat hierarchy, via a special character, we could make it work. For example, if the special character is __

.zgroup
foo__.zarray
foo__.zattrs
foo__0.0
foo__0.1

etc.

Could you please raise an issue here ( https://github.com/zarr-developers/zarr-specs/issues )?

@rabernat I afraid that won't scale because Zenodo only allow 100 files at maximum.

Total files size limit per record is 50GB (max 100 files). One-time 100GB quota can be requested and granted on a case-by-case basis.

source: https://www.openaire.eu/technical-requirements

Was this page helpful?
0 / 5 - 0 ratings