Zenodo: Fail to upload larger file via API to sandbox

Created on 26 Sep 2016 · 6Comments · Source: zenodo/zenodo

Hello there,

when I try to upload a test file > 100 MB via python requests and Zenodo API (sandbox) using the following code

data = {'filename': 'test.zip'}
files = {'file': open(filename, 'rb')}
r = requests.post("https://sandbox.zenodo.org/api/deposit/depositions/%s/files?access_token=TOKEN" % deposition_id, data=data, files=files)

it returns http error code 413 ("413 Request Entity Too Large" which btw is not documented in zenodo API Documentation).

The same code tested with files < 100 MB works and returns 201.

Have I reached the file size limit? (if so, it would be good to add this to the documentation). Or maybe this is due to the "requests" package - do I have to multipart put the data?

Enhancement

Source

ghost

Most helpful comment

Just in case anyone comes across this, here's some Python to do the file upload part using the new API. This is equivalent to the CURL call outlined above to upload a single file. I've tested it out with 160MB file that failed with the documented API. This also uses requests (that the documented API uses) and simply makes a new submission and uploads a file. Note that the URL used is the sandbox one.

import requests

r = requests.post('https://sandbox.zenodo.org/api/deposit/depositions',
                        params={'access_token': ACCESS_TOKEN}, json={},
                        headers={"Content-Type": "application/json"})

print r.status_code

bucket_url = r.json()['links']['bucket']

filename='bigfile.txt'
r = requests.put('%s/%s' % (bucket_url,filename),
                data=open(filename, 'rb'),
                headers={"Accept":"application/json",
                "Authorization":"Bearer %s" % ACCESS_TOKEN,
                "Content-Type":"application/octet-stream"})

print r.status_code

jakelever on 24 Aug 2017

👍6

All 6 comments

You will have to use our new file upload API (which we haven't published yet) to get files bigger than 100MB uploaded. This is because the current API uses application/mulitpart-formdata to upload the file which is not very efficient. In the new API you stream the binary content of the file in a PUT request which is much faster and doesn't require any encoding/decoding in either ends. I'll send you an example a bit later today.

lnielsen on 27 Sep 2016

Thank you for looking into this! I'm excited about the example of the new API.
Maybe you want to consider the python package "requests-toolbelt" that would also allow for streaming multipart form-data objects. But I guess any solution without encoding requirements is appreciated.

ghost on 30 Sep 2016

Apologies for the long delay in replying to this one:

1) Find your bucket URL:

$ curl -H "Accept: application/json" -H "Authorization: Bearer <access token>" "https://www.zenodo.org/api/deposit/depositions/<deposit id>"
{
  "links": {
    "bucket": "https://www.zenodo.org/api/files/<bucket id>",
    ...
  },
...

2) Upload a file into the bucket

$ curl -X PUT -H "Accept: application/json" -H "Content-Type: application/octet-stream" -H "Authorization: Bearer <access_token>" -d @<path to local file> https://www.zenodo.org/api/files/<bucket id>/<filename>

Note the bucket is versioned, so in order to completely remove a file again you must use the version link. Find it from listing the bucket:

$ curl -H "Accept: application/json" -H "Authorization: Bearer <access token>" "https://www.zenodo.org/api/files/<bucket id>”
{
  "contents": [
    {
      "links": {
        "version": "https://zenodo.org/api/files/<bucket id>/<filename>?versionId=<versionId>", 
        ...
      }, 
      "key": "<filename>", 
      ...
    }, 
   ...

Deleting the file:

$ curl -X DELETE -H "Accept: application/json" -H "Authorization: Bearer <access_token>" https://www.zenodo.org/api/files/<bucket id>/<filename>?versionId=<versionId>”

lnielsen on 23 Nov 2016

👍2

import requests

r = requests.post('https://sandbox.zenodo.org/api/deposit/depositions',
                        params={'access_token': ACCESS_TOKEN}, json={},
                        headers={"Content-Type": "application/json"})

print r.status_code

bucket_url = r.json()['links']['bucket']

filename='bigfile.txt'
r = requests.put('%s/%s' % (bucket_url,filename),
                data=open(filename, 'rb'),
                headers={"Accept":"application/json",
                "Authorization":"Bearer %s" % ACCESS_TOKEN,
                "Content-Type":"application/octet-stream"})

print r.status_code

jakelever on 24 Aug 2017

👍6

@lnielsen @jakelever
Do you think it is possible to stream a chunked file into the bucket using requests.put?
Context:
I have a set of large files that I want to zipstream without creating a zipfile in memory or on disk beforehand. I would like to pass a generator object to the octet-stream of the request method.

ghost on 29 Sep 2017

Stumbled across this while trying to upload files using the API. For me the streaming API failed as only about half of a 986 Kb file was uploaded, resulting in a corrupt PDF. Based on http://killtheradio.net/tricks-hacks/curl-cli-not-sending-full-file-data-when-using-data-binary/ I then substituted -d @<path to file> with -T <path to file> in the curl command line and the entire file uploaded.