Zenodo: [Feature request] "Open in Binder" option for appropriate GitHub repos

Created on 6 Feb 2018  ·  21Comments  ·  Source: zenodo/zenodo

Background info


Request: Add a button in relevant Zenodo records for opening the GitHub repo in Binder for interactive notebooks etc., to encourage reproducibility in research, using GitHub ↔ Zenodo link.

  1. If source GitHub repo's README has Launch Binder badge, offer similar badge on Zenodo below the "Available in GitHub" badge.
  2. Work with Binder team so that contents of zipped repo can be launched in Binder directly from the Zenodo archive, if original GitHub repo disappears (preservation).

cc: @betatim from Binder

Feature request Needs investigation Accepted GitHub

Most helpful comment

BinderHub and repo2docker now support launching from Zenodo DOIs: https://twitter.com/mybinderteam/status/1139136841792315392

All 21 comments

This would be very cool in general.

I'd be even more interested in (2) so that binder can resolve Zenodo DOIs and launch directly from those instead of git repositories. Most of the work (on the binder side) would probably be in repo2docker which is the tool we use to actually build the containers. Right now it uses git to fetch contents or uses a local directory.

In https://github.com/jupyterhub/binderhub/issues/216 we discussed this idea a bit for launching from OSF.io

Seconding Tim's comment - let the Binder team know if there's anything we can do to help out!

I think this is related to the WIP feature of previewing .tar.gz and other compressed formats: https://github.com/zenodo/zenodo/issues/557.

Agreed, it would be a very cool feature. Technically is looks like Binder uses Repo2Docker, which as far as I can tell needs a git repository in order to work. This I think it the main obstacle as Zenodo only archives a Zip-ball of the specific release. The work-around would be to simply point to the GitHub repository (because we do have the SHA of the release we archived), but then we essentially just bypass Zenodo, and there's no real added value over just having the badge on the GitHub repo.

Thanks for getting back to us on this issue, @lnielsen! Some thoughts:

The work-around would be to simply point to the GitHub repository (because we do have the SHA of the release we archived) […]

Rather than point to the GitHub repository directly, it would make sense to have a "Binder" badge on Zenodo pointing to the specific commit/tag that was archived on Zenodo (since Binder can handle branches, tags or commits). This means that you're able to directly jump to the same version of the code/repo that is linked from the DOI.

[…] then we essentially just bypass Zenodo, and there's no real added value over just having the badge on the GitHub repo.

Well, if you point to the specific commit/tag, there is still value, since the badges in GitHub typically point to the latest commit in master. However, from the point of "preservation" and "persistence" that a DOI is supposed to provide, it would make sense if we could indeed bypass GitHub and render the repo directly from Zenodo, so that the content is still "interactive" even if the original GitHub repo gets taken down.


@choldgraf, @betatim: Is there a way to "fake" a Git repo from the Zip-ball? By adding an essentially purposeless* git init of some kind in the repo2docker workflow? So:

  • repo2docker unpacks Zip-ball → repo2docker runs git init → Binder points to contents/notebook(s).

  • edit

@choldgraf, @betatim: Is there a way to "fake" a Git repo from the Zip-ball? By adding an essentially git init of some kind in the repo2docker workflow?

that's a great question - I think this would be feasible, probably as a buildpack for repo2docker (that could either be done within r2d, or as an "extension" that lives in a separate repository). That buildpack would insert the lines into a dockerfile that does the unzipping etc.

I just opened this issue to discuss within r2d: https://github.com/jupyter/repo2docker/issues/234

This would be awesome!

I think there are two parts for this:

  1. Adding ability to read from a ZIP file at a given URL to repo2docker
  2. Adding ability to read a versioned zonodo identifier to the appropriate zip file + caching semantics to binderhub.

In the meantime, I think adding a link to the tagged version on github is the simplest thing to do!

Hey, @yuvipanda. At the moment, yeah, looking for a Binder badge and then linking to the appropriate version on GitHub is an interim solution -- depending on how @lnielsen and co. prioritise this, of course! :)

Concerning:

  1. Adding ability to read a versioned zonodo [sic] identifier to the appropriate zip file + caching semantics to binderhub.

Zenodo grabs repos only when a new release is issued and I think the GitHub badge on the Zenodo entry itself points to the appropriate tree on GitHub. Does this help at all?

The badge would be pretty easy to add, if we already know that the github repo supports binder, but it's not easy for us to detect if binder is supported. What we could do is allow adding links in "releated identifiers" field, that would then render a logo like github that allows you to launch it in binder.

@lnielsen a few thoughts that come to mind:

  1. Check if a repo has a binder badge in their README
  2. Check if a repo has a tag of some kind (e.g. "binder-ready", "binder")
  3. Check if a repo has one or more of the config files and, if so, try and build it via the Binder build API...if it returns as successful, then proceed.

Just spitballing here :-)

I think knowing if a repo will do something useful if you launch it on a BinderHub is very hard for a computer. many repositories will build and launch but most of those don't work :-( So I would look for the binder badge in the README, but that is also a crude heuristic (how would you find (at scale) repositories that have a binder badge that points to a different instance than mybinder.org?) -> Making the 'binder-ready' status human opt-in is probably the best and then it can be machine-readable as well.

Is there a format/file that zenodo looks at to extract extra information for a repository? Similar to a .travis.yml or some such?

I was trying to avoid having to the parse files in the repository :-)

I would say the best way would be via CodeMeta somehow - https://codemeta.github.io since we're planning to enable reading metadata from the codemeta file.

BinderHub and repo2docker now support launching from Zenodo DOIs: https://twitter.com/mybinderteam/status/1139136841792315392

As mentioned in https://github.com/zenodo/zenodo/issues/1416#issuecomment-398732740, I think a sensible solution would be to display a Binder logo with the proper mybinder link (similar to the GitHub one), when there is a link to https://mybinder.org in the "related identifiers" (example record: https://zenodo.org/record/3402938)

My only concern, and probably Binder team's (cc @betatim, @yuvipanda, @choldgraf), is creating a much bigger exposure of the MyBinder service, and DoS-ing it, which would end up making users follow a link to a "broken" page. Imagine that users that end up on a Zenodo software record which has a Binder logo, might just click it out of curiosity.

I've read the Reliability docs, and the rate-limiting mechanisms that are in-place look good, so I guess it's just a question if the MyBinder service maintainers are ok with that :)

As a general rule, spikes in traffic shouldn't be too big of a deal so long as they aren't gigantic spikes. What kind of traffic do you all imagine sending? :-)

As a reference, you can get an idea for the load and "spikiness" of repositories for the public binderhub deployment (the one at mybinder.org) here:

https://grafana.mybinder.org/d/fZWsQmnmz/pod-activity?refresh=1m&orgId=1&var-cluster=default
We've had folks launch ~100-200 binder at once when they were using it to teach courses and such, sometimes the launch takes longer if we need to scale up to a new node, but in general it should be OK.

The hard limit is 100 simultaneous sessions for a single repository.

... when there is a link to https://mybinder.org in the "related identifiers" (example record: https://zenodo.org/record/3402938)

As a related issue, would it be possible to have more specific metadata in the "related identifiers" for this use case? The metadata values associated with that URL in the "related/alternate identifiers" section are pretty uninformative ("Supplementary material" & "Other"). Would it be possible to add new metadata values like "Executes this upload" and "Live computing environment" to make it clear that the link allows the reader to execute the software? I think this will become a relatively common use case. Thanks

:+1: for the relation type. My suggestion would be "Interactive (computing) environment", as Binders are for humans to use, and not a one time execution (which "Executes this upload" could mean).

The available relation type vocabulary for the related identifiers is based on the DataCite v4.1 schema, so I would avoid adding a new "custom" relation type.

IMHO, the most fitting relation type would be isSourceOf (i.e. "has this upload as its source" in the upload form), in the sense that the Zenodo record is the source that Binder uses to execute it:

image

If we have a general consensus on that, I believe we can ship this in the next release :)

PS (@choldgraf): Today's silly question: copyright wise, is it ok for us to use the Binder logo from your repo?

@slint yes you can use the logo. Without us doing anything extra it is licensed like this. Which is probably not ideal for artwork.

If you are going to make a "button" for people to press there is also https://static.mybinder.org/badge_logo.svg which we recommend as the "button to launch a binder"

@slint I hadn't realised the relation type for related/alternate identifiers was taken from the DataCite v4.1 schema. Perhaps that could be stated in the head text of the esection, after the text stating the range of identifiers that are accepted.

I agree that of the available relation types, isSourceOf is the most appropriate and I have updated my Zenodo record that is being used as an example.

Is the resource type field based on resourceTypeGeneral in DataCite 4.1 (Table 7)? If so, the interactiveResource ("A resource requiring interaction from the user to be understood, executed, or experienced") seems to me to be the most appropriate value. Unfortunately, this isn't available in the drop down list, so I opted for "Other".

Was this page helpful?
0 / 5 - 0 ratings