Registry: Create DiSSCo Network Entity

Created on 18 Jan 2021  ·  17Comments  ·  Source: gbif/registry

DiSSCo would like a network entity containing the datasets originating from the relevant institutions.
Wouter A has prepared a spreadsheet with the GBIF keys.

  • Wouter asks to view it in UAT beforehand. We should create it, but since UAT is not sized sufficiently to crawl all data, I am not sure of the benefits. Creating a repeatable SQL script to use on UAT and prod seems sensible
  • I propose that we also add the ROR and GRID IDs as additional identifiers to the relevant entries in a separate SQL script

Most helpful comment

Since GrSciColl institutions and GBIF organisations are completely separate at the moment, as far as I know, you would ideally do it in both.

All 17 comments

How would you add the ROR and GRID IDs, as "tags" or as DwC field (institutionID), what about the (often different) institution name in the EML profile and what would be the process regarding registering as part of the network and registering these IDs, for new datasets added by DiSSCo partners or new partners becoming a GBIF dataprovider?

How would you add the ROR and GRID IDs, as "tags" or as DwC field (institutionID)

Tags would be an option, but I'd propose to just an identifier to the entities where is makes sense. We support multiple identifiers on all instances in the registry. This has no affect on occurrence records, but simply allows to find the organisation in the registry using the ID.

what about the (often different) institution name in the EML profile

It wouldn't be affected in GBIF. All it is doing is saying "this entry in the registry is also known as a different ID" and won't change the name that the organisation was registered in GBIF as. The name can be changed at any time though if desirable.

and what would be the process regarding registering as part of the network and registering these IDs, for new datasets added by DiSSCo partners or new partners becoming a GBIF dataprovider?

Registering datasets and institutions in GBIF will work as they always have done. Authorization to curate the membership for the network entries (i.e. adding or removing GBIF datasets to the DiSSCo entry) can be given to one or more accounts as desirable. In time we'll probably want to automate membership somehow.

Concerning the Network:

For testing, I created a network in UAT: https://registry.gbif-uat.org/network/9400230d-de38-4e0e-b44d-fcdb661f0519
I wrote a script using the API for that so it can be reproduced in prod.

The constituents of the network are all the datasets that are published by the GBIF organisations listed in the spreadsheet that are DiSSCo members (disscoMember == "y").
NB: In UAT, this includes all kinds of test datasets (but not all the datasets available in prod).

Not meaning to hijack this thread, but doesn't it make more sense to link ROR and GRID ids to GRSciColl institutions rather than to GBIF organizations?

Not meaning to hijack this thread, but doesn't it make more sense to link ROR and GRID ids to GRSciColl institutions rather than to GBIF organizations?

Thanks @rukayaj . Yes, both make sense though, as GRSciColl will only ever contain a subset of the publishing organisations in GBIF

Since GrSciColl institutions and GBIF organisations are completely separate at the moment, as far as I know, you would ideally do it in both.

Ok, I had forgotten that GRSciColl was for institutes with physical collections... So I think that you're saying some research institutions do not fit into GRSciColl (as they do not hold physical collections), but these institutions would have ROR and GRID ids? That makes sense then, and in that case I think it'd be better to just have GRIDs/RORs in one place.

@wouteraddink They're kind of being linked in the portal UI with the fuzzy matching e.g. https://www.gbif.org/occurrence/2579432371?

GRID and ROR discussion related to this other issue: https://github.com/gbif/registry/issues/274

I´d love to see ROR/GRID/ISNI used per occurrence record with dwc:institutionID (to override institution IDs in the EML -- because could apparently be distinct even within the same DarwinCore-Archive).

(the occurrence record is about the occurrence; while the GRSciColl record is about the institution -- the institutionID property on the occurrence record would link/bridge the two)

I think in principle you could use a ROR/GRID/ISNI in dwc:institutionID without problems but it is against current recommendation in the DwC documentation. I think as a community we need to change this recommendation.

Thanks Marie, I see the network now in UAT, however, it would be nice to have it filtered by default for specimen datasets only. Also, https://www.gbif-uat.org/network/9400230d-de38-4e0e-b44d-fcdb661f0519 is still empty?

Also, https://www.gbif-uat.org/network/9400230d-de38-4e0e-b44d-fcdb661f0519 is still empty?

All datasets need to be reprocessed to pick up the networkKey in the index

+ the summary page has to be edited in another system (we can do that in production).

Should I include the datasets that have some preserved specimens or only preserved specimens?

I would include also datasets that have some preserved specimens. Not sure how that would influence counts on the overview page, are these record or dataset based?

The metrics are generated based on the records of the datasets belonging to the network. This means that if I tag a dataset containing observations, these observations will be included in the metrics.

@wouteraddink at GBIF Norway we have now moved all the university museum GBIF data publishers (not eligible for ROR and Grid) to the university level (with ROR and Grid) and merged (moved respective datasets) with the eventual GBIF data publishers that have been created for university departments for biology and geology.

We aim to follow the principle that Norwegian GBIF data publishers should be entities that qualify/are eligible for a ROR and Grid ID. (And briefly started to suggest for data publishers eligible but not yet with a ROR to register for this ID).

I have updated your "CETAF+DiSSCo institutions" spreadsheet using "comments" (where row 121-122 would be merged).

Thanks for the info @dagendresen. I have been talking with both GRID and ROR, GRID is tighening their policies no longer allowing separated identifiers for institutions embodied in universities. ROR is still 1:1 synchonised with GRID but that may change later this year and they will likely have a more relaxed policy, also a ROR WG is working on an extension for departments, but that is in early stages of development and it is not decided yet whether these will be minted through ROR directly or through wikidata or github. For DiSSCo we can now work with ROR as it has now a fully implemented metadata schema including parent organisation relations and if institutions cannot get a ROR we can use cetaf passport identifiers and link them to their University ROR if needed. Orcid has not yet implemented ROR but is planning this.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

MortenHofft picture MortenHofft  ·  24Comments

MortenHofft picture MortenHofft  ·  5Comments

rukayaj picture rukayaj  ·  9Comments

timrobertson100 picture timrobertson100  ·  9Comments

ManonGros picture ManonGros  ·  12Comments