Registry: GBIF citation string/object on all datasets

Created on 11 Jan 2017  ·  24Comments  ·  Source: gbif/registry

We have decided to use the GBIF generated citation on all datasets. Since that is how a dataset should be cited, I believe that information should be in the API response. I know @kbraak and @ahahn-gbif have talked about this citation format extensively - so i include you here.

All 24 comments

We discussed this a bit with Andrea this morning, I am all in favor of single recommended citation, i.e. default citation, that would be assembled from the metadata, respecting the roles, orders of names etc. We need to make sure that GBIF.org, maybe publication date of the latest version, and the DOI are the parts of citation which cannot be modified, unless republished. Andrea mention a possible issue with the non-IPT publishers, such as ABCD and BioCASE publishers.

I continue to recommend that we reuse DataCite’s preferred citation format, which satisfies the Joint Declaration of Data Citation Principles. This is the format the IPT uses to auto-generate citations. More information about this format can be found here.

I believe that as long as we clearly communicate how we derive our citation from EML, ABCD, etc then publishers will have the chance to adapt their metadata accordingly, should it matter to them.

@fmendezh In case it helps, this is the method used in the IPT to auto-generate the resource's citation. Of course it uses IPT-specific model objects, but fortunately they are quite similar to our Registry model objects.

@dschigel you have previously talked about downloadable formats for citation management software. If this is something we really do want to do in the future, this is your chance to describe the required atomized components that the API needs to expose for that to happen.

Yes, here is what I wrote on 23 Sep, still valid:

Hi Morten,

You asked about the bibliographic formats we could be exporting the GBIF recommended citation to (sorry that the citation itself did not get final shape yesterday).

Here is my suggestion:

Use three formats Mendeley is exporting to, see attached image
Nature P group is using only .ris.
Plus we will need the free text (but which system – to check with comms after we decide what goes into the GBIF citation – sorry for the delay on our end, as you probably know from discussion with Andrea, this is not a trivial decision).

This three export formats should be enough to get some key citation export functionality to the planned HowToCite box / popup etc.
The citation itself will get shape as we progress with KC, KB and AH.
If you want to have more, another image has the export formats Science is using.

Check also CrossMark and Share buttons in PLOS One on the right side of any paper:
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0159184
I hope Siro could comment this, too.

Dmitry

It might be a nice option to provide multiple formats on the website, but for many purposes we need obe standard only. eg citationd inside downloads or in the api. we could consider dealing with a complex reference object in the api instead of a string to allow various formatting, but atomizing references generically well is a challenge.

Timely feedback from Rod gbif/portal-feedback#33 about adopting CiteProc

There is also a java and JavaScript implementation:
https://michel-kraemer.github.io/citeproc-java/
https://github.com/juris-m/citeproc-js

Based on the example for Dataset from the Joint Declaration of Data Citation Principles (https://www.force11.org/node/4771) and a modified version of the IPT method.

For the dataset 98333cb6-6c15-4add-aa0e-b322bf1500ba we would generate a default citation like:

Smirnov I, Golikov A, Khalikov R (2017): Ophiuroidea collections of the Zoological Institute Russian Academy of Sciences. Zoological Institute, Russian Academy of Sciences, St. Petersburg. Dataset/Occurrence, accessed via GBIF.org on [date]. http://doi.org/10.15468/ej3i4f

from @timhirsch: That approach looks fine. I think the only issue that would cause for [aggregators is that they] would come out as the attributed institution because that is taken from the designated publisher, right? In which case we could/should encourage them to register each of their contributing institutions, for which perhaps we could help with a ‘bulk endorsement’?

I would replace "Dataset/Occurrence," with more human readable "Occurrence dataset", no comma.

Accessed via - is a little redundant. A function of data publishing platform, analogous to a publishing house, is probably as important as access - note that in book references you would just say Taylor & Francis, not "published by". Accessed via will be marked out by most of the copy editors in the journals.

Should the way GBIF.org mentioned be the same for the citations at the dataset pages and on the download screens & how-to-cite?

The problem with excluding 'accessed via' is that you then remove any mention of GBIF in recommended citation!

No, I did not mean to remove GBIF.org - on the contrary. Just words Accessed via are a bit redundant -as the whole point of a citation to indicate the "access via". I meant:

GBIF.org, [date of latest version or of the first publication]. Occurrence datatset: http://doi.org/10.15468/ej3i4f

or similar

After compiling the feedback we have:

Smirnov I, Golikov A, Khalikov R (2017): Ophiuroidea collections of the Zoological Institute Russian Academy of Sciences. Zoological Institute, Russian Academy of Sciences, St. Petersburg. GBIF.org, [2017-01-27]. Occurrence Dataset. http://doi.org/10.15468/ej3i4f

@dschigel the date [2017-01-27] is the access date, is it ok?

For me this seems fine. It will flush out issues where people will feel the publisher is not the appropriate institution to get the main credit in the citation, but that can then act as an incentive for e.g. networks to register their providing institutions separately as data publishers.

Thanks to very useful discussion with @kbraak and @siro1, and in accordance with the DataCite recommendations https://schema.datacite.org/meta/kernel-3/doc/DataCite-MetadataKernel_v3.1.pdf (see section 2.2), I suggest we use the following hybrid that would work for both bibliography and the data worlds. Here is the example that would follow the data citation principles https://www.force11.org/group/joint-declaration-data-citation-principles-final. Accessed via is back because the DOI refers to a dataset, and access date +portal name to web access (that's why it is a hybrid citation). Version number is added to explain the use of the year in the citation. Note lack of brackets and punctuation:

Smirnov I, Golikov A, Khalikov R (2017): Ophiuroidea collections of the Zoological Institute Russian Academy of Sciences. Version 1.25. Zoological Institute, Russian Academy of Sciences, St. Petersburg. Occurrence dataset http://doi.org/10.15468/ej3i4f, accessed via GBIF.org on 2017-01-27.

I hope that the roles and their order in the autogenerated citation and the the dataset page heading are the same: ORIGINATOR, METADATA AUTHOR,

Once we fix the citation for the datasets (last version as published + dataset page) we might need to edit / add section to here https://demo.gbif.org/citation-guidelines @kcopas

Do we need the colon after the brackets? I would exclude and simply have a space.

Smirnov I, Golikov A, Khalikov R (2017) Ophiuroidea collections of the Zoological Institute Russian Academy of Sciences. Version 1.25. Zoological Institute, Russian Academy of Sciences, St. Petersburg. Occurrence dataset http://doi.org/10.15468/ej3i4f, accessed via GBIF.org on 2017-01-27.

Also last part, how about simply a comma after GBIF.org: "GBIF.org, 2017-01-27"

@siro1, I would keep "on". Thia hybrid citation has both publication year of the last version (2017) and the date of access. Replacing "on" with comma assumes knowledge of which date is what, while on makes is a clear statement on access.

Well, I still find "on" superfluous because once you talk about the web and access it follows that the date after that will be the date of access. A minor issue but can save space and characters

I'd suggest to close this discussion here. We are going into really tiny details now. It is still possible that we will need some kind of change later on, but I feel that space is not the biggest saving that can be made at this point.

The API response will use the generated citation to replace the current "citation.text":

"citation": {
  "text": "Creuwels J (2016). Naturalis Biodiversity Center (NL) - Hemiptera. Naturalis Biodiversity Center. Occurrence Dataset http://doi.org/10.5072/eu7dj2 accessed via GBIF.org on 2017-02-17."
}

For the record:
This is now documented on the GBIF.org FAQ.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

MortenHofft picture MortenHofft  ·  5Comments

marcos-lg picture marcos-lg  ·  11Comments

timrobertson100 picture timrobertson100  ·  9Comments

rukayaj picture rukayaj  ·  9Comments

timrobertson100 picture timrobertson100  ·  17Comments