How to cite data

As an approach towards developing best practice for data citation, I recently wrote a Data Citation Best Practice Discussion Document that is available on Google Docs, and that I have now slightly revised to Version 2 [1].

In that document, I first compared what is recommended by DataCite [2] and by Altman and King [3] with what currently practised by the Dryad Data Repository and what presently occurs ‘in the wild’ in a handful of journal articles that reference Dryad datasets.  I then proposed some ‘internal’ recommendations for Dryad to adopt, and concluded with draft Data Citation Best Practice Recommendations.  As I say in the preface to the document:

“Since Dryad is pioneering data management in terms of data resources that are linked to journal articles, it is to be hoped that by first developing citation best practice in the Dryad context we can thereby catalyse its wider spread.  If we can thus agree what such best practice should be among the Dryad community and implement such best practice proposals, we can then promote such practices within the wider scholarly community.”

I realized that much of the confusion and disagreement concerning the best method of citing data resources within earlier e-mail threads resulted from a conflation of ideas about two entities which in the conventional citation of journal articles are quite distinct:

  • the in-text citation containing an in-text reference pointer, e.g. “this paper builds upon the work of Jones et al. [15].”     and
  • the actual reference to Jones et al. within the article’s reference list, e.g. “[15] Jones A, Bloggs B and Smith C (2008). Title. JournalName
    14:132-134. doi:*****.”

Thus, in an e-mail I wrote on 27 April, where I said

“Excellent, but what we really want is for the data citations to be included in the reference list along with the bibliographic citations, following the DataCite model: Creator (PublicationYear): Title. Version. Publisher. ResourceType. Identifier “

. . . I should also have stressed the need for explicit in-text citations that denote such references.

All that is explained within the Google Docs paper.  In that paper I also proposed having a separate Data Resources section within the body text of a journal article, in which data resource citations can be gathered.  That does not preclude these resources also being cited, where appropriate, within the Methods and Materials or Results sections of the paper, but is designed to put data resource citations “on the map”, so to speak, as important new publication performative acts.

It is not appropriate, in my mind, for data citations to be included in the Acknowledgements section of a paper, which is designed for acknowledging contributions to the work from people and funding agencies, even if Thomson Reuters has developed methods to parse such entries, since they also have well-established mechanisms for harvesting proper (data) references from the reference list.

All the ontological terms required to mark up in-text reference pointers and their textual contexts, references, reference lists, etc., to permit automated detection and harvesting of data citations and references, are available as RDF within the SPAR (Semantic Publishing and Referencing) Ontologies (http://purl.org/spar/), which were designed precisely to facilitate such work.

Since writing my Data Citation Best Practice Discussion Document, I was invited (on a purely voluntary non-commercial basis, I should add!) to work with Pensoft Journals, a publisher that specialises in publishing biodiversity and biological systematics papers, and that has taken the lead in promoting the publication of datasets with DOIs, to contribute to and help revise their now-published Data Publishing Policies and Guidelines for Biodiversity Data [4].  This 34-page paper has a three-page section on how to cite data in Pensoft Journals, which I discuss in the next blog post, and which I am pleased to say includes all the recommendations discussed above.

[1]     David Shotton (2011) Data Citation Best Practice Discussion Document. Google Docs. https://docs.google.com/document/d/1kF8-faB72l4dKTLEyx6Z5cIabk68GrJ9GraCtWnK0qQ/edit?hl=en_GB&authkey=CPPW46wL#.

[2]    The DataCite Metadata Kernel version 2.0 (2011). http://datacite.org/schema/DataCite-MetadataKernel_v2.0.pdf.

[3]    Micah Altman and Gary King (2007). A proposed standard for the scholarly citation of quantitative data. D-Lib Magazine. 13. http://www.dlib.org/dlib/march07/altman/03altman.html.

[4]     Penev L, Mietchen D, Chavan V, Hagedorn G, Remsen D, Smith V, Shotton D (2011). Pensoft Data Publishing Policies and Guidelines for Biodiversity Data. Pensoft Publishers, http://www.pensoft.net/J_FILES/Pensoft_Data_Publishing_Policies_and_Guidelines.pdf.

This entry was posted in JISC, Ontologies, Open Citations, Semantic Publishing and tagged , , , , , , , , . Bookmark the permalink.

5 Responses to How to cite data

  1. Pingback: Pensoft Journals policy and author guidelines on data publication and citation | JISC Open Citations

  2. Pingback: JISC Open Citations Project – Final Project Blog Post | JISC Open Citations

  3. Pingback: Pensoft Journals policy and author guidelines on data publication and citation | Semantic Publishing

  4. Pingback: Ten next steps for semantic authors and publishers | Semantic Publishing

  5. Pingback: Ten next steps for semantic authors and publishers | Semantic Publishing

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s