DataCite2RDF – Mapping DataCite Metadata Scheme Terms to ontologies

The DataCite Metadata Kernel version 2.0 [1] specifies the minimal metadata, and optional metadata, that should accompany a DataCite DOI for the identification of a published data entity. Within the Metadata Kernel document there is an XML mapping of these metadata terms, using DCMI Metadata Terms, and an example encoded in XML.

Silvio Peroni and I recently published a mapping of the DataCite metadata elements to RDF using ontology terms [2], in order to enable data repositories to publish DataCite metadata in RDF as Open Linked Data, enabling these metadata to be understood programmatically and integrated automatically with similar data from elsewhere.

Our mapping covers all the main terms, and the Relation Type sub-properties that describe the relationship of the related resource to the resource being registered, but does not address DataCite sub-terms, e.g. 2.2.1 nameIdentifierScheme.

Wherever possible, commonly used Dublin Core Elements, DCMI (Dublin Core Metadata Initiative) Metadata Terms, FOAF (Friend of a Friend Vocabulary) and PRISM (Publishing Requirements for Industry Standard Metadata) terms have been used.

These have been supplemented, as appropriate, by terms:

from FRBR (Functional Requirements for Bibliographic Records),

from the following SPAR (Semantic Publishing and Referencing) Ontologies:

                CiTO, Citation Typing Ontology

                FaBiO, FRBR-aligned Bibliographic Ontology, and

CiTO4Data, an extension of CiTO for datasets that provides the properties cito4data:compiles and cito4data:isCompiledBy that the DataCite Metadata Kernel requires;

and from a new DataCite Ontology (http://purl.org/spar/datacite/) that we created to provide the following four object properties lacking in other Ontologies:

                    datacite:hasPrimaryIdentifier

                    datacite:hasAlternateIdentifier

                    datacite:hasRelatedIdentifier

                    datacite:hasPersonalIdentifier

Use of DCMI Metadata Terms in RDF

An object property has a class or a URI as its object, while a data property has a literal (e.g. text, number, date) as its object, and may have a W3C XML Schema Definition Language (XSD) datatype qualifier, e.g. ^^xsd:date. (See http://www.w3.org/TR/xmlschema11-2/).

Many Dublin Core properties are not formally specified to be one or another, leading to potential confusion. In the following mapping, Dublin Core Elements are always used as data properties, while Dublin Core Metadata Initiative Metadata Terms are used either as data properties or as object properties, as helpfully specified by the Max Planck Digital Library in their document entitled How to use DCMI Metadata as linked data.  

In our DataCite2RDF mapping document [2], alternative mappings are given where appropriate, separated by semi-colons.  Both dc: and dcterms: properties are listed.  Preferred terms are shown bold.  RDF statements are given in Turtle notation.

Accompanying this DataCite2RDF mapping document, we published as Google docs both an RDF mapping of the DataCite XML example, and an RDF mapping of the metadata for a Dryad repository holding, showing how DataCite2RDF can be used for real data.

We welcome feedback on these documents: <david.shotton@zoo.ox.ac.uk> and <speroni@cs.unibo.it>.

[1]    The DataCite Metadata Kernel version 2.0 (2011). http://datacite.org/schema/DataCite-MetadataKernel_v2.0.pdf.

[2]     David Shotton and Silvio Peroni (2011). Mapping DataCite Metadata Scheme Terms (v2.0) to ontologies (DataCite2RDF). Google docs. https://docs.google.com/document/d/1paJgvmCMu3pbM4in6PjWAKO0gP-6ultii3DWQslygq4/edit?authkey=CMeV3tgF&hl=en_GB.


This entry was posted in Data publication, JISC, Ontologies and tagged , , , , , , , , . Bookmark the permalink.