Nomenclature for data publications and citations

The meaning of the word “dataset” is ambiguous, changing with context.  In FaBiO , the FRBR-aligned bibliographic ontology, we define “dataset” at the conceptual level (i.e. as a frbr:Work) thus:

fabio:Dataset: “A collection of related facts, often expressed in numerical form and encoded in a defined structure.”

In situations demanding precision in terminology, such as data repository metadata and data citation, I think it is wiser to avoid the use of the term “dataset”, and instead to use the terms “data file” and “data package“, as the entities that data repositories handle and to which identifiers pertain.

In FaBiO, a data file is defined as follows:

fabio:DataFile: “A realisation of a fabio:Dataset (a frbr:Work) containing a defined collection of data with specific content and possibly with a specific version number, that can be embodied as a fabio:Digital Manifestation (a frbr:Manifestation with a specific format) and represented by a specific fabio:ComputerFile (a frbr:Item) on someone’s hard drive.”

FaBiO also has the class

fabio:MetadataDocument “A document that contains metadata information describing one or more characteristics of an entity”, which is a realization of fabio:Metadata
“A separate work that provides information describing one or more characteristics of a resource or entity”.

Using ORE, a data package containing data files and a metadata manifest can be defined as type of ore:Aggregation.  However, ORE is not sufficiently expressive to meet the need of repositories who may wish to refer specifically to data packages, to manifests and to the landing pages displaying metadata relating to data packages and individual data files in their RDF metadata. For this reason, we have developed two tiny new ontologies, DaPO, the Data Package Ontology, and LaPO, the Landing Page Ontology, that provide the missing nomenclature, as will be detailed in a future blog post.

This entry was posted in Data publication, JISC, Ontologies and tagged , , , , , , , , , , , . Bookmark the permalink.

One Response to Nomenclature for data publications and citations

  1. Pingback: JISC Open Citations Project – Final Project Blog Post | JISC Open Citations

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s