The meaning of the word “dataset” is ambiguous, changing with context. In FaBiO , the FRBR-aligned bibliographic ontology, we define “dataset” at the conceptual level (i.e. as a frbr:Work) thus:
fabio:Dataset: “A collection of related facts, often expressed in numerical form and encoded in a defined structure.”
In situations demanding precision in terminology, such as data repository metadata and data citation, I think it is wiser to avoid the use of the term “dataset”, and instead to use the terms “data file” and “data package“, as the entities that data repositories handle and to which identifiers pertain.
In FaBiO, a data file is defined as follows:
fabio:DataFile: “A realisation of a fabio:Dataset (a frbr:Work) containing a defined collection of data with specific content and possibly with a specific version number, that can be embodied as a fabio:Digital Manifestation (a frbr:Manifestation with a specific format) and represented by a specific fabio:ComputerFile (a frbr:Item) on someone’s hard drive.”
FaBiO also has the class
fabio:MetadataDocument “A document that contains metadata information describing one or more characteristics of an entity”, which is a realization of fabio:Metadata
“A separate work that provides information describing one or more characteristics of a resource or entity”.
Using ORE, a data package containing data files and a metadata manifest can be defined as type of ore:Aggregation. However, ORE is not sufficiently expressive to meet the need of repositories who may wish to refer specifically to data packages, to manifests and to the landing pages displaying metadata relating to data packages and individual data files in their RDF metadata. For this reason, we have developed two tiny new ontologies, DaPO, the Data Package Ontology, and LaPO, the Landing Page Ontology, that provide the missing nomenclature, as will be detailed in a future blog post.