Requirements for citations to be treated as First-Class Data Entities
In my introductory blog post, I listed five requirements for the treatment of citations as first-class data entities. The first of these requirements is that they must be definable in a machine-readable manner as a member of the class “Citation”, and describable using appropriate ontology terms.
Changes to the OpenCitations Data Model
In the OpenCitations Data Model (OCDM), itself described in the following blog post, we have created the following new classes and properties that permit the descriptions of citations in richer ways that are appropriate for bibliometric research. These changes have been inspired by the publications of Vincent Larivière, Ludo Waltman and their colleagues [1-3].
These new classes and properties and their definitions are described below:
- Citation: a permanent conceptual directional link from the citing bibliographic resource to a cited bibliographic resource, created by the performative act of an author citing a published work that is relevant to the current work, typically made by including a bibliographic reference in the reference list of the citing work, or by the inclusion within the citing work of a link, in the form of an HTTP Uniform Resource Locator (URL), to the cited bibliographic resource on the World Wide Web.
The class Citation has sub-classes defining a particular type of citation.
- Self-citation: a citation in which the citing and the cited entities have something significant in common with one another. Sub-classes include:
- Affiliation self-citation: a citation in which at least one author from each of the citing and the cited entities is affiliated with the same academic institution.
- Author network self-citation: a citation in which at least one author of the citing entity has direct or indirect co-authorship links with one of the authors of the cited entity.
- Author self-citation: a citation in which the citing and the cited entities have at least one author in common.
- Funder self-citation: a citation in which the works reported in the citing and the cited entities were funded by the same funding agency.
- Journal self-citation: a citation in which the citing and the cited entities are published in the same journal.
- Journal cartel citation: a citation from one journal to another journal which forms one of a very large number of citations from the citing journal to recent articles in the cited journal.
- Distant citation: a citation in which the citing and the cited entities have nothing significant in common with one another over and beyond their subject matter.
New object properties
- has citing document: The bibliographic resource which acts as source for the citation.
- has cited document: The bibliographic resource which acts as target for the citation.
New data properties
- has citation creation date:The date on which the citation was created. This has the same numerical value as the publication date of the citing bibliographic resource, but is a property of the citation itself. When combined with the citation time span, it permits that citation to be located in history.
- has citation time span: The temporal characteristic of a citation, namely the interval between the publication date of the cited entity and the publication date of the citing entity.
Changes to CiTO, the Citation Typing Ontology
To complement these additions to the OpenCitations Data Model, and to permit these richer characteristics of citations to be encoded in RDF, we have additionally made the following changes to CiTO, the Citation Typing Ontology.
The class cito:SelfCitation has been renamed cito:AuthorSelfCitation, with an unchanged definition (“a citation in which the citing and the cited entities have at least one author in common”).
A new class cito:SelfCitation has been created, with same the more general definition as for this sub-class in the OCDM (“a citation in which the citing and the cited entities have something significant in common with one another”). In CiTO, this now includes five new sub-classes:
with the definitions given above for these sub-classes in the OCDM.
New object properties
To complement the OCDM properties, we have within CiTO the following object properties:
- cito:hasCitedEntity (“A property that relates a citation to the cited entity”) and
- cito:hasCitingEntity (“A property that relates a citation to the cited entity”).
CiTO also has the following relevant object property:
with the sub-property cito:sharesJournalWith.
New data properties
To match the additions in the OCDM, we have added these new data properties to CiTO, which have the same definitions as those in the OCDM:
In addition, the class cito:AuthorNetworkSelfCitation is accompanied by the new data property:
which specifies the minimal distance that one of the authors of a citing entity has with regards to one of the authors of a cited entity according to their co-author network. For instance, a citation has a co-authorship citation level equal to 1 if at least one author of the citing entity has previously published as co-author with one of the authors of the cited entity. Similarly, we say that a citation has a co-authorship citation level equal to 2 if at least one author of the citing entity has previously published as co-author with someone who him/herself has previously published as co-author with one of the authors of the cited entity. And so on.
Describing a citation in RDF
Describing a citation between two articles in RDF as a simple link is straightforward but relatively uninformative:
<https://w3id.org/oc/corpus/br/1> cito:cites <https://w3id.org/oc/corpus/br/18> .
The alternative RDF description of a citation as a first-class date entity could include the following triples (omitting any provenance information in this example), where br/1 and br/18 are the internal identifiers for the citing bibliographic resource and the cited bibliographic resource within the OpenCitations Corpus:
<https://w3id.org/oc/virtual/ci/1-18> a cito:Citation ; cito:hasCitingEntity <https://w3id.org/oc/corpus/br/1> ; cito:hasCitedEntity <https://w3id.org/oc/corpus/br/18> ; cito:hasCitationCreationDate "2016"^^xsd:gYear ; cito:hasCitationTimeSpan "P10Y"^^xsd:duration ; datacite:hasIdentifier <https://w3id.org/oc/virtual/id/ci-1-18> .
The meaning of “virtual” in the URI of this citation is explained in the following blog post about the OpenCitations Data Model.
The following diagram prepared by Silvio Peroni shows the semantic relationships for a citation currently handled by the OpenCitations Corpus (omitting the sub-classes of the class cito:Citation). Explanation of OCI, the Open Citation Identifier, is given in a subsequent post.
 Matthew L. Wallace, Vincent Larivière and Yves Gingras (2012. A Small World of Citations? The Influence of Collaboration Networks on Citation Practices. PLoS ONE 7(3): e33339. https://doi.org/10.1371/journal.pone.0033339
 Philippe Mongeon, Ludo Waltman and Sarah de Rijcke (2016). What do we know about journal citation cartels? A call for information. CWTS blog post. Available at https://www.cwts.nl/blog?article=n-q2w2b4
 Ludo Waltman and Caspar Chorus (2016). Journal self-citations are increasingly biased toward impact factor years. CWTS blog post. Available at https://www.cwts.nl/blog?article=n-q2x264