Citations as First-Class Data Entities: Citation Descriptions

Requirements for citations to be treated as First-Class Data Entities

In my introductory blog post, I listed five requirements for the treatment of citations as first-class data entities.  The first of these requirements is that they must be definable in a machine-readable manner as a member of the class “Citation”, and describable using appropriate ontology terms.

This blog post describes recent additions to the OpenCitations Data Model, and to CiTO, the Citation Typing Ontology, that permit the required richer description of citations.

Changes to the OpenCitations Data Model

In the OpenCitations Data Model (OCDM), itself described in the following blog post, we have created the following new classes and properties that permit the descriptions of citations in richer ways that are appropriate for bibliometric research.  These changes have been inspired by the publications of Vincent Larivière, Ludo Waltman and their colleagues [1-3].

These new classes and properties and their definitions are described below:

New classes

  • Citation: a permanent conceptual directional link from the citing bibliographic resource to a cited bibliographic resource, created by the performative act of an author citing a published work that is relevant to the current work, typically made by including a bibliographic reference in the reference list of the citing work, or by the inclusion within the citing work of a link, in the form of an HTTP Uniform Resource Locator (URL), to the cited bibliographic resource on the World Wide Web.

The class Citation has sub-classes defining a particular type of citation.

  • Self-citation: a citation in which the citing and the cited entities have something significant in common with one another. Sub-classes include:
    • Affiliation self-citation: a citation in which at least one author from each of the citing and the cited entities is affiliated with the same academic institution.
    • Author network self-citation: a citation in which at least one author of the citing entity has direct or indirect co-authorship links with one of the authors of the cited entity.
    • Author self-citation: a citation in which the citing and the cited entities have at least one author in common.
    • Funder self-citation: a citation in which the works reported in the citing and the cited entities were funded by the same funding agency.
    • Journal self-citation: a citation in which the citing and the cited entities are published in the same journal.
  • Journal cartel citation: a citation from one journal to another journal which forms one of a very large number of citations from the citing journal to recent articles in the cited journal.
  • Distant citation: a citation in which the citing and the cited entities have nothing significant in common with one another over and beyond their subject matter.

New object properties

  • has citing document: The bibliographic resource which acts as source for the citation.
  • has cited document: The bibliographic resource which acts as target for the citation.

New data properties

  • has citation creation date:The date on which the citation was created. This has the same numerical value as the publication date of the citing bibliographic resource, but is a property of the citation itself. When combined with the citation time span, it permits that citation to be located in history.
  • has citation time span: The temporal characteristic of a citation, namely the interval between the publication date of the cited entity and the publication date of the citing entity.

Changes to CiTO, the Citation Typing Ontology

To complement these additions to the OpenCitations Data Model, and to permit these richer characteristics of citations to be encoded in RDF, we have additionally made the following changes to CiTO, the Citation Typing Ontology.

New classes

The class cito:SelfCitation has been renamed cito:AuthorSelfCitation, with an unchanged definition (“a citation in which the citing and the cited entities have at least one author in common”).

A new class cito:SelfCitation has been created, with same the more general definition as for this sub-class in the OCDM (“a citation in which the citing and the cited entities have something significant in common with one another”). In CiTO, this now includes five new sub-classes:

  • cito:AuthorSelfCitation
  • cito:JournalSelfCitation
  • cito:FunderSelfCitation
  • cito:AffiliationSelfCitation
  • cito:AuthorNetworkSelfCitation

with the definitions given above for these sub-classes in the OCDM.

New object properties

To complement the OCDM properties, we have within CiTO the following object properties:

  • cito:hasCitedEntity (“A property that relates a citation to the cited entity”) and
  • cito:hasCitingEntity (“A property that relates a citation to the cited entity”).

CiTO also has the following relevant object property:

  • cito:sharesPublicationVenueWith

with the sub-property cito:sharesJournalWith.

New data properties

To match the additions in the OCDM, we have added these new data properties to CiTO, which have the same definitions as those in the OCDM:

  • cito:hasCitationCreationDate
  • cito:hasCitationTimeSpan.

In addition, the class cito:AuthorNetworkSelfCitation is accompanied by the new data property:

  • cito:hasCoAuthorshipCitationLevel

which specifies the minimal distance that one of the authors of a citing entity has with regards to one of the authors of a cited entity according to their co-author network. For instance, a citation has a co-authorship citation level equal to 1 if at least one author of the citing entity has previously published as co-author with one of the authors of the cited entity. Similarly, we say that a citation has a co-authorship citation level equal to 2 if at least one author of the citing entity has previously published as co-author with someone who him/herself has previously published as co-author with one of the authors of the cited entity. And so on.

Describing a citation in RDF

Describing a citation between two articles in RDF as a simple link is straightforward but relatively uninformative:

<https://w3id.org/oc/corpus/br/1>
      cito:cites
          <https://w3id.org/oc/corpus/br/18> . 

The alternative RDF description of a citation as a first-class date entity could include the following triples (omitting any provenance information in this example), where br/1 and br/18 are the internal identifiers for the citing bibliographic resource and the cited bibliographic resource within the OpenCitations Corpus:

<https://w3id.org/oc/virtual/ci/1-18> a cito:Citation ;
     cito:hasCitingEntity <https://w3id.org/oc/corpus/br/1> ;
     cito:hasCitedEntity <https://w3id.org/oc/corpus/br/18> ;
     cito:hasCitationCreationDate "2016"^^xsd:gYear ;
     cito:hasCitationTimeSpan "P10Y"^^xsd:duration ;
     datacite:hasIdentifier <https://w3id.org/oc/virtual/id/ci-1-18> .

The meaning of “virtual” in the URI of this citation is explained in the following blog post about the OpenCitations Data Model.

The following diagram prepared by Silvio Peroni shows the semantic relationships for a citation currently handled by the OpenCitations Corpus (omitting the sub-classes of the class cito:Citation).  Explanation of OCI, the Open Citation Identifier, is given in a subsequent post.

References

[1]     Matthew L. Wallace, Vincent Larivière and Yves Gingras (2012. A Small World of Citations? The Influence of Collaboration Networks on Citation Practices.  PLoS ONE 7(3): e33339. https://doi.org/10.1371/journal.pone.0033339

[2]     Philippe Mongeon, Ludo Waltman and Sarah de Rijcke (2016). What do we know about journal citation cartels? A call for information.  CWTS blog post. Available at https://www.cwts.nl/blog?article=n-q2w2b4

[3]       Ludo Waltman and Caspar Chorus (2016). Journal self-citations are increasingly biased toward impact factor years. CWTS blog post. Available at https://www.cwts.nl/blog?article=n-q2x264

Advertisements
Posted in Bibliographic references, Citations as First-Class Data Entities, Ontologies, Open Citations, Semantic Publishing | Tagged , , , , | 1 Comment

Citations as First-Class Data Entities: Introduction

Citations are now centre stage

As a result of the Initiative for Open Citations (I4OC), launched on April 6 last year, almost all the major scholarly publishers now open the reference lists they submit to Crossref, resulting in more than half a billion references being openly available via the Crossref API.

It is therefore time to think carefully about how citations are treated, and how they might be better handled as part of the Linked Open Data Web.

Citations are normally treated simply as the links between published entities.

Conventional citation

However, an alternative richer view is to regard a citation as a data entity in its own right.

First class citation

This permits us to endow a citation with descriptive properties, such as

has citation creation date:   3rd March 2015
has citation time span:       6 years, 5 months and 23 days
has type:                     Self-citation
has identifier:               oci:7295288-3962641

[Note: a later blog post entitled “Open Citation Identifiers” will include an explanation of the identifier shown here.]

Advantages of treating citations as First-Class Data Entities

  • All the information regarding each citation is available in one place.
  • Citations become easier to describe, distinguish, count and process.
  • If available in aggregate, citations described in this manner are easier to analyze using bibliometric methods, for example to determine how citation time spans vary by discipline.

Requirements for citations to be treated as First-Class Data Entities

  • They must be definable in a machine-readable manner as a member of the class “Citation”, and describable using appropriate ontology terms.
  • They must have metadata structured using a generic yet appropriately detailed data model.
  • They must be storable, searchable and retrievable in an open database designed for bibliographic citations.
  • They must be identifiable using a global persistent identifier scheme.
  • There must be a Web-based identifier resolution service that takes the citation identifier as input and returns a description of the citation.

Blog post detailing how these requirements are met

Subsequent blog posts will describe how we at OpenCitations have satisfied these requirements, permitting citations to indeed be treated as First-Class Data Entities:

  1. Citations as First-Class Data Entities: Citation Descriptions
  2. Citations as First-Class Data Entities: The OpenCitations Data Model
  3. Citations as First-Class Data Entities: The OpenCitations Corpus
  4. Citations as First-Class Data Entities: Open Citation Identifiers
  5. Citations as First-Class Data Entities: The Open Citation Identifier Resolution Service
Posted in Bibliographic references, Citations as First-Class Data Entities, Ontologies, Open Citation Identifiers, Open Citations, Semantic Publishing, Uncategorized | Tagged , , , , | 8 Comments

OpenCitations and the Initiative for Open Citations: A Clarification

Some folk are confused, but OpenCitations and the Initiative for Open Citations, despite the similarity of their names, are two distinct organizations.

OpenCitations (http://opencitations.net) is an open scholarly infrastructure organization directed by Silvio Peroni and myself, and its primary purpose is to host and build the OpenCitations Corpus (OCC), an RDF database of scholarly citation data that now contains almost 13 million citation links.

In contrast, the Initiative for Open Citations (I4OC; https://i4oc.org) is a separate and independent organization, whose founding was spearheaded by Dario Taraborelli of the WikiMedia Foundation.  OpenCitations was just one of several organizations that founded the Initiative for Open Citations, as documented at https://i4oc.org/#founders.

I4OC is a collaboration between scholarly publishers, researchers, and other interested parties to promote the unrestricted availability of scholarly citation data, but it does not itself host citation data.

Within a short space of time, I4OC has persuaded most of the major scholarly publishers to open their reference lists submitted to Crossref, so that the proportion of all references submitted to Crossref that are now open has risen from 1% to over 50%.

These references are now available for OpenCitations to harvest into the OpenCitations Corpus and publish in RDF, as well as for others to harvest and use as they wish.

All clear now?

Posted in Bibliographic references, open access, Open Citations, Open scholarship | Tagged , , , , | Leave a comment

Oxford University Press opens its references!

Good news!  Today, on January 16th 2018, Oxford University Press (OUP) announced its participation in the Initiative for Open Citations, and requested Crossref to turn on reference sharing for all OUP deposited references from more than half a million publications.  Oxford University Press is the largest university press in the world, publishing in 70 languages and 190 countries.

OUP logo

Their announcement is at https://academic.oup.com/journals/pages/announcements_from_oup/oup_joins_I4OC.

OUP now joins the elite band of four university presses that have already made their references open at Crossref in response to the I4OC call (https://i4oc.org/#publishers).

This decision by OUP has been a long time in gestation – see my 2012 post Oxford University Press to support Open Citations – but is no less welcome for that!

Posted in Bibliographic references, open access, Open Citations, Open scholarship | Tagged , , , , , | Leave a comment

Funders should mandate open citations

On 9th January 2018, I published a World View article in Nature entitled Funders should mandate open citations [1], in which I argue that access to open references from scholarly publications is so important that, when encouragements from organisations such as the Initiative for Open Citations (I4OC) to publishers to open their references fall on deaf ears, then sterner measures are required.

Where sterner measures . .

Specifically, major funders should extend their open access mandates and require grant recipients to publish only in journals whose publishers ensure their references are open.

This suggestion was originally made by Catriona MacCallum, Director of Open Science at Hindawi, during a recent I4OC conference call, and she deserves the credit for it.

My article is freely available from Nature:

online at http://go.nature.com/2midnzx; PDF at http://rdcu.be/Eqsv.

[1]        David Shotton (2018).  Funders should mandate open citations.  Nature 553: 129.               doi:10.1038/d41586-018-00104-7

 

Posted in Bibliographic references, open access, Open Citations, Open scholarship | Tagged , , | Leave a comment

Barriers to comprehensive reference availability

Two significant barriers prevent comprehensive reference availability through Crossref.

The first barrier

First, two-thirds of Crossref’s publisher-members, in particular the smaller ones, do not submit references along with the other details of their publications. Many of these published works are of types (e.g. abstracts, editorials and news items) that lack any references.  However, while the number of non-submitted references associated with other publications from these publishers is not known, it is likely to be substantial.

Ironically, quite a number of publishers have their Crossref reference status option set to ‘Open’, and yet fail to submit any references!

All publishers who use Crossref DOIs and submit metadata describing their works to Crossref should be strongly encouraged to start submitting associated reference lists if these exist.  Crossref have confirmed that it is easy to do, with or without membership of Crossref’s free and beneficial Cited-by Service that provides publishers with statistics on the citations of their own publications.  Help can be provided by Crossref Support (support@crossref.org).

The second barrier

The second barrier to full reference availability is created by publishers that submit references to Crossref, but do not presently make them open. Elsevier is by far the largest member of this group, which also includes the American Chemical Society, IEEE and Wolters Kluwer Health.

It is both quick and easy for a publisher to change its preference setting and request that all the references associated with its DOI prefixes are made open – all it requires is an email request to support@crossref.org.  But without such a request, the references will be remain in the default ‘Limited’ status.

References that are not associated with Crossref

There are, of course, many scholarly publications, for example preprints in repositories such as arXiv, and journal articles and monographs from small academic publishers in the Humanities, that do not have Digital Object Identifiers issued by Crossref.  There are also an increasing number of datasets in repositories such as Dryad that have associated references to the scholarly literature, but whose DOIs are issued by DataCite.  None of these submit references to Crossref where they can be made available via the Crossref API, and separate additional measures will be required to capture and share their references with the community.

 

 

 

 

Posted in Bibliographic references, Data publication, open access, Open Citations, Open scholarship | Tagged , , | 1 Comment

The new Crossref reference distribution policy

Since 1st January 2018, Crossref has had a new reference distribution policy, described at https://www.crossref.org/reference-distribution/.

There are three possible options for setting the reference distribution preference from which a publisher can choose, these being ‘Closed’, ‘Limited’ and ‘Open”.

If the ‘Closed’ option is chosen, the references will only be used for the Crossref Cited-by service, and are not distributed via any of the other Crossref interfaces.

If the ‘Limited’ option is chosen, the references will additionally be made available to Crossref metadata subscribers who have signed an agreement for the new Crossref Metadata APIs ‘Plus’ service which also came into effect on 1st January 2018.  This is the new Crossref default option.

If the ‘Open’ option is chosen, the references will be openly available to anyone using the Crossref APIs.

Publishers will no longer be able to select the reference distribution preference for individual publications on a case-by-case basis, but rather the preference will be set for all publications appearing under a particular DOI prefix, with the new default option being ‘Limited’.

To date, more than 60 scholarly publishers, including most of the major ones, have responded to the Initiative for Open Citations and have instructed Crossref to set their references to open, as describe in a previous post.

Posted in open access, Open Citations, Open scholarship | Tagged , | Leave a comment