Citations as First-Class Data Entities: Introduction

Citations are now centre stage

As a result of the Initiative for Open Citations (I4OC), launched on April 6 last year, almost all the major scholarly publishers now open the reference lists they submit to Crossref, resulting in more than half a billion references being openly available via the Crossref API.

It is therefore time to think carefully about how citations are treated, and how they might be better handled as part of the Linked Open Data Web.

Citations are normally treated simply as the links between published entities.

Conventional citation

However, an alternative richer view is to regard a citation as a data entity in its own right.

First class citation

This permits us to endow a citation with descriptive properties, such as

has citation creation date:   3rd March 2015
has citation time span:       6 years, 5 months and 23 days
has type:                     Self-citation
has identifier:               oci:7295288-3962641

[Note: a later blog post entitled “Open Citation Identifiers” will include an explanation of the identifier shown here.]

Advantages of treating citations as First-Class Data Entities

  • All the information regarding each citation is available in one place.
  • Citations become easier to describe, distinguish, count and process.
  • If available in aggregate, citations described in this manner are easier to analyze using bibliometric methods, for example to determine how citation time spans vary by discipline.

Requirements for citations to be treated as First-Class Data Entities

  • They must be definable in a machine-readable manner as a member of the class “Citation”, and describable using appropriate ontology terms.
  • They must have metadata structured using a generic yet appropriately detailed data model.
  • They must be storable, searchable and retrievable in an open database designed for bibliographic citations.
  • They must be identifiable using a global persistent identifier scheme.
  • There must be a Web-based identifier resolution service that takes the citation identifier as input and returns a description of the citation.

Blog post detailing how these requirements are met

Subsequent blog posts will describe how we at OpenCitations have satisfied these requirements, permitting citations to indeed be treated as First-Class Data Entities:

  1. Citations as First-Class Data Entities: Citation Descriptions
  2. Citations as First-Class Data Entities: The OpenCitations Data Model
  3. Citations as First-Class Data Entities: The OpenCitations Corpus
  4. Citations as First-Class Data Entities: Open Citation Identifiers
  5. Citations as First-Class Data Entities: The Open Citation Identifier Resolution Service
Advertisements
Posted in Bibliographic references, Citations as First-Class Data Entities, Ontologies, Open Citation Identifiers, Open Citations, Semantic Publishing, Uncategorized | Tagged , , , , | 6 Comments

OpenCitations and the Initiative for Open Citations: A Clarification

Some folk are confused, but OpenCitations and the Initiative for Open Citations, despite the similarity of their names, are two distinct organizations.

OpenCitations (http://opencitations.net) is an open scholarly infrastructure organization directed by Silvio Peroni and myself, and its primary purpose is to host and build the OpenCitations Corpus (OCC), an RDF database of scholarly citation data that now contains almost 13 million citation links.

In contrast, the Initiative for Open Citations (I4OC; https://i4oc.org) is a separate and independent organization, whose founding was spearheaded by Dario Taraborelli of the WikiMedia Foundation.  OpenCitations was just one of several organizations that founded the Initiative for Open Citations, as documented at https://i4oc.org/#founders.

I4OC is a collaboration between scholarly publishers, researchers, and other interested parties to promote the unrestricted availability of scholarly citation data, but it does not itself host citation data.

Within a short space of time, I4OC has persuaded most of the major scholarly publishers to open their reference lists submitted to Crossref, so that the proportion of all references submitted to Crossref that are now open has risen from 1% to over 50%.

These references are now available for OpenCitations to harvest into the OpenCitations Corpus and publish in RDF, as well as for others to harvest and use as they wish.

All clear now?

Posted in Bibliographic references, open access, Open Citations, Open scholarship | Tagged , , , , | Leave a comment

Oxford University Press opens its references!

Good news!  Today, on January 16th 2018, Oxford University Press (OUP) announced its participation in the Initiative for Open Citations, and requested Crossref to turn on reference sharing for all OUP deposited references from more than half a million publications.  Oxford University Press is the largest university press in the world, publishing in 70 languages and 190 countries.

OUP logo

Their announcement is at https://academic.oup.com/journals/pages/announcements_from_oup/oup_joins_I4OC.

OUP now joins the elite band of four university presses that have already made their references open at Crossref in response to the I4OC call (https://i4oc.org/#publishers).

This decision by OUP has been a long time in gestation – see my 2012 post Oxford University Press to support Open Citations – but is no less welcome for that!

Posted in Bibliographic references, open access, Open Citations, Open scholarship | Tagged , , , , , | Leave a comment

Funders should mandate open citations

On 9th January 2018, I published a World View article in Nature entitled Funders should mandate open citations [1], in which I argue that access to open references from scholarly publications is so important that, when encouragements from organisations such as the Initiative for Open Citations (I4OC) to publishers to open their references fall on deaf ears, then sterner measures are required.

Where sterner measures . .

Specifically, major funders should extend their open access mandates and require grant recipients to publish only in journals whose publishers ensure their references are open.

This suggestion was originally made by Catriona MacCallum, Director of Open Science at Hindawi, during a recent I4OC conference call, and she deserves the credit for it.

My article is freely available from Nature:

online at http://go.nature.com/2midnzx; PDF at http://rdcu.be/Eqsv.

[1]        David Shotton (2018).  Funders should mandate open citations.  Nature 553: 129.               doi:10.1038/d41586-018-00104-7

 

Posted in Bibliographic references, open access, Open Citations, Open scholarship | Tagged , , | Leave a comment

Barriers to comprehensive reference availability

Two significant barriers prevent comprehensive reference availability through Crossref.

The first barrier

First, two-thirds of Crossref’s publisher-members, in particular the smaller ones, do not submit references along with the other details of their publications. Many of these published works are of types (e.g. abstracts, editorials and news items) that lack any references.  However, while the number of non-submitted references associated with other publications from these publishers is not known, it is likely to be substantial.

Ironically, quite a number of publishers have their Crossref reference status option set to ‘Open’, and yet fail to submit any references!

All publishers who use Crossref DOIs and submit metadata describing their works to Crossref should be strongly encouraged to start submitting associated reference lists if these exist.  Crossref have confirmed that it is easy to do, with or without membership of Crossref’s free and beneficial Cited-by Service that provides publishers with statistics on the citations of their own publications.  Help can be provided by Crossref Support (support@crossref.org).

The second barrier

The second barrier to full reference availability is created by publishers that submit references to Crossref, but do not presently make them open. Elsevier is by far the largest member of this group, which also includes the American Chemical Society, IEEE and Wolters Kluwer Health.

It is both quick and easy for a publisher to change its preference setting and request that all the references associated with its DOI prefixes are made open – all it requires is an email request to support@crossref.org.  But without such a request, the references will be remain in the default ‘Limited’ status.

References that are not associated with Crossref

There are, of course, many scholarly publications, for example preprints in repositories such as arXiv, and journal articles and monographs from small academic publishers in the Humanities, that do not have Digital Object Identifiers issued by Crossref.  There are also an increasing number of datasets in repositories such as Dryad that have associated references to the scholarly literature, but whose DOIs are issued by DataCite.  None of these submit references to Crossref where they can be made available via the Crossref API, and separate additional measures will be required to capture and share their references with the community.

 

 

 

 

Posted in Bibliographic references, Data publication, open access, Open Citations, Open scholarship | Tagged , , | Leave a comment

The new Crossref reference distribution policy

Since 1st January 2018, Crossref has had a new reference distribution policy, described at https://www.crossref.org/reference-distribution/.

There are three possible options for setting the reference distribution preference from which a publisher can choose, these being ‘Closed’, ‘Limited’ and ‘Open”.

If the ‘Closed’ option is chosen, the references will only be used for the Crossref Cited-by service, and are not distributed via any of the other Crossref interfaces.

If the ‘Limited’ option is chosen, the references will additionally be made available to Crossref metadata subscribers who have signed an agreement for the new Crossref Metadata APIs ‘Plus’ service which also came into effect on 1st January 2018.  This is the new Crossref default option.

If the ‘Open’ option is chosen, the references will be openly available to anyone using the Crossref APIs.

Publishers will no longer be able to select the reference distribution preference for individual publications on a case-by-case basis, but rather the preference will be set for all publications appearing under a particular DOI prefix, with the new default option being ‘Limited’.

To date, more than 60 scholarly publishers, including most of the major ones, have responded to the Initiative for Open Citations and have instructed Crossref to set their references to open, as describe in a previous post.

Posted in open access, Open Citations, Open scholarship | Tagged , | Leave a comment

Openness of non-Elsevier references

For completeness, this post, also based on analyses performed by Daniel Ecer of eLife (d.ecer@elifesciences.org) on data he downloaded from Crossref in September 2017 (Ecer, 2017), complements the two preceding posts, and details the openness of references from scholarly publishers other than Elsevier.

 The main conclusion is that, of the 650,093,489 references stored in Crossref from journal articles published by publishers other than Elsevier, 486,041,671 (74.76%) are open.

The detailed statistics derived from the Crossref data at the time of sampling relating to all publishers except Elsevier are as follows:

Number of works recorded at Crossref from publishers other than Elsevier

Crossref has records of 93,184,372 works with DOIs, of which 69,699,633 (74.80%) are journal articles and 23,484,739 (25.20%) are works that are not journal articles (i.e. book chapters, proceedings articles, datasets, etc.).

Of the 93,184,372 works, 76,795,932 (82.41%) were published by publishers other than Elsevier.

Of the 69,699,144 journal articles, 54,440,761 (78.11%) were in journals with publishers other than Elsevier.

Of the 23,484,739 works that are not journal articles, 22,355,171 (95.19%) were published by publishers other than Elsevier.

Numbers of non-Elsevier works with references

Of all 76,795,932 works with DOIs recorded in Crossref from publishers other than Elsevier, 27,609,963 (35.95%) have accompanying references and 49,185,969 (64.05%) lack references.

Of the 54,440,761 journal articles recorded in Crossref from publishers other than Elsevier, 23,459,805 (43.09%) have accompanying references, and 30,980,956 (56.91%) lack references.

Of the 22,355,171 works that are not journal articles recorded in Crossref from publishers other than Elsevier, 4,150,158 (18.56%) have accompanying references, and 18,205,013 (81.44%) lack references.

Number of non-Elsevier references at Crossref

Of the 1,075,133,743 references stored in Crossref from all works, 732,513,350 (68.13%) are from works published by publishers other than Elsevier.

Of the 956,050,193 references stored in Crossref from journal articles, 650,093,489 (68.00%) are from journals published by publishers other than Elsevier.

Of the 119,083,550 references stored in Crossref from works that are not journal articles, 82,419,861 (69.21%) are from works published by publishers other than Elsevier.

Average numbers of references per non-Elsevier work

The 732,513,350 non-Elsevier references stored in Crossref come from 27,609,963 works of all types with accompanying references, giving an average of 26.53 references per work.

650,093,489 non-Elsevier references come from 23,459,805 non-Elsevier journal articles with accompanying references, giving an average of 27.71 references per journal article.

82,419,861 non-Elsevier references come from 4,150,158 non-Elsevier works with accompanying references that are not journal articles, averaging 19.86 references per work.

Proportion of non-Elsevier works that have open references

Of the 27,598,963 non-Elsevier works of all type documented in Crossref that have accompanying references, 18,228,221 (66.05%) have open references.

Of the 23,459,805 non-Elsevier journal articles documented in Crossref that have accompanying references, 17,072,801 (72.77%) have open references.

Of the 4,139,158 non-Elsevier works documented in Crossref that are not journal articles and that have accompanying references, 1,155,420 (27.91%) have open references.

Proportion of non-Elsevier references that are open

Of the 732,513,350 references stored in Crossref from all works published by publishers other than Elsevier, 523,186,205 (71.42%) are open, and 209,327,145 (28.58%) are not open.

Of the 650,093,489 references stored in Crossref from journal articles published by publishers other than Elsevier, 486,041,671 (74.76%) are open, and 164,051,818 (25.24%) are not open.

Of the 82,419,861 references stored in Crossref from works published by publishers other than Elsevier that are not journal articles, 37,144,534 (45.07%) are open, and 45,275,327 (54.93%) are not open.

Proportion of references which are not open that are published by publishers other than Elsevier

Of the 551,932,682 references from all works stored at Crossref that are not open, 209,327,145 (37.93%) are from works published by publishers other than Elsevier.

Of the 470,008,522 references from journal articles stored at Crossref that are not open, 164,051,818 (34.90%) are from journal articles published by publishers other than Elsevier.

Of the 81,924,160 references from works that are not journal articles stored at Crossref that are not open, 45,275,327 (55.26%) are from works published by publishers other than Elsevier.

 

 Details for all publishers combined, and for Elsevier separately, are given in the two previous posts.

 

Reference

Ecer, D. (2017). Crossref Data Notebook. Available at https://elifesci.org/crossref-data-notebook

 

Posted in open access, Open Citations, Open scholarship | Tagged , , | 1 Comment