COCI has surpassed 700M citations

We are excited to share that COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations, was on 12 May 2020 extended with more than 47 million additional citations, and has reached a total number of more than 702 million DOI-to-DOI citation links between more than 58 million bibliographic entities.

The citations added in this the fifth release of COCI came from the most recent Crossref dump downloaded on 22 April 2020, which includes the references of the articles deposited in Crossref between 4 October 2019 and 4 April 2020. Such updates to COCI will now occur regularly at bimonthly intervals.

As a consequence, COCI now contains 702,772,530 citations, and also includes publications about the COVID-19 pandemic. We will use this new release of COCI to update the Coronavirus Open Citations Dataset, the second release of which will include details about these additional references and publications.

COCI, which is fully described in our open-access article, was one of the subjects of a multidisciplinary comparison between the major citation indexes recently published on arXiv. In addition, it has been recently mentioned on the Scholix web site as one of the implementors of the Scholix citation data format.

Finally, we wish to remind you that all the bibliographic and citation data in COCI:

Additional 31 million citations in COCI

We are proud to announce that COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations, has just been extended with more than 31 million additional citations.

As introduced in an earlier blog post and an open-access article recently published on Scientometrics, COCI is our first OpenCitations Index of open citations. In COCI, we have applied the concept of citations as first-class data entities, each identified using a unique persistent Open Citation Identifier (OCI). COCI indexes the contents of one of the major databases of open scholarly citation information, namely Crossref, and renders and makes available this information in machine-readable RDF and in other formats.

The fourth release of COCI contains more than 655 million DOI-to-DOI citation links between more than 55 million bibliographic entities. The additional 31 million citations added in the new release come from the reprocessing of previous dumps of Crossref  data. In particular, we retrieved all the citations that involve references in citing articles that were in the Crossref ‘Limited’ set when we downloaded it in October 2018. Such citing articles currently appear in the Crossref ‘Closed’ dataset due to more recent restrictive policy decisions taken by their publishers.

Finally, we wish to remind you that all the bibliographic and citation data in COCI:

The French National Fund for Open Science supports OpenCitations

The French National Fund for Open Science (FNSO) has decided to support OpenCitations, PKP, and DOAB as part of SCOSS, the Global Sustainability Coalition for Open Science Services.

FNSO has identified OpenCitations as an infrastructure disseminating bibliographic and citation metadata in open access with a level of quality and coverage that provides a workable, free and open alternative to the academic community’s current dependency on proprietary tools, therefore freeing up possibilities for citation analysis, promoting the evolution of bibliometric indicators and broadening knowledge of science.

The FNSO is contributing € 250,000, which is 16.3% of the amount that was requested under SCOSS and is committing to a political and technical partnership with OpenCitations.

OpenCitations is deeply honoured and delighted that the French Open Science Committee has chosen to award such a substantial portion of its open science budget to support our work. These funds will be spent (a) on strengthening our computational infrastructure, (b) on employing software engineers to develop new data sources and services, and data curators to ensure the highest possible quality of our data, and (c) on community engagement through workshops and publications.

OpenCitations described

OpenCitations is an infrastructure organization for open scholarship dedicated to the publication of open bibliographic and citation data. We at OpenCitations are proud to announce the publication, in the first issue of Quantitative Science Studies, of a canonical paper in which we introduce and describe OpenCitations and outline its achievements and goals [1].

Here, I outline the contents of our paper, and provide definitive links on the topics described. Many of these topics have been the subjects of earlier blog posts.

This paper appears in the first Special Issue of QSS, dedicated to the description of the bibliometric data sources that lie at the heart of scientometric research, which aims to characterize the most important data sources currently available and to show how they differ in various dimensions, for instance in the data they provide, their level of openness, and their support for making research reproducible. The first three papers in this special issue cover the most important commercial bibliographic data sources: Web of Science (Clarivate Analytics), Scopus (Elsevier), and Dimensions (Digital Science), while the remaining three articles describe open data sources: Microsoft Academic, Crossref and OpenCitations.

In the introduction to our own paper, we describe the origins of OpenCitations, discuss the growth and benefits of open science, and introduce the Semantic Web techniques used at OpenCitations for recording and publishing our data. We then go on to describe OpenCitations’ services and data, namely Open Citation Identifiers, the OpenCitations Data Model, the SPAR (Semantic Publishing and Referencing) Ontologies, the OpenCitations Corpus, and the OpenCitations Indexes of citation data, of which the first and largest is COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations, that currently holds information on over 624 million citations. We conclude our survey of OpenCitations’ services and data by outlining the generic open source software developed at OpenCitations, including OSCAR, the OpenCitations RDF Search Application for searching over RDF datasets, LUCINDA, OSCAR’s associated OpenCitations RDF Resource Browser, and RAMOSE, OpenCitations’ application for creating REST APIs over SPARQL endpoints, thus opening Semantic Web datasets to those not familiar with SPARQL, the RDF query language.

In the second half of the paper, we describe OpenCitations as an organization in terms of its compliance with the principles for the sustainability of open infrastructures proposed by Bilder, Lin and Neylon (2015) [2], and report the selection of OpenCitations by the Global Sustainability Coalition for Open Science Services (SCOSS) as an open infrastructure organization worthy of crowd-funding support by the stakeholder community. We then provide usage statistics for our datasets and web site, and describe the adoption of OpenCitations data and services by the community, before concluding with a forward look at our proposed developments of OpenCitations activities.


The first issue of Quantitative Science Studies

The memorable date 20/02/2020 saw the publication by MIT Press of the first issue of Volume One of a new journal, Quantitative Science Studies (QSS), the official open access journal of the International Society for Scientometrics and Informetrics (ISSI). QSS’s Editor in Chief is Ludo Waltman (CWTS, University of Leiden, Netherlands), Vincent Larivière (Université de Montréal, Montreal, Quebec, Canada) and Staša Milojević (Indiana University Bloomington, Bloomington, Indiana, USA) are its Associate Editors, and it has a large and distinguished editorial board.

What makes the launch of this new journal remarkable is the story of how it came into being. In 2019, the entire editorial team of the Journal of Informetrics (JOI), a leading journal in this field published by Elsevier, resigned en masse and decided to start an alternative journal, QSS, both because of Elsevier’s position on open citations, and because, in their opinion, the financial model used by Elsevier violates the scientific ethos.

Reproducibility in the field of scientometrics requires scientific metadata that are both of high-quality and open, particularly those relating to bibliographic citations. The JOI editorial board was deeply concerned by the refusal of Elsevier to join almost all other large scholarly publishers in supporting the Initiative for Open Citations (I4OC). As we have previously reported on this blog, Elsevier is the largest contributor of bibliographic references to Crossref, but insists that these data should be kept closed.

Elsevier’s position, driven by commercial interests (since it sells access to citation data through Scopus), flies in the face of the scientific community’s clear move towards open science, with hundreds of scientometricians having signed an ISSI open letter urging scholarly publishers to support I4OC.

Science is a self-governing system, and the editorial team held the view that the ultimate responsibility for a scholarly journal should fall with the scientific community, who serve as the gatekeepers, producers, and consumers of scientific content.

The editorial team also believed Elsevier’s subscription fees to be excessive, and its article processing charges (APCs) for open access publishing to be unfairly high, thus limiting both those who can afford to read Elsevier journals and those who can afford to publish in them, so that publishing with Elsevier inevitably places major limits on scholarship, harming both science and society. It was for all these reasons that they forsook JOI and started QSS.

We at OpenCitations congratulate the editorial team for their courage in deciding to make this journal flip, and wish them, together with the ISSI and MIT Press, every success for this important new journal. We also commend the Technische Informationsbibliothek (TIB) – Leibniz Information Centre for Science and Technology and the Communication, Information, Media Centre (KIM) of the University of Konstanz, who, in collaboration with the Fair Open Access Alliance (FOAA), have generously agreed to cover APCs for the first three years of the QSS journal.

Introducing InTRePIDs – In-Text Reference Pointer Identifiers


Readers of this blog will be familiar with Open Citation Identifiers (OCIs), described in an earlier post and formally defined in [1]. OCIs enable bibliographic citations, treated as first class information entities, to be uniquely identified and referenced, and are used to identify the >624 million individual citations indexed in the latest release of COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations, as described in a recent post.

However, COCI and similar citation indexes do not provide any information about where within the citing paper a citation is generated, the textual contexts of the in-text reference pointers, or the reasons for including different in-text reference pointers denoting the same reference at different points within the text.

As explained in the preceding post describing the Open Biomedical Citations in Context Corpus funded by the Wellcome Trust and under development by OpenCitations, deep citation analysis requires a more nuanced approach to citations, which acknowledges that each in-text reference pointer that denotes a bibliographic reference in the reference list of a citing publication instantiates its own citation, as shown in Figure 1.

Figure 1. Citations between a citing paper and a cited paper instantiated both by the inclusion of a bibliographic reference within the reference list of the citing paper and by the inclusion within the text of the citing paper of one or more in-text reference pointers denoting that reference.

The pointer citations clearly involve the same cited publication as does the reference citation itself, but each has its own unique characteristics: the location and textual context of its in-text reference pointer within the text of the citing publication, and its particular rhetorical function which is determined by that context.

If the reference citation is open (as defined in [2]) and identified by an OCI, each in-text reference pointer related to that citation can be identified uniquely using an In-Text Reference Pointer Identifier (InTRePID).

InTRePIDs facilitate in-depth scholarship on in-text reference pointer locations and citation functions, and fine-grained analysis of the relationships between publications, by making it possible

  • to identify each in-text reference pointer with a unique PID,
  • to distinguish references that are cited only once from those that are cited multiple times,
  • to see which references are cited together (e.g. in the same sentence or within an in-text reference pointer list),
  • to determine from which section(s) of the article references are cited (e.g. Introduction, Methods, Discussion), and, potentially,
  • to determine the rhetorical function of the citations from analysis of their textual contexts, by the application of natural language processing, machine learning and artificial intelligence techniques to conduct sentiment analysis on the citation contexts.

Definition of an InTRePID

An InTRePID is composed of two parts separated by an oblique stroke



  • <oci-numerals> is the numerical part of the OCI uniquely identifying the particular open citation to which the in-text reference pointer and its denoted bibliographic reference relate. Thus an InTRePID can be assigned for any in-text reference pointer that relates to an open citation for which a valid OCI has been assigned;
  • <ordinal> identifies the nth occurrence of an in-text reference pointer within the text of the citing paper relating to that citation; and
  • <total> defines the total number of in-text reference pointers denoting that bibliographic reference within the citing paper.

For example, intrepid:070433-070475/46 is a valid InTRePID for an in-text reference pointer defined within the OpenCitations Citations in Context Corpus.

A formal definition document for the InTRePID is given in [3].

Exemplar in-text reference pointers

Consider the following citing paper:

Zou, J. et al. (2020). Phenotypic and genotypic correlates of penicillin susceptibility in nontoxigenic Corynebacterium diphtheriae, British Columbia, Canada, 2015–2018. Emerging Infectious Diseases, 26: 97-103.

This paper contains six in-text reference pointers denoting Reference 13 in the reference list:

13. Lowe, C. et al. (2011). Cutaneous diphtheria in the urban poor population of Vancouver, British Columbia, Canada: a 10-year review. J. Clinical Microbiology 49: 2664-2666.

The InTRePIDs for these pointers are recorded within the OpenCitations Biomedical Citations in Context Corpus, together with the corpus identifiers and DOIs of the citing and cited papers, as shown in the excerpt presented in Figure 2.

Figure 2. An excerpt from the OpenCitations Biomedical Citations in Context Corpus, showing highlighted the InTRePIDs for the six in-text reference pointers within Zou, J. et al. (2020) denoting Reference 13, the reference to Lowe, C. et al. (2011), together with the internal corpus identifiers for each in-text reference pointer, and the corpus identifiers and DOIs for the citing and cited papers.

Of these six in-text reference pointers, having InTRePIDs intrepid:070433-070475/1-6 to intrepid:070433-070475/6-6, the first and the fourth of these, together with their document locations, their embedding sentences, their in-text reference pointer lists, and their InTRePIDs, chosen as examples, are as follows:

Introduction. “Nontoxigenic strains have been shown to have epidemic potential, causing infections in persons afflicted by homelessness, alcohol abuse, and injection drug use (9,13–15).” (intrepid:070433-070475/1-6)

Discussion. “We also noted ST5 and ST32 in our review from downtown Vancouver during 1998–2007 (13).” (intrepid:070433-070475/4-6)

The first of these discusses those people most susceptible to diphtheria infection, while the other discusses which multilocus sequence types (STs) of C. diphtheriae were found, thus relating to the organism causing the infection rather than to the infected individuals. The rhetorical function of these two in-text reference pointers is quite distinct.

To permit this information to be recorded within the OpenCitations Citations in Context Corpus, extensions were required to the OpenCitations Data Model, a new extended version of which was recently published [4], as described in a related blog post.

The OpenCitations InTRePID Resolution Service

To support the use of InTRePIDs to identify in-text reference pointers, OpenCitations has recently developed an InTRePID Resolution Service (currently in ‘beta’ in its development cycle), which is running at A screenshot of this service is shown in Figure 3.

Figure 3. A screenshot of the user interface of the InTRePID Resolution Service.

In addition to using the Web user interface shown in Figure 3, InTRePIDs can be entered into this resolution service in the form of resolvable URIs, e.g.

As shown in Figure 4, the OpenCitations InTRePID Resolution service returns metadata concerning the in-text reference pointer identified by the InTRePID, and the bibliographic reference that it denotes, from which further information about the citation and the citing and cited publications may be obtained by following the links provided.

Figure 4. A screenshot of the Web page displaying metadata returned by the InTRePID Resolution Service.

Note that as well as rendering this information in HTML on a web page, the resolution service can also provide it in a variety of machine-readable formats.


InTRePIDs, which enable the identification of individual in-text reference pointers, and the InTRePID Resolution Service, are new services from OpenCitations that will facilitate scholarship on the textual contexts and rhetorical functions of such in-text reference pointers, and of the citations that they instantiate.

InTRePIDs were first announced on 30th January 2020 at PIDapalooza 2020 in Lisbon, the Open Festival of Persistent Identifiers.


