Introducing InTRePIDs – In-Text Reference Pointer Identifiers

Rationale

Readers of this blog will be familiar with Open Citation Identifiers (OCIs), described in an earlier post and formally defined in [1]. OCIs enable bibliographic citations, treated as first class information entities, to be uniquely identified and referenced, and are used to identify the >624 million individual citations indexed in the latest release of COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations, as described in a recent post.

However, COCI and similar citation indexes do not provide any information about where within the citing paper a citation is generated, the textual contexts of the in-text reference pointers, or the reasons for including different in-text reference pointers denoting the same reference at different points within the text.

As explained in the preceding post describing the Open Biomedical Citations in Context Corpus funded by the Wellcome Trust and under development by OpenCitations, deep citation analysis requires a more nuanced approach to citations, which acknowledges that each in-text reference pointer that denotes a bibliographic reference in the reference list of a citing publication instantiates its own citation, as shown in Figure 1.

Figure 1. Citations between a citing paper and a cited paper instantiated both by the inclusion of a bibliographic reference within the reference list of the citing paper and by the inclusion within the text of the citing paper of one or more in-text reference pointers denoting that reference.

The pointer citations clearly involve the same cited publication as does the reference citation itself, but each has its own unique characteristics: the location and textual context of its in-text reference pointer within the text of the citing publication, and its particular rhetorical function which is determined by that context.

If the reference citation is open (as defined in [2]) and identified by an OCI, each in-text reference pointer related to that citation can be identified uniquely using an In-Text Reference Pointer Identifier (InTRePID).

InTRePIDs facilitate in-depth scholarship on in-text reference pointer locations and citation functions, and fine-grained analysis of the relationships between publications, by making it possible

  • to identify each in-text reference pointer with a unique PID,
  • to distinguish references that are cited only once from those that are cited multiple times,
  • to see which references are cited together (e.g. in the same sentence or within an in-text reference pointer list),
  • to determine from which section(s) of the article references are cited (e.g. Introduction, Methods, Discussion), and, potentially,
  • to determine the rhetorical function of the citations from analysis of their textual contexts, by the application of natural language processing, machine learning and artificial intelligence techniques to conduct sentiment analysis on the citation contexts.

Definition of an InTRePID

An InTRePID is composed of two parts separated by an oblique stroke

intrepid:<oci-numerals>/<ordinal><total>

where

  • <oci-numerals> is the numerical part of the OCI uniquely identifying the particular open citation to which the in-text reference pointer and its denoted bibliographic reference relate. Thus an InTRePID can be assigned for any in-text reference pointer that relates to an open citation for which a valid OCI has been assigned;
  • <ordinal> identifies the nth occurrence of an in-text reference pointer within the text of the citing paper relating to that citation; and
  • <total> defines the total number of in-text reference pointers denoting that bibliographic reference within the citing paper.

For example, intrepid:070433-070475/46 is a valid InTRePID for an in-text reference pointer defined within the OpenCitations Citations in Context Corpus.

A formal definition document for the InTRePID is given in [3].

Exemplar in-text reference pointers

Consider the following citing paper:

Zou, J. et al. (2020). Phenotypic and genotypic correlates of penicillin susceptibility in nontoxigenic Corynebacterium diphtheriae, British Columbia, Canada, 2015–2018. Emerging Infectious Diseases, 26: 97-103. https://doi.org/10.3201/eid2601.191241

This paper contains six in-text reference pointers denoting Reference 13 in the reference list:

13. Lowe, C. et al. (2011). Cutaneous diphtheria in the urban poor population of Vancouver, British Columbia, Canada: a 10-year review. J. Clinical Microbiology 49: 2664-2666. https://doi.org/10.1128/JCM.00362-11

The InTRePIDs for these pointers are recorded within the OpenCitations Biomedical Citations in Context Corpus, together with the corpus identifiers and DOIs of the citing and cited papers, as shown in the excerpt presented in Figure 2.

Figure 2. An excerpt from the OpenCitations Biomedical Citations in Context Corpus, showing highlighted the InTRePIDs for the six in-text reference pointers within Zou, J. et al. (2020) denoting Reference 13, the reference to Lowe, C. et al. (2011), together with the internal corpus identifiers for each in-text reference pointer, and the corpus identifiers and DOIs for the citing and cited papers.

Of these six in-text reference pointers, having InTRePIDs intrepid:070433-070475/1-6 to intrepid:070433-070475/6-6, the first and the fourth of these, together with their document locations, their embedding sentences, their in-text reference pointer lists, and their InTRePIDs, chosen as examples, are as follows:

Introduction. “Nontoxigenic strains have been shown to have epidemic potential, causing infections in persons afflicted by homelessness, alcohol abuse, and injection drug use (9,13–15).” (intrepid:070433-070475/1-6)

Discussion. “We also noted ST5 and ST32 in our review from downtown Vancouver during 1998–2007 (13).” (intrepid:070433-070475/4-6)

The first of these discusses those people most susceptible to diphtheria infection, while the other discusses which multilocus sequence types (STs) of C. diphtheriae were found, thus relating to the organism causing the infection rather than to the infected individuals. The rhetorical function of these two in-text reference pointers is quite distinct.

To permit this information to be recorded within the OpenCitations Citations in Context Corpus, extensions were required to the OpenCitations Data Model, a new extended version of which was recently published [4], as described in a related blog post.

The OpenCitations InTRePID Resolution Service

To support the use of InTRePIDs to identify in-text reference pointers, OpenCitations has recently developed an InTRePID Resolution Service (currently in ‘beta’ in its development cycle), which is running at http://opencitations.net/intrepid. A screenshot of this service is shown in Figure 3.

Figure 3. A screenshot of the user interface of the InTRePID Resolution Service.

In addition to using the Web user interface shown in Figure 3, InTRePIDs can be entered into this resolution service in the form of resolvable URIs, e.g.

http://opencitations.net/intrepid/070433-070475/4-6

As shown in Figure 4, the OpenCitations InTRePID Resolution service returns metadata concerning the in-text reference pointer identified by the InTRePID, and the bibliographic reference that it denotes, from which further information about the citation and the citing and cited publications may be obtained by following the links provided.

Figure 4. A screenshot of the Web page displaying metadata returned by the InTRePID Resolution Service.

Note that as well as rendering this information in HTML on a web page, the resolution service can also provide it in a variety of machine-readable formats.

Conclusion

InTRePIDs, which enable the identification of individual in-text reference pointers, and the InTRePID Resolution Service, are new services from OpenCitations that will facilitate scholarship on the textual contexts and rhetorical functions of such in-text reference pointers, and of the citations that they instantiate.

InTRePIDs were first announced on 30th January 2020 at PIDapalooza 2020 in Lisbon, the Open Festival of Persistent Identifiers.

References

[1] Silvio Peroni and David Shotton (2019): Open Citation Identifier: Definition. Figshare. https://doi.org/10.6084/m9.figshare.7127816.v2

[2] Silvio Peroni and David Shotton (2018). Open Citation: Definition. Figshare. https://doi.org/10.6084/m9.figshare.6683855

[3] David Shotton, Marilena Daquino and Silvio Peroni (2020). In-Text Reference Pointer Identifier: Definition. Figshare. https://doi.org/10.6084/m9.figshare.11674032

[4] Marilena Daquino, Silvio Peroni and David Shotton (2019). The OpenCitations Data Model. Version 2.0. Figshare. https://doi.org/10.6084/m9.figshare.3443876

This entry was posted in Bibliographic references, Citations as First-Class Data Entities, Open Citation Identifiers, Open Citations and tagged , , , , , , , , , , , , , , , . Bookmark the permalink.

1 Response to Introducing InTRePIDs – In-Text Reference Pointer Identifiers

  1. Pingback: Roundup (January 30, 2020) | LJ infoDOCKET

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s