Citations as First-Class Data Entities: Open Citation Identifiers

Requirements for citations to be treated as First-Class Data Entities

In my introductory blog post, I listed five requirements for the treatment of citations as first-class data entities.  The fourth of these requirements is that they must be identifiable using a global persistent identifier scheme.

At the recent PIDapalooza Conference on persistent identifiers, held in Girona, Spain, I launched the Open Citation Identifier (abbreviated OCI, in line with DOI), the new persistent identifier for citations [1].

In this post, I describe the Open Citation Identifier scheme, created and operated by OpenCitations, which supports the assignment of Open Citation Identifiers not only to the citations present in the OpenCitations Corpus (OCC) but also to open citations present in other bibliographic databases.

Structure and syntax of the Open Citation Identifier

Each OCI has a simple structure: oci:number-number, where “oci:” is the identifier prefix.

OCIs for citations stored within the OpenCitations Corpus are constructed by combining the OpenCitations Corpus local identifiers for the citing and cited bibliographic resources, separating them with a dash.  (For definition of OCC local identifiers, see the OpenCitations Data Model).

For example, oci:2544384-7295288 is a valid OCI for the citation between two papers stored within the OpenCitations Corpus, the first number being the OCC local identifier for the citing bibliographic resource [2], and the second being the OCC local identifier for the cited bibliographic resource [3], these bibliographic resource local identifiers being unique within the OCC.  [Note: Supplier prefixes are omitted from OCC local identifiers of bibliographic resources ingested into the OpenCitations Corpus prior to February 2018, but will be included within all OCC local identifiers of bibliographic resources ingested into Corpus after that date.]

OCIs for external resources identifies by numerical identifiers

OCIs can also be created for bibliographic resources described in an external bibliographic database, if they are similarly identified there by identifiers having a unique numerical part.  For example, the OCI for the citation that exists between Wikidata resources Q27931310 (the citing resource, [4]) and Q22252312 (the cited resource, [5]) is oci:0102793131001022252312, where “010” is the assigned OCC supplier prefix for Wikidata.  (The colours here and below are added simply for clarity.)

The OCC supplier prefix consist of a positive number (following the pattern “nnn”, where “nnn” is a string of numerals of variable length which includes no zeros), enclosed between two zeros (e.g. “0420”).  The list of all assigned OCC supplier prefixes is given at https://github.com/opencitations/oci/blob/master/suppliers.csv.

OCIs for citations between resources identified by DOIs

OCIs can also be created for bibliographic resources described in external bibliographic database such as Crossref or DataCite where they are identified by alphanumeric Digital Object Identifiers (DOIs), rather than purely numerical strings.

To achieve this, each case-insensitive DOI is first normalized to lower case letters. Then, after omitting the initial “doi:10.” prefix, the alphanumeric string of the DOI is converted reversibly to a pure numerical string using the simple two-numeral lookup table for numerals, lower case letters and other characters presented at https://github.com/opencitations/oci/blob/master/lookup.csv. For example, using this lockup table, “1” becomes “01”, “2” becomes “02”, “a” becomes “10”, “b” becomes “11”, and “/” becomes “36”.  To the resulting number, the appropriate OCC supplier prefix is then added, to clearly identify its provenance.

A citation documented in Crossref exists between the two publications [3] and [6], which are there identified by the DOIs doi:10.1108/jd-12-2013-0166 and doi:10.1371/journal.pcbi.1000361.  We can thus create an OCI for this Crossref citation by using numerical representations of the two DOIs. These numerical representations are:

0200101000836191363010263020001036300010606

and

02001030701361924302723102137251211183701000000030601

where the initial “020” in each case is the assigned OCC supplier prefix for Crossref.

From these two numerical representations of DOIs, the OCI for the Crossref citation between these two paper is easily constructed, and is:

oci:0200101000836191363010263020001036300010606-02001030701361924302723102137251211183701000000030601

While this is long for an identifier, it should be remembered that it will be processed computationally, and is not intended for human readability.

In this way, Crossref OCIs can be assigned to all ~350 million open references within Crossref in which the cited paper as well as the citing paper has a DOI [7].

OCIs for the same citation recorded within different databases

If a citation is recorded in more than one bibliographic database, a separate OCI can be created for each instance, each OCI having a distinct supplier prefix and being specific to that database.

Thus, in addition to the Crossref OCI created from DOIs and described above for the citation from [3] to [6], a Wikidata OCI exists for the same citation recorded within Wikidata, having the form oci:01024260641-01021092566.

Upon resolution of an OCI, the Open Citation Identifier Resolution Service will pull metadata only from the database specified by the supplier prefix of the OCI.  Details of the Open Citation Identifier Resolution Service are given in the next blog post.

It is important to note that an OCI can only be used to specify a citation between a citing and a cited publication which is actually recorded within a bibliographic database.  For this reason, the OCI “oci:7295288-3962641” shown below the second diagram in the introductory blog post to this series is presently invalid.  While the OpenCitations Corpus has metadata describing both bibliographic resources [3] and [6], it has not yet ingested the reference list for the first bibliographic resource [3] (which has the OCC local identifier 7295288), having information about it only from a reference within a third paper, with no information about the references [3] itself contains.  As a result, at present OCC has no record that a citation actually exists between [3] and the second bibliographic resource [6] (which has the OCC local identifier 3962641).

Representing OCIs in RDF

To permit the description of OCIs in RDF, “oci” has been added as a new member of the class datacite:ResourceIdentifierScheme within the DataCite Ontology.

The resolvable URL for any citation identified by a OCI has the form “https://w3id.org/oc/virtual/ci/nnn-mmm”, where nnn-mmm represents the OCI with its “oci:” prefix removed. Currently, we are able to return the RDF description of all the citations contained in the OpenCitations Corpus and Wikidata. We are working to extend the coverage so as to include other datasets, e.g. Crossref.

References

[1]     David Shotton (2018). Citations as first-class data entities. Open Citation Identifiers.  Conference presentation. PIDapalooza 2018, Girona, 23-23 January 2018. https://doi.org/10.6084/m9.figshare.5844972

[2]     Armen Yuri Gasparyan, Marlen Yessirkepov et al. (2015). Preserving the integrity of citations and references by all stakeholders of science communication.  J. Korean Med. Sci. 30:1545-1552. (English.)  https://doi.org/10.3346/jkms.2015.30.11.1545

[3]     Silvio Peroni, Alexander Dutton, Tanya Gray and David Shotton (2015). Setting our bibliographic references free: towards open citation data. Journal of Documentation, 71 (2): 253-277.  https://doi.org/10.1108/jd-12-2013-0166

[4]     Daniel K. Bricker, Eric B. Taylor et al. (2012). A Mitochondrial Pyruvate Carrier Required for Pyruvate Uptake in Yeast, Drosophila, and Humans. Science 337: 96-100.
https://doi.org/10.1126/science.1218099

[5]     Douglas Hanahan and Robert A. Weinberg (2011). Hallmarks of cancer: the next generation.  Cell 144: 646–674.  https://doi.org/10.1016/j.cell.2011.02.013

[6]     David Shotton, Katie Portwin, Graham Klyne and Alistair Miles (2009).  Adventures in semantic publishing: exemplar semantic enhancement of a research article. PLoS Computational Biology 5: e1000361. http://dx.doi.org/10.1371/journal.pcbi.1000361

[7]     Daniel Ecer (2017). Crossref Data Notebook (updated). Available at https://elifesci.org/crossref-data-notebook

 

Advertisements
Posted in Bibliographic references, Citations as First-Class Data Entities, Open Citation Identifiers, Open Citations, Semantic Publishing | Tagged , , , , | 5 Comments

Citations as First-Class Data Entities: The OpenCitations Corpus

Requirements for citations to be treated as First-Class Data Entities

In my introductory blog post, I listed five requirements for the treatment of citations as first-class data entities.  The third of these requirements is that they must be storable, searchable and retrievable in an open database designed for bibliographic citations.

In this post, I describe the current status of the OpenCitations Corpus, a well-structured open database specifically developed by OpenCitations and designed to store information about bibliographic citations as Linked Open Data, encoded in RDF (specifically JSON-LD).

What is OpenCitations?

OpenCitations (http://opencitations.net) is an scholarly infrastructure organization that has created and is currently expanding the coverage of the Open Citations Corpus (OCC), an open repository of scholarly citation data made available under a Creative Commons CC0 public domain dedication, which provides in RDF accurate citation information (bibliographic references) harvested from the scholarly literature.

The Co-Directors of OpenCitations are David Shotton, Oxford e-Research Centre, University of Oxford (david.shotton@opencitations.net) and Silvio Peroni, Department of Computer Science and Engineering, University of Bologna (silvio.peroni@opencitations.net).

We are committed to open scholarship, open data, open access publication, and open source software.  We espouse the FAIR data principles developed by Force11, of which David Shotton was a founding member, and the aim of the Initiative for OpenCitations (I4OC), of which David Shotton and Silvio Peroni were both founding members, to promote the availability of citation data that is structured, separable, and open.

The principal activity of OpenCitations to date has been the establishment and population of the OpenCitations Corpus.

Holdings of the OpenCitations Corpus

We have so far concentrated on ingesting into the OpenCitations Corpus bibliographic references from open access papers available at PubMed Central, the encoding of these data in RDF, and high-quality curation of the citation links they represent, involving metadata enrichment from the Crossref API and (for authors) the ORCID API.

To date (19th February 2018), the OCC has ingested the references from 302,758 citing bibliographic resources, and contains information about 12,830,347 citation links to 6,549,665 cited resources. Plans to expand the coverage of the OCC are outlined below.

User interfaces

The information within the OCC can be accessed via OSCAR, our new generic OpenCitations RDF Search Application (http://opencitations.net/search) [1], which can be used for textual searches over any triplestore presenting a SPARQL endpoint.  Users can employ OSCAR to search the OCC for publication titles, author names, publication years, and identifiers (DOIs, PubMed IDs PubMed Central IDs, ORCIDs, and OCC corpus identifiers). Such a search returns details of all bibliographic resources within the OCC matching the search term, from which their references can be obtained, if known. In the near future, we will complement OSCAR with a browse interface named LUCINDA.

We also provide a SPARQL endpoint for directly querying the Blazegraph triplestore in which we store the OCC RDF, and we plan in the near future to supplement such programmatic access with a REST API.  In addition, the contents of the entire triplestore, and of the various sub-databases within the Corpus, together with their provenance information, are downloadable from Figshare as monthly dumps.  Once the REST API has been developed, we will turn our attention to developing user interfaces for the interactive visualization of citation graphs.

The OpenCitations Data Model

As described in the previous blog post, we have just completed a comprehensive revision of the OpenCitations Data Model (OCDM, available at https://doi.org/10.6084/m9.figshare.3443876), which we use to capture descriptions of all aspects of the OCC citations and their provenance. This model makes extensive use of our SPAR (Semantic Publishing and Referencing) Ontologies (http://www.sparontologies.net/), which we developed to describe all aspects of the scholarly publishing domain in RDF .

The OpenCitations Data Model is freely available for third parties to use when recording their own bibliographic and citation information in RDF, with the advantage that data so modelled will be immediately compatible with those within the OpenCitations Corpus, which can act as a publishing venue for such third-party data.

Future ingest rate and data sources

Since July 2016, the instantiation of the OpenCitations Corpus currently running at the University of Bologna has been ingesting reference lists from biomedical journal articles at the relatively slow rate of about 200,000 citing bibliographic resources per year. During February 2018, ingestion into the Corpus is suspended, while we move the system to a completely new and more powerful server, supplemented by thirty Raspberry Pi ingest engines that will work in parallel feeding ingested data to the server.

This will increase our ingestion rate ~30-fold to about six million citing bibliographic resources per year, equivalent to ~240 million citations per year at 40 references per paper (the current OCC value is 42.4 references per paper).  We should then be able to complete ingestion of the ~1.4 million remaining OA resources at PubMed Central within about three months.

At that stage, we plan to start ingesting references from the ~17 million journal articles whose deposited references are now open at Crossref as a consequence of the Initiative for Open Citations.  The scholarly world currently publishes about 2.5 million new journal articles each year, of which about half will be probably be open at Crossref (assuming Elsevier has not by then opened its references).  So, by the end of 2020, Crossref will have ~650 million open references.  In addition to ingesting new open Crossref references as they are made available, we will be able to eat into the backlog of existing Crossref open references at a catch-up rate of ~190 million per year.  By the end of 2020, we anticipate that the OCC should contain ~650 million citations harvested from PMC and Crossref, roughly half the coverage of Web of Science.  We are currently also considering ingest of references from other major bibliographic databases.

Our vision for OpenCitations

Our vision is that OpenCitations should become a comprehensive source of open citation information from all disciplines of scholarly endeavour encoded as Linked Open Data, a key component of the academic open infrastructure used on a daily basis without charge by scholars worldwide.

To be of maximum utility, it requires effective graphical user interfaces and analytical tools to interrogate and quantify the data contained within the OCC.  Since these data are all open, we anticipate that such interface and tool development will best be undertaken collaboratively within the open scholarly community, and we invite developers interested in such collaboration to contact us at contact@opencitations.net.

Reference

[1]     Ivan Heibi, Silvio Peroni and David Shotton (2018).  OSCAR: A customisable tool for free-text search over SPARQL endpoints. Accepted to the 2018 International Workshop on Semantics, Analytics, Visualisation: Enhancing Scholarly Dissemination Workshop (https://save-sd.github.io/2018/, co-located with The Web Conference), 24 April 2018 – Lyon, France.  Preprint available at https://w3id.org/people/essepuntato/papers/oscar-savesd2018.html

Posted in Bibliographic references, Citations as First-Class Data Entities, Open Citations, Semantic Publishing | Tagged , , , , | 2 Comments

Citations as First-Class Data Entities: The OpenCitations Data Model

Requirements for citations to be treated as First-Class Data Entities

In my introductory blog post, I listed five requirements for the treatment of citations as first-class data entities.  The second of these requirements is that they must have metadata structured using a generic yet appropriately detailed data model.

To fulfil that requirement, OpenCitations is pleased to announce the publication on 13 February 2018 of the OpenCitations Data Model, v1.6 [1].  This replaces the previous version, v1.5.3, published on 13 July 2016.

The data model has been expanded and enhanced to improve the recording of publication dates, to include the treatment of citations as first-class data entities, and to permit the model’s adoption by third parties who may wish to use it to model their own citation data, or to prepare their citation data for publication in the OpenCitations Corpus (OCC).  To facilitate this, the document describing this data model is published under a Creative Commons Attribution 4.0 International license.

In addition to a change in the title from “Metadata for the OpenCitations Corpus” to “The OpenCitations Data Model”, and the use of the name “OpenCitations” (one token with two words in camel case) in place of “Open Citations” (with the space separating the two words), the substantive changes in the model from the previous version are as follows:

New class

A new class, Archival document, has been added as a subclass of bibliographic resource, to permit the model to be used for work on ancient manuscripts.

Publication dates

The mechanism for recording the publication dates of bibliographic resources has been improved, and now accepts the full date of publication (yyyy-mm-dd, if available), or the year plus the month of publication (yyyy-mm, if the full date is not available), or failing that just the year of publication (yyyy, as in the previous version of the data model).   In order to support this modification in the OWL mapping, prism:publicationDate is now used instead of fabio:hasPublicationYear.

Citations as first-class data entities

A new class of bibliographic entity, Citation, has been added to permit the description of citations as first-class data entities.  This class has been assigned sub-classes (e.g. Author self-citation) and properties (e.g. citation time span) to permit the description of citations in a manner helpful for bibliometric analysis.  These, and associated changes to CiTO, the Citation Typing Ontology, are described more fully in the previous blog post.

Virtual entities

The OpenCitations Data Model now permits the definition of virtual entities, i.e. bibliographic entities that are defined on-the-fly, only when they are requested (for example, by accessing their URLs). These are defined either by using data relating to non-virtual bibliographic entities that are already available within the OCC, or by using data that are themselves obtained on-the-fly from an external supplier (e.g. Wikidata).

This approach of using virtual RDF resources is optional, and is simply employed for storage efficiency, to avoid duplication of information within the OCC triplestore. As of January 2018, only one type of bibliographic entity is defined as a virtual entity, namely a citation (a members of the class Citation).

Such a virtual entity does not have the full provenance information normally associated with other bibliographic entities within the OCC, but it does have associated with itself the date of its creation and direct links both to the agent responsible for such creation and to the source data used in its construction.

Because we do not separately store these virtual entities within the Corpus triplestore, they cannot be directly queried by means of the OCC SPARQL end-point, neither are they stored within its data dumps. However, the data associated with an OCC virtual entity can be obtained by accessing its URL, which has form “https://w3id.org/oc/virtual/xyz”, clearly distinguishable from those URIs used for other (non-virtual) OCC bibliographic entities which have the form “https://w3id.org/oc/corpus/xyz”.  More details and examples are given in the Data Model document itself.

Additionally, for citations defined using Open Citation Identifiers (OCIs, described in a subsequent blog post), details of the cited and citing publications may be readily obtained by using the Open Citation Identifier Resolution Service at http://opencitations.net/oci.

Supplier prefixes

To enable citation data created by third parties to be incorporated within the OpenCitations Corpus, from February 2018 the OCC local identifiers for bibliographic resources now include a supplier prefix which clearly identifies the provenance of the data.  The prefix consists of a positive number (following the pattern “nnn”, where “nnn” is a string of numerals of variable length which includes no zeros), enclosed between two zeros (e.g. “0420”).

To ensure uniqueness of prefixes used by different suppliers, all organizations wishing to adopt the OpenCitations Data Model and to use it to create publicly available citation data, whether these are published in the OpenCitations Corpus or independently, must apply to OpenCitations for a unique supplier prefix, by sending an email to support@opencitations.net.  A list of already assigned supplier prefixes is available at https://github.com/opencitations/oci/blob/master/suppliers.csv.

The appropriate supplier prefix is combined with a unique numerical string that forms the ‘body’ of the identifier to create the local identifier used in OCC to identify an individual bibliographic resource.  OCC local identifiers for citations (as opposed to bibliographic resources) are constructed by combining the local identifiers for the citing and cited bibliographic resources, separating them with a dash.  Thus, for a citation between two bibliographic resources described in an external bibliographic database where they are each identified by an identifier having a unique numerical part, the OCC local identifiers for the citing and cited bibliographic resources are combined, separating them with a dash.

For example, the citation between citing Wikidata resource Q27931310 and cited Wikidata resource Q22252312 is given the OCC local citation identifier “01027931310-01022252312”, where “010” is the OCC supplier prefix (defined above) for Wikidata.  How these OCC local identifiers for citations are used to create Open Citation Identifiers is described in a separate blog post.

 

We commend the OpenCitations Data Model to anyone considering the storage of citation information, particularly if it is to be encoded in RDF, and we welcome contributions of citation data encoded using this model for publication within the OpenCitations Corpus.

Reference

[1]     Silvio Peroni, David Shotton (2018). The OpenCitations Data Model. Version 1.6. figshare. https://doi.org/10.6084/m9.figshare.3443876

Posted in Bibliographic references, Citations as First-Class Data Entities, Open Citations, Semantic Publishing | Tagged , , | 5 Comments

Citations as First-Class Data Entities: Citation Descriptions

Requirements for citations to be treated as First-Class Data Entities

In my introductory blog post, I listed five requirements for the treatment of citations as first-class data entities.  The first of these requirements is that they must be definable in a machine-readable manner as a member of the class “Citation”, and describable using appropriate ontology terms.

This blog post describes recent additions to the OpenCitations Data Model, and to CiTO, the Citation Typing Ontology, that permit the required richer description of citations.

Changes to the OpenCitations Data Model

In the OpenCitations Data Model (OCDM), itself described in the following blog post, we have created the following new classes and properties that permit the descriptions of citations in richer ways that are appropriate for bibliometric research.  These changes have been inspired by the publications of Vincent Larivière, Ludo Waltman and their colleagues [1-3].

These new classes and properties and their definitions are described below:

New classes

  • Citation: a permanent conceptual directional link from the citing bibliographic resource to a cited bibliographic resource, created by the performative act of an author citing a published work that is relevant to the current work, typically made by including a bibliographic reference in the reference list of the citing work, or by the inclusion within the citing work of a link, in the form of an HTTP Uniform Resource Locator (URL), to the cited bibliographic resource on the World Wide Web.

The class Citation has sub-classes defining a particular type of citation.

  • Self-citation: a citation in which the citing and the cited entities have something significant in common with one another. Sub-classes include:
    • Affiliation self-citation: a citation in which at least one author from each of the citing and the cited entities is affiliated with the same academic institution.
    • Author network self-citation: a citation in which at least one author of the citing entity has direct or indirect co-authorship links with one of the authors of the cited entity.
    • Author self-citation: a citation in which the citing and the cited entities have at least one author in common.
    • Funder self-citation: a citation in which the works reported in the citing and the cited entities were funded by the same funding agency.
    • Journal self-citation: a citation in which the citing and the cited entities are published in the same journal.
  • Journal cartel citation: a citation from one journal to another journal which forms one of a very large number of citations from the citing journal to recent articles in the cited journal.
  • Distant citation: a citation in which the citing and the cited entities have nothing significant in common with one another over and beyond their subject matter.

New object properties

  • has citing document: The bibliographic resource which acts as source for the citation.
  • has cited document: The bibliographic resource which acts as target for the citation.

New data properties

  • has citation creation date:The date on which the citation was created. This has the same numerical value as the publication date of the citing bibliographic resource, but is a property of the citation itself. When combined with the citation time span, it permits that citation to be located in history.
  • has citation time span: The temporal characteristic of a citation, namely the interval between the publication date of the cited entity and the publication date of the citing entity.

Changes to CiTO, the Citation Typing Ontology

To complement these additions to the OpenCitations Data Model, and to permit these richer characteristics of citations to be encoded in RDF, we have additionally made the following changes to CiTO, the Citation Typing Ontology.

New classes

The class cito:SelfCitation has been renamed cito:AuthorSelfCitation, with an unchanged definition (“a citation in which the citing and the cited entities have at least one author in common”).

A new class cito:SelfCitation has been created, with same the more general definition as for this sub-class in the OCDM (“a citation in which the citing and the cited entities have something significant in common with one another”). In CiTO, this now includes five new sub-classes:

  • cito:AuthorSelfCitation
  • cito:JournalSelfCitation
  • cito:FunderSelfCitation
  • cito:AffiliationSelfCitation
  • cito:AuthorNetworkSelfCitation

with the definitions given above for these sub-classes in the OCDM.

New object properties

To complement the OCDM properties, we have within CiTO the following object properties:

  • cito:hasCitedEntity (“A property that relates a citation to the cited entity”) and
  • cito:hasCitingEntity (“A property that relates a citation to the cited entity”).

CiTO also has the following relevant object property:

  • cito:sharesPublicationVenueWith

with the sub-property cito:sharesJournalWith.

New data properties

To match the additions in the OCDM, we have added these new data properties to CiTO, which have the same definitions as those in the OCDM:

  • cito:hasCitationCreationDate
  • cito:hasCitationTimeSpan.

In addition, the class cito:AuthorNetworkSelfCitation is accompanied by the new data property:

  • cito:hasCoAuthorshipCitationLevel

which specifies the minimal distance that one of the authors of a citing entity has with regards to one of the authors of a cited entity according to their co-author network. For instance, a citation has a co-authorship citation level equal to 1 if at least one author of the citing entity has previously published as co-author with one of the authors of the cited entity. Similarly, we say that a citation has a co-authorship citation level equal to 2 if at least one author of the citing entity has previously published as co-author with someone who him/herself has previously published as co-author with one of the authors of the cited entity. And so on.

Describing a citation in RDF

Describing a citation between two articles in RDF as a simple link is straightforward but relatively uninformative:

<https://w3id.org/oc/corpus/br/1>
      cito:cites
          <https://w3id.org/oc/corpus/br/18> . 

The alternative RDF description of a citation as a first-class date entity could include the following triples (omitting any provenance information in this example), where br/1 and br/18 are the internal identifiers for the citing bibliographic resource and the cited bibliographic resource within the OpenCitations Corpus:

<https://w3id.org/oc/virtual/ci/1-18> a cito:Citation ;
     cito:hasCitingEntity <https://w3id.org/oc/corpus/br/1> ;
     cito:hasCitedEntity <https://w3id.org/oc/corpus/br/18> ;
     cito:hasCitationCreationDate "2016"^^xsd:gYear ;
     cito:hasCitationTimeSpan "P10Y"^^xsd:duration ;
     datacite:hasIdentifier <https://w3id.org/oc/virtual/id/ci-1-18> .

The meaning of “virtual” in the URI of this citation is explained in the following blog post about the OpenCitations Data Model.

The following diagram prepared by Silvio Peroni shows the semantic relationships for a citation currently handled by the OpenCitations Corpus (omitting the sub-classes of the class cito:Citation).  Explanation of OCI, the Open Citation Identifier, is given in a subsequent post.

References

[1]     Matthew L. Wallace, Vincent Larivière and Yves Gingras (2012. A Small World of Citations? The Influence of Collaboration Networks on Citation Practices.  PLoS ONE 7(3): e33339. https://doi.org/10.1371/journal.pone.0033339

[2]     Philippe Mongeon, Ludo Waltman and Sarah de Rijcke (2016). What do we know about journal citation cartels? A call for information.  CWTS blog post. Available at https://www.cwts.nl/blog?article=n-q2w2b4

[3]       Ludo Waltman and Caspar Chorus (2016). Journal self-citations are increasingly biased toward impact factor years. CWTS blog post. Available at https://www.cwts.nl/blog?article=n-q2x264

Posted in Bibliographic references, Citations as First-Class Data Entities, Ontologies, Open Citations, Semantic Publishing | Tagged , , , , | 2 Comments

Citations as First-Class Data Entities: Introduction

Citations are now centre stage

As a result of the Initiative for Open Citations (I4OC), launched on April 6 last year, almost all the major scholarly publishers now open the reference lists they submit to Crossref, resulting in more than half a billion references being openly available via the Crossref API.

It is therefore time to think carefully about how citations are treated, and how they might be better handled as part of the Linked Open Data Web.

Citations are normally treated simply as the links between published entities.

Conventional citation

However, an alternative richer view is to regard a citation as a data entity in its own right.

First class citation

This permits us to endow a citation with descriptive properties, such as

has citation creation date:   3rd March 2015
has citation time span:       6 years, 5 months and 23 days
has type:                     Self-citation
has identifier:               oci:7295288-3962641

[Note: a later blog post entitled “Open Citation Identifiers” will include an explanation of the identifier shown here.]

Advantages of treating citations as First-Class Data Entities

  • All the information regarding each citation is available in one place.
  • Citations become easier to describe, distinguish, count and process.
  • If available in aggregate, citations described in this manner are easier to analyze using bibliometric methods, for example to determine how citation time spans vary by discipline.

Requirements for citations to be treated as First-Class Data Entities

  • They must be definable in a machine-readable manner as a member of the class “Citation”, and describable using appropriate ontology terms.
  • They must have metadata structured using a generic yet appropriately detailed data model.
  • They must be storable, searchable and retrievable in an open database designed for bibliographic citations.
  • They must be identifiable using a global persistent identifier scheme.
  • There must be a Web-based identifier resolution service that takes the citation identifier as input and returns a description of the citation.

Blog post detailing how these requirements are met

Subsequent blog posts will describe how we at OpenCitations have satisfied these requirements, permitting citations to indeed be treated as First-Class Data Entities:

  1. Citations as First-Class Data Entities: Citation Descriptions
  2. Citations as First-Class Data Entities: The OpenCitations Data Model
  3. Citations as First-Class Data Entities: The OpenCitations Corpus
  4. Citations as First-Class Data Entities: Open Citation Identifiers
  5. Citations as First-Class Data Entities: The Open Citation Identifier Resolution Service
Posted in Bibliographic references, Citations as First-Class Data Entities, Ontologies, Open Citation Identifiers, Open Citations, Semantic Publishing, Uncategorized | Tagged , , , , | 10 Comments

OpenCitations and the Initiative for Open Citations: A Clarification

Some folk are confused, but OpenCitations and the Initiative for Open Citations, despite the similarity of their names, are two distinct organizations.

OpenCitations (http://opencitations.net) is an open scholarly infrastructure organization directed by Silvio Peroni and myself, and its primary purpose is to host and build the OpenCitations Corpus (OCC), an RDF database of scholarly citation data that now contains almost 13 million citation links.

In contrast, the Initiative for Open Citations (I4OC; https://i4oc.org) is a separate and independent organization, whose founding was spearheaded by Dario Taraborelli of the WikiMedia Foundation.  OpenCitations was just one of several organizations that founded the Initiative for Open Citations, as documented at https://i4oc.org/#founders.

I4OC is a collaboration between scholarly publishers, researchers, and other interested parties to promote the unrestricted availability of scholarly citation data, but it does not itself host citation data.

Within a short space of time, I4OC has persuaded most of the major scholarly publishers to open their reference lists submitted to Crossref, so that the proportion of all references submitted to Crossref that are now open has risen from 1% to over 50%.

These references are now available for OpenCitations to harvest into the OpenCitations Corpus and publish in RDF, as well as for others to harvest and use as they wish.

All clear now?

Posted in Bibliographic references, open access, Open Citations, Open scholarship | Tagged , , , , | 1 Comment

Oxford University Press opens its references!

Good news!  Today, on January 16th 2018, Oxford University Press (OUP) announced its participation in the Initiative for Open Citations, and requested Crossref to turn on reference sharing for all OUP deposited references from more than half a million publications.  Oxford University Press is the largest university press in the world, publishing in 70 languages and 190 countries.

OUP logo

Their announcement is at https://academic.oup.com/journals/pages/announcements_from_oup/oup_joins_I4OC.

OUP now joins the elite band of four university presses that have already made their references open at Crossref in response to the I4OC call (https://i4oc.org/#publishers).

This decision by OUP has been a long time in gestation – see my 2012 post Oxford University Press to support Open Citations – but is no less welcome for that!

Posted in Bibliographic references, open access, Open Citations, Open scholarship | Tagged , , , , , | 1 Comment