Open Citations is dead. Long live OpenCitations.

OpenCitations logo 50% with words greyBG

In October 2015, I asked Silvio Peroni, my long-term colleague in the development of the SPAR Ontologies, to become Co-Director of the Open Citations Project, and to work with me in taking forward the prototype Open Citations Corpus (OCC), originally developed at the University of Oxford with the support of Jisc, with the aim of developing it into a production service of real use to scholars.

The result is OpenCitations, a new instantiation of the OCC hosted by the Department of Computer Science and Engineering of the University of Bologna, based on a new metadata schema and employing several new technologies to automate the ingestion of fresh citation metadata from authoritative sources.

Since the beginning of July 2016, OpenCitations has been ingesting and processing accurate bibliographic references harvested from the reference lists of scholarly papers available in Europe PubMed Central, enriched by metadata from Crossref. These scholarly citation data are described using the SPAR Ontologies according to the new OpenCitations metadata document [1], and are published under a Creative Commons public domain dedication (CC0), so that others may freely build upon, enhance and reuse them for any purpose, without restriction under copyright or database law. We have described the new OpenCitations Corpus, and the new software developed by Silvio to create it, in [2].

OpenCitations is being continuously populated from the scholarly literature, and, as of 30th March 2017, has ingested the references from 123,989 citing bibliographic resources, and contains information about 5,307,857 citation links to 3,469,648 cited resources.

The whole OCC is now available for querying (via SPARQL), and for browsing by means of a very simple Web interface that shows only the data about bibliographic entities (e.g. https://w3id.org/oc/corpus/br/1). Additional more user-friendly interfaces will be available in the coming months. The entire contents of the OpenCitations Corpus (OCC) are also archived every month as data dumps that are made available online through Figshare. Each dump comprises several zip archives, each containing either data or provenance information of a particular sub-dataset of the OCC.

Despite the fact that OpenCitations presently contains only a small proportion of global citation data, it is important to realize that, because of the very nature of scholarly citation, even this partial coverage includes citations of the most important papers in every biomedical field, these critical papers being characterized by the high number of their inward citation links.

[1] Silvio Peroni, David Shotton (2016). Metadata for the OpenCitations Corpus. figshare. https://dx.doi.org/10.6084/m9.figshare.3443876

[2] Silvio Peroni, David Shotton, Fabio Vitali (2016). Freedom for bibliographic references: OpenCitations arise. Proceedings of 2016 International Workshop on Linked Data for Information Extraction (LD4IE 2016): 32-43.
https://w3id.org/oc/paper/occ-lisc2016.html

Advertisements
This entry was posted in Open Citations and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s