As introduced in a previous blog post, COCI is the OpenCitations Index of Crossref open DOI-to-DOI references, all released as CC0 material. It is our first OpenCitations Index of open citations, in which we have applied the concept of citations as first-class data entities to index the contents of one of the major databases ofopen scholarly citation information, namely Crossref, and to render and make available this information in machine-readable RDF.
We are now proud to announce a new release of COCI, the second, which now contains almost 445 million DOI-to-DOI citation links coming from both ‘the ‘Open’ and the ‘Limited’ sets of Crossref reference data. This represents an increase of 42% in the number of indexed citations, compared with the initial release of COCI on 4th June 2018, which indexed 316,243,802 citations involving 45,145,889 bibliographic resources. In addition, the data model for COCI has now been extended so as to state directly the presence of journal self-citations and author self-citations.
Extended data model
The previous data model used for storing the citation data in COCI – which is itself a subset of the OpenCitations Data Model – has been extended so as to keep track of two particular types of self-citation, as shown in the following figure.
Generally speaking, a self-citation is citation in which the citing and the cited entities have something significant in common with one another, over and beyond their subject matter. The two kinds of self-citations we are now tracking are:
- journal self-citation (class
cito:JournalSelfCitation), i.e. a citation in which the citing and the cited entities are published in the same journal. This information has been obtained by comparing the ISSNs of the journals where two journal articles related by a citation have been published, as provided by Crossref. If they share the same ISSN, then the citation is described as journal self-citation;
- author self-citation (class
cito:AuthorSelfCitation), i.e. a citation in which the citing and the cited entities have at least one author in common. This information has been obtained by comparing the ORCIDs associated to the authors of a citing bibliographic entity with the ORCIDs of the authors of the cited entity. In this case, if any ORCID is shared, then the citation is described as author self-citation. This categorization excludes authors bearing the same name where the ORCIDs are not known, since, while these instances may be author self-citations, they may alternatively merely represent name coincidences of distinct individuals.
It is worth mentioning that, while the ISSN information are usually present in the data returned by Crossref, the presence of ORCID id data associated with the authors of the various paper represented in Crossref is presently very limited, so that the number of recorded author self-citations in COCI is likely to be a considerable underestimate.
In this new release, COCI contains 445,826,118 citations, of which 30,114,696 are recorded as journal self-citations and 251,699 are recorded as author self-citations.
Extended REST API
The REST API for querying COCI has been extended so as to return information about the aforementioned self-citations. In particular, the response to the operations “references” and “citations” now has two more fields, i.e. “journal_sc” and “author_sc”, that are set to “yes” if the citation returned is a journal self-citation or an author self-citation respectively, or “no” otherwise.
Using the capabilities of the REST API, it is also possible to keep in or exclude from the result set those citations that are (or are not) one of the aforementioned types of self-citation. For instance, the following call
returns all the citations having the article with DOI “10.1002/pol.1987.140251103” that are journal self-citations.
In this blog post we have introduced the second release of COCI, the OpenCitations Index of Crossref open DOI-to-DOI references, a citation index which now contains almost 450 million open citations created from the ‘Open’ and ‘Limited’ references included within Crossref.
As a reminder, all the data in COCI:
- can be queried by means of the COCI SPARQL endpoint;
- can be retrieved by using the COCI REST API;
- can be searched by using the COCI Search Interface;
- are available as dumps on Figshare in CSV and N-Triples while the whole triplestore is available on The Internet Archive, the most recent of which is dated November 2018.