Why publishers should open their references

Why should the publishers of subscription-access journals, who presently generate income from the sale of access to peer-reviewed full text scholarly articles, be willingly open the reference lists of these articles, and contribute these to the Open Citations Corpus for publication as open linked data? I would like to suggest the following reasons:

1. There is a general move towards open data, which is widely regarded as a common good. This includes citation data, i.e. bibliographic references from one article to another (in RDF Turtle format: A cito:cites B . ).
2. The reference lists at the end of journal articles are works of scholarship by the authors, who have chosen to include certain references and exclude other potentially citable papers from the reference list. However, the references themselves are simply items of bibliographic data, formatted according to the journal style, and do not benefit from the author’s creative input.
3. The reference list, together with the front matter (including the bibliographic information about the article itself) and the abstract, has traditionally been included within the copyright protection enjoyed by the article as a whole. However, the bibliographic information about the article and the article’s abstract are commonly made freely available, for example through PubMed. This same openness should now be afforded to the reference list within each article.
4. There is a home for such reference citation data: the Open Citations Corpus has been specifically created to house and publish scholarly bibliographic citation data, and is now preparing to welcome article reference lists from subscription-access journals, to supplement those already contributed from open-access journals.
5. For those publishers who already contribute their reference information to CrossRef as part of its Cited-By Linking service, this can be accomplished without any change to the publisher’s own publishing workflows, just by giving permission for CrossRef to flag the articles of certain journals as having open references. Open Citations intends to collaborate with CrossRef by harvesting the reference lists from such flagged articles, parsing them into RDF, and adding them to the Open Citations Corpus. Provided that the references are already being submitted to CrossRef, no work will have to be done by the publisher, and no changes in publishing procedure will be involved.
6. Open Citations will publish each reference list as an independent RDF Named Graph, with a unique URI, thereby protecting the integrity of the article reference list as a unit of scholarship, the source of which will be explicitly acknowledged.
7. The open citations data will then be offered back to publishers to use as they wish, e.g. for visualization of citation networks, calculation of metrics, etc., providing easier and more usable access to their own citation data than is currently afforded by commercial providers, who do not provide such data in linked data format.
8. Publishers will also be free to host their own open citations data, should they wish to do so.
9. For the majority of publishers, who would still receive subscriptions on the full articles themselves, opening their article reference lists in this way will cost nothing in terms of lost revenue.
10. Indeed, participation in the Open Citation Corpus will bring the following benefits to subscription-access publishers:
– Access to services to be built over the aggregated open citations data, for example an automated reference correction service available to editors upon receipt of a manuscript, for the automated pre-publication correction of errors in reference lists prior to article publication.
– Increased exposure to users of the references to the publisher’s own journal articles – a form of advertising. While at first coverage among subscription-access publishers will be incomplete, this expanding Open Citation Corpus will, in true Web 2.0 style, become more useful the more publishers participate.
– Even while coverage is incomplete, the Open Citations Corpus by its very nature contains reference citations to all the key papers published in every field covered – currently to all the key papers published in every biomedical field, enabling readers more easily to identify and find the most highly cited papers of each contributing publisher.
– Opening citations data will result in white-listing and general good-will from funding agencies, government and other advocates of open data, who might otherwise mandate publication by grantees in alternative open-access journals.
– Opening citations data will lead to support from scholars and researchers themselves, who wil be more inclined to publish in that publisher’s journals, feeling that at last the publishers would be giving back to them some of their own data, rather than selling it back to them as at present.

As my next blog post shows, one leading subscription-access publisher is now willing to open its journal article references in the way I have suggested.  Others who would like to so the same should contact me at <david.shotton@zoo.ox.ac.uk>.

