Last September, I attended the Fifth Annual Conference on Open Access Scholarly Publishing, held in Riga, at which I had been invited to give a paper entitled The Open Citations Corpus – freeing scholarly citation data. A recording of my talk is available here, and my PowerPoint presentation is separately available here. My own reflections on the major themes of the conference are given in a separate Semantic Publishing Blog post.
While in Riga preparing to give that talk about the importance of open citation data, I received an invitation from Sara Abdulla, Chief Commissioning Editor at Nature, to write a Comment piece for their forthcoming special issue on Impact. My immediate reaction was that this should be on the same theme, an idea to which Sara readily agreed. The deadline for delivery of the article was 10 days later!
As soon as the Riga conference was over, I first assembled all the material I had to hand that could be relevant to describing the Open Citations Corpus (OCC) in the context of conventional access to academic citation data from commercial sources. That gave me a raw manuscript of some five thousand words, from which I had to distil an article of less than 1,300 words. I then started editing, and asked my colleagues Silvio Peroni and Tanya Gray for their comments.
The end result, enriched by some imaginative art work by the Nature team, was published a couple of weeks later on 16th October , and presents both the intellectual argument for open citation data, and the practical obstacles to be overcome in achieving the goal of a substantial corpus of such data, as well as giving a general description of the Open Citations Corpus itself and of the development work we have planned for it.
Because of the drastic editing required to reduce the original draft to about a quarter of its size, all material not crucial to the central theme had to be cut. I thus had the idea of developing the original draft subsequently into a full journal article that would include these additional themes, particularly Silvio’s work on the SPAR ontologies described in this Semantic Publishing Blog post , Tanya’s work on the CiTO Reference Annotation Tools described in this Semantic Publishing Blog post, and a wonderful analogy between the scholarly citation network and Venice devised by Silvio. I also wanted to give authorship credit to Alex Dutton, who had undertaken almost all of the original software development work for the OCC. For this reason, instead of assigning copyright to Nature for the Comment piece, I gave them a license to publish, retaining copyright to myself so I could re-use the text. I am pleased to say that they accepted this without comment.
Silvio and I then set to work to develop the draft into a proper article. The result was a ten-thousand word paper submitted to the Journal of Documentation a week before Christmas . We await the referees’ comments!
 Shotton D. (2013). Open citations. Nature 502: 295–297. http://www.nature.com/news/publishing-open-citations-1.13937. doi:10.1038/502295a.
 Peroni S and Shotton D (2012). FaBiO and CiTO: ontologies for describing bibliographic resources and citations. Web Semantics: Science, Services and Agents on the World Wide Web. 17: 33-34. doi:10.1016/j.websem.2012.08.001.
 Silvio Peroni, Alexander Dutton, Tanya Gray, David Shotton (2015). Setting our bibliographic references free: towards open citation data. Journal of Documentation, 71 (2): 253-277. http://dx.doi.org/10.1108/JD-12-2013-0166; OA at http://speroni.web.cs.unibo.it/publications/peroni-2015-setting-bibliographic-references.pdf
This is the main article about OpenCitations, which includes several background information and the main ideas and works supporting the whole project, the Corpus, and some possible future developments in terms of new kinds of data to be included, e.g. citation functions.