What is Related Work?
Related Work is a user-friendly Web application developed to provide a means of browsing citation links, as the basis for a planned recommendation service for the best articles to read. The references underpinning Related Work are those contained within the preprints held by the widely-used Cornell preprint server ArXiV, comprising some 750,000 articles in the fields of mathematics, physics, astronomy and the other ‘hard sciences’.
Related Work was initiated in March 2012 by Dr Heinrich Hartmann and René Pickhardt, two mathematicians with strong software development skills and interests in social media, after brainstorming about what new services over the ArXiV papers they would like to use. It was the lack of basic features, like related work recommendations and a discussion system, that led them to start their own website around the ArXiV content. It quickly became apparent that access to reference data was absolutely crucial for building a sensible recommendation service.
Following bulk download of the open-access ArXiV articles, the bibliographic references were extracted from each article and placed in a Neo4J database. (Neo4J is the leading NOSQL graph database, good for storing document nodes and links between them.)
On top of these data, a demonstration Related Work web interface was developed, permitting users to search by title, keyword or author name. Enhancements to this interface are already in prototype. That work was completed in October 2012, and was coupled with full release of the underlying data and the source code under open licenses. Related Work does not presently expose its content in RDF as open linked data.
Collaboration between Related Work and the Open Citations Project
The collaboration between Related Work and the Open Citations Project started in October 2012, when Heinrich’s search for new sources for citation data led him to the Open Citations Corpus. We started talking about how we could support each other’s work, and soon discovered that we were facing very similar problems and had a shared interest in the semantic enhancement of citation data.
In November we decided that, because our aims of making bibliographic reference data openly available were so well aligned, and because our underlying data were so similar, we should immediately merge the on-going developments of the Open Citation Corpus and the Related Work project. This would maximize the effectiveness of the limited developer effort that we had available to take this work forward.
This decision immediately promised to bring the Open Citations Corpus a doubling in the volume of our citation data, the input of the two experienced and committed web developers from Related Work (namely Heinrich and René), and a close link to a leading semantic web research group within the Institute for Web Science and Technologies at the University of Koblenz-Landau, at which they both work. That relationship bears an additional potential of getting computer science students in Koblenz involved with the project. Indeed, an undergraduate research project on personalized auto-completion for Related Work is currently being supervised by René.
Reciprocally, the development effort and prior experience of Richard Jones, Ben O’Steen, Mark MacGillivray and their colleagues at Cottage Labs, developed during the Open Bibliography Project and currently being applied within the Open Citations Extension Project, provided solutions to problems being faced by Related Work, particularly relating to BibJSON and BibServer for handling bibliographic records, and to Cottage Lab’s front-end faceted search and browse software Edjo, which works over both Apache Solr and ElasticSearch indexes.
For branding, at least in the short term, we proposed to keep the name “Open Citations Corpus” for the underlying data corpus and its infrastructure, the Open Citations Corpus Datastore (OCCD), and the name “Related Work” for the user-oriented services built on top of these citation data.
The following blog post explains how this joint project is now being taken forward.
David Shotton writes: With the return of Heinrich Hartmann and René Pickhardt to Germany and their involvement in other things, the potential collaboration between the Open Citations Project and their Related Works project came to nothing. Separately, the corpus has recently been renamed OpenCitations and given a new lease of life, described here, with Silvio Peroni as Co-Director. For this reason, this blog was re-named “OpenCitations” on 30th March 2017.