Reflections on the global citation graph

In his call for open citations, Dario Taraborelli hailed the scholarly citation graph (in which the nodes (vertices) are individual academic publications and the links (edges) represent bibliographic citations from one publication to another) as one of humankind’s most important intellectual achievements.

We all understand that the inclusion within our own academic publications of bibliographic references to the works of others is one of the most explicit ways of acknowledging the thoughts, discoveries, achievements and influences of other scholars, and their contributions to our own work. Not only does what we gain from their publications enable us to make intellectual progress, by “standing on the shoulders of giants” as Newton once famously observed [1], but the influence of these publications extends forward in time across the entire intellectual landscape, like gigantic shadows cast at sunset, whether or not those influenced by these publications have occasion to reference them in their own works.

A bibliographic citation is not only “a conceptual directional link from a citing entity to a cited entity, created by a human performative act of making a citation”, but it is additionally both enduring and retrospective. Enduring, because once made it persists for ever within the global corpus of scholarly literature, and retrospective because (with the exception of occasional contemporaneous citations) the cited publication predates the citing publication.

At the anterior margin of a crawling cell, cellular protrusive extension (for example of a pseudopodium) is achieved by the catalysed polymerization of new filaments of the cytoskeletal protein actin from attachment sites on an existing stationary actin filament network, pushing the cell margin forward [2]. The scholarly citation network (or citation graph, the two terms here being used interchangeably) is similarly dynamic and temporally directional, being extended forward as new works of scholarship are published. Extension of knowledge is achieved by the catalytic inspiration provided by existing academic publications, themselves temporally stationary within the expanding citation network, leading to the publication of new works of scholarship that cite these previous publications and thus extend the citation network further into the future. The citation graph is thus not just an acyclic directed graph, but an acyclic temporally directed graph. Indeed, it is this temporal aspect of the citation network that is one of its most important features.

To use another analogy, the human genealogical tree is inherently multidimensional and difficult to represent pictorially in its entirety, because each new birth brings together the family trees of the child’s two parents. However, unless the parents are seriously promiscuous, the resulting genealogical tree is not impossibly complex. In contrast, the scholarly citation network is much more highly interlinked, since each new publication cites not just two but many preceding (‘parent’) publications, which themselves may beget many other citations.

Visualization of the global scholarly citation graph, or portions of it, is thus inherently difficult, and the important temporal aspect of the graph is the one ignored by almost every method used for visualizing aspects of that graph. Existing methods may take the broad view, showing the links, and the strength of those links, between one scholarly domain and another, thus visualizing the ‘structure of science’. Alternatively, they may take a more detailed view of a small section of the graph, visualize the proximity of individual publications to one another. Often a radial display is chosen for this, that shows in closest proximity those papers directly referenced by the selected publication in the centre, then at a greater radius those papers referenced by the cited papers shown in the inner circle, and so on. Because of the graph’s complexity, such displays quickly looses intelligibility after two citation links.

Among a small number of visualization applications that do not ignore the temporal aspect of the graph is Citeology, a temporally based citation network visualization tool developed some years ago by Justin Matejka and colleagues at the design software company Autodesk [3]. Unfortunately, this innovative software prototype was not central to that company’s mission, development ceased, and the Citeology Java app is no longer available. However, in his last email to me, Justin Matejka kindly offered to help others re-create this application.

There is thus an urgent need for innovative new open-source visualization tools that will clearly and dynamically display portions of the global citation graph, for example the direct and indirect citation connections between any two publications or any two individuals, along the temporal axis of publication date. Developers within the open science community please step forward!

References

[1] Isaac Newton, in a 1675 letter to Robert Hooke, wrote “If I have seen further it is by standing on the shoulders of Giants.” https://discover.hsp.org/Record/dc-9792/

[2] Bruce Alberts et al. (2014). Molecular Biology of the Cell. 6th Edition. Garland Science. Chapter 16, The Cytoskeleton.

[3] Justin Matejka, Tovi Grossman, George Fitzmaurice (2012). Citeology: Visualizing Paper Genealogy. ACM Extended Abstracts on Human Factors in Computing Systems. https://www.autodesk.com/research/publications/citeology https://d2f99xq7vri1nk.cloudfront.net/CiteologyVideo.mp4

Posted in Bibliographic references, Information visualization, Open Citations, Open scholarship, Open Science, Semantic Publishing | Tagged , , , , | Leave a comment

OpenCitations’ compliance with the Principles of Open Scholarly Infrastructure

What should an open scholarly infrastructure look like? 

An answer to this tough question can be found in the original February 2015 blog post by Geoffrey Bilder, Jennifer Lin and Cameron Neylon

Bilder G., Lin J., Neylon C. (2015) Principles for Open Scholarly Infrastructure , http://dx.doi.org/10.6084/m9.figshare.1314859

and in the summary of the principles to be found as:  

Bilder G, Lin J, Neylon C (2020), The Principles of Open Scholarly Infrastructurehttps://doi.org/10.24343/C34W2H : 

Infrastructure at its best is invisible. We tend to only notice it when it fails.  If successful, it is stable and sustainable. Above all, it is trusted and relied on by the broad community it serves. Trust must run strongly across each of the following areas: running the infrastructure (governance), funding it (sustainability), and preserving community ownership of it (insurance)”. 

These areas are fully define the Principles of Open Scholarly Infrastructure (POSI), which provide a set of guidelines by which open scholarly infrastructure organizations and initiatives that support the research community can be run and sustained.  

As far as we are aware, Crossref was the first infrastructure to publish its compliance with POSI, detailed in Geoffrey Bilder’s December 2020 blog post

Crossref’s Board votes to adopt the Principles of Open Scholarly Infrastructure.

OpenCitations too espouses POSI and, in January 2021, we monitored the extent of our own compliance with POSI, the results of which are shown in the following diagram. 

Governance 

 Coverage across the research enterprise We gather citations from global scholarship 
 Stakeholder governed Advisory board 
currently lacks
executive power and is not elected 
 Non-discriminatory membership Membership open to all those espousing 
open science 
● Transparent operations Everything is open 
 Cannot lobby OpenCitations lobbies to achieve open 
scholarly citations 
and bibliographic 
metadata; 
it does not engage in political or financial 
lobbying 
 Living will Since all our data open, others can 
recreate our service 
 Formal incentives to fulfill mission & wind-down No formal plan for wind-down 
has yet been drawn up 

Sustainability 

 Time-limited funds used only for time-limited activities Grant income should 
be used solely for grantprojects 
 Goal to generate surplus Goal not yet realized – 
income so far too limited 
 Goal to create contingency fund to support operations for 12 months Goal not yet realized – 
income so far too limited 
 Mission-consistent revenue generation Membership fees and 
solicited donations 
 Revenue based on services, not data All data and services freely given to community, and thus do not 
generate income 

Insurance 

 Open source All software under open source licenses 
 Open data All data available 
under CC0 waiver 
 Available data All data available via REST APIs, SPARQL endpoints, query interfaces and data dumps 
 Patent non-assertion We will not 
patent anything: 
OpenCitations’ 
infrastructure 
is free to replicate 

 
We at OpenCitations are proud of the results reached in the Insurance area, but realise that we still have some was to go in the other areas. Although the general situation is already satisfying, we are working to strengthen our weak points.  

Posted in Data publication, Open Citations, Open scholarship, Open Science | Tagged , , , , | Leave a comment

Swiss Institutions pledge 89,250 Euros to OpenCitations

We want to express our gratitude to the 18 institutional members and customers of the Consortium of Swiss Academic Libraries which have now pledged 89,250 euros to support OpenCitations over the next three years. This generous donation is part of a total funding of 320,250 euros destined for the three services currently being promoted by SCOSSDOAB and OAPEN, PKP, and OpenCitations.  

The Consortium of Swiss Academic Libraries involves all cantonal universities, the ETH Domain, the Swiss National Library and other institutions from the fields of education and research as well as from the public sector, with the core task of licensing of e-resources (electronic journals, databases, eBooks) for its members and customers.  

As can be read in this post, Susanne Aerni, Head of Consortial Services commented on the pledge: “This pledge exemplifies the broad Swiss commitment to vital infrastructure for Open Access and Open Science. All Swiss Universities, all institutions of the ETH-domain, some Universities of Applied Science, CERN, and the Swiss National Science Foundation support these three vital services through the Consortium of Swiss Academic Libraries.” 

Thank you, Switzerland, for your support to OpenCitations! 

Posted in Open Citations, Open scholarship, Open Science | Tagged , , , | Leave a comment

Crossing a significant threshold: more than one billion citations now available in COCI!

“The competitive benefits of closing access to citation data diminish with each new citation released to the public domain, but the benefits of open data remain. Going forward, citation data is almost completely public domain”.

With these words, from the article “A tipping point for open citations data” (July 15, 2021), Ian Hutchins celebrated the threshold crossing of one billion citations on public-domain databases in February 2021.

Now, a new significant milestone has been reached. We are enthusiastic to announce that COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations has just been extended with 334 million additional citations. Its most recent release, the COCI July 2021 release, now contains a total of 1.09 billion DOI-to-DOI citation links derived from open references within Crossref,which includes the references of articles deposited or opened in Crossref between November 2020 and January 2021.

These numbers make us proud, and confirm the essential value of the Initiative for Open Citations (I4OC). Since 2018, the mission of I4OC has been to persuade publishers to provide open citation data by means of the Crossref platform. The I4OC untiring commitment has led the major academic publishers to a progressive change of heart regarding open citations, and the scholarly community to a deeper interest in this openness.

These factors contributed to the creation of COCI in 2018, the first open citation index created by OpenCitations, in which we applied the concept of citations as first-class data entities (Heibi I., Peroni S., Shotton D., 2019). Over the last three years, COCI has been extended in a series of releases, by harvesting citations mostly from Crossref data dumps, starting from an initial coverage of 300 million citations (First release).

A crucial event that preceded (and delayed!) this latest COCI release was Elsevier’s endorsement in the DORA Declaration on Research Assessment in December 2020, thereby making “reference lists for all articles published in Elsevier journals openly available via Crossref so they can be available for reuse. This means other important initiatives like I4OC can draw on this metadata”. As described in our previous post, Elsevier’s welcome commitment led to the opening of many previously closed references from its numerous academic journals submitted to Crossref. Now, after an extended period of data ingestion and processing, all these newly opened Elsevier references are available at OpenCitations within COCI.

Elsevier’s involvement has both an effective and a symbolical value. Even if publishing more than one billion citations is a thrilling achievement, and – as Hutchins wrote – we are now at a tipping point with regard to open citations data, this milestone is not the last stop. Together with the other organizations and projects that participate in the Initiative for Open Citations, we will keep claiming the urgency for the remaining academic publishers to join our cause, and sharing our values with the whole academic community to make all existing citations data freely open and accessible. Recalling what Dario Taraborelli wrote in the conclusion of his article “The citation graph is one of humankind’s most important intellectual achievements“, “the world is waiting for the citation graph to become a public good”.

Posted in Bibliographic references, Citations as First-Class Data Entities, Data publication, open access, Open Citation Identifiers, Open Citations, Semantic Publishing | Tagged , , , , , , , , | Leave a comment

OpenCitations at LIBER Annual Conference 2021: ‘How Can Open Infrastructures Support the Role of Research Libraries?’

For the second year, OpenCitations has taken part in the LIBER annual conference.  LIBER (Ligue des Bibliothèques Européennes de Recherche – Association of European Research Libraries) is a network that gathers 440 research libraries, based in more than 40 countries all over the world, with the mission of supporting Europe’s research libraries by highlighting their value to policymakers, providing resources and training, and forming valuable partnerships. 

Since 1951, the LIBER Annual Conference is a key event for the entire network, a keenly anticipated meeting for research library professionals whose mission is “to identify the most pressing needs for research libraries, and to share information and ideas for addressing those needs”. Due to the ongoing pandemic restrictions, the 50th LIBER meeting (23-25 June 2021) was held online, as was the 2020 meeting, with digital co-hosting by the University of Belgrade Library in Serbia. The online-showcase format, however, didn’t constrain the creation of a vital virtual square, fostered by the voices of 70 speakers. The main theme of the conference, “Libraries and Open Knowledge: from vision to implementation” was deepened in 12 parallel sessions.

Professor Silvio Peroni, Director of OpenCitations, participated in Session #5 ‘How Can Open Infrastructures Support the Role of Research Libraries?’ with a presentation dedicated to the benefits of Open Infrastructures for libraries, dialoguing with James MacGregor (interim Managing Director of the Public Knowledge Project), Joanna Ball (Head of Roskilde University Library), and Niels Stern (director of OAPEN and co-Director of DOAB).  

The session, chaired by Maaike Napolitano (National Library of the Netherlands) opened with a presentation by Fidan Limani (Research assistant at ZBW– Leibniz Information Centre for Economics) about the integration of scholarly artifacts from the domain of economics using Knowledge Graphs (KG), and the creation of a network of entities describing objects of interest and connections, while keeping a library perspective. The use of citation links connecting datasets and citations, and the adoption of ontologies and data exportation in RDF would facilitate a possible beneficial collaboration between ZBW and Open Infrastructures such as OpenCitations (whose data is itself in the form of a Knowledge Graph). 

OpenCitations also shares some common features with the other Open Infrastructures described in the second presentation: the financial support from SCOSS project; the community-based approach; and their promising value for libraries and the entire scholarly community.  

OpenCitations is an independent not-for-profit infrastructure organization dedicated to open scholarship and the publication of open bibliographic and citation data by the use of Semantic Web (Linked Open Data) technologies, engaged in advocacy for open citations and open bibliographic metadata, as a founding member of both the Initiative for Open Citations (I4OC) and the Initiative for Open Abstracts (I4OA). It provides data containing more than 7 hundred million citations that the community can use for any purpose. Such data can be crucial as a vehicle for use in national and international research evaluation exercises to make such activities more transparent and reproducible as compared to other proprietary services. Librarians can use OC citation data (e.g., via our REST API) to enhance or develop tools to support their authors, researchers, students, institutional administrators in different kind of contests, for instance by providing metrics to monitor research at your institution and by improving the discoverability of research products such as publications and data. 

OAPEN is a no-profit foundation dedicated to increase the discoverability of open access books and trust around them. They are running three Open-Source platforms enabling open access to books:  the Directory of Open Access Books (DOAB) – a freely available basic indexing service easy integrable within library catalogues; OAPEN Library – a publication platform dedicated to hosting, preserving and distributing books; OAPEN OA Books Toolkit – public information resource for authors to build trust around open-access books. 

PKP (Public Knowledge Project) is a software and library project, consisting of three applications (Open Journal System, Open Pre-printer System and Open Monograph Press).  

The dialogue during this LIBER session wasn’t a mere presentation of these projects and their technical properties: the speakers emphasized the importance of ensuring the participation and the engagement of the stakeholder community, pointed out the crucial value of the support received – not only financial – from Research Libraries, and discussed how such Open Infrastructures can be beneficial for libraries. 

How can libraries support Open Infrastructures? And what role do they play in a long-term solution? According to Joanna Ball, from a librarian perspective, it’s not only a who-benefits-whom problem, but it’s more about finding a “third way, about developing mutually beneficial partnerships, and going beyond the traditional way of approaching things so that we can really play to each other’s strengths.” 

This approach is fully aligned with OpenCitations’ intentions. As Silvio Peroni underlined, in most of cases the active collaboration between Open Infrastructures and libraries is not only about the financial support, but in cooperatively reach a common goal. In particular, “if infrastructures like OpenCitations provide appropriate and easy-to-use interfaces and tools that allow librarians to contribute appropriate bibliographic metadata, and if librarians are willing to enter such metadata from their own records, libraries may become a significant reliable source of this kind of information”. The result of such a ‘crowd-sourced’ entry of bibliographic metadata by libraries would be an enrichment of the overall global open knowledge graph made available through citational links.  

In the last presentation, dedicated to two services provided by OPERAS, Emilie Blotière, (CNRS) and Tiziana Lombardo (Net7) reiterated the value of scholarly communication. COESO and GO TRIPLE, funded by the European Commission, aim in fact to create a persistent dialogue in the Social Sciences and Humanities community, by tackling the fragmentation and becoming a meeting point among different communities.  

What emerged from the session is the importance of communication, cooperation and networking between Open Infrastructures and Libraries, and this is a message that perfectly matches with the core values of LIBER, collaboration and inclusivity. The next LIBER annual conference is scheduled for June 2022 in Odense, hopefully recreating the physical and enthusiastic gathering of the previous meetings.  

You can find the recording of the full session here: LIBER 2021 Session #5: How Can Open Infrastructures Support the Role of Research Libraries? 

You can find the slides of the session on Zenodo.

Posted in Data publication, Open abstracts, open access, Open Citations, Open scholarship, Open Science, Uncategorized | Tagged , , , , , , , | Leave a comment

New research fellowship position to work on the EOSC

OpenAIRE-Nexus is an H2020 project funded by the European Commission which aims at bringing together, within the European Open Science Cloud (EOSC), fourteen new services focused on the development and promotion of Open Science. OpenCitations is directly involved in this project through the Department of Classical Philology and Italian Studies at the University of Bologna.

In the context of this OpenAIRE-Nexus project, our goal is to make all services offered by OpenCitations compatible with OpenAIRE, so as to guarantee semantic and technical interoperability with all the other Open Science services available in the EOSC. For this purpose, we now seek applicants for a new one-year research fellowship to be held from May 2021 (renewable for an additional year), for which the application closing deadline is 31 March 2021.

The goal of the Research Fellowship is to study the current limitations of the OpenCitations infrastructure, and possible improvements to introduce into it, in order to integrate it with OpenAIRE and the EOSC. The Research Fellow, who will work in collaboration with Silvio Peroni, Director of OpenCitations, is expected to address issues relating to the provision of Web services, the management of distributed and heterogeneous databases, and data ingestion and conversion processes.

The Call for Applications (in Italian and in English) is available online on the website of the University of Bologna. It also includes an attachment with a description of OpenCitations and of the activities related to the position. The position has a net salary (exempt from income tax, after deduction of social security contributions) in excess of 20K euros per year. As indicated in the Call for Applications, candidates need to apply exclusively through the University of Bologna web portal.

For further information, please contact Silvio Peroni (email: silvio dot peroni at unibo dot it).

Posted in Job, Open Citations, Open Science | Tagged , , , | Leave a comment

Seeking applicants for three-year research fellowship position

A year ago, at the end of 2019, OpenCitations was selected by the Global Sustainability Coalition for Open Science Services (SCOSS, https://scoss.org) for its second round of crowdfunding support, since SCOSS believes that OpenCitations aligns well with Open Science goals and is an innovative service. The goal of such support is to enable OpenCitations’ operations over the next three years as it transitions into a global scholarly infrastructure organization with a secure financial footing. As part of this work, we now plan to strengthen the current technical and computational infrastructure (server, parallel processing, backup, etc.) used by OpenCitations, which is currently hosted at the University of Bologna.

For this purpose, we now seek applicants for a new three-year research fellowship to be held from March 2021, for which the application closing deadline is 7 February 2021. The principal goals of this research fellowship are:

  1. to study the current limitations of the OpenCitations infrastructure and introduce improvements, and
  2. to design and implement new software control tools that will enable us to manage the infrastructure more efficiently.

Additionally, the selected research fellow will be expected to address issues relating to the provision of Web services, the management of distributed and heterogeneous databases, OpenCitations’ data conversion and ingestion processes involving parallel computing, and the overall security of the infrastructure. Particular attention will need to be given to data preservation and to the long-term maintenance and updating of the infrastructure.

The Call for Applications (in Italian and in English) is available online on the website of the University of Bologna. It also includes an attachment with a description of OpenCitations and of the activities related to the position. The position has a net salary (exempt from income tax, after deduction of social security contributions) in excess of 23K euros per year. As indicated in the Call for Applications, candidates need to apply exclusively through the University of Bologna web portal.

Posted in Job, Open Citations, Open Science | Tagged , , , | Leave a comment