Like a kid with a new train set! Exploring citation networks

As part of the Open Citations Project, Alex Dutton recently completed a graphing plug-in for the Open Citations web site, that permits users to generate different kinds of graphs of citation networks by querying the Open Citation Corpus for a particular article, and either display the network of papers citing that article (input citations), papers cited by that article (output citations), or both.  These can be displayed on screen in the web browser in a variety of layouts, or conveniently downloaded in a number of useful formats.

THIS IS SOOOOO COOL!

Having survived the preparation and posting of the JISC Open Citations Project Final Blog Post last night, minutes before the midnight deadline, I’m now like a kid with a new train set, playing with this display tool and exploring the citation networks present in the Open Citation Corpus, something I have dreamed of doing for two years now.

Remember first that in the Open Citations Corpus we have some 200,000 citing articles – those within the Open Access Subset (OASS) of Pubmed Central – citing ~3.4 million papers out there in the big wide world, which are only recipients of citations.  The consequence of this limited corpus is that the majority of citation chains are of length one – from a paper in the OASS to a paper outside the OASS.  Not very interesting.  Add to this the fact that PubMed Central is new – over 90% of the papers in the Open Access Subset were published in the 21st Century, and 77% of them in the last 5 years.  Thus there are only a very few citation from articles within the OASS to other articles within the OASS.  That means that the maximum length of our citation chains, at present, to three or four on links the input side – a selected article may be cited by a chain of three or four other OASS articles, and three or four on the output side – the selected article may cite other OASS articles in addition to non-OASS articles, and these in turn will cite others.  However, in most cases, the citations chains are much shorter.

simple network

simple networkFigure 1. A simple citation network of input citation chain length of 2 links within the Open Citations Corpus, and an output chain length of 1 link – the selected article (red) receives citations from other OASS articles (green), and itself cites only articles outside the OASS (white).

Let’s start with something familiar – the article in PLoS Neglected Tropical Diseases by Reis et al. (2008) [1] that I used for our semantic publishing exemplar [2].  Its inward citation graph, limited to a citation chain length of two links, created by and copied from the Open Citations Project web site, looks like this:

input citations of Reis

Figure 2. The input citation network of Reis et al. (2008), limited to an citation chain length of 2 links.

I, of course, cited the Reis et al. (2008) paper [1] in our 2009 Adventures paper [3] that we based upon it, and also in my first paper on CiTO in 2010 [3], which also cites the Adventures paper.   Reis et al. (2008) is also cited by Fink et al. (2010) [4], who also cited our Adventures paper, and by Bourhy et al. (2010) [5], another PLoS Neglected Tropical Diseases paper in the OASS, which in turn is cited by Galloway and Levett (2010) [6], while our Adventures paper is also cited by Gerner and Nenadic (2010) [7].

The following image shows this graph as it was originally created within the Open Citations web page:

Input citations of Reis in web page

Figure 3. The same input citation network of Reis et al. (2008), as shown in the Open Citations web site.

Since Reis et al. (2008) has a reference list containing 52 references, its output citation graph is much more complex, even when limited to a citation chain length of 2, since several of its cited papers are also members of the Open Access Subset.  The following figure shows the whole output citation network a citation chain length of 2, which is too demagnified to be legible.

Citations by Reis

Figure 4. The outward citation network of Reis et al. (2008), limited to a citation chain length of 2 links.

The next figure shows a close-up of part of the previous diagram – the output citation network of Reis et al. (2008), again showing the Reis et al. (2008) paper in red, and one of the key papers it cites, Maciel et al. (2010) [8], a slightly earlier paper from the same research group, forming a second key node in the top right of the diagram.

Cited by Reis closeup

Figure 5. A close-up of a central portion of the outward citation network of Reis et al. (2008), limited to a citation chain length of 2 links.

Clearly, there is lots of information that can be extracted from these graphs, particularly when we display them in a tool like GraphViz that permits interactions with the data.  While the Open Citations web site simply displays such citation graphs created using one of several layout algorithms selected by the user, the raw data can also be downloaded in a variety of formats including GraphViz, GraphML and SVG, while the resulting network images can be downloaded in as PNG, JPEG and PDF images, and the underlying RDF metadata can be downloaded as RDF/XML. N-triples, Notation3 and Turtle.

Having used our new Open Citations web site and its network display interface for a short while, I am already aware of many shortcomings and limitations that we will attempt to improve upon in the next few days.  However, we would very much like to hear from you – as a user of the Open Citations web site – both to learn what you like about what we have done and to hear what you find to be shortcomings of the functionality and new features that you would like to see implemented, which we will record as user stories to input into our next round of development.  These can either be recorded as comments on this blog post, or can be e-mailed with the subject line “Open Citations web site” either to me <david.shotton@zoo.ox.ac.uk> or to Alex Dutton <Alexander.dutton@zoo.ox.ac.uk>, who is the person who deserves all the credit for the present system.  We look forward to hearing from you.

[1]  Reis RB, Ribeiro GS, Felzemburgh RDM, Santana FS, Mohr S, Melendez SXTO, Queiroz A, Santos AC, Ravines RR, Tassinari WS, Carvalho MS, Reis MG, Ko AI (2008). Impact of environment and social gradient on Leptospira infection in urban slums. PLoS Negl Trop Dis 2(4): e228. doi:10.1371/journal.pntd.0000228.

[2] Shotton D, Portwin K, Klyne G, Miles A (2009). Adventures in semantic publishing: exemplar semantic enhancements of a research article. PLoS Comput Biol 5:e1000361. doi:10.1371/journal.pcbi.1000361.

[3] Shotton D (2010). CiTO, the Citation Typing Ontology. Journal of Biomedical Semantics  1 (Suppl. 1): S6. doi:10.1186/2041-1480-1-S1-S6.

[4]  Fink JL, Fernicola P, Chandran R, Parastatidis S, Wade A, Naim O, Quinn GB, Bourne PE (2010). Word add-in for ontology recognition: semantic enrichment of scientific literature.  BMC Bioinformatics 11:103. doi:10.1186/1471-2105-11-103.

[5]  Bourhy P, Collet L, Clément S, Huerre M, Ave P, Giry C, Pettinelli F, Picardeau M (2010). Isolation and Characterization of New Leptospira Genotypes from Patients in Mayotte (Indian Ocean). PLoS Negl Trop Dis 4(6): e724. doi:10.1371/journal.pntd.0000724.

[6]  Galloway RL, Levett PN (2010) Application and Validation of PFGE for Serovar Identification of Leptospira Clinical Isolates. PLoS Negl Trop Dis 4(9): e824. doi:10.1371/journal.pntd.0000824.

[7]  Gerner M, Nenadic G (2010). LINNAEUS: A species name identification system for biomedical literature. BMC Bioinformatics 11:85. doi:10.1186/1471-2105-11-85.

[8]  Maciel EAP, Carvalho ALF, Nascimento SF, Matos RB, Gouveia EL, Reis MG, Ko AI (2008). Household transmission of Leptospira infection in urban slum communities. PLoS Negl Trop Dis 2: e154. doi:10.1371/journal.pntd.0000154.

Enhanced by Zemanta
This entry was posted in JISC, Open Citations and tagged , , , , , , , , , , . Bookmark the permalink.

One Response to Like a kid with a new train set! Exploring citation networks

  1. Pingback: JISC Open Citations Project – Final Project Blog Post | JISC Open Citations

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s