Last night I watched the Netflix documentary The Social Dilemma (https://www.netflix.com/title/81254224), in which former employees of the big Silicon Valley social media companies expose the serious and sometimes tragic or even fatal consequences that social media may have on individual lives. These social media services are run by commercial companies under pressure from shareholders to make ever increasing profits. In this situation, the ultimate consumers of these services becomes not the individuals using them, but the advertisers, and the users of these services (ourselves) become the commodities whose user profiles and personal preferences are sold by the social media companies to the advertisers for use in targetting adverts.
The Social Dilemma is a compelling documentary, since it is told by those who know (since they helped build and run the systems). It is particularly relevant to those who have pre-teen and teenage children, whose lives and personal interactions are increasingly being shaped and to a large extent controlled by social media, particularly during the current Covid-19 lock-downs. As recent events in the United States have highlighted, social media also pose fundamental issues around the definition of “facts” and “beliefs”, moving the debate from epistemology to politics and affecting the future of our societies.
From social media to academic analytics
Jason Priem’s self-portrait as a phrenology illustration.
From https://www.flickr.com/photos/26158205@N04/4307548673. (CC BY-SA 2.0)
Jason Priem is co-founder of ImpactStory, Depsy, UnPayWall and other open analytic and open science infrastructures and services (https://our-research.org/projects) that deserve ongoing support from the academic community.
Academic analytics is the application of statistical, predictive modelling, data mining and artificial intelligence (AI) techniques to analyse, evaluate and summarize various types of organizational, educational and bibliographic data derived from higher educational and research institutions, in order to provide numerical results that can be used to guide strategic planning and decision-making practices in these contexts. It is increasingly used for student and faculty assessment, for deciding the allocation of funding, and for evaluating the standing and productivity both of individual academic departments and of entire universities.
Examples of such analyses include the degree of cross-institutional and international authorship of scholarly publications, and their citation counts excluding self-citations, used as indicators of the importance of research project outputs; the correlation of student grades with their interactions with university services such as libraries and virtual learning environments, used improve the learning performance of individual students; and the drop-out rates and degree distributions of different universities, employed to evaluate the quality of teaching. Those using such analyses include not only university administrators and individual academics, but also, in the case of learning analytics, increasingly the students themselves and their parents.
The relevance of The Social Dilemma to academic analytics is that these, like social media, are increasingly controlled by commercial companies under similar pressures to turn a profit. Here it is the universities and their academic data that become the consumed commodities, while the commercial suppliers of academic analytical services are the financial beneficiaries of these data.
There are, of course, differences between these two situations. While social media companies and academic analytics companies both have shareholders that expect profits and users to whom they provide services, the social media companies have advertisers that bring in revenue, while academic analytics companies get most of their revenue directly from the academic community itself. There is thus a relatively close connection between those who provide the raw data and those who pay for the analytical services built over these data. Since the academic community is both data provider and the one who pays the piper, this means that the social dilemma around research analytics should be easier to resolve than the social dilemma surrounding social media.
A further important difference is the following: while participation in social media is strictly voluntary, most of the academic community are evaluated through data analytics and AI without their express consent. Information on faculty members is being collected and used with little or no recourse for the individuals affected, since there are few, if any, rights to disclosure, rights to opt out of data analytics and AI-powered reviews and decisions, rights to review the data for errors, rights to correct errors, or rights to appeal decisions based on such analytics. Academic positions carry with them the expectation of academic freedom, the principles of which are hard to reconcile with the intense individual scrutiny built into the deployment of academic analytics and AI.
The dangers of commercial analytic platforms in academia
In May 2020, Amy Brand and Claudio Aspesi published their seminal article In pursuit of open science, open access is not enough (Science 368: 574-577. https://doi.org/10.1126/science.aba3763), in which they argued cogently about the dangers of commercial dominance of academic data analytics and knowledge infrastructures, and the need for open alternatives. Details of this growing commercial dominance of academic analytics, among other platforms and services, are given in the excellent analysis by Penny C. S. Andrews in her chapter The Platformization of Open (https://doi.org/10.7551/mitpress/11885.003.0027), in the book Reassembling Scholarly Communications: Histories, Infrastructures, and Global Politics of Open Access edited by Martin Paul Eve and Jonathan Gray (The MIT Press, 2020: https://doi.org/10.7551/mitpress/11885.001.0001).
Take, for example, the major university rankings, such as the Times Higher Education ranking. These rankings are extremely powerful. They rely on proprietary data, ironically to a significant extent made freely available to the producers of the rankings by universities, which are then used to define how the performance of universities should be assessed. Times Higher Education, for instance, presents its World University Rankings as “the definitive list of the top universities globally” (https://www.timeshighereducation.com/world-university-rankings). The performance criteria used by the Times Higher Education ranking, and a few other major university rankings, now play an important role in the decision-making processes of universities all over the world. However, because the underlying data are proprietary, it is hard to challenge the rankings or to use the data to provide alternative perspectives on university performance. It has become increasingly difficult for universities to develop strategic priorities that do not align with the performance criteria used by the major university rankings. For example, at one major European university, discussions about the development of an open science strategy explicitly take into account the possible negative effects of open science practices on its position in the major university rankings.
Application of the message of The Social Dilemma to the realm of scholarly information shows how the rise of commercially controlled academic analytics might fundamentally threaten academic freedom and access to truth itself. As Penny Andrews points out, several of the big players in academic publishing and scholarly communication are now building suites of products based around scholarly data and analytics, whose platforms rarely have open and transparent governance, and are encouraging universities to subscribe to such suites, sometimes in deals that, in the name of open science, bundle access to the institution’s scholarly data and provision of analytics based upon them with open access publication of that institution’s scholarly outputs, as, for example, in the Dutch Universities’ recent deal with Elsevier (https://tinyurl.com/y5v7ua7u). By gaining proprietorial control of such data, and by providing the default means of information transfer and workflows between a university’s administrative CRIS systems, academic libraries and individual researchers, such commercial companies lock universities and national consortia into non-interoperable situations in which their academic data, whether relating to their own standing, to the sources and distribution of their external research funding, or to the publication records and relative academic merits of their faculty members, are no longer fully under their own control.
The issues posed by the commercial deployment of data analytics are clearly compounded when these services are performed by companies which conduct other business with the academic community. A researcher who is faced with the question of where to publish her next article can be forgiven for deciding that, at the margin, it cannot hurt to submit it to a journal owned by the company tasked with assessing her research performance. There is a massive conflict of interest when companies that derive significant parts of their profits from publishing research also assess it and offer guidance on what projects should be funded next.
The urgent need for open community-governed infrastructures
For the reasons discussed above, the present situation in academia is dire. The academic community should take control of the data analytics infrastructures it uses, which need to be kept open, with transparent governance, to ensure the healthy functioning of the academic community. While the existing scholarly publishing infrastructure is well-established and hard to change quickly, the use of data analytics and AI in academia is still nascent and in flux. Hence it should be relatively easy to prevent ceding complete control of these activities to commercial vendors, who, of course, are merely doing what they exist to do, namely to maximize profits for their owners and shareholders.
Resolving this situation is within the grasp of the academic community, and its clear responsibility, although this will not be without difficulties. It may be much easier for a university administrator to authorize payment for a subscription to academic analytical services from a commercial supplier “that knows what it is doing” than it is to collaborate with colleagues from other academic institutions – often seen as competitors – to develop or fund alternative services that are independent, open and transparently managed, with all the implications that has in terms of the creation of salaried posts, recruitment or retraining of staff, premises, administration, etc. However now is the time to act, even during the current pandemic-induced economic recession, before commercial lock-in becomes a reality. Given the huge sums that universities already spend on subscription services of various types, it is clear that the primary problem is not the redeployment of existing financial resources, but is more fundamentally philosophical: whether or not academia wishes to be in control of its own data, or beholden to commercial interests. The development of community-controlled platforms providing open academic analytical services should now be made a priority, and appropriate sustained financial support for these platforms should be provided by the academic community, including governmental and charitable funders of research.
This week’s online OPERA conference “The Future of Open Research Analytics” (18-19 November 2020; https://deffopera.dk/opera-conference-november-2020/), hosted by the Danish OPERA Project (https://deffopera.dk/), provides a timely forum in which to discuss these issues.
I would like to acknowledge and thank Ludo Waltman and Claudio Aspesi for reviewing drafts of this blog post, and for their important and insightful suggestions for its improvement and expansion, which I have incorporated with their permission.