The human costs of the research-assessment culture
The term ‘REF-able’ is now in common usage in UK universities. “Everyone’s constantly thinking of research in terms of ‘REF-able’ outputs, in terms of ‘REF-able’ impact,” says Richard Watermeyer, a sociologist at the University of Bristol, UK. He is referring to the UK Research Excellence Framework (REF), which is meant to happen every seven years and is one of the most intensive systems of academic evaluation in any country. “Its influence is ubiquitous — you can’t escape it,” says Watermeyer. But he and other scholars around the world are concerned about the effects of an extreme audit culture in higher education, one in which researchers’ productivity is continually measured and, in the case of the REF, directly tied to research funding for institutions. Critics say that such systems are having a detrimental effect on staff and, in some cases, are damaging researchers’ mental health and departmental collegiality.
Unlike other research benchmarking systems, the REF results directly affect the distribution of around £2 billion (US$2.6 billion) annually, creating high stakes for institutions. UK universities receive a significant proportion of their government funding in this way (in addition to the research grants awarded to individual academics).
Research assessment toolkit
Since its inception, the REF methodology has been through several iterations. The rules about which individuals’ work must be highlighted have changed, but there has always been a focus on peer-review panels to assess outputs. Since 2014, a team in each university department has been tasked with selecting a dossier of research outputs and case studies that must demonstrate societal impact. These submissions can receive anything from a four-star rating (for the most important, world-leading research) to just one star (the least significant work, of only national interest). Most departments aim to include three- or four-star submissions, often described as ‘REF-able’.
But the process is time-consuming and does not come cheap. The most recent REF, in 2021, was estimated to have cost £471 million. Tanita Casci, director of the Research Strategy & Policy Unit at the University of Oxford, UK, acknowledges that it’s resource-intensive, but says that it’s still a very efficient way of distributing funds, compared with the cost of allocating money through individual grant proposals. “I don’t think the alternative is better,” she concludes. The next exercise has been pushed back a year, until 2029, with planned changes to include a larger emphasis on assessment of institutional research culture.
Many UK academics see the REF as adding to an already highly competitive and stressful environment. A 2021 survey of more than 3,000 researchers (see go.nature.com/47umnjd) found that they generally felt that the burdens of the REF outweighed the benefits. They also thought that it had decreased academics’ ability to follow their own intellectual interests and disincentivized the pursuit of riskier, more-speculative work with unpredictable outcomes.
Some other countries have joined the assessment train — with the notable exception of the United States, where the federal government does not typically award universities general-purpose research funding. But no nation has chosen to copy the REF exactly. Some, such as the Netherlands, have instead developed a model that challenges departments to set their own strategic goals and provide evidence that they have achieved them.
Whatever the system, few assessments loom as large in the academic consciousness as the REF. “You will encounter some institutions where, if you mention the REF, there’s a sort of groan and people talk about how stressed it’s making them,” says Petra Boynton, a research consultant and former health-care researcher at University College London.
Strain on team spirit
Staff collating a department’s REF submission, selecting the research outputs and case studies to illustrate impact, can find themselves in an uncomfortable position, says Watermeyer. He was involved in his own department’s 2014 submission and has published a study of the REF’s emotional toll1. It’s a job that most academics take on “with trepidation”, he says. It can change how they interact with colleagues and how colleagues view and interact with them.
“You’re trying to make robust, dispassionate, critical determinations of the quality of research. Yet at the back of your mind, you are inescapably aware of the implications of the judgements that you’re making in terms of people’s research identities, their careers,” says Watermeyer. In his experience, people can get quite defensive. That scrutiny of close colleagues’ work “can be really disruptive and damaging to relationships”.
UK research assessment is being reformed — but the changes miss the mark
Watermeyer often found himself not only adjudicating on work but also acting as a counsellor. “You have to attend to the emotional labour that’s involved; you’re responsible for people’s welfare and well-being,” and no training is provided, he says. A colleague might think that their work has met expectations, only to find that assessors disagree. “I’ve been in situations where there are tears,” Watermeyer recalls. “People break down.”
For university support staff, the REF also looms large. Sometimes, more staff must be hired near the submission deadline to cope with the workload. “It is an unbelievable pressure cooker,” particularly at small institutions, says Julie Bayley, former director of research-impact development at the University of Lincoln, UK. Bayley was responsible for overseeing 50 case studies to demonstrate the impact of Lincoln’s research, and describes this as akin to preparing evidence for a legal case. “You are having to prove, to a good level of scrutiny, that this claim is true,” Bayley says. This usually involves collecting testimonial letters from organizations or individuals who can vouch for the research impact, something she sometimes did on behalf of researchers who feared straining the external relationships they had developed.
Boynton says there can be an upside. “There’s something really exciting about putting together [a case study] that shows you did something amazing,” she says. But she also acknowledges that those whose research is not put forward can feel as if their work doesn’t matter or is not respected, and that can be demoralizing.
The clamour about achieving four stars can skew attitudes about research achievements. Bayley recounts a senior academic tearfully showing her an e-mail from his supervisor that read, “It’s all well and good that you’ve changed national UK policy, but unless you change European policy, it doesn’t count.” She says her own previous research on teenage pregnancy met with similar responses because it involved meeting real needs at the grass-roots level, rather than focusing on national policy. “That’s the bit I find most heartbreaking. Four-star is glory for the university, but four-star is not impact for society,” says Bayley.
The picking and choosing between individual researchers has implications for departments. “That places some people on the ‘star player competition winner’ side and, particularly where resources are limited, that means those people get more support” from their departments, explains Bayley. She has witnessed others being asked to pick up the teaching workload of researchers who are selected to produce impact case studies for a REF submission. Boynton agrees: “It’s not a collegiate, collective thing — it’s divisive.”
Hidden contributions
Research assessment can also affect work that universities often consider ‘non-REF-able’. Simon Hettrick, a research software engineer at the University of Southampton, UK, was in this position in 2021. He collaborates with researchers to produce crucial software for their work. But, he says, universities find it hard to look beyond academic papers as the metric for success even though there are 21 categories of research output that can be considered, including software, patents, conference proceedings and digital and visual media.
In the 2021 REF, publications made up about 98.5% of submissions. Hettrick says that although other submissions are encouraged, universities tend not to select the alternatives, presumably out of habit or for fear they might not be judged as favourably.
The result is that those in roles similar to Hettrick’s feel demotivated. “You’re working really hard, without the recognition for that input you’re making,” he says. To counter this, Hettrick and others launched an initiative called The hidden REF that ran a 2021 competition to spotlight important work unrecognized by the REF, garnering 120 submissions from more than 60 universities. The competition is being run again this year.
In April, Hettrick and his colleagues wrote a manifesto asking universities to ensure that at least 5% of their submissions for the 2029 REF are ‘non-traditional outputs’. “That has been met with some consternation,” he says.
Regarding career advancement, REF submissions should not feed into someone’s prospects, according to Casci, who says that universities make strong efforts to separate REF assessments from decisions about individuals’ career progression. But “it’s a grey area” in Watermeyer’s experience; “it might not be reflected within formal promotional criteria, but I think it’s the accepted unspoken reality”. He thinks that academic researchers lacking ‘REF-able’ three- or four-star outputs are unlikely to be hired by any “serious research institution” — severely limiting their career prospects and mobility.
Watermeyer says the consequences for these individuals will vary. Some institutions try to boost the ratings of early-career academics by putting them on capacity-building programmes, including buddying schemes to foster collaborations with more ‘REF-able’ colleagues. But, for more senior staff, the downside could be a performance review. “People might be ‘encouraged’ to reconsider their research role, if they find themselves unable to satisfy the three-star criteria,” he says.
There’s a similar imperative for a researcher’s work to be used as an impact case study. “If your work is not selected for that competition, you lose the currency for your own progression,” says Bayley.
The REF also exacerbates inequalities that already exist in research, says Emily Yarrow, an organizational-behaviour researcher at Newcastle University Business School, UK. “There are still gendered impacts and gendered effects of the REF, and still a disproportionate negative impact on those who take time out of their careers, for example, for caring responsibilities, maternity leave.” A 2014 analysis she co-authored of REF impact case studies in the fields of business and management showed that women were under-represented: just 25% of studies with an identifiable lead author were led by women2. Boynton also points out that there are clear inequalities in the resources available to institutions to prepare for the REF, causing many researchers to feel that the system is unfair.
Although not all the problems researchers face can be attributed to the REF, it certainly contributes to what some have called an epidemic of poor mental health among UK higher-education staff. A 2019 report (see go.nature.com/3xsb78x) highlighted the REF as causing administrative overload for some and evoking a heightened, ever-present fear of ‘failure’ for others.
UK research councils have acknowledged the criticisms and have promised changes to the 2029 REF. Steven Hill, chair of the 2021 REF Steering Group at Research England in Bristol, UK, which manages the REF exercise, says these changes will “rebalance the exercise’s definition of research excellence, to focus more on the environment needed for all talented people to thrive”. Hill also says they will implement changes to break “the link between individuals and submissions” because there will no longer be a minimum or maximum number of submissions for each researcher. The steering group aims to provide more support in terms of how REF guidance is applied by institutions, to dispel misconceptions about requirements. “Some institutions frame their performance criteria in REF terms and place greater requirements on staff than are actually required by REF,” Hill says.
Other ways forward
Similar to the REF, the China Discipline Evaluation (CDE) occurs every four to five years. Yiran Zhou, a higher-education researcher at the University of Cambridge, UK, has studied attitudes to the CDE3 and says there are pressures in China to produce the equivalent of ‘REF-able’ research and similar concerns about the impact on academics. China relies much more on conventional quantitative publication metrics, but researchers Zhou interviewed criticized the time wasted in producing CDE impact case studies. Those tasked with organizing this often had to bargain with colleagues to collect the evidence they needed. “Then, they owe personal favours to them, like teaching for one or two hours,” says Zhou.
Increased competition has become a concern among Chinese universities, and Zhou says the government has decided not to publicize the results of the most recent CDE, only informing the individual universities. And, Zhou says, some of those she spoke to favoured dropping the assessment altogether.
Mammoth UK research assessment concludes as leaders eye radical shake up
In 2022, Australia did just that. Ahead of the country’s 2023 Excellence in Research for Australia (ERA) assessment, the government announced that it would stop the time-consuming process and start a transition to examine other “modern data-driven approaches, informed by expert review”. In October 2023, the Australian Research Council revealed a blueprint for a new assessment system and was investigating methods for smarter harvesting of evaluation data. It also noted that any data used would be “curated”, possibly with the help of artificial intelligence.
Some European countries are moving away from the type of competitive process exemplified by the REF. “For the Netherlands, we hope to move from evaluation to development” of careers and departmental strategies, says Kim Huijpen, programme manager for Recognition and Reward for the Universities of the Netherlands, based in The Hague, and a former chair of the working group of the Strategy Evaluation Protocol (SEP), the research evaluation process for Dutch universities. In the SEP, institutions organize subject-based research-unit evaluations every six years, but the outcome is not linked to government funding.
The SEP is a benchmarking process. Each research group selects indicators and other types of evidence related to its strategy and these, along with a site visit, provide the basis for review by a committee of peers and stakeholders. The protocol for 2021–27 has removed the previous system of grading. “We wanted to get away from this kind of ranking exercise,” explains Huijpen. “There’s a lot of freedom to deepen the conversation on quality, the societal relevance and the impact of the work — and it’s not very strict in how you should do this.”
The Research Council of Norway also runs subject-based assessments every decade, including institutional-level metrics and case studies, to broadly survey a field. “From what I hear from colleagues, the Norwegian assessment is much milder than the REF. Although it’s similar in what is looked at, it doesn’t feel the same,” says Alexander Refsum Jensenius, a music researcher at the University of Oslo. That’s probably because there is no direct link between the assessment and funding.
Refsum Jensenius has been involved in the Norwegian Career Assessment Matrix, a toolbox developed in 2021 by Universities Norway, the cooperative body of 32 accredited universities. It isn’t used to assess departments, but it demonstrates a fresh, broader approach.
What differentiates it from many other assessments is that in addition to providing evidence, there is scope for a researcher to outline the motivations for their research directions and make their own value judgements on achievements. “You cannot only have endless lists of whatever you have been doing, but you also need to reflect on it and perhaps suggest that some of these things have more value to you,” says Refsum Jensenius. For example, researchers might add context to their publication list by highlighting that opportunities to publish their work are limited by its interdisciplinary nature. There is also an element of continuing professional development to identify a researcher’s skills that need strengthening. Refsum Jensenius says this approach has been welcomed in the Norwegian system. “The toolbox is starting to be adopted by many institutions, including the University of Oslo, for hiring and promoting people.”
For many UK researchers, this more nurturing, reflective method of assessment might feel a million miles away from the REF, but that’s not to say that the REF process does not address ways to improve an institution’s research environment. Currently, one of the three pillars of assessment involves ‘people, culture and environment’, which includes open science, research integrity, career development and equity, diversity and inclusion (EDI) concerns. Since 2022, there have been discussions on how to better measure and incentivize good practice in these areas for the next REF.
Bayley thinks the REF can already take some credit for an increased emphasis on EDI issues at UK universities. “I will not pretend for a second it’s sorted, but EDI is now so commonly a standing item on agendas that it’s far more present than it ever was.”
But she is less sure that the REF has improved research culture overall. For example, she says after the 2014 REF, when the rules changed to require that contributions from all permanent research staff be submitted, she saw indications that some universities were gaming the system in a way that disadvantaged early-career researchers. Junior staff members were left on precarious temporary contracts, and she has seen examples of institutions freezing staff numbers to avoid the need to submit more impact case studies. “I’ve seen that many times across many universities, which means the early-career entry points for research roles are reduced.”
“The REF is a double-edged sword,” concludes Bayley. The administrative burden and pressures it brings are much too high, but it does provide a way to allocate money that gives smaller institutions more of a chance, she says. After the 2021 REF, even though top universities still dominated, many received less of the pot than previously, whereas some newer, less prestigious universities performed strongly. The biggest increase was at Northumbria University in Newcastle, where ‘quality-related’ funding rose from £7 million to £18 million.
For Watermeyer, the whole process is counterproductive, wasting precious resources and creating a competitive, rather than a collaborative, culture that might not tolerate the most creative thinkers. He would like to see it abolished. Hettrick is in two minds, because “the realist in me says it is necessary to explain to the taxpayer what we’re doing with their money”. He says the task now is to do the assessment more cheaply and more effectively.
Other research communities might not agree. As Huijpen points out, “there’s quite a lot of assessments in academic life, there are a lot of moments within a career where you are assessed, when you apply for funding, when you apply for a job”. From her perspective, it’s time to opt for less ranking and more reflection.
link