June 2009

SCIENCE PUBLISHING:

Fair Research Assessments Require Extensive Deliberation

How does one fairly assess the quality of research performed by scientists? This question is of interest to those who perform, publish, and provide funding for scientific research.

It is often a critical factor in decisions regarding continuing employment, manuscript acceptance into technical science journals, allocation of limited financial resources, issuance of scientific awards, perceived "credibility" and "prestige," etc. Put simply, a research scientist's career is determined largely, if not entirely, by assessing the quality of his or her research.

The challenge of performing a fair scientific research assessment.

Most scientists agree that this scientific assessment should be made in large part on the basis of articles the scientist has garnered in technical journals. This is where agreement ends; opinions widely diverge on how scientific articles should be evaluated.

The current trend is to address this assessment challenge in a quantitative fashion. For example, a scientific article may be viewed as "better" than another if the former accumulates more citations in technical articles than the latter.

Some people view this kind of quantitative, bibliographic assessment as being optimally fair and impartial. A further appeal is the ease of obtaining such bibliographic information these days, via internet-based resources.

However, there are many reasons why an assessment that is not based strictly on bibliographic analysis of a scientist's research impact may be useful as well. For example, a scientific technical article may be lightly cited, yet be the critical breakthrough for a later, highly cited, article.

Towards addressing such controversies, Liz Allen (Wellcome Trust, United Kingdom) and coworkers have delved into the question of the relative utility of bibliographic versus expert assessment of technical scientific articles. Unsurprisingly, they have found both to the useful, suggesting that reducing a scientist to a number via strict citation analysis is an unfair method of evaluation.

The publications.

The scientists evaluated technical science articles indexed by PubMed, and funded in whole or part by the Wellcome Trust. All of them were published between May and September 2005.

Each publication was screened to ensure that it had a focus on biomedical research (and were in fact funded by the Wellcome Trust), to arrive at 979 articles. Eighty-four percent of them were original research articles, and 16% were review articles.

Most of the original research articles listed multiple sources of financial support. Only 23% were entirely based on funding from the Wellcome Trust.

Publication characterization.

The scientists noted the publisher and journal title for all of the evaluated articles. For the original research articles, several further bits of information were compiled.

These include journal impact factor (the average number of citations garnered per journal article by the end of the year) at the time of publication, additional funding sources (besides the Wellcome Trust), collaborations with other research groups, and author number. Additionally, the publications were classified into general scientific fields, e.g., genetics, physiological sciences, etc.

Publication analysis: "Expert review" and bibliographic records.

The scientists gathered 16 scientific reviewers, of relevant expertise, to evaluate these articles. Each article was read by a reviewer (87% of the time by two reviewers), each of whom categorized the importance on a scale of 1 ("for the record"), the lowest category, to 4 ("landmark"), the highest category.

The result of these analyses were summed. In other words, the possible score ranged from 2 (both reviewers categorized the article as "for the record") to 8 (both reviewers categorized the article as "landmark").

Ninety-six percent of the articles read by two reviewers were placed no more than one category apart. The small number of remaining articles were designated as "unresolved."

Each article was evaluated sometime in December 2005, which is no more than a few months after it had been published. The journal name, authors, and institutional affiliations were not hidden from the reviewers, because biases based on such information are not readily countered, even if an attempt is made to hide the information.

The scientists gathered bibliographic data (number of citations gathered by each article) on the articles after three years. They utilized standard databases for this analysis.

Research funding type.

The scientists also evaluated the published research funded by the Wellcome Trust in terms of the type of funding that was issued, e.g., a large grant or a small fellowship. This was a highly labor-intensive process, because many of the articles did not provide such funding details.

This question of funding type is very important, as many scientists dispute the merits of issuing many small grants versus a smaller number of large grants. Armed with all of this information, the scientists were now able to evaluate the quality of scientific research funded by the Wellcome Trust, through multiple viewpoints.

Journals chosen for publication.

Which journals published the research funded by the Wellcome Trust? The analyzed articles appeared in 432 different scientific technical journals, issued by 98 different publishers.

Among the original research articles, the journals in which the research was published was not at all focused on a specific journal. For example, the Journal of Biological Chemistry was the most common journal, at less than 3% of the publications, and only a handful of journals published more than 1.2% of the articles.

Seventy percent of the original reaearch articles appeared in a journal with a journal impact factor of 3. This means that, on average, each article in the journal is cited three times by the end of the year of publication.

Interdisciplinarity of the published research.

Quality science is often thought to arise from multiple lead scientists at multiple institutions. Based on this metric, how interdisciplinary is the research funded by the Wellcome trust?

Sixty-four percent of the articles were authored by five or more scientists. Forty-seven percent of the original research articles were written by research groups solely in the United Kingdom, although many were collaborative efforts based in multiple countries.

Comparing "expert review" to bibliographic records.

The scientists found that 33% of the published articles were classified as a "useful step forward" by the "expert reviewers," i.e., receiving a rating of 4 out of a possible 8, the most common classification. The journal impact factor was positively, strongly correlated with the "expert review" rating.

The scientists furthermore found a positive, strong (yet less so) correlation between the "expert review" rating and bibliographic citations. The three most highly cited articles were on a genomics topic, and had more than 80 listed authors each (were large-scale efforts).

Based on these results, one may conclude that, overall, "expert review" correlates well with bibliographic records, and consequently the latter (easier) approach is all that is needed. However, on the level of the individual article, there were exceptions to the observed trends, e.g., a paper that was not highly cited may have received a high "expert review" evaluation.

Effect of research funding type.

The scientists found statistically significant higher "expert review" evaluations and bibliographic records pertaining to articles based on research funded by a large award over a long period of time, in comparison to those funded by a small award over a short period of time. However, the issue of grant size in relation to scientific productivity is still up in the air.

This is because scientific research assessments were made on a per-publication basis, not a per-grant basis. Further analysis is needed to address the issue in a conclusive manner.

Implications.

Research scientists want their scientific productivity to be judged in a fair and impartial manner. This research shows that a strict bibliographic analysis of scientific productivity, based on tabulating the number of citations a technical science article has received, allows many quality articles to fall through the cracks of the evaluation process.

No one has developed an evaluation process that is fast, yet also fair and impartial. Lengthy deliberation and critical analysis may inherently require a great deal of time, but for the sake of fairness, they should play an important role in determining the quality of a body of scientific research.

for more information:
Allen, L., Jones, C., Dolby, K., Lynn, D., & Walport, M. (2009). Looking for Landmarks: The Role of Expert Review and Bibliometric Analysis in Evaluating Scientific Publication Outputs PLoS ONE, 4 (6) DOI: 10.1371/journal.pone.0005910