The virtue of ‘science by numbers’

Send to a friend

The details you provide on this page will not be used to send unsolicited email, and will not be sold to a 3rd party. See privacy policy.

Measuring scientific productivity by tracking the publication record of researchers is widely acknowledged to be hazardous and imperfect. But, like the peer review process, nothing better has yet been devised.

What’s the best way to measure a scientist’s productivity? Certainly not by the currently favoured system of counting the number of scientific papers that he or she has published, adjusting the weight given to each according to the profile of the journal in which it appears, and then adding up the total. Reducing science to this type of numbers game ignores the many other ways in which the value of a researcher’s work can be assessed. It is also vulnerable to the various devices that journals can use to boost the weight given to them in such calculations, while actively discriminating against those researchers who may – for example for reasons of language – find it more difficult to get published in them.

But if it is not the best way of measuring productivity, it might, overall, still be the fairest. The issue is one of the most hotly debated by the scientific community. Does a focus on quantitative measures (such as journal citations) provide a relatively objective and transparent measure of productivity which, while not being perfect, nevertheless acts as a useful way of measuring one scientist’s performance against another? Or does it merely encourage an "audit" culture that stifles genuine scientific creativity?

There are certainly plenty of researchers who would argue the second case. These will cite colleagues who insist on publishing their results in large numbers of relatively short papers – a technique known as 'salami slicing' – or devising strategies designed primarily to attract the attention of journal editors (for example, by overstating the importance of their work, or linking it to scientifically fashionable topics).

Critics in the developing world are particularly fierce. They argue that a heavy reliance on citation statistics and impact factors when making academic appointments or decisions on allocating research funding only exacerbates the problems caused by their difficulty in getting research published in high profile journals at all.

But there is another side of the argument. As Adam Lomnicki argues in the current issue of Nature, the fact that peer-review publication lies at the heart of the quality-control process in science means that, even though it may not be perfect, a system which judges scientists according to their publication record can still be an highly effective way of promoting good science (see How 'impact factors' promote scientific excellence). Indeed Lomnicki goes further to suggest that this has a particular relevance to developing countries, where too much second-rate science is carried out, not because the scientists involved are second-rate, but because the structures in which they operate do not make sufficient effort to reward scientific quality.

Reward systems

Citation statistics – and the impact analysis on which they are based – started life in the 1950s. This was when Eugene Garfield set up the Institute for Scientific Information in Philadelphia, United States, to gather information about scientific publications not by assessing their content, but by adding up the number of times that they were cited by other scientists. The logic is straightforward. A citation can be viewed as a statement by one scientist about the significance of the work of another; therefore the more citations a paper receives, the more important that paper can realistically claim to be by the rest of the scientific community.

Whatever aversion scientists may have about seeing their work evaluated in this relatively mechanistic way, the idea has a solid academic pedigree. Much of this is based on the pioneering work of the historian of science Derek de Solla Price. By charting the history of science in books such as Science Since Babylon (1961) and Little Science, Big Science (1963), largely through the history of scientific publication, Price was able to track the growth and decline of both scientific fields and scientific ideas with a precision that frequently escaped those who adopted a more narrative or biographical approach. Indeed Garfield himself acknowledges the inspiration that he owes to Price, who he once described as the "father of scientometrics" – a word devised to describe the measurement of science.

Furthermore, the techniques involved now play a key role in what many countries would argue are successful attempts to place the assessment of scientific productivity on secure, objective footing. In such countries, for example, both appointment and promotion decisions rest heavily on publication records that use standard international measures of, for example, the 'impact factor' of particular journals (which relates to the frequency with which the "average article" in a journal has been cited in a particular year). Similar calculations are made in choosing between the demands of different university departments when assessing how limited research or teaching funds should be allocated.

Legitimate criticisms

Inevitably, the more that these techniques become embedded in the incentive and reward system for science, both in developed and developing countries, the more closely they are scrutinised, and the more the result of this scrutiny becomes the basis for criticism, often legitimate and almost always strongly felt.

Some of this criticism comes from those who are upset at the way that citation statistics can be used by non-scientists to make judgements about which individual scientists are more productive. Their argument is that only a scientist's peers are in a position to make such an assessment, which needs to be based on the totality of an individual’s scientific contributions, not just a numerical summary of his or her publications.

A different criticism has a more social basis, and originates with those who are concerned about the relatively narrow set of criteria that are used to judge whether – and where – a particular scientific result is published. The concern here is that much research that has an important social impact (for example, in devising new forms of medical treatment) may not make it into the top scientific journals because of a lack of purely scientific novelty. As a result, a reward system that is based largely on publication in such journals will fail to give adequate recognition to research that seeks to solve important problems, not just to advance knowledge.

Both sets of criticisms can be heard from researchers in the developing world. These then add their own complaints about how the international publication system tends, in practice, to discriminate against them. As indicated above, for example, a key issue is language; despite the best endeavours of editors to remain fair, it is clear that a non-English speaker faces higher hurdles in getting published in an English-language science journal than their English-speaking academic colleague. Another reflects the lack of library resources required to keep abreast of the latest publications, again essential to compete successfully for space in prominent scientific journals.

Answering the critics

The list of complaints about the use of citations and impact factors is a long one. And there appears to be widespread agreement about some of the inherent flaws in the system (for example, a paper may become highly cited just because its results are subsequently disproved, or may even have been shown to be fraudulent).

Part of the problem, however, is that no one has yet been able to come up with a better way of assessing scientific productivity. Other approaches each offer advantages. But each has its own problems that are likely to be just as large, if not larger. In principle, for example, it would be possible to substitute citation analysis with detailed assessment of the content of papers by scientific peers. But the amount of work required would result in many scientists spending most of their working days sitting on review panels!

Furthermore, as Lomnicki acknowledges in his article, if appropriately applied, citation analyses can act as powerful drivers of scientific quality. Making promotion dependent on measurable scientific output is not such an outrageous concept, particularly in countries where overall scientific productivity may not have been given the priority it deserves. This has, for example, often been the case in government-funded laboratories, whose income may depend as much on the political priorities of those in power as on the quality of results that researchers in these laboratories generate.

Of course, it is essential that such systems are not abused. This means, firstly, that the application of such measurement techniques must not be so rigid that they exclude the genuinely inventive individual, even if his or her research does not fit into an easily measurable category. It also means that the process must be fully transparent (with the opportunity for appeal against judgements that are felt to be unfair). And that the weaknesses exposed by critics must be addressed, not brushed under the carpet.

It also means that, where necessary and appropriate, local and regional efforts to enhance citation performance should be encouraged. The organisation BIREME in Brazil is already exploring some possibilities in this field (for example, using local-language electronic publishing to boost the ‘impact factors’ of Latin American journals). Similar efforts are being made in Africa through, such as the scheme African Journals On-Line. There is much to be done to improve current the way that the system works. But, just because the current system – like democracy itself – fails to meet the ideal, that is not a sufficient reason for abandoning it.