It is not a real crisis, but perhaps not far from it. People have looked upon science as producing ‘reliable knowledge’, and now it seems as though much science is not very reliable at all. If it isn’t about truth, why should we consider it special? Well, a good questions for an interested medical student to think about. But hard to do so. Part of the answer lies with statistical paradigms (or at least the way we like to play within those paradigms), part with the sociology and economics of careers in science, and part with the means by which modern societies seek to control and fund ‘legitimate’ science. Let me start with a few quotes to illustrate some of the issues.
A series of simple experiments were published in June 1947 in the Proceedings of the Royal Society by Lord Rayleigh–a distinguished Fellow of the Society–purporting to show that hydrogen atoms striking a metal wire transmit to it energies up to a hundred electron volts. This, if true, would have been far more revolutionary than the discovery of atomic fission by Otto Hahn. Yet, when I asked physicists what they thought about it, they only shrugged their shoulders. They could not find fault with the experiment yet not one believed in its results, nor thought it worth while to repeat it. They just ignored it. [and they were right to do so]
The Republic of Science, Michael Polanyi
[talking about our understanding of obesity] Here’s another possibility: The 600,000 articles — along with several tens of thousands of diet books — are the noise generated by a dysfunctional research establishment. Gary Taubes.
“We could hardly get excited about an effect so feeble as to require statistics for its demonstration.” David Hubel, Nobel Laureate (quoted in Brain and Visual Perception)
The value of academics’ work is now judged on publication rates, “indicators of esteem,” “impact,” and other allegedly quantitative measures. Every few years in the UK, hundreds of thousands of pieces of academic work, stored in an unused aircraft hangar, are sifted and scored by panels of “experts.” The flow of government funds to academic departments depends on their degree of success in meeting the prescribed KPIs [key performance indicators]. Robert Skidelsky
So, what is going on, and why is this of interest to a good medical student? Well, let me be provocative. All of the above is critical to the health of modern medicine. People trust doctors, and most of this trust lies not with recent attempts to define professionalism, but because doctors make use of knowledge that is judged to be reliable. Or at least more reliable that what we peddled five hundred years ago. The scientific revolution has meant that there is now clear water between doctors and priests. We no longer sacrifice an animal’s entrails, but examine it with a microscope.
On the other hand, if we think about statistics and medical students we should parse a few concepts. Much as a critical understanding of statistics and how we interpret data as evidence is critical to medicine ‘in the round’, this does not mean all medical students need to be statistically very adept. Indeed, many of the best clinicians I have seen or worked with are largely statistically illiterate. So, statistics is important for the ‘society’ of doctors, but we have to think ecology — not individuals. We need brain surgeons, pathologists, dermatologists etc, but we do not need all doctors to be adept in each of these domains. Do not think of the average medical student, nor of the average candidate for medical school; think variances, distributions and the medical ecosystem.
Second, we need to distinguish between the sorts of statistics needed to do experimental science (well many, if not most sciences) and those needed to practice medicine. Somebody broadly fluent in the former to a high level, will have little problem with the latter, but the two topics need to be considered separately. Much communication with patients is communication about risk: numbers, probabilities, and to use a cliche, the unknown unknowns. Communication skills are not just about being nice and making adjustments in the light of a patients vocabulary, but about statistical competence. You cannot possess good patient skills without knowledge of statistics and the literature about how people interpret data, and how people justify and act on the basis of what they have been told. Classical statistical insights help, but they are far from sufficient. And perhaps my statement ‘you cannot possess good patient skills’ is too dogmatic. I see exceptions, so again, maybe we have to think a little about the ecology of how medical and clinical care is organised. What I do believe is that we should be aiming for high level competence in all our students.
So, to summarise, I would draw distinctions between the statistics needed to do science, the statistics needed to explain science, and the sorts of applied psychology of how people interpret numerical and non-numerical data. I would argue that a large part of ‘communication skills’ teaching at undergraduate level should be given over to such issues, particularly the latter two. Now let me get back to those quotes, and what some might refer to as the crisis in medical research.
At this stage, it is also important to point out that I am a big fan of statistics. RA Fisher is one of my heroes, and I think the GLM is one of the crowning intellectual achievements of the twentieth century. Statistics is fun, hard, humbling and at times wonderfully counterintuitive. It is also one of the those areas of intellectual enquiry where, with a few ‘simple and deep’ insights, you can demolish torrents of laziness that pass for everyday wisdom. Yes, lots of great natural science was done before statistics became a mature discipline, but lots can only be done now because of it.
There are lots of way of trying to explain what has gone wrong with modern statistics (or more precisely, the misuse of statistics) and modern science. I see the problem as being to do with an attempt to formalise knowledge and professional activity that ‘does violence’ to both the natural world and the sociology of what we know allows us to produce reliable knowledge. Readers of this blog will know, I think we have denigrated the role of judgment, experience and tacit knowledge in far too many areas. But let me be specific.
When Fisher developed p values, they were a guide to action. They were a guide to further enquiry and experiment. They were not a ‘truth machine’. Subsequently frequentist statistics sought to become a branch of ‘quality control’ with type 1 and type 2 errors. But whereas this might make sense in a factory producing widgets, this does not describe science. In science all sorts of hard to formalise activities influence what I and you believe (see the Polanyi quote above). It is for this reason that ‘systematic reviews’ of evidence are not possible. You can systematically review a series of trials, but you cannot meaningfully review systematically the world view about the natural world, and give it a p value at the end.
Fisher, of course, was using p values to enable him to construct theories about the natural world, with the idea of performing better and better experiments. A better experiment is one where you pin down ever more tightly the phenomenon you are interested in. This is not health technology assessment, but an attempt to order the world. Cleaving nature at the joints etc.
There are are a number of things implicit in this approach. You repeat experiments. You build upon the results of your previous experiments. And most likely you intervene in the world (i.e. experiments, rather than just observations). In one sense the p values are not for everybody else’s benefit, but for your own. They are there to guide you in a search for truth, and the experimental design reflects how you think the world works. Equipoise has no place here. Our problem, is that virtually everyone of those statements no longer describes the way much —not all —science is practiced.
As a long series of articles attests, people now want to use p values not so much as heuristic guides (let alone markers of truth), but as administrative markers of what gets published, and who gets promoted or who gets funded. All too often experiments are not repeated, and nor are subsequent experiments tightly dovetailed on the earlier results. The discipline of seeking to avoid wasting your own time, by building securely brick by brick, on solid foundations, has been replaced by selling (shoddy) bricks to others to use in their experiments. But the moral hazard in the fragmented complicated world of modern science means that the bricks are often suspect. Publishing papers becomes like printing money: without reputational censure, it is too easy to game the system. Then we end up KPIs because people have deluded themselves that there really is a ‘currency’ with defined units of value in science. So, you end up with those who know the price of everything, and the value of nothing.
The above account of real science implies self-discipline, reputation, and a deep sense of values. You cannot express any of these in a p value. The tragedy is that naive interpretations of p values, and confusion between a statistical and a scientific hypothesis, has allowed the parasitic growth of a pseudoscientific excel spreadsheet approach to evidence. In this framework, ‘p-hacking’, selective publication, hiding data, ignorance of effect sizes, and a failure to confirm experimental results (or admit their limitations) have flourished. Similarly, the evidence-based-medicine (EBM) idea that all you need to understand the world, are more and more RCTs misses the mark, by not understanding the nature of scientific evidence.
Students: if you want to read two clear but desirably difficult (to use Robert Bjork’s phrase) accounts of what is wrong with some common measures of probability see an article in Nature by Regina Nuzzo and David Colquhoun’ web page article here. Steve Goodman, quoted in the Nature piece has written well on these topics for a medical audience for many years.