I like statistics and spent most of my intercalated degree ‘using’ medical stats (essentially, writing programs on an IBM 360 mainframe to handle a large dataset, that I could then interrogate using the GLIM package from the NAG). Yes, the days of batch processing and punchcards. I found — and still find — statistics remarkably hard.
I am always very wary of people who say they understand statistics. Let me rephrase that. I am very suspicious of non-professional statisticians who claim that they find statistics intuitive. I remember that it was said that even the great Paul Erdos got the Monty Hall problem wrong.
The following is from a recent article in Nature:
What will retiring statistical significance look like? We hope that methods sections and data tabulation will be more detailed and nuanced. Authors will emphasize their estimates and the uncertainty in them — for example, by explicitly discussing the lower and upper limits of their intervals. They will not rely on significance tests. When P values are reported, they will be given with sensible precision (for example, P = 0.021 or P = 0.13) — without adornments such as stars or letters to denote statistical significance and not as binary inequalities (P < 0.05 or P > 0.05). Decisions to interpret or to publish results will not be based on statistical thresholds. People will spend less time with statistical software, and more time thinking.
There is lots of blame to go around here. Bad teaching and bad supervision, are easy targets (too easy). I think there are (at least) three more fundamental problems.
Science has been thought of as a form of ‘reliable knowledge’. This form of words always sounded almost too modest to me, especially when you think how powerful science has been shown to be. But in medicine we are increasingly aware that much modern science is not a basis for honest action at all. Blake’s words were to the effect that ‘every honest man is a prophet’. I once miswrote this in an article I wrote as ‘every honest man is for profit’. Many an error….
An article in PNAS highlighting poor standards of reproducibility of many published finding, links to a John Hopkins ‘MOOC’ on ‘Data Science’. The course includes modules on R programming, experimental design and analysis. It all looks nicely laid out, although initially I thought you had to pay to run through the course — this is however not the case. You just have to pay if you want certification. This is via the Coursera platform. I know there are other courses covering some of the same topics, and it strikes me that teaching statistics without small group interaction is a hard task. But surely we must see this approach spread to use at the undergraduate level in subjects like medicine, where the best students will be able to demonstrate that they can acquire skills without the limitations of course structures designed for those who are less capable. But more importantly, it really is not hard to imagine that this approach will be superior to our current attempts to cover topics in which medical schools appear unable to invest in teaching staff or high level materials. I am quite optimistic about the latter but less so about such courses eradicating dodgy science.
It is not a real crisis, but perhaps not far from it. People have looked upon science as producing ‘reliable knowledge’, and now it seems as though much science is not very reliable at all. If it isn’t about truth, why should we consider it special? Well, a good questions for an interested medical student to think about. But hard to do so. Part of the answer lies with statistical paradigms (or at least the way we like to play within those paradigms), part with the sociology and economics of careers in science, and part with the means by which modern societies seek to control and fund ‘legitimate’ science. Let me start with a few quotes to illustrate some of the issues.
A series of simple experiments were published in June 1947 in the Proceedings of the Royal Society by Lord Rayleigh–a distinguished Fellow of the Society–purporting to show that hydrogen atoms striking a metal wire transmit to it energies up to a hundred electron volts. This, if true, would have been far more revolutionary than the discovery of atomic fission by Otto Hahn. Yet, when I asked physicists what they thought about it, they only shrugged their shoulders. They could not find fault with the experiment yet not one believed in its results, nor thought it worth while to repeat it. They just ignored it. [and they were right to do so]
The Republic of Science, Michael Polanyi
[talking about our understanding of obesity] Here’s another possibility: The 600,000 articles — along with several tens of thousands of diet books — are the noise generated by a dysfunctional research establishment. Gary Taubes.
“We could hardly get excited about an effect so feeble as to require statistics for its demonstration.” David Hubel, Nobel Laureate (quoted in Brain and Visual Perception)
The value of academics’ work is now judged on publication rates, “indicators of esteem,” “impact,” and other allegedly quantitative measures. Every few years in the UK, hundreds of thousands of pieces of academic work, stored in an unused aircraft hangar, are sifted and scored by panels of “experts.” The flow of government funds to academic departments depends on their degree of success in meeting the prescribed KPIs [key performance indicators]. Robert Skidelsky
I was musing over this article, party because it is a longstanding interest of mine—how do we acquire useful new knowledge— but also in the context of this blog on medical education, how do we get across to students how medical advance has occurred. Without getting into the ‘what can we learn from history’ subroutine, I think the topic important, and one that we cannot assume students will learn to think deeply about, without some guidance or prompting. To my mind, the role of education here is the classic one: as a detergent to propaganda.
The editorial describes changes at the mental health part of NIH (NIMH) in which the new director has made clear that to be funded, clinical trials have to include some test of the underlying biological mechanism. The line is that too many trials are black-box tests, in which if the results are negative, nothing is learned. ( It is suggested that 50% of the studies currently funded, would not be funded if submitted as new grant proposals at some future time). I think they are targeting the sort of pragmatic NHS style RCT which I find so depressing. The reason is simple: without a construct, or a genuine scientific hypothesis (and I do not mean a statistical one), we have no idea whether the conclusions of any study will apply at any future time, or in any other population. And a cognate fact is that we know trials are noisy and often unreliable guides to what is going on. Fisher warned of this nearly a century ago: we tend to try and rely on statistics when we know little about what is really going on, when what we need is more thinking, and much more repetition.
As it is, we often capture very little of the routine clinical encounter in many clinical trials. They are guides to what is going on, not rules to tell us how to behave. If the effects are very large, we can perhaps ignore much of this. But so often, we only conduct large studies, because the effect sizes are so small. If we think about it in simple terms, the R2 values are far too low: most of the variance is random, and unexplained. They are not good experiments and, it is no surprise, that we are finding out that so many papers published are wrong. Most RCT do not present the information this way, because it would be apparent we know little about what will happen to our patients. Not always, however. Systemic retinoids for acne have, I suspect, an NNT of close to 1. But even there, we had a clear demonstration of efficacy, before we had much insight into mechanism —but researchers went searching to complete the circle—not to the next RCT in a different domain.
However, what the editorial dismisses, is what I would most want to get across to students. The article (discussing psychiatric drugs) states that:
By the early 1990s, the pharmaceutical industry had discovered — mostly through luck — a handful of drug classes that today account for most mental-health prescriptions.
This is a real travesty, and again supports my adage that Nature doesn’t really understand medical research or medicine. Many of the leads were not luck, but the results of astute clinicians / pharmacologists interested in what happened to their patients. Not so much those immersed in the use of rating scales, or obsessed by assuming anything interesting must be due to chance, but acute observers that provided insights worth following by pharma. Calling this luck, is like saying Charles Darwin was just lucky (although I would accept Wallace was deeply unlucky). This for me is just another representation of the master clinician, one whose expertise is based around a knowledge of patients, and who thinks about what happens to them. This is a style of medicine we are in danger of losing. Students should know that the obsession of thinking about patients is what underpins and drives clinical advance. This is not at the expense of sensible clinical experiments, or wet-bench work, just an acknowledgement that medicine has its own intellectual heartlands, and we need to communicate this to the next generation, because it is in danger of being killed off by a pincer movement of ‘protocols’ on the one hand, and a confusion that biology and medicine are synonymous, on the other. As far as discovery in psychiatry is concerned, few can equal David Healy for explaining how we got where we did. For some other areas, see what I wrote in Science over 10 years ago. The problem for the undergraduate teacher is how to integrate real knowledge of statistics and experimental design, with a knowledge of how genuine clinical advance occurs.
The situation was a familiar one. Some time back, I was gossiping to a medical student, and he began to to talk about some research he had done, supervised by another faculty member of staff. I asked what he had found out: what did his data show? What followed, I have seen if not hundreds of times, then at least on several score occasions. A look of trouble and consternation, a shrug of embarrassment, and the predictable word-salad of ‘significance’, t values, p values, statistics and ‘dunno’. Such is the norm. There are exceptions, but even amongst postgraduates who have undertaken research, the picture is not wildly different. Rarely, without directed questioning, can I get the student to tell me about averages, or proportions, using simple arithmetic. A reasonable starting point surely. ‘What does it look like if you draw it?’ is met with a puzzled look. And yet, if I ask the same student, how they would manage psoriasis, or why skin cancers are more common in some people than others, I get —to varying degrees—a reasoned response. I asked the student how much tuition in statistics they had received. A few lectures was the response, followed by a silence, and then, “They told us to buy a book”. More silence. So this is what you pay >30K a year for? The student just smiled in agreement. This was a good student.
Statistics is difficult. Much statistics is counter-intuitive and, like certain other domains of expertise, learning the correct basics often results in a temporary —or in some cases a permanent —drop in objective performance.** That is, you can make people’s ability to interpret numerical data worse after trying to teach them statistics. On the other hand, statistics is beautiful, hard, and full of wonderful insights that debunk the often sloppy thinking that passes for everyday ‘common sense’. I am a big fan, but have always found the subject anything but easy. But, like a lot of formal disciplines, the pleasure comes from the struggle to achieve mastery. I also think the subject important, and for the medical ecosystem at least, it is critical that there is high level expertise within the community. On the other hand, in my experience many of the very best clinicians are (relatively) statistically illiterate. The converse is also seen.