For the SA Branch September 2022 meeting, Prof Adrian Barnett (QUT) gave a presentation over zoom from Queensland; his talk a damning exposition on Bad Statistics in Medical Research.
We began with an example of a Covid 19 study published in The Lancet, where 75% of subjects were excluded because they were still hospitalised or not confirmed as infected. Despite this major statistical flaw, the paper has been cited over 25,000 times. During the pandemic there has been pressure to publish research about Covid-19 quickly, however, it is plain to see the danger in scientists lowering the standard of their research, when the requirement for trust in science has never been so high.
Adrian presented a number of examples of the misunderstandings of stats common in medical research, some baffling and even humorous. These included the assumption that continuous raw data must be normally distributed in order for statistical analysis to be valid; excluding outliers (when they can be the most interesting part of the data!); and an over-reliance on p-values to provide all the information about the analysis. The interpretation of p-values can also be faulty, but faulty interpretations are often accepted with the attitude of: “It's okay, since everyone else treats them this way too”.
Next, we considered the distribution of z-values extracted from confidence intervals in a medical journal and noted that this had the shape of a normal distribution with a chunk missing in the middle - indicating that studies with a negative result (non-significant results) are often excluded from publication. Adrian went as far as to call this research misconduct.
Adrian's suggestions for fixing the problem include:
We also considered the introduction of a statistical audit, where a random selection of 100 papers per year are selected for audit by a statistician, to check the research and also whether they can reproduce the results.
A statistical robot might also be put to use: an algorithm that can flag potential statistical problems in a paper and possibly detect fraudulent results.
The presentation also included a number of lamentable figures from real publications, such as pie charts that distort the data, and a 3D bar chart that resembled pieces of fudge.
The presentation ended with a lively discussion, with many weighing in on the problem of bad stats and how we might fix it. In response to the question, if we don't rely on p-values, then what do we use? Adrian conceded that sometimes results are complicated, but ultimately science is hard - and we should celebrate that.
By Annie Conway