A recent Guardian article has the title "Statistical illiteracy isn't a niche problem. During a pandemic, it can be fatal"
https://www.theguardian.com/commentisfree/2020/oct/26/statistical-illiteracy-pandemic-numbers-interpret
It starts thus
<<<
In the institute where I used to work a few years ago, a rare non-infectious illness hit five colleagues in quick succession. There was a sense of alarm, and a hunt for the cause of the problem. In the past the building had been used as a biology lab, so we thought that there might be some sort of chemical contamination, but nothing was found. The level of apprehension grew. Some looked for work elsewhere.
One evening, at a dinner party, I mentioned these events to a friend who is a mathematician, and he burst out laughing. “There are 400 tiles on the floor of this room; if I throw 100 grains of rice into the air, will I find,” he asked us, “five grains on any one tile?” We replied in the negative: there was only one grain for every four tiles: not enough to have five on a single tile.
We were wrong. We tried numerous times, actually throwing the rice, and there was always a tile with two, three, four, even five or more grains on it.
. . . We, know-all professors, had fallen into a gross statistical error. We had become convinced that the “above average” number of sick people required an explanation. Some had even gone elsewhere, changing jobs for no good reason.
>>>
Assuming rice grains fall independently on tiles, the probabilities of 0, 1, . , , 5, 6+ grains on one tile are, with the R code used:
setNames(round(dbinom(0:5,100,.0025), 5), 0:5)
0 1 2 3 4 5
0.77856 0.19513 0.02421 0.00198 0.00012 0.00001
Thus, the probability of 5 grains on the one tile is vanishingly small. (The author does not give this calculation.)
Then, what is the author's point with this example? That rice grains that are thrown randomly by a bunch of mathematicians are unlikely to fall independently. If so, what is the relevance to the story of the five colleagues who had a rare infectious illness (probability for one such illness, maybe 1 in 400, or 0.0025)? If so, then, surely the point is that the five illnesses are very unlikely to be independent events, and that is makes sense to look for a common cause. It appears to me that the author is himself guilty of a "gross statistical(?) error". Have I missed something.
For anyone who wants to simulate the probabilities, the following code may be used:
n <- numeric(7)
for(i in 1:100000){
sam <- sample(1:400,100, replace=T)
tab <- table(sam)
for(j in 2:6)n[j]<-n[j]+sum(tab==j-1)
n[1]<-n[1]+400-length(tab)
n[7] <- n[7]+sum(tab>5)
}
setNames(round(n/sum(n),5), c(0:5,"6+"))
0 1 2 3 4 5 6+
0.77854 0.19516 0.02421 0.00197 0.00012 0.00001 0.00000