Monday, 2 July 2012

When Statistics Met Poohsticks

This blog follows a slightly different format to previous entries. Rather than review an interesting psychology study I offer something a little different, but still strongly themed by my day job of doing psychology research…

Chapter 6 of AA Milne’s “The House at Pooh Corner” introduces the now famous game of Poohsticks. The story goes that Winnie the Pooh invents the game after accidentally dropping a pine cone off a bridge into a flowing river. Having chanced upon the observation that a cone dropped over one side of the bridge will be carried by the current of the water passing beneath to the other side of the bridge, Pooh’s first thought is whether this might be what scientists would call a replicable phenomenon, or in everyday speak, something that can be repeated. He says: “That’s funny… I dropped it on the other side… and it came out on this side! I wonder if it would do it again?”. For a character based on a soft toy, Pooh bear has a remarkably scientific outlook on life!

He tries this several times and then by and by develops the game into a race by dropping two cones at once and trying to guess which will emerge on the other side of the bridge first as the race winner. When it was time to leave to go home for tea Pooh had won, by guessing the first cone correctly, 36 times, but lost 28 times. Milne suggests that this means that Pooh was – well, actually, the narrative stops short of making any kind of judgement as to Pooh’s predictive abilities. Milne explains it by saying that Pooh was: “well, you take twenty-eight from thirty-six, and that’s what he was”.

I was reading this story to my daughter at bedtime, and upon seeing this set of scores, the psychologist-statistician in me rather came to the fore. There is a statistical test of whether that distribution of scores – 36 correct and 28 incorrect – is likely to be due to chance, or not. If performance were at chance level, then this would suggest that Pooh was really only guessing which cone emerged from under the bridge first. The alternative possibility is that this number of correct predictions would be unlikely to be due to chance. In that case one could argue that Winnie the Pooh was applying some logic or skill of judgement in order to make consistently correct, above chance level predictions on the outcomes of Poohstick races. So, the first thing I did this morning was run the test and see!

The statistical test in question is called the “Chi-Square Goodness of Fit Test” (“chi” is pronounced like “sky” without the “s”). You can look up a technical description of it on Wikipedia, but I’ll try and provide a more straightforward description here.

It works by comparing real-life scores or data (here it is Pooh’s tally of Poohstick race results) with perfect 50-50 chance level of performance. In the case of Poohsticks, for a score of 36 vs. 28 there must have been 64 races in total (36 + 24 = 64). For 64 Poohstick races, the perfect 50-50 chance level of performance is 32 guessed correctly and 32 guessed incorrectly. The chi-square test helps us to decide what we should make of a change of 4 either side of that (32 – 4 = 28 and 32 + 4 = 36). There are two possible outcomes – probably chance level of performance (consistent with guessing) or probably non-chance level of performance (consistent with applying skill and judgement).

The crucial thing that enables a decision to be made about chance or non-chance performance is that someone, somewhere made very many observations of what happens over a series of chance level 50-50 calls. Perhaps they tossed a coin very many times, each time guessing first whether heads or tails would come up, and keeping a tally of whether they were right or wrong each time. In doing this they were mapping and defining the chance level of performance. Knowing what happens by chance helps us to decide whether a new set of scores resembles chance, or something else. The decision rests on the size of the difference between the correct and incorrect calls. Differences so large that they only occur 5% of the time under chance conditions are deemed to be “statistically significant”. Such differences are usually understood to be unlikely to be due to chance, and so likely to be due to some kind of phenomenon, such as, in this example, skill at Poohsticks.

So which was it for Winnie the Pooh’s first ever set of Poohsticks scores? I ran the analysis using the computer software SPSS©. The chi-squared goodness of fit test showed that there was no significant effect, chi-square = 1.000, df = 1, p = 0.317. (NB In that last sentence I have reported the chi-squared test statistics in the same way that scientists would do in a research paper; “df” stands for degrees of freedom, and “p” stands for “probability”.) No effect means that Pooh was operating at chance level of performance. This tells us that anyone could obtain a score of 36 correct predictions and 28 incorrect predictions just by guessing the outcome of a series of Poohsticks races without using any skill or judgement.

Based on his original Poohsticks predictions you could argue that there is no evidence of Winnie the Pooh being anything other than a Bear of Very Little Brain. But let’s cut him some slack – he did invent the stillpopular pursuit of dropping cones and twigs into water on one side of a bridge before dashing across to watch them emerge on the other side. Poohsticks is a wonderful pastime in itself regardless of whether one can guess in what order they will appear. But just for the competitively minded out there – you would need to be correct in at least 40 out of 64 Poohsticks races in order for the chi-square test to return a significant result. Better get practising!

Post script
I posted the above on 2 July 2012. Today, 6th June 2013 a contributor, Eric, points out that I was not the first person to whom the idea of performing a chi-square on Winnie the Pooh's performance at Pooh sticks occurred! Click here for an article in the journal "Teaching Statistics" by Eric D. Nordmoe which preceeds my effort by 8 years! Apologies, Eric.