Two Plus Two Newer Archives - View Single Post - Statistical significance test for missing sets?

peterchi · #6 05-30-2007, 02:32 AM

[ QUOTE ]
[ QUOTE ]
The probability of getting s or fewer sets is exactly what you want (anyway there isn't much more to infer from it than that you've been unlucky in terms of flopping sets). Any statistical test with a fancy name would be just an approximative method for calculating Pr(s<=s_observed) and if you can calculate it exactly from the binomial distribution an approximation is not needed.

[/ QUOTE ]

I'm not sure I understand. If I take one coinflip and get tails, then use binomial distribution to calculate the probability of hitting 0 heads in 1 trials, I'll get - surprise surprise - 0.5. Now I'm having a hard time interpreting 0.5 as the probability of my coin being random. Any reasonable statistical significance indicator should give me a number very close to 1, shouldn't it?

[/ QUOTE ]
Not with a sample size of 1. You aren't going to get any test to give you a good answer with that.

Though, true, it's not exactly the "probability of your coin being random" as you sort of suspected.

What it really is, is the probability of observing data as or more extreme to what you observed, given that your null hypothesis (in this case, that your coin is fair) is true. (To be even more precise, I should say that it's the long-run probability of observing data as or more extreme to that which we saw, under repeated sampling from the model of our null hypothesis). This is what all Frequentist analyses (the most common approach) will give you. If this probability is very small, then we claim that there is evidence to reject the null hypothesis. Because, we would just say that if our null hypothesis were true, then what we actually observed is quite unlikely, so that is evidence that our null hypothesis actually isn't true. It's kind of backwards thinking, but unless you are a Bayesian, it's all you can do.

So, calculating the probability of observing as many sets or fewer than you did, as you did, would be exactly what you wanted. If it's less than 0.05 (a somewhat arbitrary, but the most commonly used cut-off), then you would conclude that you ran bad.