Two Plus Two Newer Archives - View Single Post - Latest cliffsnotes on Absolute soulreading.

flight2q · #80 09-24-2007, 07:00 AM

Josem, here is how you test whether there is a correlation between wolf howling and full moons. I will describe likelihood ratios. There are other ways, but this is very standard. What knappis was doing was close to a likelihood ratio, that's why I was saying he was on a good track.

You plan to observe wolves on Df days with a full moon and Dn days with no full moon. You will end up recording Hf howlings with full moons and Hn howlings without. You premise that each day the wolves howl with some probability, either Pf or Pn, depending on whether there is a full moon, and that each day is independent. You don't know the values of Pf and Pn.

If howlings are uncorrelated to full moons, then Pf=Pn. You set this as the null hypothesis. For the test, you calculate the likelihood (so to speak) of the observed data having occurred if the null hypothesis were true; and also if it were false. Since there are many possibilities for the pair <Pf,Pn> either way, you decide to pick the pair where the likelihood is maximal. So the uncorrelated and correlated likelihoods used will be:

Lu = [(Hf+Hn)/(Df+Dn)]^(Hf+Hn) * [1-(Hf+Hn)/(Df+Dn)]^(Df+Dn-Hf-Hn)
Lc = [Hf/Df]^(Hf) * [1-Hf/Df]^(Df-Hf) * [Hn/Dn]^(Hn) * [1-Hn/Dn]^(Dn-Hn)

And we call Lc/Lu the likelihood ratio. The way this experiment is set up, the likelihood ratio is guaranteed to be at least 1.

We pick a value in advance of the experiment, call it L. We reject the null hypothesis if the likelihood ratio is greater than L. (This is a two-sided test. For a one-sided test, we would reject the null hypothesis only if both Lc/Lu>L and Hf/Df>Hn/Dn.) What we choose for L depends on various things, but it depends a lot on how we will use the results of the experiment. If the purpose of the experiment is merely to decide whether we should invest time and resources in a full scale investigation, then we can set L to a moderate value. If the purpose of the experiment is to take a legal action, then we would want a rather high value. In any case, L>>1. We also need to consider any effects that might arise if our data is not so good - for example, if our data came from going over records of people reporting wolf howlings, who might mention a full moon if there were one, but not mention it otherwise.

How good is our experiment? To evaluate this it would be useful if we had an a priori probability distribution on the pair <Pf,Pn>. Then we could crunch a bunch of numbers and determine two probabilities of likely interest. The first is the probability that we reject the null hypothesis, even though it is true. The second is the probability that we fail to reject the null hypothesis even though there is significant correlation. These are often called Type I and II errors. We can do these calculations for different values of L, our threshold for rejecting the null hypothesis, and tradeoff these risks.

Generally, we don't have an a priori distribution, but we can make these calculations for a few different values for the pair <Pf,Pn> and ask ourselves what we think of the effectiveness of the experiment. If there is significant probability of failing to reject the null hypothesis, even though there is correlation between howling and full moons, then we say that the power of the experiment is low. A typical way to increase the power is to increase the sample size, Df and Dn.

If our use of the data is such that we will use a rather high value for L, we must take especial care to ask ourselves a lot of questions about whether our model is correct or our data is invalid. For example, our model might be bad if there is serial correlation in the observations (our calculations assumed they were independent from day to day). Our collected data can be tainted in various ways, and we have to question our procedures if we rejected some of the data (e.g., we assume the wolves heard about our experiment and started deliberately howling when there was no full moon). For some of our concerns we may be able to devise tests to check our assumptions - i.e., a test for whether there is serial correlation.