Two Plus Two Newer Archives - View Single Post - The envelope problem, and a possible solution

AaronBrown · #64 07-01-2006, 09:40 AM

[ QUOTE ]
For #2. "If you have a 50% chance of losing some amount and a 50% chance of winning twice that amount, is it a positive expected value bet?"

If "the amount" is fixed the answer is yes. However, if you have a 50% chance of winning but "the amount" is $1 when you win and "the amount" is $2 when you lose then your EV is 0.

How's that? Does that agree with common sense?

[/ QUOTE ]
So many different points are flying around that I think it's clearer if I try to answer one per post.

This is a very good statement of the frequentist position. You're not allowed to compute the expected value and have a variable in the result. So you can't say having 0.5 chance of getting $X/2 and 0.5 chance of getting 2*$X has an expected value of 1.25*$X.

The trouble is for any problem outside a textbook, you have to say this. Even inside a textbook you'll find loads of formulae showing expected value and other parameters computed in terms of variables. A statistical test shows you how to plug your measurements into a formula to get a significance level, that formula is not allowed.

It leads to impossible hair-splitting, which is why Bayesians reject the rule. For a simple example, three baseball fans are arguing over who has the best true underlying batting average in the majors. One says Twin Joe Mauer (102 hits, 260 at bats, 299 plate appearances for a 0.392 average); one says Marlin Miguel Cabrera (94 hits, 274 at bats, 337 plate appearances for a 0.343 average); one says Red Sox (Sock?) Mark Loretta (100 hits, 312 at bats, 339 plate appearances, for a 0.321 average).

Mauer's fan says the other two are clearly wrong, that the probability of two hitters with the same true underlying batting average having this much difference over this many at-bats is less than 5%. This has nothing to do with baseball, it's a pure mathematical question. All agree to accept standard assumptions, that for one player each trial has a fixed probability of success, and all trials for all players are independent. All agree that the data above are correct.

A frequentist statistician from Florida is hired to settle the question. He starts by computing the single batting average that is most likely to produce both Mauer's and Cabrera's results: 0.367. He computes that the probability a true 0.367 hitter will get 102 or more hits out of 260 at bats is 0.181, and the probability of 94 or fewer hits out of 274 at bats is 0.224. Two times the product of these probabilities (since either one could have had the higher actual average) is 0.081. This is more than 0.05 so he cannot reject at the 5% level that Mauer and Cabrera have the same true underlying batting average. The same calculation for Loretta shows a 0.021 significance, so we can reject that Loretta has the same true batting average as Mauer at the 5% level. This is more accurate than the standard t-test, because it uses the exact binomial distribution, but the results are similar, the t-test give 0.119 significance for Cabrera and 0.037 for Loretta.

The Boston fan doesn't like the result so he hires a frequentist statistician from Harvard. "Aha," he says in a thick Harvard accent, "that Florida idiot was computing in terms of a variable. Some plate appearances do not count as at bats (walks, errors, hit by pitch and sacrifices) so the number of at bats for a player was not fixed. We have to do a trinomial computation, assuming the most likely values for joint probability of hit per plate appearance and most likely individual values for probability of no at bat per plate appearance. Now the significance level for Cabrera is 0.026, so we can reject him as Mauer's equal, but Loretta gets 0.070, so his batting cannot be distinguished at the 5% level from Mauer's."

The problem is the frequentist approach is only rigorous before the experiment is performed, and only if every eventuality is defined (including the significance level; if you wait until after the experiment is performed to select your level, you have erred). Once you have real data, you have to invent a hypothetical repeatable experiment. The Florida statistician assumes each batter got the actual number of at-bats, the Harvard guy assumes each batter got the actual number of plate appearances. We could equally well make assumptions like each batter hit until he got the actual number of hits, or each batter played in the actual number of games and plate appearances were selected from some distribution, or lots of other things. Each assumption gives a different answer.

Some people devote a lot of energy to defining rules so that there is one "natural" answer to common questions. But most people have given up. The outcome of a frequentist significance test on existing data is a matter of opinion.

Bayesians throw all this away and use only confidence intervals, not significance tests. So they have no hair-splitting or inconsistencies. Unfortunately, the probability is still a matter of opinion. The only difference is the two statisticians don't have to criticize each other's work, they just have opposite conclusions because they have different priors.