Two Plus Two Newer Archives  

Go Back   Two Plus Two Newer Archives > General Gambling > Probability
FAQ Community Calendar Today's Posts Search

Reply
 
Thread Tools Display Modes
  #1  
Old 12-31-2006, 01:51 AM
Daisydog Daisydog is offline
Member
 
Join Date: May 2006
Posts: 60
Default Statistics Question (hard, I think)

Let X be a random variable with an unknown distribution

Let X_bar(n) be a sample mean calculated based on n samples from the distribution of X

Now, suppose your goal is to estimate Var(X) and the only thing you are allowed to observe is X_bar(n) 50 times (but not the underlying 50*n individual samples used to calculate the 50 sample means). You are allowed to choose n.

The estimator you are using is n*(sample variance of X_bar(n)) which should be an unbiased estimator of Var(X), I think.

The questions is: What value of n will minimize the variance of the estimator.

Based on some simulations I've done in Excel, I'm pretty sure the answer depends, but on exactly what I am not sure.
Reply With Quote
  #2  
Old 12-31-2006, 05:02 AM
Siegmund Siegmund is offline
Senior Member
 
Join Date: Feb 2005
Posts: 1,850
Default Re: Statistics Question (hard, I think)

Sure looks to me like for large n it ought to be independent of n. You are taking the variance of 50 observations, either way.

Do you have some particular reason to believe it *isnt* independent of n (other than some "n vs. n-1" effects if n is extremely small)?
Reply With Quote
  #3  
Old 12-31-2006, 12:13 PM
AaronBrown AaronBrown is offline
Senior Member
 
Join Date: May 2005
Location: New York
Posts: 2,260
Default Re: Statistics Question (hard, I think)

In theory, the larger the n, the smaller the variance of your estimator. However, there are practical considerations that might lead you to reduce n. For example, if you cannot measure with infinite precision, if n gets too large, all your means will be the same and you will get an estimate of zero.
Reply With Quote
  #4  
Old 12-31-2006, 05:57 PM
Daisydog Daisydog is offline
Member
 
Join Date: May 2006
Posts: 60
Default Re: Statistics Question (hard, I think)

Based on simulations, I have noticed the following (which I haven't proven theoretically):

1. When X is distributed normally, it appears that the variance of the estimator is independent of n. So you get just as good of an estimate if n=1 vs. n=1,000,000.

2. When X is not normally distributed, the variance of the estimator appears to be independent of n if n is large. I assume this is because the sample means becomes normally distributed for large n.

3. For some distributions of X, the best estimator appears to be when n=1. For example, this seems to occur when X is distributed uniformly and when X is distributed Bernoulli with p=.5. Note that Skewness=0 for both of these.

4. For distributions of X that are highly skewed, like Bernoulli with p=.01, it appears the best estimators are when n is large. Using n=1 appears to be a very poor estimator.

I'd be interested if anyone can back this up with any theory.
Reply With Quote
  #5  
Old 12-31-2006, 06:51 PM
AaronBrown AaronBrown is offline
Senior Member
 
Join Date: May 2005
Location: New York
Posts: 2,260
Default Re: Statistics Question (hard, I think)

You are correct, I was not.

The variance of the sample variance of a Normal distribution is 2*s^4*(N-1)/N^2, where N = 50 in this case. That means 0.0392*s^4. If the standard deviation of X is s, then the standard deviation of X_bar is s/n^0.5. That makes the variance of the sample variance of X_bar 0.0392*s^4/n^2. But when you multiply the variance of X_bar by n to get an unbiased estimate of the variance of X, the variance increases by n^2, so you end up with the same variance of the sample variance of X_bar for all n.

For non-Normal distributions, the result depends on the kurtosis (the central fourth moment divided by the variance squared). The variance of the sample variance is:

s^4*[(k - 1)*N - k + 3]/N^3

where k is the kurtosis. k = 3 for a Normal distribution, so this reduces to the formula above. The s^4 term makes no difference, as shown above for the Normal case. So the only change as n changes is through the effect on k. If the variance of X is finite, k will tend toward 3 as n grows.

Therefore, if the distribution has "fat tails," k > 3, increasing n will decrease k and reduce the variance of the sample variance of X_bar. Thin tailed distributions, with k < 3 will see the opposite effect.
Reply With Quote
  #6  
Old 01-02-2007, 01:13 AM
Daisydog Daisydog is offline
Member
 
Join Date: May 2006
Posts: 60
Default Re: Statistics Question (hard, I think)

Thank you. Your posts are always enlightening. I had been trying to derive a theoretical formula for the variance of the sample variance of the sample mean and it was getting too complicated. It looked like it involved E(X^4) so the formula with the kurtosis is not a surprise. I was guessing it had something to do with skewness too, but I guess not. Just curious, how were you able to come up with that formula so quickly? I'm guessing an old text book or did you just have that commited to memory?

Follow up question: If X is the distribution of win/loss on individual hands of poker, would you agree that X would have a high kurtosis and thus the best estimator would use a large n, say greater than 1000? When I talk about poker hands here I mean poker hands simulated from something like Turbo Texas Hold em where I think individual hands should be independent.
Reply With Quote
  #7  
Old 01-03-2007, 08:36 PM
AaronBrown AaronBrown is offline
Senior Member
 
Join Date: May 2005
Location: New York
Posts: 2,260
Default Re: Statistics Question (hard, I think)

Thanks for the kind words, especially since I was dead wrong in my first response.

I do remember the relations of the first four cumulants, and it's just a little algebra to derive the variance of the sample variance.

Yes, I agree that the more hands you use per simulation, the more accurate your sample variance will be. But you're better off putting all the hands in one big pool.
Reply With Quote
  #8  
Old 01-03-2007, 10:10 PM
Daisydog Daisydog is offline
Member
 
Join Date: May 2006
Posts: 60
Default Re: Statistics Question (hard, I think)

[ QUOTE ]
Yes, I agree that the more hands you use per simulation, the more accurate your sample variance will be. But you're better off putting all the hands in one big pool.

[/ QUOTE ]

With the simulation program you can't put all the hands in one big pool. All it tells you is the total win/loss after n hands. Given this, I think the best way to estimate the variance is to run the simulation, say, 50 times with n=10,000 or more and use that to derive a per-hand sample variance.

These simulation programs are not the best, but I think you can use them to get a reasonable approximation of variance and how certain game condition affect it.
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 04:24 AM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.