(Long & Stats heavy)- Calculating confidence levels for SNGs.

fraz8000 · #1 06-26-2007, 10:38 AM

I hope this isn’t too esoteric/long/confused but I really need help on this. If anyone knows a bit about stats and could help fill me in it would be greatly appreciated. I know a little bit about stats so you won’t have to dumb it down [img]/images/graemlins/smile.gif[/img].

Say I have a sample of 2000 sngs for a Party $22 buyin. Say the ROI is about 10%, with

12%- 1st
12%- 2nd
12%-3rd
64%- OOTM

finishes (I know the maths doesn’t add up exactly but I made these up for simplicity).

So the mean of the tournaments is $2.20 per tourney, with standard deviation ‘sd’.

I want to create confidence intervals for my sng profitability ($ per tourney). Up until a while ago I used the formula-

[( X – 2.2) / sd]
(Where X is the hypothesized population mean)

and found values for X which give sample statistics relating to the 5% two-tailed test for the standard normal distribution.

Now I realize this is wrong, because the results for a SNG are obviously not normally distributed, and have only 4 possible values (-$22, $18, $38, $78) with nothing in between.

Is there a way to create confidence-intervals (or some equivalent) for non-normally distributed populations?

I thought about dividing the sample into groups of ‘n’ SNGs, with 2000/n groups. Obviously the larger the value of n, the more the group means will converge to a normal distribution, but the less groups I will have overall. The overall mean of the total group profits would naturally be [n*($2.20)]. From what I understand, the overall standard deviation of the sample of the groups will be [(sd)*root(n)]. So using the standardized normal process for the groups of 50, I could then divide the confidence interval by 50 to get the confidence level for $ per tourney.

This is all I can think of and seems a bit tedious. I’m not even sure if it solves the problem. Are there any other ways?

rufus · #2 06-26-2007, 01:17 PM

There are ways to create confidence intervals for non-normal distributions, but I don't think that's really what you want to do here.

Really, what you're looking at is a process with four possible results, and you've run it a bunch of times. If we make the questionable assumption that the trials are independent, then the traditional normal distribution approach applies.

Basically, your 95% confidence interval is going to be roughly to 2 divided by the square root of the number of trials.

So, with 2000 trials your 95% confidence interval is roughly plus/minus (1/sqrt(2000))=2.3%. (There's some fudging possible to make all of this more precise, but you really don't care about it.)

So, based on the assumptions, and data, you can say that that you are 95% confident that you've got a 9.7-14.3% chance of taking first and so on.

You can use this to calculate an expected return and so on, relatively easily.

(Mathematically, it's possible to trade off a stronger confidence for a larger interval, and to adjust the confidence interval based on the apparent probability, but both of those are really splitting hairs for something like this.)

Now, if you're concerned that the SNG's are not independent, you can test that hypothesis by, for example, splitting the 2000 trials into two sets of 1000 trials and seeing how close the averages are between the two subsets. (Of course, to do this with good confidence, you'd have to get a lot of trials, which could be very expensive.)

AaronBrown · #3 06-26-2007, 09:21 PM

rufus is correct, but you can do this exactly if you want. If the outcomes are independent, you have a multinomial distribution with three parameters:

p1 = the chance of finishing first
p2 = the chance of finishing second
p3 = the chance of finishing third

Your ROI is (100*p1 + 60*p2 + 40*p3)/22 - 1. The chance of observing exactly 240 firsts, seconds and thirds out of 2,000 trials is 2,000!*p1^240*p2^240*p3^240*(1-p1-p2-p3)^1,280/(240!*240!*240!*1,280!). The problem is to solve for the lowest ROI such that this probability equals 0.025 (also for the highest ROI).

If you want to solve this problem, I can show you how to do it. It's a bit tedious to type it in, however, so if you're satisfied with the Normal approximation, I won't bother.

fraz8000 · #4 06-28-2007, 03:08 AM

rufus and aaron thanks a lot for your replies.

I think I am satisfied with the normal approximation, but aaron could you please briefly explain how I could get the ROI figures the multinomial way? I am assuming the multinomial confidence level would be slightly more accurate because it is not based on the false assumption of normality, is this correct? I am interested in seeing how big the discrepancy is.

If it's a lot of trouble to type you could just give me the general overview and I can look it up. I've gone over my stats notes on multinomials, and I can't think of any way to work out confidence limits except for just plugging in values which wouldn't be practical given such big factorials.

Thanks

jogsxyz · #5 06-28-2007, 11:05 AM

Settle for the approximation.

For 10-player SnGs the aggregate table standard deviation is 1.67 buy-ins for one SnGs.
That ought to be close enough and easy to solve.

rufus · #6 06-28-2007, 02:23 PM

[ QUOTE ]

If it's a lot of trouble to type you could just give me the general overview and I can look it up. I've gone over my stats notes on multinomials, and I can't think of any way to work out confidence limits except for just plugging in values which wouldn't be practical given such big factorials.

[/ QUOTE ]

I think Aaron was talking about more precise ROI approximations using the assumption that SnG's are independent (i.e. assuming a Normal distribution).

Since the system is really rather simple, it's easy to come up with an EROI range:
Start with the 95% confidence interval for 2000 trials: .0224

To find the 'best case' assume maximal probabilities for the best and minimal from the worst:
.12+.023=.143
.12-.023=.097
.64-.023=.617
This gives an upper limit:
(.143*100+.12*60+.097*40)-22=(14.3+8.58+3.88)-22=4.76
And reverse it for the worst case:
(.097*100+.097*60+.143*40)-22=(9.7+5.82+5.72)-22=-0.76

So, you have 95% confidence that the EROI is within (-0.76,4.76).

Taking a marginally more sophisticated approach, the confidence intervals are actually smaller for frequencies that are far from .05, - in this case about +/- 1.6% so there is 95% confidence that the EROI is within (0.00,4.00).

Another improvement is that since the extremes are both unlikely it's possible to intervals with weaker confidence for the EROI, which is more involved - so I'm too lazy to do it --, but will probably produce an even smaller 95% EROI confidence interval -- I'd guess around (0.5,3.5).

AaronBrown · #7 06-28-2007, 08:44 PM

Let me show the method using smaller numbers, and you can see if it's worth the trouble for the full problem. Before you work through the whole thing step-by-step, you should know I made an early mistake, so the answer is not correct but the method is.

Your tournament costs $1 to enter and pays $7 for first and $3 for second. You've entered three times, and won 1st place once, 2nd place once and out of the money the third time. Let p1 be your chance of finishing first and p2 be your chance of finishing second. Your ROI is:

7*p1 + 3*p2 - 1

and the chance of observing the three finishes is:

6*p1*p2*(1 - p1 - p2)

We want to maximize and minimize (two separate calculations) the ROI, subject to making the probability equal 0.025. We could solve for p2 in terms of p1 using the constraint and substitute that into the ROI, but that would get unwieldy for more variables (as in your problem with three places). Instead, we'll use Lagrange multipliers.

7*p1 + 3*p2 - 1 - v*[6*p1*p2*(1 - p1 - p2) - 0.025)]

Take the derivative with respect to p1:

7 - v*[6*p2*(1 - p2) - 12*p1*p2] = 0
7 - 6*v*p2*[1 - p2 - 2*p1] = 0
7/(6*v*p2) = 1 - p2 - 2*p1
p1 = 1/2 - p2/2 - 7/(12*v*p2)

Let's put that back into the original equation:

3/4 - 19*p2/4 - 49/(24*v*p2) - v*(3 - 3*p2 - 0.025)

Take the derivative with respect to p2:

-19/4 + 49/(24*v*p2^2) + 3*v = 0
49/(24*v*p2^2) = 19/4 - 3*v
24*v*p2^2 = 196/(19 - 12*v)
p2^2 = 49/[6*v*(19 - 12*v)]
p2 = 7/SQRT[6*v*(19 - 12*v)]

Now the problem is to find a v such that 6*p1*p2*(1- p1 - p2) = 0.025. You can either do this numerically or using Goal Seek in Excel. You'll find v = 1.379 works, but unfortunately results in p1 = -0.549 and p2 = 1.554. I forgot to add additional constraints to make p1> 0, p2 > 0 and p1 + p2 < 1. So you see how much work even a simple problem is.

Going back and doing it right was too much trouble to type in, but you get an interval of (-36%,526%), remember it's based on only three results. -36% corresponds to p1 = 0.0476, p2 = 0.1030 and 526% corresponds to p1 = 0.856, p2 = 0.0902. It's interesting that the chance of finishing second is pretty much constant, the range is determined almost completely by different chances of first.