For those who don't think 30+ BI swings are a reality... - Page 12

AssFrister · #**111** 01-10-2007, 09:40 PM

[ QUOTE ]
The moment pokey started posting I was lost

[/ QUOTE ]

But incredibly turned on though. Don't deny it.

MyTurn2Raise · #**112** 01-11-2007, 01:00 AM

[ QUOTE ]
What seems to be forgotten in this discussion is that according to the Central Limit Theorem, the SUM of the 1000 samples will be approximately normally distributed, no matter the distribution of each individual sample. If the approximation is bad, we simply need to use more samples.

Correspondingly, if our 100 hand samples are not close to normally distributed, we should increase the number of hands in each chunk, and the accuracy of the model will increase.

This seems to be an easier approach than trying to come up with a precise hand distribution, and then simulating 100k of those.

[/ QUOTE ]

yeah, i've been meaning to post. I went through my old class notes and this is exactly right.

BadMongo · #**113** 01-11-2007, 02:20 PM

[ QUOTE ]
BadMongo, can you explain the formula you're using in the simulation?

=SQRT(-2*LN(RAND()))*SIN(2*PI()*RAND())

I'm good at other kinds of math, but I suck at statistics. So don't hesitate to get technical, but also don't assume that I already know every theorem in statistics.

[/ QUOTE ]

Yeah, I probably should have explained this in my original post. It's not too complicated.

The formula is known as the Box-Muller Transformation. Basically, it allows us to transform a continuous distribution to a normal distribution. If we start with x1 and x2 which are uniformly and independently distributed values between 0 and 1, we can transform them into z1 and z2 which will be independently and normally distributed with mean = 0 and variance = 1 using Box-Muller. If you're interested in a more technical discussion of Box-Muller, you can check this out.

In excel, you can generate x1 and x2 easily by using the RAND() function which returns a random value between 0 and 1, and can be reset by pressing F9. So the RAND() part in the formula above is just our x1 and x2. The formula then takes these values and returns z2, which we know is normally distributed. So, this gives us a method to generate a random sample from a normal distribution, instead of a continuous one. The other columns are used to scale the mean and variance, sum the samples together, and convert the result into buy-ins, respectively. Thus, each time F9 is pressed, a new set of 1000 random samples from a normal distribution will be generated, each one telling us how much was won/lost in buy-ins over it's respective chunk of 100 hands. We can then determine how much was won/lost over several chunks of 100 hands by adding the chunks together.

Pokey · #**114** 01-13-2007, 01:35 AM

[ QUOTE ]

What seems to be forgotten in this discussion is that according to the Central Limit Theorem, the SUM of the 1000 samples will be approximately normally distributed, no matter the distribution of each individual sample. If the approximation is bad, we simply need to use more samples.

[/ QUOTE ]

This is clearly true. Furthermore, while I have not looked closely at the distribution I believe it to be close enough to normal that the sum of results from should closely approximate normal fairly quickly. A potential problem is the amazingly large outliers that are maddeningly frequent -- stackings (positive and negative both) will usually read as a 10+ sigma event, but happen every couple hundred hands on average. You'll need to include a hefty number of hands in your sample before it really starts to look normal. Perhaps samples of 1000 (played) hands would do the job; maybe it would require more, or maybe it would require less. I really haven't looked into this question very closely.

I want to reiterate that my biggest exceptions to the original model were in using the SD/100 number that Poker Tracker spits out and in looking at total hands dealt, rather than hands played. I believe that if you were to eliminate non-played hands from the distribution and calculate the standard deviation the correct way, the 100-hand samples would be close enough to normal that the original methodology would be acceptable.

Note well: the resulting distributions would look nothing like those in the original post from this thread. When you calculate the numbers correctly, you'll find that the winrate-to-standard-deviation ratio is VASTLY different than what most people have reported to them by Poker Tracker. Specifically, if we compare the standard deviation to the winrate, we will find that for the vast majority of players the standard deviation divided by winrate is MUCH smaller when these numbers are calculated correctly than when these numbers are simply taken from Poker Tracker. As a result, the simulated paths should trend upward more sharply and have far less volatility than the Poker Tracker numbers would have us believe.

StarRain · #**115** 01-13-2007, 02:13 AM

I guess I am very new to these stats, but can someone please tell me what does SD mean?

Rotterdaum · #**116** 01-13-2007, 02:21 AM

[ QUOTE ]
[ QUOTE ]
FWIW the highest reported SD in the SSNL survey was just under 75, but its very likely that this is over a really small sample.

[/ QUOTE ]

I have a bld DB with a 200k hand sample, 90 bb/100 SD.

[/ QUOTE ]

the thing with this statement, is that althoug you have him as a 5ptbb winner over this stretch, it doesn't represent his winrate, cause his SD is so high. His true winrate can be between -xptbbb/100 and 15+ptbb/100

Kristian · #**117** 01-13-2007, 09:28 AM

I don't really see the point of seperating played and non-played hands, because I don't think it will affect the model significantly. If we are dealing with a 25% VPIP player, analyzing samples of 100 played hands, and 400 hands should give us the same level of convergence towards a normal distribution, since the 300 non-played hands just add 300 zeroes. To use the played hand model we would still need to add those 300 zeroes at some point in order to compare it to our own results.

What remains is a proper analysis of how many hands are necessary to get a good approximation to a normal distribution. What is considered 'good' is up for debate obviously. I do not have any experience at this sort of analysis, but if no one else steps up, I will get around to it at some point.

Kristian · #**118** 01-13-2007, 09:30 AM

[ QUOTE ]
[ QUOTE ]
[ QUOTE ]
FWIW the highest reported SD in the SSNL survey was just under 75, but its very likely that this is over a really small sample.

[/ QUOTE ]

I have a bld DB with a 200k hand sample, 90 bb/100 SD.

[/ QUOTE ]

the thing with this statement, is that althoug you have him as a 5ptbb winner over this stretch, it doesn't represent his winrate, cause his SD is so high. His true winrate can be between -xptbbb/100 and 15+ptbb/100

[/ QUOTE ]

Also, as has been explained by Pokey, the Poker Tracker SD number is almost totally useless.

Kristian · #**119** 01-13-2007, 09:35 AM

[ QUOTE ]
I guess I am very new to these stats, but can someone please tell me what does SD mean?

[/ QUOTE ]

Standard Deviation

nextgenneo · #**120** 01-13-2007, 10:22 AM

Yo, ridic post. Awesome work