For those who don't think 30+ BI swings are a reality... - Page 13

Pokey · #**121** 01-13-2007, 11:41 AM

[ QUOTE ]

I don't really see the point of seperating played and non-played hands, because I don't think it will affect the model significantly. If we are dealing with a 25% VPIP player, analyzing samples of 100 played hands, and 400 hands should give us the same level of convergence towards a normal distribution, since the 300 non-played hands just add 300 zeroes. To use the played hand model we would still need to add those 300 zeroes at some point in order to compare it to our own results.

[/ QUOTE ]

The more "normal-like" a population is, the faster its samples statistics converge to normal. Adding in non-played hands makes the population DECIDEDLY non-normal.

Relative to our purposes, there is one powerful reason for removing the non-playing hands: adding in the non-played hands will change your measured variance. Take an absurd example: every time I play a hand, I win exactly 1 BB. Every time I fold preflop, I break even. If I play 25% of my hands then my variance is 0.25*(1 - 0.75)^2 + 0.75*(0 - 0.75)^2 = 0.4375, which is a standard deviation of 0.66. A normal approximation of this person's results would imply that he would have a 6.5% chance of losing money on any given hand, which is dramatically incorrect. Looking at only the played hands we would see the truth, which is that this person will never lose money by playing poker: there is no risk of any downswing and no risk of ruin.

Given that we only have a non-trivial distribution when we're PLAYING a hand, eliminating non-played hands should help our model perform more realistically. It will also dramatically speed up convergence to normality, allowing our simplifying assumption of a normal distribution to be more accurate with a smaller sample.

BadMongo · #**122** 01-13-2007, 01:30 PM

Pokey, glad you're still posting in this thread.

[ QUOTE ]
This is clearly true. Furthermore, while I have not looked closely at the distribution I believe it to be close enough to normal that the sum of results from should closely approximate normal fairly quickly. A potential problem is the amazingly large outliers that are maddeningly frequent -- stackings (positive and negative both) will usually read as a 10+ sigma event, but happen every couple hundred hands on average. You'll need to include a hefty number of hands in your sample before it really starts to look normal. Perhaps samples of 1000 (played) hands would do the job; maybe it would require more, or maybe it would require less. I really haven't looked into this question very closely.

[/ QUOTE ]

This is exactly what I was trying to explain earlier in the thread, but you seemed adament that I was wrong. Most likely I was just doing a poor job of explaining myself. I of course agree with you that 100 hands may not be enough to "normalize" the data. At the same time, we don't want to smooth things out to much, and I think 1000 hands my be excessive. This is something that needs to be adressed though. I'll see what I can do about it.

[ QUOTE ]

I want to reiterate that my biggest exceptions to the original model were in using the SD/100 number that Poker Tracker spits out and in looking at total hands dealt, rather than hands played. I believe that if you were to eliminate non-played hands from the distribution and calculate the standard deviation the correct way, the 100-hand samples would be close enough to normal that the original methodology would be acceptable.

[/ QUOTE ]

Hmm, this is an interesting idea. It should work. Everyone would have to calculate their own SD (and convert it to the proper units), however.

Thanks for your input so far Pokey.

Kristian · #**123** 01-13-2007, 02:19 PM

[ QUOTE ]
Given that we only have a non-trivial distribution when we're PLAYING a hand, eliminating non-played hands should help our model perform more realistically. It will also dramatically speed up convergence to normality, allowing our simplifying assumption of a normal distribution to be more accurate with a smaller sample.

[/ QUOTE ]

I still respectfully think you are wrong. Your example involves the variance internally in the 100 (for instance) hand sample, where we are in fact only interested in the sum of the hands, and the variance of that sum, which will not change if you add a lot of zeroes.

A more relevant example: Consider the sum of 100 normally distributed random variables. This sum is clearly also normally distributed with some mean and variance. But this sum will have the exact same distribution, mean and variance as 100 normally distributed variables added 300 variables that are always zero.

I agree that the convergence towards normality only improves with each extra played hand. But all the non-played hands don't ruin anything, since they contribute nothing. So whether you use 100 played hand samples, or 400 total hand samples does not matter, it is almost the same model (assuming 25% hands played, randomly placed). There may be some small difference since the 400 total hands will sometimes have a slightly different number of played hands, but I don't think this is a huge problem. The major difference is that the played hand model is more trouble.

Pokey · #**124** 01-13-2007, 02:35 PM

[ QUOTE ]

Your example involves the variance internally in the 100 (for instance) hand sample, where we are in fact only interested in the sum of the hands, and the variance of that sum, which will not change if you add a lot of zeroes.

[/ QUOTE ]

For practical purposes, we would calculate the variance by looking at all hands (either all hands dealt as you suggest, or all hands played as I suggest) and then calculate the variance as the sum of the squared deviations from the mean of the distribution. If we calculate the variance this way, then adding in zeroes WILL change the measured variance of the sample. For players with a nearly-zero winrate, the added zeroes will "stabilize" the sample, and make the perceived variance smaller. However, for a sample with a large (or very negative) winrate, the addition of these zeroes would serve to INCREASE the measured variance of the sample.

You are correct that -- IF we include exactly 300 zeroes in every sample -- the mean and variance of the sum of 400 dealt hands will be identical to the mean and variance of the sum of 100 played hands. However, that's something we would get as a result of the model. I'm talking about how to correctly parameterize the model, and to do so we'll need a sample mean and sample standard deviation of winrate per hand (or per 100, or per 1000, or whatever it is that we decide to concentrate on). My point is not that including a bunch of zeroes would screw up the simulations, but rather that it would screw up the parameterization of the model that is crucial for creating realistic simulations.

If you were hell-bent on including the non-played hands, I suggest a bootstrap method for calculating the sample mean and deviation: draw several thousand 100-hand samples (either with or without replacement -- it should be mostly irrelevant with a large enough pool of hands) and calculate the mean and standard deviation of those samples. Note: this will wind up being much more work than my original proposal of simply truncating the zeroes and going from there.

One final point: adding in the zeroes gains us nothing. You haven't really suggested a reason why having the zeroes would make the parameterization BETTER. Given that there are potential drawbacks but no identifiable benefits, it seems cleaner to just eliminate them. Again, if you want to add them back in when you're creating your simulations, that would be easy and perfectly acceptable. Say you find the distribution of your played hands, and you have a VPIP of 25%. In that case, your simulation would spit out a zero 75% of the time and draw from the created distribution the other 25% of the time. Chunk them into 1000-hand blocks and you've got a sample path.

Piece of Cake · #**125** 01-13-2007, 02:54 PM

[ QUOTE ]

The last one took several tries to produce, but it's possible nonetheless. As you can see, 100K hands is nothing with those stats.

[/ QUOTE ]

Pokey's issues aside (and of course they shouldn't be pushed aside since they are questioning the very underlying assumptions for the test) but...

Am I to understand you're running this over and over until you get an outlier trial to show us? What's the hypothesis being tested? That it's merely possible given your assumptions? Heck if I simulate anything with a nonzero probability enough times, I will eventually get to an N number of trials where even the unluckiest of events is now likely to have occurred. Even once you guys iron out your assumptions with Pokey, you simply cannot just run it over and over and say look... it still happens. A 1 in a thousand is rare, but so is 1 in a billion, but one is much much less likely. Running your simulation an infinite amount of times until you hit your big downswing doesn't prove anything.

All that said, I appreciate the work and just wanted to bump it and add some constructive.

Kristian · #**126** 01-18-2007, 10:32 AM

[ QUOTE ]
I'm talking about how to correctly parameterize the model, and to do so we'll need a sample mean and sample standard deviation of winrate per hand (or per 100, or per 1000, or whatever it is that we decide to concentrate on). My point is not that including a bunch of zeroes would screw up the simulations, but rather that it would screw up the parameterization of the model that is crucial for creating realistic simulations.

[/ QUOTE ]

I must confess I don't quite understand how you plan to use the standard deviation of an individual hand to deduce the optimal 'number of hands in a sample' parameter, without also doing a full distribution analysis of individual hands. This seems like a lot of work to me.

[ QUOTE ]

If you were hell-bent on including the non-played hands, I suggest a bootstrap method for calculating the sample mean and deviation: draw several thousand 100-hand samples (either with or without replacement -- it should be mostly irrelevant with a large enough pool of hands) and calculate the mean and standard deviation of those samples. Note: this will wind up being much more work than my original proposal of simply truncating the zeroes and going from there.

[/ QUOTE ]

This is exactly what I thought was the plan: To analyze how large a sample we need to get an approximate normal distribution, and in that process figure out the variance and mean of this sample size. It needs some work, but seems managable. Once again, I am not sure what the 'going from there' part implies.

[ QUOTE ]

One final point: adding in the zeroes gains us nothing. You haven't really suggested a reason why having the zeroes would make the parameterization BETTER.

[/ QUOTE ]

Generally speaking, I don't need to justify this, since what we are looking for are paths of all poker hands, including the ones we just fold. You need to have a good reason (which you may have) to focus on subsets of hands IMO, not the other way around.

I think seperating played and non-played hands is a very player specific action, since people play different amounts of hands. You would get seperate hand distributions for 18%, 24% and 30% VPIP players. Not just different mean and variance mind you, completely seperate distributions.

Of course the other method would also generate different results for different kinds of players, but in this case we are dealing with a normal distribution, and the difference will be reflected in only two parameters, mean and variance, making the model easier to use for different kinds of players.

Paxinor · #**127** 01-27-2007, 07:11 AM

mogo: if you need data to figure out the distribution, i could offer you a 1 mio hands database of NL 50...

i am also able to collect quite fast, so if you need more say 2 or 3 mio, it could be done in a mont or two...

Matt Flynn · #**128** 01-27-2007, 11:15 AM

very nice thread, and great explanations Pokey.

one of the problems we'll have for PNLHE is taking a stand on bankroll. prior books largely ignored tilt and assumed mean and sigma for limit.

in nl stackoff events skew distributions severely; variance varies more based on # of players, LAG vs TAG, and how your opponents play; and tilt is intensely magnified by being able to 10-table and by not recognizing when opponents have your number.

so what do you recommend for bankroll? very difficult problem since players are all over the map in mean and sigma, we are sampling from a skewed non-normal distribution, and 10K hands can break most players.

copoka · #**129** 02-06-2007, 06:40 PM

Great stuff, really.
One question...
What should be changed in your spreadsheet to make it work for Limit holdem (win rate in big bets/100,SD/100 hands and graph in big bets instead of buy ins)

Thanks

#**130** 04-29-2007, 09:44 PM

bump