Using Small Samples Cautiously (mathy)

Christian_Peters · #1 12-17-2006, 01:09 AM

I have been getting a lot of good advice from this forum for a little while now, and thought it was time to contribute something. I hope this is helpful to at least a few people; I would have put this in the math forum, but, like I said, ML is my home.

A nOOb thought he was clever by iso-3-betting a stranger from the CO with Q9o based on a statistical read, and got himself into trouble. “Villain is playing back at me on Q-high raged flop. WTF am I supposed to do? Q9o isn’t the ideal hand to iso-raise w/ but villain was 45/41.7. I know I only had 30 hands on him, and he hasn’t shown down one of his PFR’s - but I don’t need to have many hands to know that 45/41.7 is getting out of line. 45/41 ZOMG!”

The most objective read one can make in the world of MLHE or SSHE is a statistical read; despite not being able to use softwear and datamining at the B&M, good players will still be making statistical estimates based on observation and will be constantly revising these estimates with every hand, while bad players’ thoughts will be limited to the previous three hands (at most) or what they will eat for lunch. Despite the objective power of statistical reads, the fact that the average online player stays at a table probably less than an hour will frequently limit the number of hands the softwear can work with. Here, I will try to provide a feel for how accuracy and precision of a statistic changes as a function of the number of hands.

First, whenever we base a statistic on some number of hands, n, for a villain, we have used the hands (n) as an SRS (simple random sample) of the population (all hands villain has played). The assumption being that because the cards are dealt randomly, villain’s proclivities for any given sample (n) are an unbiased estimate for his true proclivities over all his hands. If we are interested in, say, his VPIP – we assume that as the sample size (n) increases, the calculated VPIP becomes closer and closer to his true VPIP. It is therefore natural to ask “how does our sample estimate of his true VPIP (population parameter) change as a function of sample size (n number of hands)?”
Most of the statistics we want on players are binomial in nature – they are proportions. In other words: villain either plays a hand or does not play a hand (VPIP), raises a hand or does not raise a hand (PFR), etc.

I will denote the est. for the population parameter p with "e" and it’s formula is e = X / n where X is the number of “successful” observations and n is the total number of observations; in our VPIP example, the proportion that PT or other softwear calculates is (X = number of hands villain played voluntarily) / (n = total number of hands). Simple enough. While the fact that e is an unbiased estimator of p gives us an intuitive sense for the accuracy of e, the estimates’ margin of error depends critically on something else - the population standard deviation, which we represent with "s". Of course, we do not know what the population standard deviation is, so just as with the population proportion, we must estimate it. We estimate population standard deviation s with the sample “standard error,” which we denote with SEp. The sample standard error follows:
SEp = SQRT[(e * (1 – e)) / n]. Now for the more difficult part.
One very common method of describing the accuracy of an estimated statistic is by calculating a confidence interval. We can represent the upper and lower bounds of any confidence interval with e ± m where m is our margin of error. We already have e, so we now need m. The margin of error is calculated using two things, the SEp we met above and a chosen critical value for a density curve for the distribution of the statistic we are estimating. The margin of error follows: m = Zc * SEp. When the number of observations is not small (n > ~ 25), we can use the standard normal density curve by the CLT (central limit theorem, its discussion is beyond the scope of this post). When the number of observations is very small (n < 25) we must use the critical value from the binomial distribution – this is quite complicated and will not be dealt with here; the point is that statistics calculated from samples of less than about 25 must be interpreted very cautiously. We now will consider how the sample proportion changes as a function of sample size n using the critical values from the standard normal distribution (remember, n must be at least 25, some books say 30).
Zc z critical values for the most common desired confidence intervals are:
80% - 1.282
90% - 1.645
95% - 1.960
96% - 2.054
98% - 2.326
99% - 2.807

Here’s an example (real from my PT database) putting it all together:
PT tells me that villain’s VPIP is 79.55 for 44 hands. Let’s construct a 95% confidence interval to get an idea of the accuracy of this estimate.
e = X / n = 0.2143
SEp = SQRT[(e * (1 – e)) / n] = SQRT[(0.2143 * (0.7857)) / 44] = 0.062
Zc = 1.960

95% CI is 0.2143 ± 1.960*0.062 <-> 0.2143 ± 0.12
<-> (0.3355, 0.0931) or (33.55%, 9.31%)

To give you a palpable sense of the starting hands corresponding to the upper and lower bounds of the 95% CI:
Disregarding our further deductions on villain’s hands based on his PF raising, if the only thing we know is that he has voluntarily put money in the pot, then we would know that 95% of samples with the following ranges contain his true range:
(upper-bound: 55+, A2s+, K4s+, Q6s+, J7s+, T7s+, 98s, A5o+, K8o+, Q9o+, J9o+, T9o)
and
(lower-bound: 88+, ATs+, KTs+, QJs, AJo+, KQo)
Quite a difference, no?

So, what does increasing sample size do to this?

50 hands. SEp = SQRT[ (0.2143 * (0.7857)) / 50] = 0.058;
0.2143 ± 0.11 <-> (32.43%, 10.43%)

75 hands. SEp = SQRT[ (0.2143 * (0.7857)) / 75] = 0.047;
0.2143 ± .092 <-> (30.64%, 12.22%)

100 hands. SEp = SQRT[ (0.2143 * (0.7857)) / 100] = 0.041;
0.2143 ± 0.080 <-> (29.47%, 13.39%)

150 hands. SEp = SQRT[ (0.2143 * (0.7857)) / 150] = 0.033;
0.2143 ± 0.065 <-> (27.90%, 14.93%)

200 hands. SEp = SQRT[ (0.2143 * (0.7857)) / 200] = 0.029;
0.2143 ± 0.057 <-> (27.13%, 15.73%)

1000 hands. SEp = SQRT[ (0.2143 * (0.7857)) / 1000] = 0.013; 0.2143 ± 0.025 <-> (23.93%, 18.93%)

At 1000 hands, disregarding our further deductions on villain’s hands based on his PF raising, if the only thing we know is that he has voluntarily put money in the pot, then we would know that 95% of samples with the following ranges contain his true range:
(upper-bound: 66+, A2s+, K6s+, Q8s+, J8s+, T9s, A8o+, K9o+, QTo+, Jto)
and
(lower-bound: 66+, A5s+, K9s+, Q9s+, J9s+, T9s, A9o+, KTo+, QTo+)
Much better, no?

But there’s more. We can reverse the idea – that is, we can start with a given margin of error (whatever we arbitrarily desire) and determine the number of hands we would need to observe to estimate at any fixed level of confidence with our pre-chosen margin of error.

Sample size for a desired margin of error follows:
n = [(Zc / m)^2 * e (1 – e)]

For example, if we wanted a margin of error of 4% (0.04) at a confidence level of 90% for the above villain’s PFR: n = [(1.645 / 0.02)^2 * 0.2143 (0.7857)] = 284.77
We would need to observe 285 hands.

bennyhana · #2 12-17-2006, 04:44 AM

ltdr,thesse are ;poset for week nitse, not weekenet kninght

fabadam · #3 12-17-2006, 07:17 AM

I kind of like it. I will have it in my remind list for a while, because it's been too long I did this statistical math stuff.

FWIW, I think realistically a 95% confidence interval is asking too much in typical micro-limits environments. The reality of the matter is that you have between 30 and 100 hands of data for most villains, if that.

I'm looking for ways to corroborate stats based on small samples with other stats. This is one reason why I keep people's BB/100 winrate in my PAHUD popups. Yes it's meaningless by itself, but if I see someone with 74/50 stats after 25 hands, I look at the winrate. If it's 250 BB/100, they've simply had a hot run for a few orbits. If it's -98 BB/100, he's most likely a donk.
None of these are certainties, and nowhere near 95% confidence, but it's the best I've got, so it will have to do.

What would be very interesting, is to see which stats converge quickly, and which don't. My gut feeling is that VPIP/PFR and overall AF are fairly good after 100 hands, but nearly everything else is entirely meaningless by itself.

fretelÃ¶o · #4 12-17-2006, 11:43 AM

Thanks for this post!

two things I got lost on: [ QUOTE ]
The margin of error follows: m = Zc * SEp.

[/ QUOTE ]

Those Zc values - where do they come from? Calculated by means "which are beyond the scope of this post"? [img]/images/graemlins/wink.gif[/img] If so, I'm scared and very content. [img]/images/graemlins/laugh.gif[/img]

[ QUOTE ]
e = X / n = 0.2143

[/ QUOTE ]

This I don't get. If he has a vpip of ~80 over ~40 hands, how do you come up with .2? Shouldn't it be 40/80 = .5?

Paxosmotic · #5 12-17-2006, 12:25 PM

[ QUOTE ]
Those Zc values - where do they come from? Calculated by means "which are beyond the scope of this post"? [img]/images/graemlins/wink.gif[/img] If so, I'm scared and very content. [img]/images/graemlins/laugh.gif[/img]

[/ QUOTE ]
The Z values probably are a little beyond the scope of the thread, but he is correct on their values so the math is solid there.

Christian_Peters · #6 12-17-2006, 06:41 PM

Yeah, when I first got into poker, I had this fantacy that I could represent any decision with math, and then I quickly gave that idea up and just started working on hands and pattern recognition, practice, etc. But I've got time off from school and I think I'm going to start working out some kind of perfect preflop game based totally on stats. We'll see. I probably should have just left out all the ci discussion and just given examples of what sample size does to accuracy of reads. Ohh well.

Xhad · #7 12-17-2006, 07:41 PM

Good post. Another related point is that it's good to know what kinds of poker decisions are close and which ones are not close. For example, Q9o should generally be folded to a raise, by a very wide margin, so you need to be very sure of your read to do something else. On the other hand if you made it like AJs in UTG1 facing an UTG open-raise, all three options are so close in value that even a small read can sway it.

Another thing to realize is that not all read-based mistakes are created equal. Generally, if you're not sure of a read, you do best by making the smallest-variance decision because those tend to make the smallest mistakes even if you're wrong. It takes a stronger read for me to 3bet a normal "folding" hand, than to fold a normal 3betting hand. On the other hand if it's the turn and I get checkraised, it takes a stronger read for me to fold a hand that should normally be called down, than to call down a hand that would normally be folded.

Sigurd · #8 12-17-2006, 08:49 PM

Can anybody provide a link to a page that explains the general math used here?

I'm pretty sure that I would get all this, if I just knew the Danish equivalent of all the "math words".

Christian_Peters · #9 12-17-2006, 09:05 PM

[ QUOTE ]
Can anybody provide a link to a page that explains the general math used here?

I'm pretty sure that I would get all this, if I just knew the Danish equivalent of all the "math words".

[/ QUOTE ]

I don't know what the Danish would be, but if you do a Wiki search for the following, you should be close:
"inference for proportions"
"large-sample confidence interval for a population proportion"
or "sample size for desired margin of error."

Tell me if none of this helps and I do a little searching.

-c

bozlax · #10 12-17-2006, 09:22 PM