HUSNG Statistics Project - Page 2

marv · #11 11-29-2006, 10:17 AM

[ QUOTE ]

I have the results of over 900k HUSNGs and 132k players that span several months of this year. Rather than sit on this information I'd like to contribute to the forum by compiling useful statistics on... everything.

The things I do have:
- Buyin
- Time of Day
- Result from 2 and 4 man games

Things I do not have:
- A lot of data from the tubos and non-increasing blinds
- Durations

I need 2+2's help to determine what data to extract and how to do it. For example, you want to compare $50s and $100s? How would you do it? You want to know what winrates are achievable? How would you do it? Winning percentage quartiles or what? Want to know what time of day the most fish play? Come up with the rules to do it and I'll get the results.

All ideas are welcome, even if you cant think of how to do it.

Nichomacheo

[/ QUOTE ]

Another thing you might try to do is rank the players (or even compute Elo-type ratings as in chess) rather than just look at a simple statistic like 1) ROI or 2) dollars/day. The problem with 1 is that it will favour people who avoid the strong players, while 2 will favour the grinders. This may only be possible with HU games.

Working out who is the 'strongest' SnG player and seeing where you rank sounds like much more fun.

(I'm trying to do this for heads-up cash games.)

Marv

Indiana · #12 11-29-2006, 11:37 AM

Ok here are my thoughts. Not having duration is huge because the big ? with HU matches is whether or not a fast and aggro style with a lower ROI is better $EV than a cautious style with a higher ROI that takes longer per match. Because you have so few variables, we will have to view this as a tabular/observational task. My interest here would be to evaluate the "learning curve" effect. Bascially, within each limit of play I would look at the overall win rate, then look at the win rate after removing the first 100 games played per player, then look at it after removing the first 200 games played, etc etc...Naturally, this analysis should be on players with at least 400 or so games played.

Another analysis we should do is a prediction of winrate in a logistic model. I'd set it up to regress win rates against buyin(mandatory), time of day(perhaps categorized into morning, evening, night), and number of games played by this player. From this we would get very useful info like the amount of % win/loss we can expect by playing under different conditions.

Indy

Nichomacheo · #13 11-29-2006, 03:54 PM

[ QUOTE ]
Must be some kind of sharkscope manager.

[/ QUOTE ]

Not Sharkscope at all. This is not what I want to talk about, so lets stick to the project.

[ QUOTE ]
then calculate an average winrate for the different buyins.

[/ QUOTE ]

This is a popular request, but, eh, its 50% across the entire population. Think about it: for every 1 win, there is 1 loss. Four-man games will be different though, but thats not what you asked.

[ QUOTE ]
PS if u want me to analyze the data in SAS

[/ QUOTE ]

Dont know what SAS is, but, the data is not in a standard Access database. I had to create my own directory hasing scheme to make it fast.

[ QUOTE ]
I don't know HOW he finds the time while still playing a dozen games or so a day to record thousands of games

[/ QUOTE ]

Its called multitasking, jackass. [img]/images/graemlins/wink.gif[/img]

[ QUOTE ]
I think the best way to analyse these is by stakes, then by player, then by winrate.

[/ QUOTE ]

Penguin, great response as always. I was thinking about doing something like this and I'm glad you've put some thought into it. I'm going to spend my operations research class today thinking about it and with everyone's help I want to do something like what you were saying. At a high level, it would be breaking the winrates up for each buyin (ie what % of players at the $50s have a winrate 45-50%, 50-55% etc). Just what Penguin said.

[ QUOTE ]
Another thing you might try to do is rank the players (or even compute Elo-type ratings as in chess) rather than just look at a simple statistic like 1) ROI or 2) dollars/day.

[/ QUOTE ]

I dont know what ELO is, but I understand what you're saying. You want to come up with a number that represents skill level. I've tried long and hard to do this, but its amazing complex. If you can come up with a smart way to do it that'll factor in the different buyins etc, then I'd do it.

[ QUOTE ]
My interest here would be to evaluate the "learning curve" effect. Bascially, within each limit of play I would look at the overall win rate, then look at the win rate after removing the first 100 games played per player, then look at it after removing the first 200 games played, etc etc...Naturally, this analysis should be on players with at least 400 or so games played.

[/ QUOTE ]

I dont think this would be informational. My data is not 100% complete for any period of time. Its a sampling of 900k games out of... who knows. I probably have 60-75% of the game results. Any analysis that involves profit would be pointless (ROI wouldnt). I dont think games 0-100 of a sample would be any different from 200-300 other than variance. Unless you follow a player's ENTIRE results (without blanks), you couldnt do the analysis you're talking about.

[ QUOTE ]
Another analysis we should do is a prediction of winrate in a logistic model. I'd set it up to regress win rates against buyin(mandatory), time of day(perhaps categorized into morning, evening, night), and number of games played by this player. From this we would get very useful info like the amount of % win/loss we can expect by playing under different conditions.

[/ QUOTE ]

I like this, but you'd have to explain how to do it.

Goldmund · #14 11-29-2006, 04:19 PM

Not sure if any of the previous posters have suggested this, but tracking the win% of sets of players with 100+ game samples at different levels seems interesting. That way you get to see how a 63% win-player on the 50$ buy-in level performs on the 100$ level etc. Goldmund

Nichomacheo · #15 11-29-2006, 05:06 PM

[ QUOTE ]
Not sure if any of the previous posters have suggested this, but tracking the win% of sets of players with 100+ game samples at different levels seems interesting. That way you get to see how a 63% win-player on the 50$ buy-in level performs on the 100$ level etc. Goldmund

[/ QUOTE ]

I could do that. I'm not sure how many people have 100+ games at multiple levels. I'm sure there are plenty, but how many do I need to have a good result?

PrayingMantis · #16 11-29-2006, 06:32 PM

I'm thinking about something like this:

For each buy-in level, create this graph, in which each unique player is represented by a point on on a 2D graph: X axis for how many games he played, and Y axis for his % win rate (0-100). Now on a Z axis we can see how many players have same certain X AND Y (i.e, in cases when more than one point is in the same place on the X,Y graph).

It is a 3D graph basically. I hope this is doable and that it makes sense.

Nichomacheo · #17 11-29-2006, 09:01 PM

[ QUOTE ]
I'm thinking about something like this:

For each buy-in level, create this graph, in which each unique player is represented by a point on on a 2D graph: X axis for how many games he played, and Y axis for his % win rate (0-100). Now on a Z axis we can see how many players have same certain X AND Y (i.e, in cases when more than one point is in the same place on the X,Y graph).

It is a 3D graph basically. I hope this is doable and that it makes sense.

[/ QUOTE ]

I understand what you're saying, but I think its overkill. You cant look at something like this and easily understand the results. Two-dimensional graphs are much more user-friendly and what I think we should shoot for.

PrayingMantis · #18 11-29-2006, 09:24 PM

[ QUOTE ]

I understand what you're saying, but I think its overkill. You cant look at something like this and easily understand the results. Two-dimensional graphs are much more user-friendly and what I think we should shoot for.

[/ QUOTE ]

Perharps this is true, but I believe that the big majority of interesting information will be found on the x,y graph layer, also considering the fact that for players who played more than few dozens of games you will rarely have exact same X AND Y. Most of the the information on the Z axis will therefore "exist" only in the area where you'll find players who played a very small amount of games, which is the least interesting area anyway. Think how improbable it is to have 2 players who played _exactly_ n games (say n>100), and who at the same time have _exactly_ the same winrate.

Nichomacheo · #19 11-29-2006, 10:07 PM

How's this sound to everybody?

For every player in the database, check to see which of the buyins he has more than 100 games for. For each buyin, determine the winrate and increment a variable that corresponds to that buyin/winrate. Simply:

$50s,45%-50% = $45%-50% + 1

At the end, I'll have data on every buyin and what winrates are most likely and how they are skewed.

Hows this sound?

PrayingMantis · #20 11-29-2006, 10:18 PM

[ QUOTE ]
what winrates are most likely and how they are skewed.

[/ QUOTE ]

I think that this is the most interesting data for every buy in. The question should be how to weight different "sample sizes" coming from different players.