Results of a CP2-7 experiment

MarkGritter · #1 04-27-2007, 03:10 AM

I wrote some code which, given a list of hands, and a list of opponent Chinese Poker/2-7 settings, optimizes the hands to maximize value against the specified opponent strategy. It does this by picking the setting with the highest value, on average, against all the relevant settings in the list.

I've started by running a small experiment with just 1000 possible hands for each player. (In reality there are about 635 billion if we distinguish suits.) Player 'A' gets 1000 hands and player 'B' gets 1000 hands, initially set up randomly and possibly illegally.

Player 'A' goes first, and based on B's random settings, constructs his hands to win the most. Call this strategy A1.

Then player B constructs a strategy, B1, that tries to beat A1. B doesn't know which cards A has in any given confrontation, but he knows how A1 will set the possible opposing hands. (Because A and B are dealt from the same deck, not all the settings are revelant to a particular hand--- in fact, given a specific A hand, only 1% of the B hands are possible, which gives us about 10 possible opponents hands on average.)

Then go on to make strategies A2, B2, etc.

Now, if there is a 'pure' solution for CP2-7, we should expect A and B to converge to an expected value close to 0. One of them might have more good hands, of course, but they should at least agree on how much this "subset" of CP is worth. It should be slight positive for one of them and slightly negative for one of them.

This does not appear to be the case. Here are the estimated values for each strategy when competing against the previous one:

A1 estimated EV: $3.834 (at $1/point) vs. random B
B1 estimated EV: $1.522 vs. A1 (1000 changed hands)
A2: $0.300 vs. B1 (884 changed hands)
B2: $0.262 vs. A2 (635 changed hands)
A3: $0.125 vs. B2 (564 changed hands)
B3: $0.277 vs. A3 (543 changed hands)
A4: $0.136 vs. B3 (562 changed hands)
B4: $0.258 vs. A4 (539 changed hands)
A5: $0.135 vs. B4 (581 changed hands)
B5: $0.295 vs. A5 (557 changed hands)
A6: $0.138 vs. B5 (592 changed hands)
B6: $0.273 vs. A6 (566 changed hands)
A7: $0.147 vs. B6 (594 changed hands)
B7: $0.298 vs. A7 (573 changed hands)
A8: $0.145 vs. B7 (592 changed hands)

Note that whichever player knows the other's pure strategy can pick a counter-strategy that provides him with a substantial positive expectation. In fact, this strategy shows that hands which have a fixed best strategy appear to be the minority.

Here are some hands I traced that switch back and forth:
Code:<hr /><pre>
A1: 6dJcKcKhKs 2c4s5d6s8c QcQhQs
A2: 6d6sQcQhQs 2c4s5d8cJc KcKhKs
A3: same
A4: QcQhQsKcKh 2c4s5d6d8c 6sJcKs
A5: 6d6sQcQhQs 2c4s5d8cJc KcKhKs (= A2)

B1: 2h3d4s5s6d 3s6h7h9sTd QdKsAh
B2: 3d3s6d9sTd 2h4s5s6h7h QdKsAh
B3: 3s4s5s9sKs 2h3d6d7hTd 6hQdAh
B4: 3d3s6d9sTd 2h4s5s6h7h QdKsAh (= B2)
B5: 3s4s5s9sKs 2h3d6d7hTd 6hQdAh (= B3)

A2: 2s5s8sTsJs 2c5c6d8h9c ThKhAh
A3: 2c2s5c8h8s 5s6d9cJsKh ThTsAh
A4: 2s5s8sTsJs 2c5c6d8h9c ThKhAh (= A2)
A5: 2c2s5c8h8s 5s6d9cJsKh ThTsAh (= A3)
A6: 2s5s8sTsJs 2c5c6d8h9c ThKhAh (= A2)
</pre><hr />

Caveats:

1. My code could have a bug. I've eyeballed its decisions and they seem to make sense. I am willing to make it available for review.

2. Things might change between 2000 hands and 635 billion. I didn't construct the sample of hands using any special method, so it is unlikely that I happened to just "get lucky", but it is possible that my sample is not large enough to exhibit enough smoothness. (Or small enough to contain many degenerate cases.) I will try to run larger samples now that I have some confidence that it works.

3. A good pure strategy might exist but not be reachable using just local optimization. You might be able to globally optimize using game theory, making sub-maximal choices for some hands to prevent being exploited by your opponent. (However, note that most game-theoretic answers involve non-pure strategies.) I am willing to provide my data set to anybody who wants to try constructing a superior pure strategy.

However, I think that this result strongly argues that a "consider all the possibilities for these 13 cards and select the best" strategy is exploitable.

4. I may not be patient enough. Perhaps the process will arrive at a pure strategy after many, many iterations. I think that is unlikely given that most hands seem to be running in relatively short cycles, and there does not seem to be any progress toward convergence--- but it's a possibility.

MarkGritter · #2 04-27-2007, 06:41 PM

Andrew Prock thinks I need to run on bigger samples, because each individual hand is directly compared against such a small number of other hands. I agree.

I made a modification to the code to prefer re-using the previous strategy's had if it is just as good as any other that has been found. This reduces the number of hands that get swapped back and forth somewhat.

I am running a 10,000-hand experiment and rerunning the 1,000-hand experiment for longer to see whether I can identify a cycle or see evidence of convergence. (However, if there is a cycle it may be fairly long.)

Phat Mack · #3 04-28-2007, 11:08 AM

I've been following this with great interest. What a great project!

Then player B constructs a strategy, B1, that tries to beat A1. B doesn't know which cards A has in any given confrontation, but he knows how A1 will set the possible opposing hands.

How does this work? Does player B oterate all the possible settings for Player A's hands, and then index an arbitrary "ballance" for the setting?

(Because A and B are dealt from the same deck, not all the settings are revelant to a particular hand--- in fact, given a specific A hand, only 1% of the B hands are possible, which gives us about 10 possible opponents hands on average.)

By the "same" deck, do you mean that both hands are coming from a 52-card deck, or is A's from a 52-card deck and B's from a 39-card deck?

MarkGritter · #4 04-28-2007, 01:42 PM

[ QUOTE ]

Then player B constructs a strategy, B1, that tries to beat A1. B doesn't know which cards A has in any given confrontation, but he knows how A1 will set the possible opposing hands.

How does this work? Does player B oterate all the possible settings for Player A's hands, and then index an arbitrary "ballance" for the setting?

[/ QUOTE ]

B examines each of his hands one by one. He looks at all possible arrangements of his hand, and how they do against the fixed layouts described in A1. He chooses the arrangement which has the highest expected value against that hand distribution. (For example, if A1 has very strong middles, B might choose to abandon the middle entirely.)

Here's an example from the current run: B's hand is 2s3c3d5h6h6s7c7s8d9cJcJhAs. The hands and arrangements in A's latest strategy that don't contain one of these cards are:

9h9sKcKhKs 2d3h4h6d7h ThQdAh
9d9hAcAdAh 2d4s5d6c9s TsJsKh
5c5dTcThTs 2h3s4c6c8h KdKsAc
8h9dTdJsQd 2d4d5d8sTs 4hAdAh
TdThQcQhQs 2h3h4c5c9d 3sKhKs
4d6dQdKdAd 2c4h5s6c7h 9sTsJs
5c5d5s9d9s 2c3h4c6d8h ThKhAc
QcQdKcKdKs 2d4d5dJsAc 9d9h9s
2c4c6cTcKc 2d3h4s5d8c 3sThKd
6cJdJsKcKh 2h4d5d6d9s QdAcAh
TcThQdQhQs 2c3s5d6c9d 6d9hKd
TcJdQcKhAc 2d3h4d7h8h 9dThTs
4d6dJdQdAd 2h5c6c8s9h 4h4sTh
TcThTsKcKd 3s5d8hJdQh 4d4h4s
4dTcTdThTs 2c2d4h8h9s QdQhAc
4c4d4sTdTh 2d3s8c9dQh 9sAcAh
4dJdQhQsKh 2d3s4s5c9h 2h8c8s
8hTsJdKcKd 2c3h5c6d7d QcQhAc
7dJdQdKdAd 3h5c8h9hTs 5sKcKh
etc. (113 possible hands out of 10000) All are treated as equally likely.

Here are some of B's possibilities, with their EV (in points) against these 113 hands listed.

-1.51327 3c3d6s7c7s 2s5h6h8d9c JcJhAs
-1.48673 3c7c9cJcJh 2s3d5h7s8d 6h6sAs
-1.38053 6h6s7c7sAs 2s3c5h8d9c 3dJcJh
-1.35398 3c6h6s7c7s 2s3d5h8d9c JcJhAs
-1.32743 3c6h9cJcJh 2s3d5h6s8d 7c7sAs

So, B picks 367JJ 24568 77A as his arrangement. (However, in the previous iteration he picked 36677 23489 JJA.)

[ QUOTE ]

(Because A and B are dealt from the same deck, not all the settings are revelant to a particular hand--- in fact, given a specific A hand, only 1% of the B hands are possible, which gives us about 10 possible opponents hands on average.)

By the "same" deck, do you mean that both hands are coming from a 52-card deck, or is A's from a 52-card deck and B's from a 39-card deck?

[/ QUOTE ]

The sets A and B are coming from a 52-card deck. I generated them separately. So, when A is considering his options, only a fraction of B's hands are immediately relevant, and similarly for B. The rest of the hands have at least one card in common and so could not occur.

It would be possible to generate them at the same time to ensure that every A hand had at least one relevant B hand, but this is not necessary in practice.

donger · #5 04-29-2007, 04:00 AM

This is really interesting, thanks for doing this. I had considered trying to script something, but I'm really rusty and was never good to start with.

In this program, how much information does A have about B's range of hands and vice versa? If he knows a lot, like tendencies (strong middles, etc), then this can go back to exploiting additional information and probably doesn't have a lot to do with the game itself. Like, for instance roshambo can be played 'perfectly' (even money) by randomizing what you throw, but if you know your opponent heavily favors rock, you can win way more often than your fair share.

I don't know a ton about game theory, etc, just what I've learned from poker, so take everything I'm saying with a grain of salt. I could easily be wrong about all this, but it's generated a lot of interesting discussion.

Murakawa · #6 04-29-2007, 04:02 AM

i wish i were high so that this would blow my mind... but wait.. it still does.

donger · #7 04-29-2007, 04:23 AM

PS.. Another thing I thought of that I forgot to mention. If you use the set-your-best-hand strategy, you shouldn't really have any tendencies because you're letting the cards you're dealt determine how you set each hand.

Like, the way I'm seeing this is.. you have a random hand, and so does your opponent.. how do you get an edge (or more like not lose one) by doing anything but setting the strongest, most balanced hand?

MarkGritter · #8 04-30-2007, 12:27 AM

Here's an example I found in a run using 10K sets (instead of just 1000 hands each.) I observed that on each iteration player A's strategy for this particular hand switched back and forth:

3d3h6c6hJs 2s4h5c7c8c KcAcAs (50%)
5c6c8cKcAc 2s3d4h6h7c 3hJsAs (50%)
5c6c7c8cKc 2s3h4h6hJs 3dAcAs (not used)

Obviously, something that B does is changing which setting (two pair in back or the flush in back) is more profitable. Here are the EV's for the various settings.

Code:<hr /><pre>
Opponent strategy EV(A-high flush) EV(K-high flush) EV(two pair)
b.4 $0.442 $0.639 $0.878
b.5 $1.170 $0.259 $0.361
b.6 $0.483 $0.612 $0.878
b.7 $1.088 $0.245 $0.388
b.8 $0.551 $0.667 $0.830
b.9 $1.116 $0.245 $0.415
b.10 $0.564 $0.667 $0.837
b.11 $1.061 $0.231 $0.415
</pre><hr />

So, looking at what B does is not so straightforward because B changes lots of relevant hands. But many of his hands were switching back and forth as well:

Code:<hr /><pre>
Even strategies Odd strategies
5s6s7s9sKs 5d6d7d9hJh 4c4dAh 5s6s7s9sKs 4c5d6d7d9h 4dJhAh
6d6sJcJdJh 4c5s8h9cKd 9sTcTs TcTsJcJdJh 4c5s6d8h9c 6s9sKd
2cTsJdJhQs 2h3s4d6s7d 5d5sKd 2c5d5sJdJh 2h3s4d6s7d TsQsKd
7d7h9d9h9s 3c4s5dTcQc JcJdKd 9d9h9sJcJd 3c4s5d7dTc 7hQcKd
5d9c9d9h9s 2c4d5s6sQd ThTsKd 5d9c9d9h9s 2c4d5s6sTh TsQdKd
4c4d4s5d5h 2c5s7d8s9h QcKhAh 5d5h5sKhAh 2c7d8s9hQc 4c4d4s
TdJdQhKsAd 3s4s5h6s8d 7d7h9s 7d8dTdJdAd 3s4s5h6s9s 7hQhKs
</pre><hr />

What appears to be happening here is that on the "even" iterations, A puts the best possible pair of aces in front. So it is not worth putting anything less than trips in front (for B) so he abandons the back and strengthens his back or middle.

But once B does so, it's attractive for A to weaken his back as well to merely a strong ace-high, which can still beat most of the trashy fronts.

But then B can respond by putting small pairs in front, which leads to A putting aces in front again.

Now, I don't know what other A hands compete against all those B hands which switched from a pair in front to junk in front. I'm guessing that they either they also have strong fronts/weak fronts in sync with A's strategy for the example hand, or that they are either weak enough or strong enough that B's choice doesn't make a large difference.

Perhaps running a larger sample will eliminate this effect--- it's certainly possible. (10K shows more convergence toward a zero value but still about 2000 hands switching strategies every iteration.)

I'm somewhat surprised that the switch in strength is mainly between front and back, not middle and back as I had expected. I'm trying to build tools to better understand what is going on here (but it's only relevant if larger experiments also show cyclic behavior--- if they converge the matter is moot.)

The reason, in this limited experiment, that there is no "strongest, most balanced hand" is because your opponent has sufficient flexibility to alter the value of your hands' arrangements. Putting aces in back might seem like the strongest choice against "typical" choices by your opponent. But if your opponent knows you are likely to have aces (because he doesn't) and that you will often put them in back, he can take advantage of that. The value of an arrangement is not fixed, after all, any more than the value of AA is fixed in Hold'em.

MarkGritter · #9 04-30-2007, 12:38 AM

[ QUOTE ]

In this program, how much information does A have about B's range of hands and vice versa? If he knows a lot, like tendencies (strong middles, etc), then this can go back to exploiting additional information and probably doesn't have a lot to do with the game itself. Like, for instance roshambo can be played 'perfectly' (even money) by randomizing what you throw, but if you know your opponent heavily favors rock, you can win way more often than your fair share.

[/ QUOTE ]

In this experiment, each player has perfect information about the other's range of hands and strategy. That is how they calculate the best option for each of their hands--- by evaluating what the opponent may have and how we will play it.

There's no question that CP2-7 does have a game-theoretic solution. What I'm aiming at here is a somewhat smaller goal, to see if CP2-7 actually has a "fixed" best arrangement for every hand, or if the game-theoretic solution must involve randomization (or at least nonlocal thinking.)

In the smaller games explored in these experiments, either player could arrive at a stable game-theoretic strategy by randomly picking among their top choices for a hand each time (using suitable weights--- not necessary 50/50.) Then the other player would find no common tendencies to exploit.

donger · #10 04-30-2007, 02:17 AM

I think we're talking apples and oranges here. Everything I'm saying has to do with selecting a strategy facing a closed, totally random hand.

I've been trying to figure out how to determine the median 2-7 hand for this game. Any idea how to do this?