|
#1
|
|||
|
|||
Results of a CP2-7 experiment
I wrote some code which, given a list of hands, and a list of opponent Chinese Poker/2-7 settings, optimizes the hands to maximize value against the specified opponent strategy. It does this by picking the setting with the highest value, on average, against all the relevant settings in the list.
I've started by running a small experiment with just 1000 possible hands for each player. (In reality there are about 635 billion if we distinguish suits.) Player 'A' gets 1000 hands and player 'B' gets 1000 hands, initially set up randomly and possibly illegally. Player 'A' goes first, and based on B's random settings, constructs his hands to win the most. Call this strategy A1. Then player B constructs a strategy, B1, that tries to beat A1. B doesn't know which cards A has in any given confrontation, but he knows how A1 will set the possible opposing hands. (Because A and B are dealt from the same deck, not all the settings are revelant to a particular hand--- in fact, given a specific A hand, only 1% of the B hands are possible, which gives us about 10 possible opponents hands on average.) Then go on to make strategies A2, B2, etc. Now, if there is a 'pure' solution for CP2-7, we should expect A and B to converge to an expected value close to 0. One of them might have more good hands, of course, but they should at least agree on how much this "subset" of CP is worth. It should be slight positive for one of them and slightly negative for one of them. This does not appear to be the case. Here are the estimated values for each strategy when competing against the previous one: A1 estimated EV: $3.834 (at $1/point) vs. random B B1 estimated EV: $1.522 vs. A1 (1000 changed hands) A2: $0.300 vs. B1 (884 changed hands) B2: $0.262 vs. A2 (635 changed hands) A3: $0.125 vs. B2 (564 changed hands) B3: $0.277 vs. A3 (543 changed hands) A4: $0.136 vs. B3 (562 changed hands) B4: $0.258 vs. A4 (539 changed hands) A5: $0.135 vs. B4 (581 changed hands) B5: $0.295 vs. A5 (557 changed hands) A6: $0.138 vs. B5 (592 changed hands) B6: $0.273 vs. A6 (566 changed hands) A7: $0.147 vs. B6 (594 changed hands) B7: $0.298 vs. A7 (573 changed hands) A8: $0.145 vs. B7 (592 changed hands) Note that whichever player knows the other's pure strategy can pick a counter-strategy that provides him with a substantial positive expectation. In fact, this strategy shows that hands which have a fixed best strategy appear to be the minority. Here are some hands I traced that switch back and forth: <font class="small">Code:</font><hr /><pre> A1: 6dJcKcKhKs 2c4s5d6s8c QcQhQs A2: 6d6sQcQhQs 2c4s5d8cJc KcKhKs A3: same A4: QcQhQsKcKh 2c4s5d6d8c 6sJcKs A5: 6d6sQcQhQs 2c4s5d8cJc KcKhKs (= A2) B1: 2h3d4s5s6d 3s6h7h9sTd QdKsAh B2: 3d3s6d9sTd 2h4s5s6h7h QdKsAh B3: 3s4s5s9sKs 2h3d6d7hTd 6hQdAh B4: 3d3s6d9sTd 2h4s5s6h7h QdKsAh (= B2) B5: 3s4s5s9sKs 2h3d6d7hTd 6hQdAh (= B3) A2: 2s5s8sTsJs 2c5c6d8h9c ThKhAh A3: 2c2s5c8h8s 5s6d9cJsKh ThTsAh A4: 2s5s8sTsJs 2c5c6d8h9c ThKhAh (= A2) A5: 2c2s5c8h8s 5s6d9cJsKh ThTsAh (= A3) A6: 2s5s8sTsJs 2c5c6d8h9c ThKhAh (= A2) </pre><hr /> Caveats: 1. My code could have a bug. I've eyeballed its decisions and they seem to make sense. I am willing to make it available for review. 2. Things might change between 2000 hands and 635 billion. I didn't construct the sample of hands using any special method, so it is unlikely that I happened to just "get lucky", but it is possible that my sample is not large enough to exhibit enough smoothness. (Or small enough to contain many degenerate cases.) I will try to run larger samples now that I have some confidence that it works. 3. A good pure strategy might exist but not be reachable using just local optimization. You might be able to globally optimize using game theory, making sub-maximal choices for some hands to prevent being exploited by your opponent. (However, note that most game-theoretic answers involve non-pure strategies.) I am willing to provide my data set to anybody who wants to try constructing a superior pure strategy. However, I think that this result strongly argues that a "consider all the possibilities for these 13 cards and select the best" strategy is exploitable. 4. I may not be patient enough. Perhaps the process will arrive at a pure strategy after many, many iterations. I think that is unlikely given that most hands seem to be running in relatively short cycles, and there does not seem to be any progress toward convergence--- but it's a possibility. |
#2
|
|||
|
|||
Update #1
Andrew Prock thinks I need to run on bigger samples, because each individual hand is directly compared against such a small number of other hands. I agree.
I made a modification to the code to prefer re-using the previous strategy's had if it is just as good as any other that has been found. This reduces the number of hands that get swapped back and forth somewhat. I am running a 10,000-hand experiment and rerunning the 1,000-hand experiment for longer to see whether I can identify a cycle or see evidence of convergence. (However, if there is a cycle it may be fairly long.) |
#3
|
|||
|
|||
Re: Update #1
I've been following this with great interest. What a great project!
Then player B constructs a strategy, B1, that tries to beat A1. B doesn't know which cards A has in any given confrontation, but he knows how A1 will set the possible opposing hands. How does this work? Does player B oterate all the possible settings for Player A's hands, and then index an arbitrary "ballance" for the setting? (Because A and B are dealt from the same deck, not all the settings are revelant to a particular hand--- in fact, given a specific A hand, only 1% of the B hands are possible, which gives us about 10 possible opponents hands on average.) By the "same" deck, do you mean that both hands are coming from a 52-card deck, or is A's from a 52-card deck and B's from a 39-card deck? |
#4
|
|||
|
|||
Re: Update #1
[ QUOTE ]
Then player B constructs a strategy, B1, that tries to beat A1. B doesn't know which cards A has in any given confrontation, but he knows how A1 will set the possible opposing hands. How does this work? Does player B oterate all the possible settings for Player A's hands, and then index an arbitrary "ballance" for the setting? [/ QUOTE ] B examines each of his hands one by one. He looks at all possible arrangements of his hand, and how they do against the fixed layouts described in A1. He chooses the arrangement which has the highest expected value against that hand distribution. (For example, if A1 has very strong middles, B might choose to abandon the middle entirely.) Here's an example from the current run: B's hand is 2s3c3d5h6h6s7c7s8d9cJcJhAs. The hands and arrangements in A's latest strategy that don't contain one of these cards are: 9h9sKcKhKs 2d3h4h6d7h ThQdAh 9d9hAcAdAh 2d4s5d6c9s TsJsKh 5c5dTcThTs 2h3s4c6c8h KdKsAc 8h9dTdJsQd 2d4d5d8sTs 4hAdAh TdThQcQhQs 2h3h4c5c9d 3sKhKs 4d6dQdKdAd 2c4h5s6c7h 9sTsJs 5c5d5s9d9s 2c3h4c6d8h ThKhAc QcQdKcKdKs 2d4d5dJsAc 9d9h9s 2c4c6cTcKc 2d3h4s5d8c 3sThKd 6cJdJsKcKh 2h4d5d6d9s QdAcAh TcThQdQhQs 2c3s5d6c9d 6d9hKd TcJdQcKhAc 2d3h4d7h8h 9dThTs 4d6dJdQdAd 2h5c6c8s9h 4h4sTh TcThTsKcKd 3s5d8hJdQh 4d4h4s 4dTcTdThTs 2c2d4h8h9s QdQhAc 4c4d4sTdTh 2d3s8c9dQh 9sAcAh 4dJdQhQsKh 2d3s4s5c9h 2h8c8s 8hTsJdKcKd 2c3h5c6d7d QcQhAc 7dJdQdKdAd 3h5c8h9hTs 5sKcKh etc. (113 possible hands out of 10000) All are treated as equally likely. Here are some of B's possibilities, with their EV (in points) against these 113 hands listed. -1.51327 3c3d6s7c7s 2s5h6h8d9c JcJhAs -1.48673 3c7c9cJcJh 2s3d5h7s8d 6h6sAs -1.38053 6h6s7c7sAs 2s3c5h8d9c 3dJcJh -1.35398 3c6h6s7c7s 2s3d5h8d9c JcJhAs -1.32743 3c6h9cJcJh 2s3d5h6s8d 7c7sAs So, B picks 367JJ 24568 77A as his arrangement. (However, in the previous iteration he picked 36677 23489 JJA.) [ QUOTE ] (Because A and B are dealt from the same deck, not all the settings are revelant to a particular hand--- in fact, given a specific A hand, only 1% of the B hands are possible, which gives us about 10 possible opponents hands on average.) By the "same" deck, do you mean that both hands are coming from a 52-card deck, or is A's from a 52-card deck and B's from a 39-card deck? [/ QUOTE ] The sets A and B are coming from a 52-card deck. I generated them separately. So, when A is considering his options, only a fraction of B's hands are immediately relevant, and similarly for B. The rest of the hands have at least one card in common and so could not occur. It would be possible to generate them at the same time to ensure that every A hand had at least one relevant B hand, but this is not necessary in practice. |
#5
|
|||
|
|||
Re: Update #1
[ QUOTE ]
Here's an example from the current run: B's hand is 2s3c3d5h6h6s7c7s8d9cJcJhAs. The hands and arrangements in A's latest strategy that don't contain one of these cards are: 9h9sKcKhKs 2d3h4h6d7h ThQdAh 9d9hAcAdAh 2d4s5d6c9s TsJsKh 5c5dTcThTs 2h3s4c6c8h KdKsAc 8h9dTdJsQd 2d4d5d8sTs 4hAdAh TdThQcQhQs 2h3h4c5c9d 3sKhKs 4d6dQdKdAd 2c4h5s6c7h 9sTsJs 5c5d5s9d9s 2c3h4c6d8h ThKhAc QcQdKcKdKs 2d4d5dJsAc 9d9h9s 2c4c6cTcKc 2d3h4s5d8c 3sThKd 6cJdJsKcKh 2h4d5d6d9s QdAcAh TcThQdQhQs 2c3s5d6c9d 6d9hKd TcJdQcKhAc 2d3h4d7h8h 9dThTs 4d6dJdQdAd 2h5c6c8s9h 4h4sTh TcThTsKcKd 3s5d8hJdQh 4d4h4s 4dTcTdThTs 2c2d4h8h9s QdQhAc 4c4d4sTdTh 2d3s8c9dQh 9sAcAh 4dJdQhQsKh 2d3s4s5c9h 2h8c8s 8hTsJdKcKd 2c3h5c6d7d QcQhAc 7dJdQdKdAd 3h5c8h9hTs 5sKcKh etc. (113 possible hands out of 10000) All are treated as equally likely. [/ QUOTE ] As we proceed through iterations of A1,A2,A3... and B1,B2,B3..., does B keep playing this same hand against the same 113 hands being played by A? |
#6
|
|||
|
|||
Re: Update #1
[ QUOTE ]
[ QUOTE ] Here's an example from the current run: B's hand is 2s3c3d5h6h6s7c7s8d9cJcJhAs. The hands and arrangements in A's latest strategy that don't contain one of these cards are: 9h9sKcKhKs 2d3h4h6d7h ThQdAh 9d9hAcAdAh 2d4s5d6c9s TsJsKh 5c5dTcThTs 2h3s4c6c8h KdKsAc 8h9dTdJsQd 2d4d5d8sTs 4hAdAh TdThQcQhQs 2h3h4c5c9d 3sKhKs 4d6dQdKdAd 2c4h5s6c7h 9sTsJs 5c5d5s9d9s 2c3h4c6d8h ThKhAc QcQdKcKdKs 2d4d5dJsAc 9d9h9s 2c4c6cTcKc 2d3h4s5d8c 3sThKd 6cJdJsKcKh 2h4d5d6d9s QdAcAh TcThQdQhQs 2c3s5d6c9d 6d9hKd TcJdQcKhAc 2d3h4d7h8h 9dThTs 4d6dJdQdAd 2h5c6c8s9h 4h4sTh TcThTsKcKd 3s5d8hJdQh 4d4h4s 4dTcTdThTs 2c2d4h8h9s QdQhAc 4c4d4sTdTh 2d3s8c9dQh 9sAcAh 4dJdQhQsKh 2d3s4s5c9h 2h8c8s 8hTsJdKcKd 2c3h5c6d7d QcQhAc 7dJdQdKdAd 3h5c8h9hTs 5sKcKh etc. (113 possible hands out of 10000) All are treated as equally likely. [/ QUOTE ] As we proceed through iterations of A1,A2,A3... and B1,B2,B3..., does B keep playing this same hand against the same 113 hands being played by A? [/ QUOTE ] Yes, the set of 13-card hands being played does not change. Whnever B considers how to play 2s3c3d5h6h6s7c7s8d9cJcJhAs, he is up against the same 113 opposing hands. Only how they have been set up may have changed. (I've actually thought about changing the experiment a bit to add a new new hands on every iteration but I concluded it wouldn't really add anything, compared with just doing a larger run.) |
#7
|
|||
|
|||
Re: Results of a CP2-7 experiment
This is really interesting, thanks for doing this. I had considered trying to script something, but I'm really rusty and was never good to start with.
In this program, how much information does A have about B's range of hands and vice versa? If he knows a lot, like tendencies (strong middles, etc), then this can go back to exploiting additional information and probably doesn't have a lot to do with the game itself. Like, for instance roshambo can be played 'perfectly' (even money) by randomizing what you throw, but if you know your opponent heavily favors rock, you can win way more often than your fair share. I don't know a ton about game theory, etc, just what I've learned from poker, so take everything I'm saying with a grain of salt. I could easily be wrong about all this, but it's generated a lot of interesting discussion. |
#8
|
|||
|
|||
Re: Results of a CP2-7 experiment
i wish i were high so that this would blow my mind... but wait.. it still does.
|
#9
|
|||
|
|||
Re: Results of a CP2-7 experiment
[ QUOTE ]
In this program, how much information does A have about B's range of hands and vice versa? If he knows a lot, like tendencies (strong middles, etc), then this can go back to exploiting additional information and probably doesn't have a lot to do with the game itself. Like, for instance roshambo can be played 'perfectly' (even money) by randomizing what you throw, but if you know your opponent heavily favors rock, you can win way more often than your fair share. [/ QUOTE ] In this experiment, each player has perfect information about the other's range of hands and strategy. That is how they calculate the best option for each of their hands--- by evaluating what the opponent may have and how we will play it. There's no question that CP2-7 does have a game-theoretic solution. What I'm aiming at here is a somewhat smaller goal, to see if CP2-7 actually has a "fixed" best arrangement for every hand, or if the game-theoretic solution must involve randomization (or at least nonlocal thinking.) In the smaller games explored in these experiments, either player could arrive at a stable game-theoretic strategy by randomly picking among their top choices for a hand each time (using suitable weights--- not necessary 50/50.) Then the other player would find no common tendencies to exploit. |
#10
|
|||
|
|||
Re: Results of a CP2-7 experiment
I think we're talking apples and oranges here. Everything I'm saying has to do with selecting a strategy facing a closed, totally random hand.
I've been trying to figure out how to determine the median 2-7 hand for this game. Any idea how to do this? |
Thread Tools | |
Display Modes | |
|
|