Fictitious play for multi-player games

jukofyork · #1 11-15-2007, 12:41 AM

If fictitious play is used to compute a NE in a multi-player game where it is possible for a player to "spite" another (such as in SNGs), then is it correct to assume that each player will attempt to only rationally maximise their own EV for the update rule?

This would mean that you would roughly use this update algorithm:

1. Init strategy to something arbitrary (A).
2. Find the maximal exploitative strategy to A (B).
3. Find the maximal exploitative strategy to a player who plays A or B each with a 50% chance of being selected (C).
4. Find the maximal exploitative strategy to a player who plays A or B or C each with a 33.3% chance of being selected (D).
5. Find the maximal exploitative strategy to a player who plays A or B or C or D each with a 25% chance of being selected (E).
.
.
.
N. Stop when no further exploitative strategy can be found against the strategy collection (or until some reasonable exploitative EV threshold is reached).

This is the basic idea that the ICM Nash Calculator is using.

But, by just assuming that each player will attempt to rationally maximise their own EV seems to ignore the fact that they could also gain EV by another method: You could give up some of your own EV to cost an opponent even more which in turn would force the opponent to change their strategy possibly meaning that you gain EV by threatening to give up some (ie: by partially minimizing your opponent's EV).

This seems to fly in the face of what I know about NE states though, as it shouldn't be possible for a player to deviate profitably from a NE (assuming the above algorithm really does converge to a NE).

Is the reasoning flawed here somewhere? If so, can you think of a simple game as an example?

Is the above algorithm flawed? If so, then what alteration would be required for the update rule?

This post is related to a post in the STT forum which discusses taking a -EV play now to force your opponent to alter his strategy so as to possibly gain more EV in the future. The reason this might be correct is that the play only costs you ~$30, yet costs your opponent ~$300:

http://forumserver.twoplustwo.com/sh...age=0&vc=1

This led me to wondering what is the correct method to compute a NE where it's possible to "spite" an opponent like this:

http://forumserver.twoplustwo.com/sh...age=0&vc=1

Juk [img]/images/graemlins/smile.gif[/img]

plexiq · #2 11-15-2007, 05:43 AM

[ QUOTE ]
This seems to fly in the face of what I know about NE states though, as it shouldn't be possible for a player to deviate profitably from a NE (assuming the above algorithm really does converge to a NE).

[/ QUOTE ]

Well, the definition of a NE is still satisfied. No player can unilaterally deviate from the NE to gain value. Thats true in your example - if only 1 player deviates, he cant improve.

If you assume bot-like players who wont deviate from the NE no matter what happens, then your best choice is to play the NE as well.

That said, i think the problem you describe is inherent in the definition of the NE. It doesnt really matter what algorithm we would use to find (or approximate) the NE.

Some kind of "raw" first idea, didnt really think it through yet:
Instead of optimizing the "current" equity (ie, playing maximally exploitative), each player tries to "drag" the strategies in a direction that will give him better equity than the current state - but only as long as his deviation from maximally exploitative play costs the respective opponents more EV than him.

This should converge to a more "robust" set of strategies. But then, these strategies will be easily exploitable by opponents who simply skip their "spite calls".

This gets pretty interesting if you think about it. If we draw random players from a population of 50% NE, and 50% "spite callers" and put them into a game, the spite caller population would have a higher expectation in this game, i think.

Need to think it through before posting any more. I hope the above makes any sense, lol.

The 13th 4postle · #3 11-15-2007, 12:25 PM

Since poker is a mixed strategy game. There will be multiple NE. One set of decisions is not the right play but mixing up your strategy is more profitable because it is a repeated game.

Anything that makes your opponent play differently, that you can take advantage of in the future is optimal and if you are able to do it you should. However, that's harder to do online then live.

trojanrabbit · #4 11-15-2007, 01:19 PM

I think the difference lies in the definition of the "game." Fictitious play will work (I've used it) if you assume the current hand is a one-shot deal. There are no more interactions after the current hand. However if you extend the definition of the game to cover multiple hands then it gets a lot more complicated.

A perfect example is when there is a big stack bullying the table near the bubble. Nash says the big stack should raise almost every hand and the small stacks should almost always fold. However just being in this situation is -EV for the small stacks. It would be in a small stacks long-term interest to call more liberally and punish the raiser. This will attempt to get the big stack to stop his bullying. But you have to take a -EV move now in order to try and stop being in continually -EV situations in the future.

But that would be way too complicated to figure out with a computer...

Tysen

jukofyork · #5 11-15-2007, 01:39 PM

[ QUOTE ]
Some kind of "raw" first idea, didnt really think it through yet:
Instead of optimizing the "current" equity (ie, playing maximally exploitative), each player tries to "drag" the strategies in a direction that will give him better equity than the current state - but only as long as his deviation from maximally exploitative play costs the respective opponents more EV than him.

[/ QUOTE ]
Yep, this is what I was thinking, but "dragging" the values could be very computationally expensive to try. The basic idea would be to somehow "drag" your own strategy into the space where it is -EV for you and see how that effects your opponents maximally exploitative strategy. The current update rule never considers these -EV calls.

Perhaps rather than "dragging" this could be accomplished by some kind of recursive update rule which is about order O(n) more complex? One idea would be to find the gradient of EV change for you for each variable of the strategy and then update your strategy variable by moving in the direction which increases EV for you (as opposed to updating it based on whether it is +EV or -EV for you to play against the current opposing strategy).

I've still not thought about this much yet so the idea might be flawed or there might be a much simpler way to combine the maximally exploitative strategy with the maximally spiteful strategy and update the rules based on both.

[ QUOTE ]
This should converge to a more "robust" set of strategies. But then, these strategies will be easily exploitable by opponents who simply skip their "spite calls".

[/ QUOTE ]
I don't think it could really be exploited, as it's the threat of the spite calls more than the calls itself that's important. The equilibrium should mean that if player A deviates by not spite calling player B anymore then player B won't be making the pushes that are punished by the spite calls anyway so nothing has changed. If the player B decides to push these anyway knowing that he'll be spite called then he's just made his strategy -EV compared to if he respected the player A's spite calls.

[ QUOTE ]
This gets pretty interesting if you think about it. If we draw random players from a population of 50% NE, and 50% "spite callers" and put them into a game, the spite caller population would have a higher expectation in this game, i think.

[/ QUOTE ]
That's quite interesting and would make an interesting experiment. What would happen if you tried to train up a maximally exploitative strategy to play against this mixed NE/spite player? Perhaps this would be a more robust strategy than NE alone?

[ QUOTE ]
Need to think it through before posting any more. I hope the above makes any sense, lol.

[/ QUOTE ]
Yep, some of my ideas might be totally off here too - I've just woke up and not really thought too carefully about all this yet, but overall it makes for some interesting thinking!

Juk [img]/images/graemlins/smile.gif[/img]

jukofyork · #6 11-15-2007, 02:08 PM

[ QUOTE ]
I think the difference lies in the definition of the "game." Fictitious play will work (I've used it) if you assume the current hand is a one-shot deal. There are no more interactions after the current hand. However if you extend the definition of the game to cover multiple hands then it gets a lot more complicated.

A perfect example is when there is a big stack bullying the table near the bubble. Nash says the big stack should raise almost every hand and the small stacks should almost always fold. However just being in this situation is -EV for the small stacks. It would be in a small stacks long-term interest to call more liberally and punish the raiser. This will attempt to get the big stack to stop his bullying. But you have to take a -EV move now in order to try and stop being in continually -EV situations in the future.

But that would be way too complicated to figure out with a computer...

[/ QUOTE ]
Yep, I guess this would require expanding the game tree out to be able to see the blinds moving and the big stack getting into more and more +EV bullying situations. Perhaps it could be expanded into the next hand (or even next few hands) and still be computationally tractable? Not sure how much better the solutions would be though.

Juk [img]/images/graemlins/smile.gif[/img]

plexiq · #7 11-20-2007, 03:24 PM

Kind of forgot about this thread, sorry [img]/images/graemlins/smile.gif[/img]

[ QUOTE ]
A perfect example is when there is a big stack bullying the table near the bubble. Nash says the big stack should raise almost every hand and the small stacks should almost always fold. However just being in this situation is -EV for the small stacks. It would be in a small stacks long-term interest to call more liberally and punish the raiser. This will attempt to get the big stack to stop his bullying. But you have to take a -EV move now in order to try and stop being in continually -EV situations in the future.

But that would be way too complicated to figure out with a computer...

Tysen

[/ QUOTE ]

I think the example is actually mixing in a different problem.

One part of the problem you describe boils down to flaws of ICM. ICM overestimates midstack-equities at the bubble, and underestimates bigstack equity. If we had access to a better EQ-estimation, midstacks would automatically call wider, because relative equities of folding/busting/doubling up would change.

With ICM we have lots of scenarios where players are expected to win/lose equity during the next orbit. This should never be the case with an accurate EQ model.

As i understand it, thats to be seen "separated" from our original problem: That the NE is usually a bad state for the caller, because he is actually in the position to "force" the pusher into a more favorable state. I think this is a problem with the NE altogether. Maybe i can think of some toy game to better demonstrate my though,...(hopefully i wont forget about the thread, again [img]/images/graemlins/laugh.gif[/img])

plexiq · #8 11-21-2007, 10:01 AM

Ok, here is a toy game featuring a "spite-calling" situation:

Basic game is the same as in Math of Poker, pg 127.

*) Every player is dealt a hand in [0...1].
*) SB ("Pusher") can push or fold
*) If SB pushes, BB ("Caller") can call or fold.
*) If there is a showdown, the player with the higher hand has 2/3 pot equity.

We use stacks of 5BB (SB=0.5, BB=1).

So far, thats just "normal" HeadsUp - and there s no possible spite-calling. After all, we are still in a zero-sum game atm. The NE for this "base game" is: Pusher: 70%, Caller: 56%.

Now lets add the possibility to "spite call":
We now change the game, such that the players will convert their stacks to money after the game, and the players goal is to optimize their $EV. However, the conversion is non-linear. Their stack will be converted to money by payout(chips)=sqrt(chips). (Any strictly growing function will do, as long as it grows "slower than linear". Sqrt is an arbitrary choice.)

This models to some degree the situation of an SNG, because doubling up in chips will now be worth less than double $.

In this modified game, the NE would be:
Pusher: Top 100%, Caller: 8.6%.

We are only 5BB deep, and NE suggests that BB is only calling 8.6% against an ATC push. Alright so far [img]/images/graemlins/smile.gif[/img]

In the plot you can see that the Caller can deal huge "EV-damage" to the pusher, by sacrificing very little EV himself. I think that the NE is unsuitable in this situation, because the caller could clearly "force" the pusher into a more favorable state.

Paxinor · #9 11-21-2007, 10:41 AM

to simulate a sit n go properly, wouldn't it be suitable to create a game where the sum of $EV is always the same? i mean this is the crucial point, because it needs to be a zero sum game! and ICM is a zero sum game too...

plexiq · #10 11-21-2007, 10:54 AM

What we want to simulate here, is the SB-vs-BB "subgame", after n other players folded. ICM isnt zero sum in this situation (if we only look at the involved players).

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode