Fictitious play for multi-player games - Page 2

Paxinor · #11 11-21-2007, 12:16 PM

ah ok know i get it...

well this is why multiplayer games suck!

basicly the other guy acts irrational... he is punishing you (and himself) and rewarding the folders so acutally de NE still holds because he is acutally giving up an edge...

so i think this is pretty much a case of implicit collusion because he has the ability to punish you...

you can deviate from the NE to gain upper hand again but then it goes into that "he thinks that i think" game...

the point of it is though that he cannot use it to gain more EV! he is giving it up no matter what you do, but its also costing you more than him. he basicly has the power to move your EV to others

i think this is an important lesson that you can put yourself into a position where you cannot win just because another player wants it to be like that

maybe one should calculate the version if you thighten up a bit and the other is max. exploiting you and compare your EV to the one where he is spite-calling you

my intuition says that your EV if adjusting and beeing maximaly exploited is lesser than if you get spite-called

jukofyork · #12 11-21-2007, 12:27 PM

[ QUOTE ]
Ok, here is a toy game featuring a "spite-calling" situation:

Basic game is the same as in Math of Poker, pg 127.

*) Every player is dealt a hand in [0...1].
*) SB ("Pusher") can push or fold
*) If SB pushes, BB ("Caller") can call or fold.
*) If there is a showdown, the player with the higher hand has 2/3 pot equity.

We use stacks of 5BB (SB=0.5, BB=1).

So far, thats just "normal" HeadsUp - and there s no possible spite-calling. After all, we are still in a zero-sum game atm. The NE for this "base game" is: Pusher: 70%, Caller: 56%.

Now lets add the possibility to "spite call":
We now change the game, such that the players will convert their stacks to money after the game, and the players goal is to optimize their $EV. However, the conversion is non-linear. Their stack will be converted to money by payout(chips)=sqrt(chips). (Any strictly growing function will do, as long as it grows "slower than linear". Sqrt is an arbitrary choice.)

This models to some degree the situation of an SNG, because doubling up in chips will now be worth less than double $.

In this modified game, the NE would be:
Pusher: Top 100%, Caller: 8.6%.

We are only 5BB deep, and NE suggests that BB is only calling 8.6% against an ATC push. Alright so far [img]/images/graemlins/smile.gif[/img]

In the plot you can see that the Caller can deal huge "EV-damage" to the pusher, by sacrificing very little EV himself. I think that the NE is unsuitable in this situation, because the caller could clearly "force" the pusher into a more favorable state.

[/ QUOTE ]
Great post! If you have the code at hand and it's easy to edit, could you try something:

Iterate through each call range from 0.0 to 1.0 in 0.01 graduations and find the best-response strategy for the pusher along with the EV for the caller vs this best-response strategy. Then plot the EVs for each call range and also find the optimal "spite calling equilibrium" range for the caller.

How does the optimal "spite calling equilibrium" calling/pushing ranges compare to the NE calling/pushing ranges? How do the EV's compare for both players?

Juk [img]/images/graemlins/smile.gif[/img]

plexiq · #13 11-21-2007, 12:29 PM

@Paxinor
Well, the Caller can not only punish the loose pusher. If we assume a non-robotic Pusher, who will adjust to the loose calling, the Callers EQ will in fact significantly increase.

In the example, if BB is calling w/ 25% instead of 8.5%, he is giving up $EV of 0.013, while the Pusher loses almost about 15 times as much.

If the Pusher correctly adjust to this wider call range, he can only Push 23.5%. In this new state, the Callers $EV is 0.25 higher than the original NE state.

@juk: Posted before seeing your reply, but the above kind of answers some of your question. I can share that code, if you guys are interested, although its pretty messy [img]/images/graemlins/wink.gif[/img]

plexiq · #14 11-21-2007, 01:42 PM

Here s the code. Possibly quite buggy, was coding this while trying to construct a toy game. (You need R to run this: www.r-project.org/ )

<font class="small">Code:</font><hr /><pre>
#our "equity transformation"
chipTrans<-function(chips=0){return(chips^0.5)}

#stack sizes
stacks = 5

#winning percentage of range1 vs range2
winPct<-function(r1=1,r2=1)
{
return((0.5*min(r1,r2)+0.66*max(r2-r1,0)+0.34*max(r1-r2,0))/max(r1,r2))
}

#EV of pusher, p..pushrange, c..callrange
EVPusher <- function(p=1,c=1)
{
return (
(1-p)*chipTrans(stacks-0.5)+
p*(
c*(
winPct(p,c)*chipTrans(2*stacks) +
(1-winPct(p,c))*chipTrans(0)
)+
(1-c)*chipTrans(stacks+1)
)
)
}

#EV of caller, p..pushrange, c..callrange
EVCaller <- function(p=1,c=1)
{
return (
(1-p)*chipTrans(stacks+0.5)+
p*(
c*(
winPct(c,p)*chipTrans(2*stacks) +
(1-winPct(c,p))*chipTrans(0)
)+
(1-c)*chipTrans(stacks-1)
)
)
}

#One iteration of fictitious play
nextIter<-function(s=c(1,1), weight=1)
{
bestcev=0
bestpev=0

bestp=-1
bestc=-1

pos = (1:1000)/1000
for(i in c(1:length(pos)))
{
thisc = EVCaller(s[1],pos[i])
thisp = EVPusher(pos[i],s[2])

if(thisc>bestcev){
bestcev=thisc
bestc=pos[i]
}

if(thisp>bestpev){
bestpev=thisp
bestp=pos[i]
}

}

sn = c(bestp,bestc)
return(weight*sn+(1-weight)*s)
}

#300 iters of fictitious play
runFictitiousPlay<-function()
{
# r[0]: push range, r[1]: call range
r=c(1,0)
for(w in 1/(1:300))
{
r=nextIter(r,w)
print(r)
}
return(r)
}

#cr = (1:1000)/1000

###plotting EQ vs variable callrange
#pev = apply(X=t(cr),FUN=EVPusher,p=r[1],MARGIN=2)
#cev = apply(X=t(cr),FUN=EVCaller,p=r[1],MARGIN=2)
#plot(cr,cev,ty='l', lty=2,ylim=c(2.2,3.5))
#lines(cr,pev)

###plotting EQ vs variable pushrange
#pev = apply(X=t(cr),FUN=EVPusher,c=r[2],MARGIN=2)
#cev = apply(X=t(cr),FUN=EVCaller,c=r[2],MARGIN=2)
#plot(cr,cev,ty='l', lty=2)
#lines(cr,pev)
</pre><hr />

trojanrabbit · #15 11-21-2007, 02:22 PM

[ QUOTE ]
If the Pusher correctly adjust to this wider call range, he can only Push 23.5%. In this new state, the Callers $EV is 0.25 higher than the original NE state.

[/ QUOTE ]
Ah, but this is deceptive. Most of the Caller's EV gain is the fact that he gets a walk 76.5% if the time. But those times where the Pusher does push, it's a different situation. Once the push is made, it's not optimal for the caller to call so often. In fact he should be calling LESS often than if the Pusher goes 100%. It's like the prisoner's dilemma. Both would be better off by cooperating, but no matter how your opponent plays, you're better off with defecting.

Tysen

pzhon · #16 11-21-2007, 02:41 PM

In case I take up playing SNGs again, I may use a +EV screen name like, "SpiteCa11er." Then I probably won't do it.

I would use a weighted average, not an equal average, but I don't think it makes much of a difference since the algoritm is so fast and can compute hundreds of iterations.

Given a 50-50 population of spite callers and a population of NE players, it is not clear that the spite callers would win on average. The spite callers will make spite calls against each other, too.

plexiq · #17 11-21-2007, 03:03 PM

@trojanrabbit
After the push is made, there is no value in calling looser (if we assume that looser calling will not effect the pushers future range) - right.

I understand the problem you describe, and its perfectly correct of course. But i still think the NE is not a good solution for "practical use" here, as its so vulnerable to deviations in the callers strategy, while varying the calling strategy a bit is basically EV-neutral.

Considering Spite-Caller vs NE population:
I think Spite callers would only perform better if they also adjust their push-range to be tighter. They give up some EV against the NE players, but perform vastly better against other spite callers. But i guess that has to be tested, maybe im missing something.

[ QUOTE ]
I would use a weighted average, not an equal average, but I don't think it makes much of a difference since the algoritm is so fast and can compute hundreds of iterations.

[/ QUOTE ]

Yeh, im using a weighted average for the Nash Calculator page. But in this case i preferred to keep things as straightforward as possible [img]/images/graemlins/smile.gif[/img]

Paxinor · #18 11-22-2007, 05:12 AM

@ plexiq: true if the pusher would adjust to his callrange then his EV would be higher but:

does the pusher acutally wants do adjust? in gametheory usually you assume that strategies are common knowledge and therefore are always exploited which is clearly not true to poker

but assuming that as soon as you thighten up your range basicly he is gonna tighten up too in his calling range. so you give him a walk more often and he also adjust to your calling range

my question is: might this be a better option for the pusher? if so he has motivation to deviate

else he is gonna be punished but basicly can do nothing about it

(this is strict game theory thinking)

now if you go to practice where strategies are acutally hidden it might acutally be good to deviate because the opposite player cannot adjust prefectly to your new strategy...

but i just wonder from a strict theoretic standpoint and i really doubt that he should deviate if the caller knows his strategy...

plexiq · #19 11-22-2007, 05:31 AM

I guess the challenge here is:
Can we model the "strategy-negotiation process" in a way, such that the Caller eventually "realizes" that its profitable for him to stay at a wider call range, given that the pusher is adopting his ranges?

Even when the pusher is pushing tighter now and switching to a tighter call range would be immediately profitably for the caller - the idea is to simply to "keep" the pusher from loosening up again, because tightening up will eventually end up in a worse state for the caller.

jukofyork · #20 11-22-2007, 11:48 PM

[ QUOTE ]
I guess the challenge here is:
Can we model the "strategy-negotiation process" in a way, such that the Caller eventually "realizes" that its profitable for him to stay at a wider call range, given that the pusher is adopting his ranges?

Even when the pusher is pushing tighter now and switching to a tighter call range would be immediately profitably for the caller - the idea is to simply to "keep" the pusher from loosening up again, because tightening up will eventually end up in a worse state for the caller.

[/ QUOTE ]
Ah, I'd forgot that the wider calling, leading to winder pushing, would end up with the caller being able to exploit by calling thinner the the original NE (I was thinking that it would just be like NE+SpiteCall for some reason). So sadly this means that their will be no "spite-call" equilibrium possible.

Juk [img]/images/graemlins/smile.gif[/img]