My attempt at sabermetrics research: Analyzing the sac bunt.

kyleb · #1 08-01-2007, 12:00 AM

http://baseballdelusions.com/blog/2007/0...es-basic-model/

I wrote a post on my blog about sacrifice bunts and their expected value in amateur leagues. I have long thought that if there were enough botched plays on sac bunts that sacrificing with runners on 1st/2nd with 0 outs would be the right play.

Well, I finally got off my ass and did the math, which I think is right. It has a lot of assumptions in the formula, but I think it's a good start.

(if you want a formatted version of the post, check out my blog)

------------------------------------------------

Anyone who follows Baseball Prospectus or other sabermetrically inspired baseball websites can tell you that the sacrifice bunt is usually the wrong play in games that are not close and not in the later innings. However, these arguments rest on the fact that the pitcher, catcher, and corner infielders are agile and skilled enough to convert the sacrifice bunt into a routine out. What if the amateur players mishandle a bunt more frequently than their MLB analog? What I’d like to do is see how much higher the error rate would have to be on bunted balls to make sacrifice bunting the right play.

For the purposes of this discussion, I will only be focusing on sacrificing with runners on 1st/2nd with 0 outs.

First, let’s figure out what the value of various gamestates are, We can see the amount of “expected runs” scored year-by-year, but we’ll use the years 1999-2002 as found here on TangoTiger.

Runners on 1st/2nd with 0 outs is worth: 1.573
Runners on 2nd/3rd with 1 out is worth: 1.467

If the sacrifice is executed 100% of the time with no errors, the loss of expected runs can be easily calculated like this:

Ex2nd_3rd_1 - Ex1st_2nd_0 = Value
1.467 - 1.573 = -0.106

Therefore, if the fielders can always turn the sacrifice bunt into an out at first, this play has a value of -0.106 runs.

However, what happens when the throw is botched?

Runners on 2nd/3rd with 0 outs is worth: 2.052 + 1 = 3.052

This is the most likely scenario when a sacrifice bunt is thrown away - the runners advance to 2nd/3rd on a bunt, the throw goes wild, the runner going to third scores, the runner going to second goes to third, and the guy who bunted goes to second.

Also, for the purposes of this discussion, we’re going to assume that the manager will always call for a sacrifice bunt with runners on 1st/2nd with 0 outs and not worry about other types of batted balls. Other assumptions include that the error is always a throwing error that results in 2nd/3rd + a run with 0 outs, that the sacrifice is never recorded as a base hit, and that the sacrifice never leads to a double play. That’ll be for another article - this is just the groundwork.

Let’s add in some errors, shall we? Our Error Rate modifier formula looks like this:

ExpTotRuns = Ex2nd_3rd_1(SuccessRate) - Ex1st_2nd_0 + [(ErrorRate)(Ex2nd_3rd_0 + 1)]

ErrorRate is the % of times an error is made on the play. SuccessRate is the % of times the play is successfully executed.

Expected Total Runs equals Expected runs with runners on 2nd/3rd with 1 out minus Expected runs with runners on 1st/2nd with 1 out (this is multiplied by the success rate) plus the sum of Error Rate times Expected runs scored with runners on 2nd/3rd and 0 outs plus 1 run.

Make sense? Try reading it a few more times before commenting for clarification.

Here’s some sample calculations:

Error rate of 0%: 1.467(1.0) - 1.573 + [(0)(2.052 + 1)] = -0.106 runs (that third variable is 0)
Error rate of 1%: 1.467(0.99) - 1.573 + [(0.01)(2.052 + 1)] = -0.09015 runs
Error rate of 5%: 1.467(0.95) - 1.573 + [(0.05)(2.052 + 1)] = -0.0239 runs

Ah-hah, getting closer! Of course, we can simply change the formula around to solve for the end result we want…

Break Even Point (careful, 9th grade algebra ahead!):
ExpTotRuns = Ex2nd_3rd_1(SuccessRate) - Ex1st_2nd_0 + [(ErrorRate)(Ex2nd_3rd_0 + 1)]
0 runs = 1.467(1.0-x) - 1.573 + [(x)(2.052 + 1)]
0 = 1.467 - 1.467x - 1.573 + 2.052x + 1x
0 = 1.585x - 0.106
0.106 = 1.585x
x = 0.06688

It seems as though the breakeven error rate is about 6.69%.

Do I think that amateur players in the leagues I play in commit overthrowing errors on sacrifice bunts at least 7% of the time? Yes, I do.

Conclusions:

First of all, I think my math is right, though I am not 100% sure on using ErrorRate and SuccessRate. I ran it by a few people and they all thought it was necessary, but it could be counting the error rate twice unnecessarily.

The work above has a lot of assumptions which makes the formula simple. We assume that the error is always an overthrow that causes a specific gamestate, we assume that the sacrifice bunt is never a hit, we assume that the runner is smart enough to go home on small overthrows, we assume that the hitter is always going to sacrifice, and that the hitter never fails to make the bunt contact. Since there are all these assumptions, the model’s not perfect (obviously). However, it’s a step in the right direction for this type of research.

Vyse · #2 08-01-2007, 12:01 AM

I hate math. But I like sabermetrics.

[censored].

Wyrm2 · #3 08-01-2007, 12:17 AM

Stupid question, but is the run expectancy table from amateur leagues? I don't think it's safe to assume that there is the same expectation as in the majors (although it doesn't change the concept, just the numbers)

kyleb · #4 08-01-2007, 12:21 AM

No, it's from 1999-2002 MLB. That's another assumption used in the process. That kind of data is not available in amateur leagues, so I have to use major league data. I used 1999-2002 because it was readily available and even if it changed some for amateur leagues, I would imagine that most of the values would change in a similar fashion (i.e. all of the values are depressed, not just some of them).

zgall1 · #5 08-01-2007, 01:12 AM

I like the start and I'm interested to see if you can remove some of the simplifying assumptions without the model becoming incredibly complicated.

JaredL · #6 08-01-2007, 02:35 AM

[ QUOTE ]
No, it's from 1999-2002 MLB. That's another assumption used in the process. That kind of data is not available in amateur leagues, so I have to use major league data. I used 1999-2002 because it was readily available and even if it changed some for amateur leagues, I would imagine that most of the values would change in a similar fashion (i.e. all of the values are depressed, not just some of them).

[/ QUOTE ]

That seems potentially quite problematic.

Say you have a runner on second. I would think that in amateur baseball that runner will score on a base hit to the outfield a much higher percentage of the time than in the big leagues due to the relative suckiness of the outfielders being greater than the relative suckiness of the baserunner. Similarly, a lot more sac flies will score a guy from third with 1 out.

Similarly, there are probably going to be fewer double plays in the amateur game. Not sure on stolen bases, I would guess that's about the same or more if anything. This will raise the ER value of runner on first - no out.

So basically all of them go up. The problem is that the effect could be stronger in the runner on second - 1 out than the runner on first no out which would tip it so that bunting looks worse in your figures than in real life. Similarly, it could be the other way around and your results could be biased in favor of bunting.

Also, in MLB this isn't much of an issue because for the most part teams don't bunt but isn't there an endogeneity problem? What I mean is that teams often bunt when it's runner on first (or second) nobody out. The results of bunting are then built into the equation which determines the EV of bunting.

So say bunting has a slightly negative effect and everybody does it. Say for example, the true ER of runner on first no outs without bunting is 1.2 and if you always bunted in that spot then the average ER of runner on first no outs is 1.0. Say that runner on second 1 out is 1.1. In this situation if you have a data set where everybody bunts you will say that the team that bunts gains .1 ER by sacrificing because they moved from 1.0 to 1.1.

So the point is that to do this analysis perfectly, you would need to compare the ER of man on first (2nd) - no outs if the team doesn't bunt to man on second (3rd) - one out (or man on first - no outs always going to bunt with next man would be even better). You can't really get this info as bunting is always legal and considering only situations where the cleanup hitter is up or there are two strikes on the batter will bias the sample. Maybe use teams that basically never bunt as the data points? This would be problematic as well.

JaredL · #7 08-01-2007, 02:46 AM

BTW in case it wasn't clear it could go the other way. Say bunting is good and all teams do it. If you you had an ER of 1.0 in the never bunt situation, 1.2 if you always bunt, 1.1 with a man on second - 1 out then you will conclude that bunting is bad because you lose .1.

This type of thing could easily happen if bunting itself leads to things like errors as you describe and that outweighs strikeouts and double plays.

kyleb · #8 08-01-2007, 02:49 AM

[ QUOTE ]
So basically all of them go up. The problem is that the effect could be stronger in the runner on second - 1 out than the runner on first no out which would tip it so that bunting looks worse in your figures than in real life. Similarly, it could be the other way around and your results could be biased in favor of bunting.

[/ QUOTE ]

Entirely possible, but I think the values will be strongly correlated for the most part, i.e. they all go up, like you said.

As for the rest of this:

[ QUOTE ]
Also, in MLB this isn't much of an issue because for the most part teams don't bunt but isn't there an endogeneity problem? What I mean is that teams often bunt when it's runner on first (or second) nobody out. The results of bunting are then built into the equation which determines the EV of bunting.

So say bunting has a slightly negative effect and everybody does it. Say for example, the true ER of runner on first no outs without bunting is 1.2 and if you always bunted in that spot then the average ER of runner on first no outs is 1.0. Say that runner on second 1 out is 1.1. In this situation if you have a data set where everybody bunts you will say that the team that bunts gains .1 ER by sacrificing because they moved from 1.0 to 1.1.

So the point is that to do this analysis perfectly, you would need to compare the ER of man on first (2nd) - no outs if the team doesn't bunt to man on second (3rd) - one out (or man on first - no outs always going to bunt with next man would be even better). You can't really get this info as bunting is always legal and considering only situations where the cleanup hitter is up or there are two strikes on the batter will bias the sample. Maybe use teams that basically never bunt as the data points? This would be problematic as well.

[/ QUOTE ]

I actually understand none of it. I've read it several times over and it makes no sense to me. Can you clarify?

If you're trying to say the dataset of the TangoTiger RunEx chart somehow already factors in bunts, that's not really relevant. How they reach those gamestates is not important. Again, I'm not sure if that's what you're saying, because I really don't understand anything you said.

JaredL · #9 08-01-2007, 03:27 AM

[ QUOTE ]

If you're trying to say the dataset of the TangoTiger RunEx chart somehow already factors in bunts, that's not really relevant.

[/ QUOTE ]

Sorry for not explaining it clearly. I'm going to bed soon as I'm moving tomorrow but it potentially matters a lot. Hopefully I can get my point across.

The best way to explain my point is (hopefully) to consider the question. Actually, I'll go to the simpler question and assume we're not thinking about errors and so forth. What you are asking is "what effect will having this guy make a perfect sacrifice or swing away have on my team's expected runs this inning?" The way you do is to compare the number of expected runs a team has from having a dude on first and no outs with having a dude on second with one out.

The problem here is that a bunch of your guy on first no out data consists of times when the batter made a perfect sacrifice bunt. Therefore, it is not representative of the don't go for the sacrifice strategy. It is, in fact, representative of the "do what the average guy does in this spot" strategy.

Look at the formula. You use
ER runner on first no out - ER runner on 2nd 1 out = ER lost to sacrificing. However, you, me and everyone who's ever read anything about bunting in MLB notes that bunting is -ER. Since teams sometimes bunt "ER runner on first no out" is going to be lower than it would be if teams never bunted. Since we want to be comparing what happens if we don't bunt to what happens if we make a perfect sacrifice this difference will be off from what we actually want. In this case it will be too small.

kyleb · #10 08-01-2007, 03:55 AM

[ QUOTE ]
ER runner on first no out - ER runner on 2nd 1 out = ER lost to sacrificing.

[/ QUOTE ]

This isn't the formula I use, fwiw. It's ExpRuns 2nd/3rd 1 out - ExpRuns 1st/2nd 0 outs. A lot of managers will bunt in this situation.

I think I'm seeing your point, though - that ExpRuns with a runner on 1st/2nd with 0 outs embodies the sacrifice bunt already based on the chart, so comparing the two situations is not analogous. That being said, I disagree with what you're saying - we're just comparing two gamestates and I really don't see how that affects my analysis at all (or at least in any meaningful sense).