It’s Time to Stop Using BABIP

I originally wrote this on Amazin’ Avenue, an analytics-friendly (to say the least) Mets blog/community.  It was well received so I am submitting it for cross-posting here.

* * *

A week or so ago, the Mets award-winning television team (well, the Gary and Ron parts) started talking sabermetrics — specifically, BABIP.   They tore it a new one, and for the most part, it’s because they didn’t understand what BABIP meant, or did, or… whatever.  It doesn’t matter.

What matters is that they talked about BABIP.  Which is horrible, because they’re going to botch it 100% of the time.  And that’s our fault, not theirs.  It’s time to stop using it.

Star-divide

By itself, batting average on balls in play means nothing.   It tells us how often a player gets a hit during the at bats when he doesn’t homer or strikeout, which in and of itself is worthless.   We know better.  Gary and Ron know better.  BABIP doesn’t differentiate between lineouts and popouts.  It treats a double in the gap the same as a bloop single.  Gary and Ron know it, and they laugh at our geekiness.  We don’t care how hard a guy hits a ball.  We’re nerds and the numbers don’t tell us that.  Literally:

Gary: Conversely, if a pitcher has a particularly low batting average on balls in play, they like to tell you it’s going to rise eventually. Well, to me that doesn’t make any sense. Certain guys hit the ball harder than other guys hit it. Certain pitchers induce more groundballs or more weakly hit balls than others. That’s part of what you’re trying to do. Am I totally off base with that?

Ron: No I totally agree with you, I think that for the average hitter, to have a high average putting balls in play, it’s probably because they do have some lucky hits. But certain hitters, like [David] Wright, hit the ball hard almost all the time.

Of course, we know it too.  We measure line drive rates and stuff like that.  We have xBABIP!   Yeah, go us!  And no, we don’t differentiate between the bloop single and the gap double — well, not independent of line drive percentage, etc.  But that’s the whole point.  We’re trying to measure how lucky the batter has been.  We want to know what the batter’s expected batting average is.

So let’s just say that.  Stop with the BABIP.  Stop with the esoteric number which only means something in relation to another number (BA) and even then really needs to incorporate other numbers (e.g. LD%) to truly say what we want to say.   Let’s do this instead.

1) Call it “Expected Batting Average.”

Obviously, BABIP isn’t a player’s expected batting average.  BABIP is a tool we use to try and figure out a players xBA (ooh! I acronymifieid it!), but that’s OK.   Let’s figure out the xBA and call it xBA.

2) Explain it in words.

Start with this:

Know what the difference between hitting .250 and .300 is? It’s 25 hits. 25 hits in 500 at bats is 50 points, okay? There’s 6 months in a season, that’s about 25 weeks. That means if you get just one extra flare a week – just one – a gorp… you get a ground ball, you get a ground ball with eyes… you get a dying quail, just one more dying quail a week… and you’re in Yankee Stadium.

That makes a ton of sense.  It has to.  It’s from Bull Durham.

But you know what?  Dying quails are fluky.  They’re luck.  Ground balls with eyes, same thing.  Flares, gorps, whatever.  Luck. That’s what Crash is saying there. The difference between a .250 hitter and a .300 hitter is a little bit of luck each week.

Guys who hit the ball hard, they don’t need as much luck.  Turn those grounders into line drives and those dying quails into warning track doubles and they’re hits — to hell with luck.  Luck is for guys like Alex Cora and Gary Matthews Jr. and that guy Rick Evans or something.

We say, screw that.  Let’s look at each at bat.  If a guy hits a frozen rope that’s caught, we know that’s not his fault.  Over time, that’ll even out, and he’ll get more hits.  If a guy strikes out, that’s an out every time.  Same with a pop up.  That won’t even out.  Homers?  Always a hit.  Grounders with eyes?  Well, that’s usually an out, and that’ll even out over time to.  We look at every single at bat and ask if the guy hit the ball hard enough to “make his own luck.”  That’s xBA.

(And you know what?  At the end of the day, that’s what BABIP turns into, too.  Except that BABIP sucks, because it doesn’t actually start there, in either name or by its equation.)

3) Drop the arrogance of specificity.  Use ranges when possible.

We’re measuring luck.   Luck isn’t exact.   So we’ll never be right on the money.  You’ll never be able to find a season where a significant number of players have an xBA equal to their actual batting average.  That makes us look stupid, when in fact, we’re just being arrogant — by being so exact.

We should use ranges.  xBA should be the 50% confidence interval, not the midpoint thereof.  More made up numbers: If a guy’s xBA is .285, it’s probably better expressed by saying that it’s between .279 and .291, or whatever.  It makes that .290 BA not seem “lucky” (it really isn’t) but tells us that a .274 is really unlucky.   In other words, it does the job — without the excruciatingly nerdy exactitude we are (wrongly) associated with.

It’s our job to communicate this stuff.  It’s not their job to get smarter (they’re not dumb) or to figure it out themselves (they’re busy) or that they don’t respect us (true, but fixable).  The problem is semantic, not logical, and semantic problems can — and indeed, must — be fixed by revising our language.  It’s time to stop using BABIP.

Dan writes a daily email newsletter, “Now I Know,” which shares something interesting to learn each day.





29 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Rudy
13 years ago

About time. This chicken and egg approach to analyzing luck is useless imo. Looking at contact % (inside and especially outside the strike zone) over a period of years is a better way imo. Baseball is an inexact science. We can’t measure everything.

Schu
13 years ago

I was listening to that broadcast and it made me facepalm too 😛

Aaron Murray
13 years ago

The idea of range revolutionized the study of poker and I can see how it could really help iron out a lot of the stubborn conflicts in baseball analysis as well. I think that we all know that when a pitcher has an ERA of 4.00 while his xFIP is 3.90 that doesn’t really mean that he’s getting lucky so we lose credibility in that situation that we need when we talk about a pitcher like Dan Haren this year. Even traditionalists are willing to agree that luck plays a part in baseball but we haven’t been doing a good job of communicating what that luck factor actually could mean. What luck COULD mean is always a range, and that’s the best way to discuss it. If I was discussing Dan Haren with a die-hard and I said that Haren should have an ERA this year between 3.2 and 3.5 (or whatever it is) I sound a whole lot more reasonable and difficult to disagree with than if I tell the same person that Dan Haren SHOULD have an ERA of 3.32 this year.

We want to give these people every opportunity to say, “OK, explain how you get those numbers,” instead of “No.” And the fact is that it’s a lot easier to say “no” to a single, specific, inflexible number than it is to a range of values.

DonCoburleone
13 years ago

Great point about including a range when expressing BABIP or xFIP scenarios. Specifically BABIP for me, which over the course of just 1 season can vary wildly, shouldn’t be cited as a huge reason to trade a guy (when it comes to fantasy baseball). David Wright is a great example, last year everyone was screaming to trade him cuz his BABIP was unsustainable. But in the end, he ended up with a .394BABIP. Guys CAN and HAVE sustained crazy high BABIPS over the course of a full season in the past, so why do we assume it can’t happen in the future?

Awesome article – This is what the community forum should be about IMO. Challenging the analysis and “en vogue” stats that the regular writers on this site subscribe to. In the end its how advanced statistical analysis………….. advances. Its the reason why nobody uses Runs Created or VORP anymore!

lester bangs
13 years ago

Cohen and Darling are two of the brightest and most open-minded announcers around. They might embrace a good case eventually. Don’t dismiss them out of hand. We’re not talking about Joe Morgan here.

lester bangs
13 years ago

I’m also eagerly awaiting the day where xBABIP is a staple on sites like this (if I missed it coming, my apologies).

MV
13 years ago

Don’t know why you wrote only about hitters’ BABIP and didn’t even mention the dumbest part of Gary’s comment – the one about pitchers’ BABIP – ”Conversely, if a pitcher has a particularly low batting average on balls in play, they like to tell you it’s going to rise eventually. Well, to me that doesn’t make any sense.” and ”Certain pitchers induce more groundballs or more weakly hit balls than others. (?!)

Big Oil
13 years ago

THIS. If Fangraphs Overlord Appleman can place this as a featured community research article, it would certainly add to the discourse. I’m all for metrics that incorporate xBABIP as a means of determining xBA/xOPB/xSLG. Currently I do it through the xBABIP calculator and then longhand to see how a particular player’s line changes.

Matt
13 years ago

The language and semantics of stats is something I’m really glad you brought up. Nothing drives me crazier than everything being chalked up to luck when what people really mean is that something is unsustainable. There is a big difference between the two.

When a guy goes 3 weeks in a row with a BABIP of .430, very seldom does luck play a large part in that. Think about it this way, put me up to bat and I guarantee you that you could give me a million ABs and I couldn’t string together 3 weeks of a .430 BABIP. If it were really just a matter of a luck, I should be able to. The player in actuality is probably “seeing the ball well” those 3 weeks and has a good, mechanically sound swing going that he’s repeating. Now can he maintain that over a full season? No, but unless he’s collected 15 seeing eye singles and 12 flares that drop in no man’s land, he’s earning those hits.

Just like pitchers mechanics can alter from game to game and influence their ability to hit or miss the strike zone, hitters swing mechanics can alter as well and influence their ability to make solid contact.

ARF
13 years ago

I think you are missing the point DonCobuerleone

You are basically saying that you should bank on luck because sometimes people get lucky for a whole season…David Wright can sustain a high BABIP for a whole season, but if his “true” BABIP is under .394 (which it obviously is) then we want to trade him, because most of the time he will regress.

Sanderson13
13 years ago

Baseball HQ uses expected batting average (xBA).

Isn’t measuring current BABIP against a player’s career average useful in determining if a player is growing in skills or just getting fluky?

Bobby Muellermember
13 years ago

@DonCoburleone:

Your David Wright example is flawed. Last season, Wright had a .426 BABIP in the first half and a .344 BABIP in the second half. His career BABIP is .349. So, he didn’t maintain his high BABIP for the whole season. His extremely high first-half BABIP came back down to his career level for the second half.

Regarding fantasy baseball, if you had traded David Wright at the All-Star break, when he was hitting .324/.410/.462 (.426 BABIP), you would have avoided his .279/.358/.423 (.344 BABIP) second half.

Neil
13 years ago

The issue I had with the broadcast was that they put words in “our” mouths.

They felt that Wright hits the ball really hard, so he should have a high BABIP, which I eseentially agree with. However, what we preach is regression to the mean and comparing a player to himself (and not to a league BABIP). Wright will have a high BABIP due to the power, line drives and speed, but if his career BABIP is about .350 and it was .400 at the time of the broadcast, then he is basically either hitting more line drives and fewer pop-ups, running harder or getting luckier. The broadcasters essentially said BABIP is silly because some guys hit the ball harder than others (which everyone knows!), but we know that you need to look at a guy’s BABIP compared to what it usually is, followed by a glance at the batted ball profile.

I’m sure Gary and Ron understand better than they let on – it’s hard to talk about this kind of stuff between pitches, especially when they get interrupted by a play.

Rich
13 years ago

“Over time, that’ll even out, and he’ll get more hits. ”

Why do we keep insisting this, when for a lot of players, its absolutely not true.

PL
13 years ago

Lets make a deal: if we can make the newer metrics easier to learn and understand, can MLB stop counting pitcher wins as an official stat?

Deal? Deal.

chris
13 years ago

Wait, who ever said we should look at BABIP in a vaccum? It is best used, imo, based on things like LD% and history (history moreso) if a 6 year player with a 310 BABIP career suddenly has a month of a 400 BABIP, chances are it will come down

Lebron Janes!
13 years ago

Hmm.. Interesting!

Garison
13 years ago

I like the idea of xBA, as it seems to be the goal of BABIP/xBABIP analysis anyways. But I have a problem with the call for using ranges rather than predicting a specific value.

I recall an article at SI.com from Derek Carty of The Hardball Times(http://sportsillustrated.cnn.com/2010/fantasy/05/13/hardball.times/index.html) that looks at how we make projections for batters. The example Carty uses is projecting Miguel Cabrera’s home run output. We have the option of projecting a specific number (say, 35) or projecting a range of outcomes (say, in the 30’s).

Every projection has a margin of error on either side, an “error bar”. When projecting a specific number there is one error bar. But when projecting a range of outcomes, there is an error bar for every number within that range, the most important ones being at the extremes. Indeed, the larger the range is the wider the margin for error becomes. You can follow the link to the article to get the details. One of Carty’s points is that anytime one predicts a range of results for a player, one implies a specific value that falls at the middle of that range, no matter how humble one is trying to be.

I think the same principle applies to projecting batting average. You can say that an xBA of .285 “is better expressed as between .279 and .291, or whatever”, but by trying to remove precision you are implying precision, i.e. the middle of whatever range you predict. By expanding the projection to be a range you also expand the error bars stemming from either end of that projection. The result being that the assertion in your example that “a .274 is really unlucky” would probably be wrong according to that hypothetical range projection. Since the lower end of your projected range (.279) has its own error bar (which would need to be calculated), .274 may very well be within that margin of error. Now, if what you’re trying to accomplish by projecting the range of .279 to .291 is to establish an error bar around a .285 xBA, THEN SAY SO. That’s totally respectable, and even desirable. It’s an EXPECTED batting average, after all; and everyone knows things don’t always go as expected. That doesn’t mean our expectations shouldn’t be precise.

No one who makes a specific projection assumes that it must turn out to be exactly correct for it to be REASONABLE. People should understand the idea that results usually fall within a range of possible outcomes. They may forget that, but then it is the writer’s job to kindly explain it rather than paint formal projections with a broad brush in the name of humility. By necessity we make (or imply) precise predictions, but we can hope to be accurate without being perfect. The bottom line is that specificity is not necessarily equivalent to arrogance, because all specific projections have a margin for error.

brentgriffin
13 years ago

I agree that babip in itself is worthless. I have made the mistake far to many times. However, when you bring in other components, babip can be a useful tool to the whole picture of how good, or how bad, a player really is. Like the article said, babip does not tell the difference between a sharp liner to the gap and a bloop single, but thats what other stats are for such as LD%, GB%, FB%, etc. For example, Jeff Niemann has about a .235 babip, which on first glance I was convinced would go way up and he would be a much worse pitcher. Then I saw his LD% was 13.5% and his babip was understandable. However, a pitcher with a .235 babip and a 21 LD% and a 39 GB% I expect to regress.

Aaron Murray
13 years ago

Garison,

There are ways to define a range so that the amount of expected error is quite easy to understand and work with. One standard deviation from the mean, for instance, would give us a range that would be expected to encompass the “true talent level” about two thirds of the time. About a sixth of the time a player will over or under perform that mark. I feel like that’s fairly easy to grasp for the interested novice and a useful measure for thumbnailing a player’s variance.

I also think it would be much easier to convince skeptics when offering them a range of expected value instead of one definite number. It’s just easier to argue with a definite number and become close minded to the whole process. If I’m discussing pitching with a stats dinosaur then when he says that Joe Blow’s ERA is 3.20 so he’s good and I say wait a minute, his xFIP is 4.00 it’s easy for the dinosaur to get defensive and start attacking the things that xFIP misses like heart and grit and fashion sense. If instead, I say that yeah, Joe’s had some good results this year but his underlying stats suggest a number between 3.80 and 4.20 then I can see the conversation going much more amicably as we discuss the defense and bullpen behind Joe and the fact that variance is going to effect every pitcher somewhat. The bottom line for me is that ranges could change the conversation from a pissing match between two magical numbers that supposedly describe a player to a discussion of what these numbers really try to do, ERA (and other trad. stats like BA) try to provide a record of what has actually happened while most deeper stats try to provide predictions of what WILL happen.

I think many of us want to see BA and ERA and the like banished forever but perhaps a better outcome would be to keep these old stats and simply add on an expected range. The announcers would say things like…”And here comes Aaron Murray Jr. to the plate, he’s hitting .324 so far this year which is actually a bit on the lower end of he expected batting average range of .320-.350. He sure has taken the baseball world by storm as he’s the first two way superstar since the Babe, posting an ERA of 1.24 that’s only slightly better than his expected range of 1.32-1.71.”

Garison
13 years ago

Good points, Aaron.

I’m not even sure I said anything worthwhile in the previous post, though I was trying to. Just to be clear, I’m not opposed to talking about expected results within a range. That would be helpful for all of us to do, not just in conversations with non-sabermetric folks.

I’m certain that people who point to FIP, xFIP, and other ERA predictors understand that those numbers are merely the output of a formula designed to show us what peripheral stats suggest. It just so happens that the formulas give us one number. It’s not that the single numbers are rigid and absolute, if for no other reason than that there are many ERA predictors, and their numbers vary. And of course those formulas are only as good as the assumptions behind them (and how well they do at predicting ERA).

I suppose the degree to which your conversation partner is familiar with advanced stats should determine how you frame your side of the discussion. If they don’t understand xFIP, you’re right; it wouldn’t do any good to just throw that number at them. So really it’s a question of “How can we be responsible with advanced stats?”

What happens if in your hypothetical conversation, Mr. Skeptic asks “What do you mean his underlying stats suggest a number between 3.80 and 4.20?” Underlying stats are plugged into formulas, and formulas (as far as I know) spit out precise numbers. So you would need to refer to some specific numbers to justify your argument, even though you’ll need to take those numbers with a grain of salt. That’s part of humility in this context: realizing that the advanced stats are helpful, but they’re not perfect; and goodness knows we are not perfect because we use the stats– like BABIP– to come to wrong conclusions sometimes.

philosofoolmember
13 years ago

1. The study of BABIP arises out of an analysis of pitchers, not hitters. It’s a very useful stat for analyzing pitchers. We shouldn’t throw out the baby with the bath water.

2. People have been working on finding a BABIP model for hitters that works for a long time and we still don’t have it.

3. It’s very, very important to emphasize that the fact that we don’t have a good BABIP model for hitters does not license assuming that every hitter’s BABIP is a good reflection of his true talent. BABIP for hitters does not reach r-squared of .5 for under 650 PA, and Pizza Cutter estimated that you would need about 1000 PA before you could confidently think your BABIP measurements were nearing an estimate of true talent.

4. A good xBA model must be predictive as well as descriptive. I don’t give a rat’s ass if someone can tell me what his BABIP should have been over a stretch if that doesn’t allow me to predict that his BABIP in the future.

Aaron Murray
13 years ago

Garison,

I guess I would say that the formulas should be tweaked to produce a range, like one standard deviation, although the mathematicians can tell me if that is the most appropriate way to go. That way we’re not saying, “it should be .375 but I’ll just make it a little fuzzier for conversation’s sake.” That would also force us to be more aware of the actual ranges that one can expect in different aspects of playing baseball. Perhaps the range on xFIP is around .5 but for wOBA it’s more like .03. Since we all know that the numbers have variation in them anyway why not do a better job of defining what that typical variation is instead of just trying to get a feel for that variation? If the range is the middle 2/3 and the player will out produce that 1/6 of the time and under produce it 1/6 of the time then I think we’d all understand the game better.

Nathan
13 years ago

While I get the point of the post, I don’t really agree with the (implied) premise that a lot of sabermetrically inclined people treat BABIP as gospel. I know that personally, I never look at a guy’s BABIP and think, “oh, his BABIP is .265, he’s been really unlucky and obviously that will get better in the future.” My thought is usually more like, “oh, he has a .265 BABIP [checks LD%], it’s likely that number will regress toward his career mark.”

I guess my point here is that I (and it seems like most people here) implicitly understand that there is a “range” that is caused by luck, SSS, an unsustainable streak, etc. It seems to me that the problem is more with perception than with the stat itself because, as brentgarrison said, it can be very useful when taken in conjunction with things like the batted-ball profile.

Aaron Murray
13 years ago

Nathan,

If there’s a problem with perception then that is a problem with the stat, at least if you want to spread an understanding of the deeper, more useful, stats. What we want (or at least what I want) is to make it easier and easier for interested old schoolers to make the conversion.

Also, even if we all agree that there is a range whenever we talk about a specific number then why not just use the range? Especially since when we “understand” a range based on a specific number there’s no telling whether the range we’re thinking of is reasonable, inflated, or deflated.

Nathan
13 years ago

I’m fine with adding some sort of range (i.e. taking it from the realm of the implicit to the realm of the explicit) to make it more understandable to people not familiar with the stat, but I’m not comfortable with saying that we need to stop using BABIP, which seems to be what some here are arguing. I’m definitely in agreement with both of the points you made.

YG
13 years ago

I like what Brent Griffin said above, but I’ll just say this: People outside of this site overanalyze babip more often than not. When you see people on other sports like ESPN using it to predict better or worse days for certain players it can definitely be misleading at times. It wasn’t until I found this site and started looking at the other stats like line drives, pop flys, etc., which IMO are the more important to look at stats than the actual BABIP. The problem with is that it could misrepresent a player that is slumping or just playing poorly as a player that’s just getting unlucky. People look at the BABIP because THEY CAN’T watch every game, but most people out there that don’t look at fangraphs don’t know about LD%s and things like that. When those type of people look at BABIP, I generally think that they tend to assume that most hits are line drives or in other words balls that were hit hard. When you look at BABIP like that, that’s when it can start to really look dumb, because when players are slumping its likely that they are grounding out and popping out way too much. The BABIP could make it look like he’s getting unlucky when he’s really simply not playing good at all.

BABIP doesn’t need to go, it just needs to be De-emphasized. Bring the Line drives, ground balls, and other percentages to the forefront. Those are the more important, more telling stats.

Joey B
13 years ago

Like others have said, very few stats in isolation have much value. Sometimes the BABIP is bad because someone is hurting or aging. We knw that Beltre’s BABIP went from .301 to .358. We know that’s high, but proportion of the .057 is due to less foul territory at Fenway, or a better hitting background? Some would say he’s having an unsustainable run, but he has a .734/.835 H/A career split. If the .835 is sustainable, and if Fenway adds 20 points to an Away split, is it conceivable that a .855/.835 split at Fenway can be attained? I would certainly think so. So a career .787 is now a .845, with no other change but ballpark.