xHitting: Going beyond xBABIP (part I)
For a few years, it’s struck me as unusual that pitching and hitting metrics are asymmetric. If the metrics we use to evaluate one group (FIP or wRC+) are so good, why don’t we use them for the other?
One issue is that we’re not used to evaluating pitchers on an OPS-type basis, and similarly we’re not used to evaluating hitters on an ERA basis. Fine. But there’s a bigger issue: Why do pitching metrics put so much more emphasis on the removal of luck?
While most sabermetricians are aware of BABIP, and recognize the pervasive impacts it can have on a batting line, attempts to (precisely) adjust hitter stats for BABIP are surprisingly uncommon. While there do exist a few xBABIP calculators, these haven’t yet caught on en masse like FIP. And xBABIP doesn’t appear on player pages in either FanGraphs or Baseball Prospectus.
xBABIP itself isn’t even the end goal. What you probably really want is xAVG/xOBP/xSLG, etc. Obtaining these is a bit cumbersome when you need to do the conversions yourself.
Moreover, it strikes me that xBABIP cannot be converted to xSLG without some ad hoc assumptions. Let’s say you conclude a player would have gained or lost 4 hits under neutral BABIP luck. What type of hits are those? All singles? 2 singles and 2 doubles? 1 single, 2 doubles, 1 triple? The exact composition of hits gained/lost affects SLG. Or maybe you assume ISO is unaffected by BABIP, but this too is ad hoc.
At least to me, whenever a hitter performs better/worse than expected, we really care to know two things:
- Is it driven by BABIP?
- If so, what is the luck-neutral level of performance?
As I’ve attempted to illustrate, answering #2 is not so easy under existing methods. (Nor do people always even attempt to answer it, really.) Even answering #1 correctly takes a little bit of effort. (“True talent” BABIP changes with hitting style, so it isn’t always enough just to compare current vs. career BABIP. And then there are players with insufficient track record for career BABIP to be taken at face value.)
Compare this to pitchers. When a pitcher posts a surprisingly good/bad ERA, we readily consult FIP/xFIP/SIERA. Specific values, readily provided on the site. So why not for hitters?
Here I attempt to help fill this gap. The approach is to map a hitter’s peripheral performance to an entire distribution of hit outcomes. These “expected” values of singles, doubles, triples, home runs, and outs, can then be used to computed “expected” versions of AVG, OBP, SLG, OPS, wOBA, etc.
Recovering xAVG and xOBP isn’t that different from current xBABIP-based approaches. The main extension is that, unlike xBABIP, this provides an empirical basis to recover xSLG, and also xWOBA.
Steps:
- Calculate players’ rates of singles, doubles, triples, home runs, and outs among balls in play. (Unlike some other BABIP settings, I count home runs as “balls in play” to estimate an expected number.)
- Regress each rate separately on a common set of peripherals. You’ll now have predicted rates of each for each player. (Keeping the explanatory variables common throughout ensures the rates sum to 100%.)
- Multiply by the number of balls in play (again counting home runs) to get expected counts of singles, doubles, triples, home runs, and outs.
- Use these to compute expected versions of your preferred statistics.
What explanatory peripherals are appropriate? Initially I’ve used:
- Line drive rate, ground ball rate, flyball rate, popup rate
- Speed score
- Flyball distance (from BaseballHeatMaps.com), to approximate power
- Speed * ground ball rate
- Flyball distance * flyball rate
These explanatory variables differ somewhat from those in the xBABIP formula linked earlier. The main distinctions are adding flyball distance (think Miguel Cabrera vs. Ben Revere) and using Speed score instead of IFH%. (IFH% already embeds whether the ball went for a hit. Certainly in-sample this will improve model fit, but it might not be good for out-of-sample use.)
Regression results:
Spd | FB Dist/1000 | FB Dist missing | (Spd*GB%)/1000 | (FB Dist*FB%)/10000 | LD% | GB% | FB% | IFFB%/100 | Pitcher dummy | Constant | |
Singles rate | -0.0177 | 0.0608 | 0.0111 | 0.4882 | 0.0090 | -0.0019 | -0.0063 | -0.0066 | -0.0417 | -0.6833 | 0.7296 |
Doubles rate | 0.0076 | 0.6044 | 0.1457 | -0.1059 | -0.0152 | -0.0058 | -0.0066 | -0.0061 | -0.0070 | -0.6700 | 0.5235 |
Triples rate | 0.0040 | 0.0193 | 0.0057 | -0.0279 | -0.0019 | -0.0077 | -0.0077 | -0.0077 | -0.0010 | -0.7695 | 0.7634 |
HR rate | 0.0018 | 0.9392 | 0.2764 | -0.0295 | 0.0283 | 0.0081 | 0.0080 | 0.0085 | -0.0127 | 0.8020 | -1.0790 |
Outs rate | 0.0043 | -1.6238 | -0.4389 | -0.3249 | -0.0202 | 0.0073 | 0.0125 | 0.0118 | 0.0624 | 1.3205 | 0.0625 |
Technical notes:
- These are rates among balls in play (including home runs)
- Each observation is a player-year (e.g. 2012 Mike Trout)
- I’ve used 2010-2012 data for these regressions
- Currently I’ve only grabbed flyball distance for players on the leaderboard at BaseballHeatMaps. This is usually about 300 players per year, or most of the “everyday regulars.” (Fear not, Ben Revere/Juan Pierre/etc. are included.) The remaining cases get an indicator for ‘FB Dist missing.’
- LD%, GB%, FB%, and IFFB% are coded so that 50% = 50, not 0.50.
- Pitcher proxy = 1 if LD% + GB% + FB% = 0. Initially I haven’t thrown out cases of pitcher hitting, nor other instances of limited PA.
- Notice the interaction terms. The full impact of GB% depends both on GB% and Speed; the full impact of FB% depends on both FB% and FB distance; etc. So don’t just look at Speed, GB%, FB%, or FB Distance in isolation.
- Don’t worry that the coefficients on pitcher proxy “look” a bit funny for HR rate and Outs rate. (Remember that these cases also have LD%=0, GB%=0, and FB%=0.) In total the average predicted HR rate for pitchers is 0.01% and their predicted outs rate is 94%.
- Strictly speaking, these are backwards-looking estimators (as are FIP and its variants), but they might well prove useful in forecasting.
I next calculate xAVG, xOBP, xSLG, xOPS, and xWOBA. For now, I’ve simply taken BB and K rates as given. (xBABIP-based approaches seem to do the same, often.)
Early results are promising, as “expected” versions of AVG, OBP, SLG, OPS, and wOBA all outperform their unadjusted versions in predicting next-year performance. (At least for the years currently covered.)
Which players deviated most from their xWOBA? Here are the leaders/laggards for 2012, along with their 2013 performance:
Leaders | Laggards | ||||||||
Name | 2012 wOBA | 2012 xWOBA | Difference | 2013 wOBA | Name | 2012 wOBA | 2012 xWOBA | Difference | 2013 wOBA |
Brandon Moss | 0.402 | 0.311 | 0.091 | 0.369 | Josh Harrison | 0.274 | 0.355 | -0.081 | 0.307 |
Giancarlo Stanton | 0.405 | 0.332 | 0.073 | 0.368 | Ryan Raburn | 0.216 | 0.290 | -0.074 | 0.389 |
Will Middlebrooks | 0.357 | 0.285 | 0.072 | 0.300 | Nick Hundley | 0.205 | 0.265 | -0.060 | 0.295 |
Chris Carter | 0.369 | 0.298 | 0.071 | 0.337 | Jason Bay | 0.240 | 0.299 | -0.059 | 0.306 |
John Mayberry | 0.303 | 0.238 | 0.065 | 0.298 | Eric Hosmer | 0.291 | 0.349 | -0.058 | 0.350 |
Torii Hunter | 0.356 | 0.293 | 0.063 | 0.346 | Gerardo Parra | 0.317 | 0.369 | -0.052 | 0.326 |
Jamey Carroll | 0.299 | 0.244 | 0.055 | 0.237 | Daniel Descalso | 0.278 | 0.328 | -0.050 | 0.284 |
Cody Ross | 0.345 | 0.291 | 0.054 | 0.326 | Jason Kipnis | 0.315 | 0.365 | -0.050 | 0.357 |
Melky Cabrera | 0.387 | 0.333 | 0.054 | 0.303 | Rod Barajas | 0.272 | 0.322 | -0.050 | – |
Kendrys Morales | 0.339 | 0.286 | 0.053 | 0.342 | Cameron Maybin | 0.290 | 0.339 | -0.049 | 0.209 |
Is performance perfect? Obviously not. The model does quite well for some, medium-well for others, and not-so-well for some. Obviously this is not the end-all solution for xHitting.
Some future work that I have in mind:
- A still more complete set of hitting peripherals. I’m thinking of park factors, batted ball direction, and possibly others.
- Testing partial-season performance
- Comparing results against projection systems like ZiPS and Steamer
Otherwise, my main hope from this piece is to stimulate greater discussion of evaluating hitters on a luck-neutral basis. Simply identifying certain players’ stats as being driven by BABIP is not enough; we really should give precise estimates of the underlying level of performance based on peripherals. We do this for pitchers, after all, with good success.
Above I’ve contributed my two cents for a concrete method to do this. A major extension to xBABIP-based approaches is that this offers an empirical basis to recover xSLG and xWOBA. While the model is far from perfect, even in its current form it generates “expected” versions of AVG, OBP, SLG, OPS, and wOBA that outperform their unadjusted versions in predicting subsequent-year performance. (Not just for leaders/laggards.)
Comments and suggestions are obviously welcome!
Sam is an Oakland A's fan and economist who received his Ph.D. from UC San Diego in 2017.
Great read, are you on twitter?
Thank you. I’m not on Twitter; sorry!
well the biggest reason why pitching metrics tend to be luck-neutral or fielding independent and advanced hitting metrics are not is that a pitcher has much, much, much less control about his BABIP.
A hitter can have an approach of slap it on the ground and run. An approach mostly used by smaller, speedy guys.
Or a big, slow slugger would try to hit the ball into the air to hit it out or to have more time to run, since he is slow to begin with and when the ball is caught in flight speed really doesn’t matter at all. You could theoretically round all the bases and arrive at home plate before the ball is caught… but you are still called out. so it makes sense for slower power guys to hit in into the air.
This approach reduces their BABIP and xBABIP.
A pitcher sees a large sample which should even out rather quickly and all the DIPS theory suggests that a pitcher does not have all that much if any control over balls in play.
This is a good point. What it suggests to me is that luck-neutral hitter evaluation has many more things that need to be taken into account. So it might be harder to do, but I believe we can do it.
Thank you for the comments.
no problem 🙂
another issue is that FIP does not really take “luck” out of the equation. It really just takes all the events that a pitcher has the most control over and takes out the sequencing… just like wOBA.
ERA is liable because of the sequencing. A single followed by a HR and then 3 K’s is not the same as a HR followed by a single and then 3K’s.
From a FIP-standpoint there is no difference between the first and the second sequence. ERA penalizes the pitcher for giving up the HR after the single which doesn’t really makes sense when you try to evaluate performance.
also, the HR, BB, K and HBP values used in the FIP-formula are all based on linear weights. just like wOBA.
wOBA is a stat that also takes out the sequencing. It is context neutral as we say. So a HR is a HR is HR. doesn’t matter if the bases are loaded or empty. Same with FIP.
Another point is that in xFIP the only thing that is normalized is the HR/FB rate. Not BABIP.
Agreed. FIP just takes out sequencing and fielding, while xFIP also takes out HR/FB luck.
Awesome! Great read, and definitely a step in the right direction.
Very thorough and insightful read. Thanks for sharing, Sam!
Interesting analysis. I just wanted to let you know that MLB teams already have access to xStats, calculated in basically the same way.
Actually, calculated using hit f/x data instead of batted ball rates.
Good to know. Are there any pieces on it on Fangraphs, BP, THT, etc.? Thanks!
We have limited amounts of HitF/x data that’s released to the public. Here’s a link to the best study I know of on it.
http://www.baseballamerica.com/minors/sabr-analytics-conference-hitters-quickly-show-batted-ball-profiles/
I did something very much like this on my own a few months ago, except I went one step further and calculated xRAA for batters instead of xWoba. The basic idea I had was to use batted ball data to calculate expected results (which is exactly the basic idea you had). It’s very good to know that there are other individuals who think that there are the same inefficiencies in amateur (read: our) evaluation of player talent as me.
You used a few more factors than me though in calculating expected results, so your calculations are likely to be slightly more accurate. The best way to improve upon our findings would be to use HitF/x data, which we have no real hope of attaining. Using average results of, say, groundballs hit between 40-45 mph to the left side, would the best possible FIB calculations we could do.
I really do wonder how much major league front offices value this stuff. My guess would be (for the most part) not as much as they should.
Cool! Thank you for the link.
And comments in general
Why are we just throwing each variable into a regression calculator without thinking? What are the R^2 values for each variable and each outcome?
I know you said having a common set of variables gets the rates to be 100% but I don’t think this is the way it should be approached.
Should Spd be included in xHR%? No, it doesn’t make sense. But it should be included in all the others. Should GB% be in HR%? No.
Should FB distance be in 1B%? Probably not either. Should FB% be in 1B? No.
It is very good work, and one of the best we have at the moment, but not near as good as pitching estimators.
Coming up with xwOBA is very hard. I’ve attempted to do it with using xBABIP and determining the wOBA weight of a BABIP induced hit. But again it doesn’t work as well as just using wRC+.
(re:RHS variables) Agreed for the most part. I definitely think some can be dropped for certain outcomes, and that there are other interactions that should be added for some. Indeed the only real reason I’ve done it this way currently is to get the rates to sum to 100%.
I wonder: if we let the RHS be outcome-specific, where the rates might add to more or less than 100%, what if we scale everything up/down proportionally? It doesn’t feel especially kosher on its own, but to the extent it might allow better prediction of individual outcomes, I wonder if it would still improve performance overall. TBD.
For now I omitted p-values and R2 just to keep the table simple. But maybe I’ll include them in a follow-up piece with (hopefully) other improvements.
Thank you for the comments!
Remarkably succesful at predicting 2013, it seems. Just looking at the top few players on each list, Raburn had a big breakout year, while Middlebrooks, Moss, and Stanton all had huge down years. Very impressive.
Awesome piece, really impressive how the 2012 differentials translated to 2013. I’m sure major league clubs have their own versions of this stat and wouldn’t be shocked if yours could compete. I’d love to see xWOBA for 2013 players.