xHitting (Part 4): 2014 Fantasy Edition!
Welcome to the fourth installment of xHitting! As always, reader comments and feedback are super encouraged and appreciated. (Links to parts one, two, and three)
Briefly recapping the method, the gist is to estimate the expected rate of each individual hit type based on a player’s underlying peripherals, and in turn recover all the needed components to compute expected versions of wOBA, OPS, etc. The only real change to the model since last time is that I now utilize a “hybrid” predicted home run rate, that averages between actual and (raw) predicted home run rate, with the weight given to actual HR rate increasing in the number of plate appearances. (This is explained in part three, for those curious.)
Perhaps the more exciting change, though, is that this time I actually have results for an ongoing season, which potentially can help for fantasy purposes. (Not that most readers need my help necessarily.) Related to fantasy usage, there were a few requests to see a full spreadsheet of past results (2010-2013 seasons), which I have posted here. Again feel free to take it or leave it at your leisure.
Note: I collected most of these data at the All-Star Break, so numbers may be a few weeks behind, but they’re still mostly true. Also, for time considerations I only fetched 2014 stats for qualified leaders. This even leaves out a few big names, but I couldn’t justify time to fetch every player.
So far, I’ve typically posted the biggest “over-” and “under”-achievers for a given season. And I suppose I’ll continue that tradition today. But while these lists are useful for highlighting which players seem most likely to regress, it overlooks another main use of the model, which is to assess the realness of a player’s apparent “breakout” or “decline;” at least in-sample. (In some cases, the model may think that a player’s breakout is entirely justified, given peripherals, while others it may view more skeptically.) Thus, today I’ll also post a second list, of players who seem to have taken a pronounced step forward/step back this season, and what the model thinks of their season-to-date performance.
Okay, time for results! I’ll start with the list of “over-” and “underachievers.”
2014 Underachievers (1st half) | 2014 Overachievers (1st half) | ||||||
Name | wOBA | xWOBA | Diff | Name | wOBA | xWOBA | Diff |
Jean Segura | 0.256 | 0.305 | -0.049 | Casey McGehee | 0.345 | 0.277 | 0.068 |
Chris Davis | 0.306 | 0.353 | -0.047 | Yasiel Puig | 0.398 | 0.340 | 0.058 |
Mark Teixeira | 0.352 | 0.397 | -0.045 | Matt Adams | 0.376 | 0.324 | 0.052 |
Gerardo Parra | 0.289 | 0.327 | -0.038 | Mike Trout | 0.428 | 0.381 | 0.047 |
Brian McCann | 0.298 | 0.330 | -0.032 | Marcell Ozuna | 0.343 | 0.300 | 0.043 |
Torii Hunter | 0.323 | 0.355 | -0.032 | Lonnie Chisenhall | 0.396 | 0.359 | 0.037 |
Joe Mauer | 0.308 | 0.340 | -0.032 | Scooter Gennett | 0.355 | 0.320 | 0.035 |
Jimmy Rollins | 0.320 | 0.352 | -0.032 | Marlon Byrd | 0.344 | 0.309 | 0.035 |
Brian Roberts | 0.304 | 0.334 | -0.030 | Giancarlo Stanton | 0.397 | 0.363 | 0.034 |
Buster Posey | 0.326 | 0.352 | -0.026 | Hunter Pence | 0.359 | 0.325 | 0.034 |
A general pattern I notice is that, having worked with this model for a while now, there do seem to be players that give the model some trouble and have a disproportionate tendency to appear on this list from year to year. A few of these players appear on this list… more on that later.
Partly for that reason, I wouldn’t necessarily say to “buy low” the guys on the left, nor “sell high” the guys on the right; although you can if you want. I won’t address every player, but I have some scattered comments:
- For readers who prefer OPS, .020 wOBA translates to about .050 OPS, on the margin.
- .397 predicted for Teixeira? Not sure where that came from…
- Poor Segura. All things considered, I think nobody deserves a big second half more than he does.
- Whatever happened to Casey McGehee’s power? The guy once hit 23 home runs in a season, but now has ISO of .073, with surprisingly low fly ball distance.
- Although Chisenhall’s breakout is not as impressive if you take out what the model thinks is luck, it’s still a pretty impressive improvement.
- Chris Davis is sort of the reverse of Chisenhall. Adding back in what the model thinks has been bad luck, he’s still way down from what he did last year, but not nearly as disappointing as he probably has been to many owners thus far.
As mentioned, certain players do seem to be able to over/underperform the model somewhat consistently; the same way we think some pitchers are usually better or worse than their FIP. With now 4.5 years of data to work with, however, I think I can make educated guesses about which players systematically deviate from the model predictions. I’ll term this deviation the “player fixed effect.”
(Requiring at least 1000 PA from 2010 through 2014 first half)
Model loves too much | Model loves too little | ||
Name | Player FE estimate (wOBA) |
Name | Player FE estimate (wOBA) |
Brian Roberts | -0.033 | Wilson Betemit | 0.032 |
Todd Helton | -0.026 | Brandon Moss | 0.032 |
Jean Segura | -0.026 | Ryan Sweeney | 0.028 |
Jose Lopez | -0.025 | Mike Trout | 0.027 |
Mark Teixeira | -0.025 | Peter Bourjos | 0.026 |
Russell Martin | -0.024 | Matt Carpenter | 0.025 |
Darwin Barney | -0.023 | Brandon Belt | 0.025 |
Chris Getz | -0.023 | Melky Cabrera | 0.025 |
Jimmy Rollins | -0.021 | Carlos Ruiz | 0.024 |
Jason Bay | -0.020 | Chris Johnson | 0.024 |
Comments:
- Again, .020 wOBA is equivalent to about .050 OPS, on the margin.
- Taking out their apparent fixed effect, Teixeira is only underperforming his xWOBA by about .020, and Brian Roberts is actually doing about par.
- On the reverse side, Mike Trout’s “adjusted” xWOBA jumps up to .408, where really it probably doesn’t surprise us that he’s outperforming even that, since he’s Mike Trout. And although Giancarlo Stanton misses the Top 10 cutoff above, his apparent fixed effect of +.022 would be 11th; so his “adjusted” xWOBA is more like .385.
- Yasiel Puig (.058) would also be on the list of “positive fixed effects” if we relaxed the PA requirement (he has 826 during this time). And Matt Adams (~.040) might also be well on his way to that list; although he has fewer plate appearances still than Puig.
- I don’t really have good explanations/know any common themes for players with negative fixed effects. Maybe readers can help?
- For Trout, home runs are pretty clearly the area where the model underestimates him. In any given season (2010-2014), he hits about twice as many HR as the model thinks he should in the “raw” prediction.
- And Trout’s not the only “HR rate defier,” either; just the most salient. In general, the model has never done as well with home runs as it does with singles, doubles, and triples. It seems there are other important determinants of home run hitting that really should be in the model, but currently are not. Intuitively, I sort of would like velocity and angle of the ball off the bat, but so far have not found a good data source to actually include these. (Maybe that will change in the coming years as MLBAM releases “Hit F/X” style data?) Until then, reader suggestions are also super welcome here.
And now, finally, for the other usage: here’s a partial list of players who have taken either a pronounced step forward or back this season, relative to established norms.
2014 “Decliners” | 2014 “Improvers” | ||||||
Name | Career wOBA | 2014 wOBA | 2014 xWOBA | Name | Career wOBA | 2014 wOBA | 2014 xWOBA |
Nick Swisher | 0.352 | 0.285 | 0.305 | Michael Brantley | 0.324 | 0.394 | 0.404 |
Joe Mauer | 0.373 | 0.308 | 0.340 | Lonnie Chisenhall | 0.328 | 0.396 | 0.359 |
Allen Craig | 0.350 | 0.289 | 0.309 | Seth Smith* | 0.334 | 0.389 | 0.356 |
Billy Butler | 0.352 | 0.300 | 0.309 | Victor Martinez | 0.362 | 0.416 | 0.422 |
Evan Longoria | 0.365 | 0.315 | 0.323 | Jonathan Lucroy | 0.342 | 0.383 | 0.354 |
Domonic Brown | 0.315 | 0.267 | 0.267 | Anthony Rizzo | 0.342 | 0.382 | 0.382 |
Chris Davis | 0.351 | 0.306 | 0.353 | Nelson Cruz | 0.356 | 0.393 | 0.380 |
Matt Holliday* | 0.385 | 0.342 | 0.318 | Jose Altuve | 0.319 | 0.356 | 0.325 |
Jean Segura | 0.299 | 0.256 | 0.305 | Brian Dozier | 0.311 | 0.344 | 0.362 |
David Wright | 0.377 | 0.335 | 0.305 | Kyle Seager | 0.334 | 0.367 | 0.344 |
Buster Posey | 0.366 | 0.326 | 0.352 | Dee Gordon | 0.297 | 0.329 | 0.318 |
Shin-Soo Choo | 0.369 | 0.333 | 0.346 | Alcides Escobar | 0.284 | 0.312 | 0.300 |
Dustin Pedroia | 0.356 | 0.325 | 0.337 | Casey McGehee | 0.321 | 0.345 | 0.277 |
Jed Lowrie | 0.327 | 0.297 | 0.305 | ||||
Jay Bruce | 0.343 | 0.315 | 0.326 |
* – To avoid inflation from Coors Field, for these players I’ve taken the total from 2011-13 seasons only
Comments:
- At least in-sample, Brantley’s breakout seems to be pretty much entirely justified. Of course this doesn’t mean that he won’t regress somewhat, but if I were to guess, I’m a little more optimistic than ZiPS and Steamer (which currently project .341 and .333 RoS, respectively). Similar deal for some others.
- “Yikes” for Billy Butler and Domonic Brown, whose declines this season seem (at least in-sample) to be entirely justified.
- I’m not sure why the model dislikes Casey McGehee so much. Obviously his fly ball distance (mentioned earlier) isn’t doing him any favors, and his .369 first-half BABIP is probably unsustainable. Still, .277 xWOBA? Seems harsh.
As with any fantasy advice, don’t take any of this too literally… Take it or leave it as you see fit.
Lastly, although I hyped this piece from a fantasy perspective, the overall goal remains that I would love to see more work done to de-luck hitter stats, the way people do so often for pitchers. (FIP for pitchers, and xWOBA or xWRC+ for hitters! Is the dream.)
Reader thoughts on how to improve the model, or requests for players not already mentioned?
Sam is an Oakland A's fan and economist who received his Ph.D. from UC San Diego in 2017.
Very nice job.
Heh, and within the same day there’s a piece saying to sell Casey McGehee and Matt Adams, and another saying to buy Seth Smith.
http://www.fangraphs.com/fantasy/the-xbabip-sell-list/
http://www.fangraphs.com/fantasy/seth-smith-is-a-must-own-in-daily-formats/
I’m not entirely sure where Adams’ “true talent” BABIP lies, but I agree with Podhorzer that it’s probably below .360 territory. (Also the Cardinals team offense has been disappointing this year, even if they may be improving lately.) Casey McGehee, meanwhile… seemingly universally hated by the “expectation” models currently.
I’m also probably not as bullish on Seth Smith as Duronio is. It’s true he has played pretty well — especially taking Petco into account, and the plate discipline is encouraging — but this model would say he’s played more like a .356 wOBA than his actual .393. But the Padres team offense is where I’m most skeptical. 3.2 runs per game…
I’m all about xHitting.
This is terrific. I’ve been trying to come up with a WAR calculation that uses fielding-independent batting as a component, somewhat akin to fWAR using FIP rather than ERA for pitchers. A few questions:
1. Do you have a spreadsheet that includes your calculations and source data (not just the results)? I’m having trouble locating all the data that would be needed to replicate this for 2014 for the entire league. I’ve read the 8/8 version of your paper and see the estimated relationships between the variables, so I think I could put together the formula, but I’m not quite sure where to locate LF% and RF%, for example. I know how to get the HR&FB distance from baseballheatmaps.
2. I was trying to think of other variables that might warrant investigation. Infield hits, maybe? There might be a lot of overlap between that and speed score, but I suppose there could be slower players who still have a talent for that. Have you analyzed the population of “fixed effect” players for similarities that might uncover new variables?
3. One interesting new angle on hit quality is Robert Arthur’s work for BP on the analytic value of the crack of the bat. If he can gather some league-wide data, you could have a decent proxy for hit quality to add to your existing variables.
Hey John, thank you so much for the comments. I’ll respond to each point below.
1. LF% and RF% come from individual player pages, within splits. I couldn’t program a parse to do this, so I had to fetch things by hand, meaning there are probably some occasional entry errors. I can maybe send you the spreadsheets, if you’d like.
2. I did think about using infield hits (I know some xBABIP calculators include this). I was somewhat reluctant to include it, since IFH status already conditions on the hit outcome, and in this sense endogenizes some of the “luck” we hope to remove.
So I was reluctant to use it for in-sample purposes. But it might improve out-of-sample performance (forecasting), so I’ll try to take a look.
I haven’t thought as hard about similarities between “fixed effect” players as I’d like. Plate discipline did come to mind, but here I’m unsure whether those are appropriate to include. Patient and impatient hitters probably hit the ball differently, but then the key is to include those differences, and not plate discipline itself.
I definitely appreciate suggestions, if other specific variables come to mind.
3. Thank you for directing me to these pieces. Yes, I would love some sort of more detailed measure of hit quality. Velocity off the bat would be nice, if there’s a realistic way to get it. Do you know if there is?
Thank you again! Feel free to e-mail me, too, if that’s more convenient.
As far as I know, ESPN’s Home Run Tracker (hittrackeronline.com) is the only place to find velocity off the bat, but it’s only for homers unless there’s more data I’m not seeing on their site. I imagine that the sample size would be an issue if you just used a player’s HRs, although I’m not sure.
If there’s not a good way to update LF% and RF% through an export from the FG leaderboards (or something comparably simple), I’m probably not going to try to update that info myself. I think I see where you are getting it, presumably the PA total in the Location section under Standard (“as L to Left” etc). Maybe adding split data to custom leaderboards is worth a feature request.
I’ll send you an e-mail to the UCSD address because I’d still be interested in seeing what you have in spreadsheet form.