xHitting (Part 4): 2014 Fantasy Edition!

Welcome to the fourth installment of xHitting!  As always, reader comments and feedback are super encouraged and appreciated.  (Links to parts one, two, and three)

Briefly recapping the method, the gist is to estimate the expected rate of each individual hit type based on a player’s underlying peripherals, and in turn recover all the needed components to compute expected versions of wOBA, OPS, etc.  The only real change to the model since last time is that I now utilize a “hybrid” predicted home run rate, that averages between actual and (raw) predicted home run rate, with the weight given to actual HR rate increasing in the number of plate appearances.  (This is explained in part three, for those curious.)

Perhaps the more exciting change, though, is that this time I actually have results for an ongoing season, which potentially can help for fantasy purposes.  (Not that most readers need my help necessarily.)  Related to fantasy usage, there were a few requests to see a full spreadsheet of past results (2010-2013 seasons), which I have posted here.  Again feel free to take it or leave it at your leisure.

Note: I collected most of these data at the All-Star Break, so numbers may be a few weeks behind, but they’re still mostly true.  Also, for time considerations I only fetched 2014 stats for qualified leaders.  This even leaves out a few big names, but I couldn’t justify time to fetch every player.

So far, I’ve typically posted the biggest “over-” and “under”-achievers for a given season.  And I suppose I’ll continue that tradition today.  But while these lists are useful for highlighting which players seem most likely to regress, it overlooks another main use of the model, which is to assess the realness of a player’s apparent “breakout” or “decline;” at least in-sample.  (In some cases, the model may think that a player’s breakout is entirely justified, given peripherals, while others it may view more skeptically.)  Thus, today I’ll also post a second list, of players who seem to have taken a pronounced step forward/step back this season, and what the model thinks of their season-to-date performance.

Okay, time for results!  I’ll start with the list of “over-” and “underachievers.”

2014 Underachievers (1st half) 2014 Overachievers (1st half)
Name wOBA xWOBA Diff Name wOBA xWOBA Diff
Jean Segura 0.256 0.305 -0.049 Casey McGehee 0.345 0.277 0.068
Chris Davis 0.306 0.353 -0.047 Yasiel Puig 0.398 0.340 0.058
Mark Teixeira 0.352 0.397 -0.045 Matt Adams 0.376 0.324 0.052
Gerardo Parra 0.289 0.327 -0.038 Mike Trout 0.428 0.381 0.047
Brian McCann 0.298 0.330 -0.032 Marcell Ozuna 0.343 0.300 0.043
Torii Hunter 0.323 0.355 -0.032 Lonnie Chisenhall 0.396 0.359 0.037
Joe Mauer 0.308 0.340 -0.032 Scooter Gennett 0.355 0.320 0.035
Jimmy Rollins 0.320 0.352 -0.032 Marlon Byrd 0.344 0.309 0.035
Brian Roberts 0.304 0.334 -0.030 Giancarlo Stanton 0.397 0.363 0.034
Buster Posey 0.326 0.352 -0.026 Hunter Pence 0.359 0.325 0.034

A general pattern I notice is that, having worked with this model for a while now, there do seem to be players that give the model some trouble and have a disproportionate tendency to appear on this list from year to year.  A few of these players appear on this list… more on that later.

Partly for that reason, I wouldn’t necessarily say to “buy low” the guys on the left, nor “sell high” the guys on the right; although you can if you want.  I won’t address every player, but I have some scattered comments:

  • For readers who prefer OPS, .020 wOBA translates to about .050 OPS, on the margin.
  • .397 predicted for Teixeira?  Not sure where that came from…
  • Poor Segura.  All things considered, I think nobody deserves a big second half more than he does.
  • Whatever happened to Casey McGehee’s power?  The guy once hit 23 home runs in a season, but now has ISO of .073, with surprisingly low fly ball distance.
  • Although Chisenhall’s breakout is not as impressive if you take out what the model thinks is luck, it’s still a pretty impressive improvement.
  • Chris Davis is sort of the reverse of Chisenhall.  Adding back in what the model thinks has been bad luck, he’s still way down from what he did last year, but not nearly as disappointing as he probably has been to many owners thus far.

As mentioned, certain players do seem to be able to over/underperform the model somewhat consistently; the same way we think some pitchers are usually better or worse than their FIP.  With now 4.5 years of data to work with, however, I think I can make educated guesses about which players systematically deviate from the model predictions.  I’ll term this deviation the “player fixed effect.”

(Requiring at least 1000 PA from 2010 through 2014 first half)

Model loves too much Model loves too little
Name Player FE
estimate (wOBA)
Name Player FE
estimate (wOBA)
Brian Roberts -0.033 Wilson Betemit 0.032
Todd Helton -0.026 Brandon Moss 0.032
Jean Segura -0.026 Ryan Sweeney 0.028
Jose Lopez -0.025 Mike Trout 0.027
Mark Teixeira -0.025 Peter Bourjos 0.026
Russell Martin -0.024 Matt Carpenter 0.025
Darwin Barney -0.023 Brandon Belt 0.025
Chris Getz -0.023 Melky Cabrera 0.025
Jimmy Rollins -0.021 Carlos Ruiz 0.024
Jason Bay -0.020 Chris Johnson 0.024

Comments:

  • Again, .020 wOBA is equivalent to about .050 OPS, on the margin.
  • Taking out their apparent fixed effect, Teixeira is only underperforming his xWOBA by about .020, and Brian Roberts is actually doing about par.
  • On the reverse side, Mike Trout’s “adjusted” xWOBA jumps up to .408, where really it probably doesn’t surprise us that he’s outperforming even that, since he’s Mike Trout.  And although Giancarlo Stanton misses the Top 10 cutoff above, his apparent fixed effect of +.022 would be 11th; so his “adjusted” xWOBA is more like .385.
  • Yasiel Puig (.058) would also be on the list of “positive fixed effects” if we relaxed the PA requirement (he has 826 during this time).  And Matt Adams (~.040) might also be well on his way to that list; although he has fewer plate appearances still than Puig.
  • I don’t really have good explanations/know any common themes for players with negative fixed effects.  Maybe readers can help?
  • For Trout, home runs are pretty clearly the area where the model underestimates him.  In any given season (2010-2014), he hits about twice as many HR as the model thinks he should in the “raw” prediction.
  • And Trout’s not the only “HR rate defier,” either; just the most salient.  In general, the model has never done as well with home runs as it does with singles, doubles, and triples.  It seems there are other important determinants of home run hitting that really should be in the model, but currently are not.  Intuitively, I sort of would like velocity and angle of the ball off the bat, but so far have not found a good data source to actually include these.  (Maybe that will change in the coming years as MLBAM releases “Hit F/X” style data?)  Until then, reader suggestions are also super welcome here.

And now, finally, for the other usage: here’s a partial list of players who have taken either a pronounced step forward or back this season, relative to established norms.

2014 “Decliners” 2014 “Improvers”
Name Career wOBA 2014 wOBA 2014 xWOBA Name Career wOBA 2014 wOBA 2014 xWOBA
Nick Swisher 0.352 0.285 0.305 Michael Brantley 0.324 0.394 0.404
Joe Mauer 0.373 0.308 0.340 Lonnie Chisenhall 0.328 0.396 0.359
Allen Craig 0.350 0.289 0.309 Seth Smith* 0.334 0.389 0.356
Billy Butler 0.352 0.300 0.309 Victor Martinez 0.362 0.416 0.422
Evan Longoria 0.365 0.315 0.323 Jonathan Lucroy 0.342 0.383 0.354
Domonic Brown 0.315 0.267 0.267 Anthony Rizzo 0.342 0.382 0.382
Chris Davis 0.351 0.306 0.353 Nelson Cruz 0.356 0.393 0.380
Matt Holliday* 0.385 0.342 0.318 Jose Altuve 0.319 0.356 0.325
Jean Segura 0.299 0.256 0.305 Brian Dozier 0.311 0.344 0.362
David Wright 0.377 0.335 0.305 Kyle Seager 0.334 0.367 0.344
Buster Posey 0.366 0.326 0.352 Dee Gordon 0.297 0.329 0.318
Shin-Soo Choo 0.369 0.333 0.346 Alcides Escobar 0.284 0.312 0.300
Dustin Pedroia 0.356 0.325 0.337 Casey McGehee 0.321 0.345 0.277
Jed Lowrie 0.327 0.297 0.305
Jay Bruce 0.343 0.315 0.326

* – To avoid inflation from Coors Field, for these players I’ve taken the total from 2011-13 seasons only

Comments:

  • At least in-sample, Brantley’s breakout seems to be pretty much entirely justified.  Of course this doesn’t mean that he won’t regress somewhat, but if I were to guess, I’m a little more optimistic than ZiPS and Steamer (which currently project .341 and .333 RoS, respectively).  Similar deal for some others.
  • “Yikes” for Billy Butler and Domonic Brown, whose declines this season seem (at least in-sample) to be entirely justified.
  • I’m not sure why the model dislikes Casey McGehee so much.  Obviously his fly ball distance (mentioned earlier) isn’t doing him any favors, and his .369 first-half BABIP is probably unsustainable.  Still, .277 xWOBA?  Seems harsh.

As with any fantasy advice, don’t take any of this too literally…  Take it or leave it as you see fit.

Lastly, although I hyped this piece from a fantasy perspective, the overall goal remains that I would love to see more work done to de-luck hitter stats, the way people do so often for pitchers.  (FIP for pitchers, and xWOBA or xWRC+ for hitters! Is the dream.)

Reader thoughts on how to improve the model, or requests for players not already mentioned?





Sam is an Oakland A's fan and economist who received his Ph.D. from UC San Diego in 2017.

6 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Jim S.
9 years ago

Very nice job.

Chuck Knoblauch
9 years ago

I’m all about xHitting.

John Wrightmember
9 years ago

This is terrific. I’ve been trying to come up with a WAR calculation that uses fielding-independent batting as a component, somewhat akin to fWAR using FIP rather than ERA for pitchers. A few questions:

1. Do you have a spreadsheet that includes your calculations and source data (not just the results)? I’m having trouble locating all the data that would be needed to replicate this for 2014 for the entire league. I’ve read the 8/8 version of your paper and see the estimated relationships between the variables, so I think I could put together the formula, but I’m not quite sure where to locate LF% and RF%, for example. I know how to get the HR&FB distance from baseballheatmaps.

2. I was trying to think of other variables that might warrant investigation. Infield hits, maybe? There might be a lot of overlap between that and speed score, but I suppose there could be slower players who still have a talent for that. Have you analyzed the population of “fixed effect” players for similarities that might uncover new variables?

3. One interesting new angle on hit quality is Robert Arthur’s work for BP on the analytic value of the crack of the bat. If he can gather some league-wide data, you could have a decent proxy for hit quality to add to your existing variables.

John Wrightmember
9 years ago
Reply to  samyoung

As far as I know, ESPN’s Home Run Tracker (hittrackeronline.com) is the only place to find velocity off the bat, but it’s only for homers unless there’s more data I’m not seeing on their site. I imagine that the sample size would be an issue if you just used a player’s HRs, although I’m not sure.

If there’s not a good way to update LF% and RF% through an export from the FG leaderboards (or something comparably simple), I’m probably not going to try to update that info myself. I think I see where you are getting it, presumably the PA total in the Location section under Standard (“as L to Left” etc). Maybe adding split data to custom leaderboards is worth a feature request.

I’ll send you an e-mail to the UCSD address because I’d still be interested in seeing what you have in spreadsheet form.