xHR: A Speedy and Mandatory Revision

The Community Research section of FanGraphs serves as an excellent sounding board for aspiring amateurs (yes, those aspiring to rise to the level of amateur). After posting about a new statistical model or a detailed analysis of player performance, fellow Community Researchers are given a chance to chime in with helpful comments, sometimes leading to revision of previously drawn conclusions. More rarely, however, do the names that grace the upper sections of the website comment, but when they do, it always leads to revision.

Last week I published a new iteration of xHR, one that was drawn from xHR/BBE. It used four variables: FBLDEV, wFB/C, SLAVG, and FB%. In my naiveté, I neglected to properly analyze the variables I included in the regression model. As Mike Podhorzer helpfully pointed out, both wFB/C and SLAVG do not quite work as variables in the proper sense. Because they are heavily results-based and are both dependent on home runs for their results, they skew the math quite a bit for calculating how many home runs a player ought to have hit. It’s helpful to think of it in terms of calculating an xSLG. As Mr. Podhorzer put it, “It’s like coming up with an xSLG that utilizes doubles, triples, and home-run rates! Obviously they are all correlated, because they are part of the equation of SLG.”  They make for a sort of statistical circular logic.

For that reason, I came up with a different model, with the same basic objectives and two of the same variables, but getting rid of the improper variables. In this one, I used:

  • AVG FBLDEV – Average fly ball/line-drive exit velocity. The idea is that the higher this value is, the harder the player is hitting the ball, and so he will hit more home runs.
  • AVG FBDST – Average fly-ball distance. It’s rather intuitive because the farther a player hits fly balls, the more likely he is to hit home runs. If anything, like FBLDEV, it’s a clear demonstration of power. Obviously it has a decent correlation with FB%, but it isn’t necessarily tangled up with home-run results.
  • K% – The classic profile of a home-run hitter is one who walks a lot, strikes out quite a bit, and hits balls that leave the yard. I suppose that a common conception is that the harder a player swings, the less control he has.
  • FB% – Fly-ball percentage obviously figures pretty heavily into a power hitter’s profile. It’s awfully difficult to hit a lot of home runs without hitting a plethora of fly balls.

Without further ado, here’s the new xHR:

Note: To be clear, the end goal is not necessarily xHR/BBE, but rather xHR. xHR/BBE is just the best path to xHR because HR/BBE is a rate stat, meaning that it will have a better year-to-year correlation than home runs because that’s a counting stat. So if a player gets injured and only plays half a season, his HR/BBE would probably be similar to his career values, but his home-run numbers would not be. With that in mind, remember that the model was made for HR/BBE, not HR, so you will necessarily have “better” results if you’re looking for xHR/BBE.

Pretty good results, to be sure, even if it’s a bit worse than the prior version. A .7989 R-squared value is nothing to scoff at, especially if you think of it as the model explaining 80% of the variance. Clearly it still underestimates the better hitters, and that’s an issue, but there are really so few data points at the top that it’s hard to take it completely seriously up there. If there was a lot more data and it still did that, then I’d be inclined to either add a handicap or to think it ought to be a quadratic regression.

As always, the formula:

xHR= (.170102188*FB% -.014640853*K% + .0000269758*AVGDST + .005672306*FBLDEV -.541845681)*BBE


Even more than the previous version, this model is easily accessible to all fans because the variables are comprehensible. Moreover, it isn’t terribly difficult to head over to Statcast or Baseball Savant to obtain the relevant information and make the calculation. Anyway, I hope you enjoy and use this information to the fullest extent.

A busy person, but one who spends his free time in front of a computer screen, fiddling with statistics. And yes, that describes everyone who regularly visits this website.

newest oldest most voted
Francis C.

I wonder if maybe Pull% could be factored into the equation.