RotoGraphs, in addition to several Community writers, have been posting about an “x” category of metrics for quite some time. They include things like Andrew Dominijanni’s xISO, Andrew Perpetua’s xBABIP, and more. The clear purpose of developing those statistical indicators was to measure and predict fantasy-baseball success, something we all aspire to in our hopefully low-priced leagues (although you probably found that using x-stats is a lot like overstudying for a test because the amount of effort you put into preparing yields diminishing returns, and you “over-Xed” the players).
One of the most prominent of the x-stats trotted out at the beginning of every season is xHR/FB, developed by Mike Podhorzer, and always accompanied by an amusing “leaders and laggards” piece. His version of xHR/FB is quite good, with a .649 R-squared value. In his regression analysis, Mr. Podhorzer utilizes somewhat exclusive metrics (hopefully public at some point), such as average absolute angle. Overall, it’s a pretty good predictor, and it becomes doubly understandable to the layman when it gets multiplied by fly balls to produce an expected home-run value.
The only real issue I have with HR/FB (and its prediction) is that it is HR/FB. While it is more stable for hitters than for pitchers, it still isn’t quite as stable as a stat I’d like to use for fantasy baseball. For my 1000 player-season sample from 2009-2015, HR/FB had a year-to-year R-squared value of .49. It isn’t terribly difficult to figure out why. There are numerous reasons, including weather changes, team changes, opponent changes, player development, and more. Moreover, it doesn’t take a very good picture of a hitter’s overall profile because it only looks at how many home runs a player hits per fly ball. A player might have a high HR/FB, but he may not hit enough fly balls for the metric to accurately describe his power (i.e. whether he actually hit a lot of home runs). On the other hand, it’s important to note that a high HR/FB generally goes with a higher FB%.
Perhaps a better metric for evaluating a player in the greater context of his hitting profile is HR/BBE. Home runs per batted-ball event is just HR/(AB+SF+SH-SO). It has a slightly higher year-to-year R-squared of .56 (from my sample), in large part because it takes into account more variables than does HR/FB. Under the umbrella of BBE fall not only fly balls, but line drives (and there can be line-drive home runs), and ground balls. In case you’re wondering why I included sacrifice hits, it’s because they tell a little bit about what kind of hitter a player is. Most modern managers are far more likely to ask a Ben Revere to lay down a sacrifice bunt than they are a Kris Bryant.
And so I thought it might be useful to run a linear regression analysis to develop an xHR/BBE (and from there, xHR). I’m a statistical autodidact, so I tried to keep things simple. Additionally, I thought it would be best if I utilized accessible variables like FB% so that a moderately literate sabermetrician could use it. After testing myriad variables, I came up with four that I’d use — average FBLDEV (Statcast), wFB/C, SLAVG, and FB%.
- AVG FBLDEV – Average fly ball/line-drive exit velocity. The idea is that the higher this value is, the harder the player is hitting the ball, and so he will hit more home runs.
- wFB/C – A rather obscure metric buried in the FanGraphs glossary, wFB/C is weighted fastball run values per 100 pitches. I use it because most home runs come off some form of a fastball, and home-run-hitter types are typically good fastball hitters.
- SLAVG – “Slap” average, a metric of my own invention (although someone else has probably thought of it – I just haven’t seen it before), is singles divided by at-bats. It’s a bit like ISO in that it tells you about a player’s power distribution (or lack thereof). I figure that this is inversely correlated with power because the more singles a player hits, the fewer home runs he’s likely to hit.
- FB% – Fly ball percentage obviously figures pretty heavily into a power hitter’s profile. It’s awfully difficult to hit a lot of home runs without hitting a plethora of fly balls.
It seems like a decent list of predictors in that they are understandable and accessible to the average fan, in addition to having a good relation to home-run hitters. I used all players that had at least 100 batted-ball events in 2015 and 2016 (Statcast only has data going back to 2015), which turns out to be close to 500 player-seasons. So let’s throw them into the Microsoft Excel Regression grinder and see what it spits out:
Note: To be clear, the end goal is not necessarily xHR/BBE, but rather xHR. xHR/BBE is just the best path to xHR because HR/BBE is a rate stat, meaning that it will have a better year-to-year correlation than home runs because that’s a counting stat. So if a player gets injured and only plays half a season, his HR/BBE would probably be similar to his career values, but his home-run numbers would not be.
The primary thing to recognize here is the R-squared value: a pretty good .78272. To the uninitiated, this simply means that the model explains 78% of the HR variance. If you’re interested (and you really ought to be), here are the coefficients for the variables and the overall formula:
xHR= (.114557524*FB% – .183885205*SLAVG + .006658976*wFB/C + .004075449*FBLDEV -.343193723) * BBE
With this information, it isn’t terribly difficult to look up a few pieces of data on FanGraphs and Statcast to see how many home runs a player “should” have hit. In case you’re wondering about its predictive value relative to that of HR/BBE, xHR/BBE has an R-square value that’s six points higher (.61). Nevertheless, it’s important to note that, based on the graph, the model struggles to predict home-run numbers for the players on the extremes – the Jose Bautistas of the world. Because the linear regression tends to underestimate rather than overestimate at the top, it’s likely that a quadratic regression would fit better. It’s something to look into, but this’ll do for now. Moreover, while there are some really crazy outliers, like Jose Bautista being predicted to hit 12 fewer home runs (Steamer does have him on pace for only 26 this year!), the model does work reasonably well for more average players.
Keep in mind that numerous improvements will be made. If anyone wants access to data or has a question, then just let me know. If not, then enjoy the tool and use it for fantasy, even though it’s getting a bit late for that. Maybe next year.
A busy person, but one who spends his free time in front of a computer screen, fiddling with statistics. And yes, that describes everyone who regularly visits this website.