Fantasy Metrics and xHR

August 5, 2016

RotoGraphs, in addition to several Community writers, have been posting about an “x” category of metrics for quite some time. They include things like Andrew Dominijanni’s xISO, Andrew Perpetua’s xBABIP, and more. The clear purpose of developing those statistical indicators was to measure and predict fantasy-baseball success, something we all aspire to in our hopefully low-priced leagues (although you probably found that using x-stats is a lot like overstudying for a test because the amount of effort you put into preparing yields diminishing returns, and you “over-Xed” the players).

One of the most prominent of the x-stats trotted out at the beginning of every season is xHR/FB, developed by Mike Podhorzer, and always accompanied by an amusing “leaders and laggards” piece. His version of xHR/FB is quite good, with a .649 R-squared value. In his regression analysis, Mr. Podhorzer utilizes somewhat exclusive metrics (hopefully public at some point), such as average absolute angle. Overall, it’s a pretty good predictor, and it becomes doubly understandable to the layman when it gets multiplied by fly balls to produce an expected home-run value.

The only real issue I have with HR/FB (and its prediction) is that it is HR/FB. While it is more stable for hitters than for pitchers, it still isn’t quite as stable as a stat I’d like to use for fantasy baseball. For my 1000 player-season sample from 2009-2015, HR/FB had a year-to-year R-squared value of .49. It isn’t terribly difficult to figure out why. There are numerous reasons, including weather changes, team changes, opponent changes, player development, and more. Moreover, it doesn’t take a very good picture of a hitter’s overall profile because it only looks at how many home runs a player hits per fly ball. A player might have a high HR/FB, but he may not hit enough fly balls for the metric to accurately describe his power (i.e. whether he actually hit a lot of home runs). On the other hand, it’s important to note that a high HR/FB generally goes with a higher FB%.

Perhaps a better metric for evaluating a player in the greater context of his hitting profile is HR/BBE. Home runs per batted-ball event is just HR/(AB+SF+SH-SO). It has a slightly higher year-to-year R-squared of .56 (from my sample), in large part because it takes into account more variables than does HR/FB. Under the umbrella of BBE fall not only fly balls, but line drives (and there can be line-drive home runs), and ground balls. In case you’re wondering why I included sacrifice hits, it’s because they tell a little bit about what kind of hitter a player is. Most modern managers are far more likely to ask a Ben Revere to lay down a sacrifice bunt than they are a Kris Bryant.

And so I thought it might be useful to run a linear regression analysis to develop an xHR/BBE (and from there, xHR). I’m a statistical autodidact, so I tried to keep things simple. Additionally, I thought it would be best if I utilized accessible variables like FB% so that a moderately literate sabermetrician could use it. After testing myriad variables, I came up with four that I’d use — average FBLDEV (Statcast), wFB/C, SLAVG, and FB%.

AVG FBLDEV – Average fly ball/line-drive exit velocity. The idea is that the higher this value is, the harder the player is hitting the ball, and so he will hit more home runs.
wFB/C – A rather obscure metric buried in the FanGraphs glossary, wFB/C is weighted fastball run values per 100 pitches. I use it because most home runs come off some form of a fastball, and home-run-hitter types are typically good fastball hitters.
SLAVG – “Slap” average, a metric of my own invention (although someone else has probably thought of it – I just haven’t seen it before), is singles divided by at-bats. It’s a bit like ISO in that it tells you about a player’s power distribution (or lack thereof). I figure that this is inversely correlated with power because the more singles a player hits, the fewer home runs he’s likely to hit.
FB% – Fly ball percentage obviously figures pretty heavily into a power hitter’s profile. It’s awfully difficult to hit a lot of home runs without hitting a plethora of fly balls.

It seems like a decent list of predictors in that they are understandable and accessible to the average fan, in addition to having a good relation to home-run hitters. I used all players that had at least 100 batted-ball events in 2015 and 2016 (Statcast only has data going back to 2015), which turns out to be close to 500 player-seasons. So let’s throw them into the Microsoft Excel Regression grinder and see what it spits out:

Note: To be clear, the end goal is not necessarily xHR/BBE, but rather xHR. xHR/BBE is just the best path to xHR because HR/BBE is a rate stat, meaning that it will have a better year-to-year correlation than home runs because that’s a counting stat. So if a player gets injured and only plays half a season, his HR/BBE would probably be similar to his career values, but his home-run numbers would not be.

The primary thing to recognize here is the R-squared value: a pretty good .78272. To the uninitiated, this simply means that the model explains 78% of the HR variance. If you’re interested (and you really ought to be), here are the coefficients for the variables and the overall formula:

xHR= (.114557524*FB% – .183885205*SLAVG + .006658976*wFB/C + .004075449*FBLDEV -.343193723) * BBE

With this information, it isn’t terribly difficult to look up a few pieces of data on FanGraphs and Statcast to see how many home runs a player “should” have hit. In case you’re wondering about its predictive value relative to that of HR/BBE, xHR/BBE has an R-square value that’s six points higher (.61). Nevertheless, it’s important to note that, based on the graph, the model struggles to predict home-run numbers for the players on the extremes – the Jose Bautistas of the world. Because the linear regression tends to underestimate rather than overestimate at the top, it’s likely that a quadratic regression would fit better. It’s something to look into, but this’ll do for now. Moreover, while there are some really crazy outliers, like Jose Bautista being predicted to hit 12 fewer home runs (Steamer does have him on pace for only 26 this year!), the model does work reasonably well for more average players.

Keep in mind that numerous improvements will be made. If anyone wants access to data or has a question, then just let me know. If not, then enjoy the tool and use it for fantasy, even though it’s getting a bit late for that. Maybe next year.

7 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

eph_unitMember since 2016

8 years ago

I’d love to have a look at the data!

Jackson Mejia

Reply to eph_unit

Happy to. Just give me your email and I’ll send you the workbook. I should note that I reported the R-squared for xHR/BBE rather than xHR. The R-squared for xHR is actually .84

Reply to Jackson Mejia

eph_unit@yahoo.com Thanks!

Mike PodhorzerMember

Appreciate the effort here as I’m always looking to improve upon the current xHR/FB rate! But I don’t think the wFB/C or SLAVG metrics should be included because they are results-based, whereas FBLDEV is the actual process, or underlying skill.

It’s like coming up with an xSLG that utilizes doubles, triples, and home run rates! Obviously they are all correlated, because they are part of the equation of SLG. Technically, wFB/C isn’t part of HR/FB rate or raw home run totals, of course, but the higher the number, the higher the slugging, likely driven by home runs. So of course it’s going to have a high correlation.

Reply to Mike Podhorzer

Yeah, I was thinking about that after I submitted it. I thought of SLAVG as a way to get around ISO, but it pretty much IS ISO. It’s a sort of circular statistical logic because if you have a high ISO (and a correspondingly low SLAVG), then that means by definition that there will be a high number of home runs because that’s how the ISO gets powered. I have a different model where I use GB% instead of SLAVG (I think its R-squared is like six points lower) and I didn’t post that one due to correlation temptation. You’re definitely right on that point.

I’m not totally convinced about wFB/C on the other hand. Being a good fastball hitter goes hand in hand with a high FBLDEV. I think it’s more of an underlying skill because some hitters are objectively good fastball hitters, while others are not. I haven’t done the math, but if I had to guess, I’d say that wFB/C is relatively stable. Maybe it would be better not to use “raw” wFB/C, but to use a wFB+. Hitting fastballs well doesn’t make one a good home run hitter, it only serves as an indication of home run potentiality in the same way FB% or FBLDEV does.

But remember that wFB/C is results-based and directly fueled by what kind of hit the batter records. It will rise the most from a home run, so a higher wFB/C will most likely be due to more home runs. It’s basically wOBA by pitch type.

Thanks for the input. I hadn’t thought of it quite like that. I’ll put out a revised edition next week.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG