xHR%: Questing for a Formula (Part 5)

May 15, 2016

This is the long-delayed fifth part in the xHR series. If you really want to read the first four parts, they can be located here, here, here, and here.

More than a month late, the highly anticipated follow-up to the first iteration of xHR has arrived. Once more, that increasingly trivial metric will grace the page of FanGraphs, wallowing in the mostly prestigious Community Research section (on the other hand, this section is most definitely the best section on the World Wide Web for experimental metrics and amateur analyses).

Unless the reader has an impeccable memory for breezily scanned, frivolous articles, he or she likely needs a reminder as to what xHR% is and aims to be. xHR% is a metric that describes at what rate a player should have hit runs over a given season. From this, expected home runs, a more understandable counting statistic, can be found by multiplying plate appearances by xHR%. It cannot be emphasized enough that the metric is not predictive; it only aims to describe. Without further ado, the formula is here:

I know that’s a lot to look at, and it isn’t exactly self-evident what all of the variables mean. As such, an explication of each part is necessary and provided below. (For logical rather than chronological purposes, the Kn variable will be analyzed last.)

AeHRD – One of the biggest differences between this formula and the last one is that this one does not use home run distance. This iteration uses expected distance, rendering it a combination of simple math, sabermetric theory, and physics. As such, expected home run distance strips out one of the biggest factors in luck — the weather.

Expected home run distance is found by utilizing a method taken from Newtonian Mechanics to calculate how far objects go. By using ESPN’s HitTracker website, I was able to obtain launch angles and velocities for nearly every home run hit in 2015. From this, I was able to resolve velocity into its respective parts, velocity in the x-direction (Vx) and velocity in the y-direction (Vy). After that, I calculated the amount of time the ball would be in the air with the formula vf=vi+gt, where vf is final velocity (0 m/s), vi is initial velocity (Vy), and g is simply the gravitational acceleration constant. Finally, I multiplied Vx by time in order to get the total expected distance.

I repeated that process for every home run hit by a given player in order to find his average expected home run distance. By doing this, I was able to strip out all weather-related components.

AeHRDH – Utilizing the same process as above, I found the average expected home run distance for every stadium. This is the player’s home stadium’s average home run distance, regardless of team.

AeHRDL – The same as above, but done for every home run hit in the majors last season.

When put together in the numerator and the denominator, the above variables serve as a “distance constant” of sorts that will at most adjust the resulting expected home runs by plus or minus two. Occasionally, the impact is negligible because the average expected distance is very close to that of the player’s home stadium and the league. Averaging the mean expected home run distance of the league and of the home stadium allows the metric to paint a more accurate picture of where the player hit his home runs and whether or not they should have left the park. Nevertheless, it’s important to note that this formula still fails to account for fly balls that fell just short of the wall due to the wind and other factors, meaning that there are still expected home runs unaccounted for.

FB% – If you remember correctly, or took the time to briefly review the previous posts, then you will recall that in the prior iteration of the formula there was a section very similar to this one. The only differences are that the weights on each year of data have changed (those are still somewhat arbitrary, however, but I am working on getting them to more precisely reflect holdover talent from past years) and the primary statistic used.

Previously, HR/PA was used, but it had to be abandoned because the results were too closely correlated with reality. This time, I looked at how similarly descriptive formulas were quantified. Oftentimes, those metrics did not use the target expected metric in their formulas. Rather, they utilized other metrics that correlated moderately well or strongly with their expected metric. In this case, I decided to use FB% because it’s a relatively stable metric (especially in comparison with HR/FB), and it has a strong correlation with HR% (about .6).

As a clarification, the subscript Y3, Y2, and Y1 indicate the years away from the season being examined, where Y1 is really Y0 because it’s zero years away. So just to be clear, Y1 is the in-season data from the year being examined. In the data to be examined, for example, Y1 is 2015, Y2 is 2014, and Y3 is 2013.

Kn – As you can well imagine, FB% numbers are always far greater than HR% numbers*, resulting in some truly ridiculous results if a constant isn’t applied that relates HR% to FB%. For instance, without a constant to modify the results, Jose Bautista would have been expected to hit 304 home runs last season. That’s a lot of home runs. Just two and a half seasons of playing at that level and he’d have the home run record in the bag. Luckily, I’m not stupid enough to think that that’s actually possible, and so I initially related FB% and xHR% with a constant, called KCon.

Unfortunately, KCon didn’t work as well as I’d hoped because it skewed expected home run results way up for terrible home run hitters and way down for the best home run hitters. By skewed, I mean bad by more than six home runs. And so I, in my infinite (and infantile) amateur mathematical wisdom, made it into a seven part piecewise** function. By this, I mean that there’s a different constant for each piece of the formula, defined by HR% at somewhat arbitrary, though round points. For clarity, here they are:

K1 = HR%<1

K2 = 1≤HR%<2

K3 = 2≤HR%<3

K4 = 3≤HR%<4

K5 = 4≤HR%<5

K6 = 5≤HR%<6

K7 = 6<HR%

It works quite well. I am very excited about the current iteration of xHR%, its implications, and all it has to offer. Of course, it is not finished, but I think I’m getting closer. Please comment if you have any questions, an error to point out, or anything of that nature. There will be a results piece published soon on the 2015 season, so keep an eye out.

*It wouldn’t be surprising if Ben Revere became the first player to have a HR% equal to FB% (both at 0%, naturally).

**It is neither continuous nor differentiable.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG