It’s always nice when things mostly work out. More often than not, when someone devotes countless hours to some pet project, whether it’s a scrapbook of some variety or an amateur statistical endeavor, it doesn’t work out terribly well. From there, one often ends up spending nearly as many hours fixing the project as they did on putting it together in the first place. The experience is incredibly frustrating, and it’s something we’ve all gone through at one time or another.
Luckily, my “quest” went much better than that of Juan Ponce de León. While I didn’t find the fountain of youth, I did find a formula that works moderately well, even though I can only back it up with one year of data at this point. The only thing Señor Ponce de León has to brag about is being arguably the second most important explorer in colonial history. Somehow those things don’t compare particularly well.
Nonetheless, things do look quite good for xHR% v2. I culled data from a variety of sources, but mainly from FanGraphs and ESPN’s selectively responsive HitTracker. I used FanGraphs for FB%, HR, AB, and strikeout numbers (in order to find BIP, I subtracted strikeouts from at-bats). On the other hand, HitTracker was used just for home run distance numbers and launch angle data. I studied all players with at least 1200 plate appearances between 2012 and 2014 in order to ensure some level of stability for the first sample taken.
And so, without further ado, take a few seconds to look at some relatively interesting graphs (I forgot to title the first one, but it’s xHR vs HR).
Here, it’s fairly easy to discern that there’s a strong relationship between expected home runs and home runs. It doesn’t take John Nash to figure that out. What is fairly interesting, however, is that the average residual is quite high (close to 2.5), indicating that the average player in the sample hit approximately +/-2.5 home runs than he should have. That difference comes from a number of factors which the formula attempts to account for. They include home ballpark, prior performance vs. current performance, and weather. One of the issues, and this was bound to be a problem because of the sample size, is that there aren’t enough data points for players who hit 40+ home runs, so it’s hard to say how accurate the formula actually is as a player approaches that skill level.
This is a slightly zoomed-in version of expected home run percentage vs home run percentage. Clearly, there’s a much stronger relationship between HR% and xHR%, due in large part to the size of the digits and because the formula was written to come out with a percentage, not a solid number. But I won’t waste too much time on xHR% because, quite frankly, it’s far less interesting and understandable than actual home run numbers.
For the interested and worldly reader, here are the equations for each:
If either of these equations gets used at all, I expect it will be xHR because home run numbers are far more accessible than home run percentage numbers. Frankly, I regret writing the formula for xHR% for that very reason. This is supposed to be a layman’s formula, so its end result should be something understandable to the average baseball fan. It should be self-evident and easy to comprehend.
Thank you for following along as the formula developed over time. Obviously, it isn’t done yet and it requires some changes, but it’s close enough to where it needs to be. It’s very similar to getting to the door of the room where the Holy Grail is, shrugging, and turning around with the intention of coming back in a few weeks (although in this case it must be noted that the Holy Grail isn’t the real one, but a plastic one covered in lead paint). Expect a return under a different name and a better data set.
You’ll notice that I didn’t include very much statistical analysis at all. I figured that was rather boring to write about, but you can feel free to contact me for the information if you would like a nice nap.
A busy person, but one who spends his free time in front of a computer screen, fiddling with statistics. And yes, that describes everyone who regularly visits this website.