# Using Statcast Data to Estimate Minor League Home Run Distance

For a couple of years now, baseball fans have enjoyed publicly available Statcast data for the MLB level. This data allows us to examine the exit velocity, launch angle, estimated distance, and countless other aspects of every batted ball. This data has also resulted in “expected” stats, very useful additions to the toolbox of any baseball fanalyst. While this data is collected at the minor league level as well, it is not made publicly available, leaving us with a more limited toolbox when evaluating prospects via statistics.

Fortunately, one piece of Statcast-adjacent MiLB data *is* publicly-available. MLB’s Prospect site includes a search engine for MiLB statistics. For each batted ball, the site reports two “hit coordinates”: hc_x and hc_y. These coordinates appear to tell us the point on the field where the batted ball hit the ground or was caught. Using these hit coordinates, we can estimate (with some accuracy) the distance of home runs hit at the MiLB level.

Here are the hit coordinates of every batted ball by the Toronto Blue Jays in 2018 at the MLB level. The picture of a baseball diamond becomes even clearer if we multiply each hc_y by -1, flipping the image about the horizontal axis.

The first step in estimating a home run’s distance is to establish the coordinates of home plate. I was unable to find any official coordinates, so I worked on establishing them myself. The most accurate method I came up with was finding a batted ball that landed on (or very, very close to) home plate. Kevin Pillar the lucky winner. On September 4th, at home against the Rays, Pillar hit a ball that landed just beside home plate. Its hit coordinates were (126.57, 204.1). I found a few other similar examples, and in each case, the hc_x was just above or below 126 and the hc_y was just above or below 204, so I opted to use the coordinates (126, 204) for home plate.

The second step in estimating a home run’s distance is to calculate the distance (in as-yet-meaningless coordinate units) between home plate and the hit coordinates of a given home run. This requires using the Pythagorean Theorem (which is a lot more fun to use, unforced, as an adult). The equation for estimated home run distance (in terms of coordinate units) is:

*Calculated HR Distance (coordinate units) **=*

*((hc_x − 126) ^{2} + (204 − hc_y)^{2})^{0.5}*

The image here, capturing Justin Smoak’s grand slam off of the Yankees’ David Robertson, helps illustrate the equation above. There were a total of 5,319 over-the-fence home runs hit in 2018 for which Statcast had hit coordinates and hit distance (out of 5,571 over-the-fence homers altogether). For each of these home runs, I inputted the hit coordinates into the equation above. The calculated distances of these home runs ranged from 140 to 210. Smoak’s grand slam had hit coordinates of (89.47, 27.86). Inputting these into the equation above gives us a calculated distance (in terms of coordinate units) of 179.9.

The third and final step in estimating a home run’s distance is to convert the distances calculated above into feet. To do so, I plotted the calculated distances for each home run against the distances given by Statcast. The resulting graph suggests that there is a very strong relationship between the two, with the calculated distances explaining about 86% of the variation in Statcast’s distances.

The trendline equation can be used to convert the calculated distances into feet:

*Calculated HR Distance (feet) = *

*2.29 x Calculated HR Distance (coordinate units)
*

The resulting distances seem quite similar to the Statcast distances. The calculated distances have a mean of 396 feet, a median of 397 feet, a minimum of 322 feet, and a maximum of 481 feet. The Statcast distances have a mean of 397 feet, a median of 398 feet, a minimum of 324 feet, and a maximum of 481 feet.

In general, there is no more than a small difference between a given home run’s calculated and Statcast distances. For 79.8% of home runs, the difference is less than one foot. The difference is less than five feet for 83.5% of the home runs, less than ten feet for 87.8% of them and less than 25 feet for 95.9% of them. While these estimates are imperfect, they seem fairly reliable in the vast majority of cases.

Smoak’s grand slam is a particularly good case to highlight. Inputting 179.9 coordinate units into the equation above gives us an estimated HR distance of 412 feet, exactly the same as the Statcast estimate.

Combining the two equations above give us a handy tool that can be used to convert hit coordinates of home runs in the MiLB dataset into estimated distances (in feet):

*Calculated HR Distance (feet) = *

*2.29 x ((hc_x − 126) ^{2} + (204 − hc_y)^{2})^{0.5}*

It’s necessary to highlight some important caveats. The first is a simple reminder of what I’ve said throughout this post: the estimated home run distances come with a margin of error. Be conscious of that when using the equation above. The main purpose I have for these estimates is to examine the number of 400 foot-plus home runs hit by various minor leaguers — without more data on batted balls at the MiLB level, this seems like a good proxy for barrels. Plus, with a high threshold, I can be confident that a home run estimated to travel 400 feet did indeed go quite far — 95% of the MLB home runs with a calculated distance of 400-plus feet had a Statcast distance of 400-plus feet, while 99.5% had a Statcast distance of 375-plus feet.

The second caveat is that, given the lack of Statcast data on home run distances at the MiLB level, I am unable to test for the accuracy of the MiLB estimates (at least in the exact way that I tested the MLB estimates). That said, the calculated distances at the MiLB and MLB levels seem to jibe well. For example, in terms of both mean and median, the MiLB estimates were about ten feet shorter than the MLB estimates, which makes sense given the shorter fences in some MiLB parks.

The MiLB estimates also seem reliable because they describe the world as we know it. Vladimir Guerrero Jr. ranked among the MiLB leaders in 2018 with 11 homers estimated to travel at least 400 feet, representing 2.8% of his plate appearances across the MiLB levels he played at. Ditto for other big power prospects like Kyle Tucker (3.2%), Dylan Cozens (3.2%), Bobby Dalbec (2.9%), and Eloy Jimenez (2.0%).

Former MLB slugger Chris Carter is also a good example of the reliability of this metric. He spent all of 2018 at the Triple-A level, between the Angels’ and Twins’ systems. Over 312 PA, Carter hit 13 homers that have an estimated distance of 400-plus feet. His 4.2% long homer rate was good for third across the minors. Back in 2016, his last full season in the majors, he hit 28 homers that traveled at least 400 feet, 4.3% of his plate appearances that year.

Ideally, the estimated distances of MiLB home runs could be checked against the actual distances found using Statcast. Obviously, however, if we had those Statcast distances, this entire post would be unnecessary. When an educated guess is one’s only option, a leap of faith of some degree is a necessary cost.

The third caveat is that this equation is based on 2018 data. When this exercise is replicated using MLB data from 2015, 2016, and 2017 (separately), the calculated HR distances aren’t as tightly correlated to the Statcast HR distances. The correlation is particularly weak in 2015 (R2 of 0.30) and 2016 (R2 of 0.41). In 2017, the calculated HR distances explain about 68% of the variation in Statcast’s HR distances (75% if five particularly weird cases, out of 5,855, are excluded). In each of these three past seasons, the equation is a bit different from the one given in this post. As such, for best results, it seems wise to limit use of the equation above to 2018 MiLB data.

Let’s end by doing exactly that, highlighting minor leaguers who excelled at mashing (what seem likely to be) particularly long dingers in 2018. For context, hitting a 400-plus foot homer in 0.5% of one’s plate appearances usually puts a player around the 50th percentile for their level.

At Triple-A, Jabari Blash led the way with 22 homers estimated to have traveled at least 400 feet (6.4% of his total PA). Blash also led among the MiLB altogether. Among prospects of note at the level, with at least 200 PA, Tyler O’Neill (4.4%), Franmil Reyes (3.6%), and Tucker (3.2%) each acquitted themselves very well.

At Double-A, Vlad Jr. hit nine 400-plus foot homers (3.4%), narrowly edged out by Peter O’Brien (3.5%) for tops at the level (min. 200 PA). Fellow uber-prospect Jimenez wasn’t far behind, producing a long dinger in 2.6% of his PA. Also among the leaders were prospects Austin Hays (2.1%), Peter Alonso (1.8%), Brendan Rodgers (1.7%), and Monte Harrison (1.7%). While he wasn’t among the league leaders, Cavan Biggio (1.1%) hit 400-plus foot homers at a well-above-average rate.

At High-A, Roberto Ramos and Ibandel Isabel share top honours. Ramos produced the highest rate of big homers (5.5%), while Isabel produced the highest number (22). Jo Adell stood out by hitting a long bomb in 2.3% of his PA as a 19 year old. Only one other teenager, Cristian Pache (1.0%), cracked 1% at the level. Kevin Smith, a big riser on top prospect lists in 2018, also stood out, producing a 400-plus foot homer in 1.6% of his PA.

At Low-A, Seuly Matias (2.7%) was a standout masher, hitting nine 400-plus foot homers at 19 years old. The level’s leader was Casey Golden, who cracked 400 feet on 13 occasions (2.5%). The Blue Jays system stands out at this level, with trade deadline acquisitions Chad Spanberger (1.7%) and Demi Orimoloye (1.6%) joining Ryan Noda (1.1%) and Brock Lundquist (1.0%) in the top 15% of batters.

At the Short Season level, Sean Reynolds was the runaway leader, with 13 bombs of 400-plus feet (4.1%). Behind him is recent Blue Jays draftee Griffin Conine, who hit six (2.6%). Joey Bart, a 2018 first-round pick, also flashed his power with four long homers (2.0%).

At the Advanced Rookie level, 18-year-old Jeremiah Jackson led by hitting long homers in 5% of his PA. Ronny Brito was another young standout, with eight 400-plus foot homers this season (3.3%). First-rounder Nolan Gorman hit a bomb in 2.4% of his PA, while former Brave Kevin Maitan did so in 2.1% of his PA. Wander Franco, 17-year-old wunderkind, also impressed with five 400-plus foot homers (1.8%).

Ultimately, this sort of tool can be applied in a number of ways. One potential use is to find prospects with more power potential than their top-line stats suggest. The recently-traded Jeter Downs seems like a good example. At 19, he was a bit young for his level (Low-A) but produced well overall (118 wRC+). He walked (9.9%) and struck out (19.7%) at slightly better-than-average rates and ran an average BABIP (.306). His ISO (.145) was solid too, ranking in the 66th percentile among batters with 200-plus PA at the level in 2018. However, he hit an impressive seven home runs with an estimated distance of 400-plus feet, accounting for 1.3% of his plate appearances (94th percentile), suggesting that his power ceiling might be much higher than just above-average.

Hopefully, in the very near future, some Statcast data for the MiLB level will be made available to the public. Until then, this approach seems like a useful workaround.

*This post was originally published on Jays from the Couch.*

I love this community page. So many interesting topics people come up with!

Really cool stuff, Jeff!

Thanks!