# Did the Baseballs Carry More in 2019?

As much as baseball fans would like a simple explanation for the astronomical increase in home runs in 2019, it is becoming clearer that many factors have played into the surge. Among the possible reasons are batters prioritizing hitting homers more than ever before, pitchers having difficulty gripping the seams of the baseball, and of course the famous “juiced balls.” Last month, a committee released initial results of a comprehensive study attempting to determine the driving forces behind the home run rate growth.

I am particularly interested in the idea that fly balls were supposedly carrying more in 2019. On multiple occasions throughout the year, I listened to announcers observe that outfielders seemed to be severely misjudging fly balls. For instance, the center fielder would be drifting back toward the wall, as if he had a bead on it, and the ball would end up 15 rows deep. Although this may seem like evidence for increased carry of the baseball, such observations can easily be driven by confirmation bias. There was a tendency this year to believe that every ball in the air would be a homer, so when a ball would carry a lot, it fit with expectations and the belief continued to grow. It may just have simply been the case that the wind was blowing out that day, or that the batter struck the ball in a particular way, and the carry had nothing to do with the ball itself. To determine if the perception was in fact reality, I focus on the following question: Did similarly struck balls travel farther in 2019 than previous years?

The data I used consists of fly balls and line drives from 2015-19. To remove popups and low line drives, I further subset the batted balls to include only those with an exit velocity of at least 70 mph and a launch angle between 10 and 45 degrees. To start, consider the following table that presents average metrics for the past five regular seasons:

Year | Observations | Exit Velocity | Launch Angle | Distance |
---|---|---|---|---|

2015 | 47,795 | 92.8 | 24.8 | 307 |

2016 | 48,285 | 93.2 | 24.9 | 306 |

2017 | 48,243 | 93.2 | 25.0 | 307 |

2018 | 49,073 | 93.3 | 25.2 | 305 |

2019 | 49,859 | 93.7 | 25.2 | 308 |

The number of batted balls that fit the parameters has generally increased from year to year, reaching its highest point in 2019. There was an increase in the average distance from 2018 to 2019, but there was also an increase in average exit velocity. In general, distance seems to increase with exit velocity and decrease with launch angle. There does not seem to be a clear unexplained jump in distance in 2019, as the relationship between contact and distance is roughly consistent across years. Next, let’s examine fly balls (as classified by Statcast) only:

Year | Observations | Exit Velocity | Launch Angle | Distance |
---|---|---|---|---|

2015 | 20,368 | 91.6 | 33.5 | 333 |

2016 | 21,616 | 92.3 | 33.3 | 336 |

2017 | 22,295 | 92.8 | 33.3 | 341 |

2018 | 23,154 | 93.1 | 33.3 | 338 |

2019 | 23,675 | 93.6 | 33.4 | 343 |

The number of fly balls has increased every year, falling in line with the idea that hitters are trying to lift the ball more. Within the fly ball category, the average launch angle has pretty much been the same each year, while the average exit velocity has steadily increased. Interestingly, the inconsistency in distance actually appears to be in 2018 where an increase in exit velocity was met with a decrease in distance. Now consider home runs:

Year | Observations | Exit Velocity | Launch Angle | Distance |
---|---|---|---|---|

2015 | 4,899 | 103 | 27.8 | 400 |

2016 | 5,602 | 103 | 28.1 | 399 |

2017 | 6,096 | 103 | 28.0 | 400 |

2018 | 5,568 | 104 | 28.2 | 398 |

2019 | 6,765 | 104 | 28.2 | 400 |

Again, 2019 does not clearly stand out from other seasons.

To get a more precise idea of the relationship between 2019 batted balls and those from previous years, I built a random forest model on the data from 2015-18. The model predicts distance using exit velocity and launch angle, as well as ballpark and month of the year to account for atmosphere and seasonal effects, respectively. The model is trained on 80% of the data and tested on the remaining 20%. The test set R-squared value is 0.854, which is quite high but not all that surprising since we might expect the combination of exit velocity and launch angle to be a very strong predictor. The inclusion of ballparks is to account for significantly different conditions (such as the thin air at Coors Field) and the month captures when there tends to be cooler air that would result in less carry or vice versa. The model reveals that exit velocity is the most important predictor, with launch angle following closely behind. The ballpark is less important but still instructive and the month provides the least amount of information.

Next, we take this model, which performed well on the test set, and apply it to the 2019 data. If the model consistently underestimates the 2019 distance, it would suggest that there was an unexplained increase in distance in 2019. The model predicts the 2019 distances even more successfully, resulting in an R-squared of 0.860. For each observation, let the “differential” equal the actual distance minus the predicted distance, so that a positive differential would mean the ball traveled farther than the model predicted. Consider the distribution of the differentials:

The distribution is centered very close to zero and is roughly symmetric. The summary statistics ultimately reveal that the model does not consistently over- or underestimate the 2019 distance. In other words, based on a model that fits the data well, the 2019 regular season ball flight was not discernibly different from previous years. This is an interesting result, as it was basically accepted that the balls were flying farther in 2019.

This perception also played a role in the postseason, where all of the sudden it seemed that many of the fly balls that had been carrying were landing in gloves instead of the seats. Was this actually happening or was it another misconception?

Following a similar process as before, let’s compare the average metrics in the 2019 postseason with those in the regular season:

Season | Observations | Exit Velocity | Launch Angle | Distance |
---|---|---|---|---|

Regular | 49,859 | 93.7 | 25.2 | 308 |

Post | 752 | 94.6 | 25.2 | 306 |

Season | Observations | Exit Velocity | Launch Angle | Distance |
---|---|---|---|---|

Regular | 23,675 | 93.6 | 33.4 | 343 |

Post | 358 | 94.3 | 33.5 | 342 |

Season | Observations | Exit Velocity | Launch Angle | Distance |
---|---|---|---|---|

Regular | 6,765 | 104 | 28.2 | 400 |

Post | 95 | 105 | 27.8 | 397 |

Of course, the postseason has significantly fewer batted balls to draw from, but we do find an intriguing result from these initial calculations. For all three subsets of batted balls (which are determined in the same way as above), the average exit velocity in the postseason was much higher. The average launch angle did not change much, except for the home runs where it decreased from 28.2 to 27.8, which would also tend to increase distance. Yet the average distance decreased in every case.

Next, I trained a new random forest model on 80% of the 2019 regular season data, and the R-squared of the model applied to the remaining 20% is 0.855. The importance of each predictor was similar to the first model. Using the model to predict the 2019 postseason data reveals a similar relationship as the tables above. The R-squared is 0.817 and the distribution of differentials (defined above) is as follows:

The model clearly tends to overestimate the distance, which suggests that the balls were carrying less in the postseason. In this case, it appears that viewers’ eyes were not deceiving them, and that similarly struck balls were not seeing the same results. There are a number of possibilities for why this was the case, including the fun option of MLB “de-juicing” the balls, but I will investigate one of these options. It is possible that the October air tends to result in less carry in general, and that this difference happens in every postseason. To examine this possibility, let’s compare the year and regular/postseason in one table:

Year | Season | Exit Velocity | Launch Angle | Distance |
---|---|---|---|---|

2015 | Regular | 92.8 | 24.8 | 307 |

2015 | Post | 93.9 | 25.1 | 311 |

2016 | Regular | 93.2 | 24.9 | 306 |

2016 | Post | 93.7 | 24.9 | 305 |

2017 | Regular | 93.2 | 25.0 | 307 |

2017 | Post | 93.5 | 24.9 | 308 |

2018 | Regular | 93.3 | 25.2 | 305 |

2018 | Post | 93.9 | 25.8 | 303 |

2019 | Regular | 93.7 | 25.2 | 308 |

2019 | Post | 94.6 | 25.2 | 306 |

Notice that in 2016, we also see an increase in average exit velocity, no change in launch angle, and a (slight) decrease in distance. On the other hand, in 2017 the slight increase in exit velocity is met with a slight increase in distance. There ultimately seems to be an inconsistent relationship when including the full subset of batted balls, but considering just home runs suggests there may in fact be a precedent:

Year | Season | Exit Velocity | Launch Angle | Distance |
---|---|---|---|---|

2015 | Regular | 103 | 27.8 | 400 |

2015 | Post | 105 | 27.7 | 400 |

2016 | Regular | 103 | 28.1 | 399 |

2016 | Post | 104 | 28.0 | 392 |

2017 | Regular | 103 | 28.0 | 400 |

2017 | Post | 103 | 28.1 | 393 |

2018 | Regular | 104 | 28.2 | 398 |

2018 | Post | 104 | 28.2 | 396 |

2019 | Regular | 104 | 28.2 | 400 |

2019 | Post | 105 | 27.8 | 397 |

From 2016-19, the average distance for home runs decreased in the postseason, despite no change or an increase in average exit velocity and minimal change in launch angle. In 2016 and 2017 there was a large decrease of 7 feet in average distance. 2015 was the only year where the average distance remained the same, but the average exit velocity was a full 2 mph greater in the playoffs.

As the goal is to determine if there was a difference between the 2019 playoffs and previous postseasons, I trained another random forest model on 2015-18 playoff data, using the same 80-20 split. The 2015-18 test set R-squared is 0.557 and the 2019 test set R-squared is 0.541. The dropoff makes sense due to the significantly lower training set sample size. Note the 2019 differential distribution:

Although the distribution is not quite symmetric, the first and third quartiles are about evenly spaced from zero. There does not seem to be a difference between previous postseasons and 2019. This, coupled with the earlier result that the regular seasons were similar, would suggest that the decrease in carry in the playoffs happens in other years as well. To test this possibility, consider new random forest models that are trained on regular season data from each year separately and then tested on that year’s postseason. These models resulted in the following statistics and differentials:

Year | Regular Season Test Set R-squared | Playoff Test Set R-squared |
---|---|---|

2015 | 0.834 | 0.828 |

2016 | 0.812 | 0.757 |

2017 | 0.841 | 0.818 |

2018 | 0.805 | 0.843 |

The decrease in distance has indeed happened before, as recently as last year. The 2018 model overestimated the distance almost as much as the 2019 model. The same was true, to a lesser degree, in 2016. The 2015 and 2017 models actually tended to underestimate the distance, but not as severely as the overestimates of 2018 and 2019. The ultimate takeaway here is that while there certainly seemed to be less carry in the postseason in 2019, it was not an unprecedented occurrence. The same thing happened last season and no one seemed to notice, which circles back to the idea that we were analyzing every fly ball much more closely than in the past.

None of the results presented here are proof that the balls were the same in every year – there are other ways that a change in the ball itself could have led to more home runs. The data does suggest, however, that the ball was not carrying further this year and that the postseason dropoff in distance was not unique to 2019.

One last observation is that we consistently saw the average exit velocity increase along with the number of balls hit in the air. The key to the increase in home runs might thus be a change in conditions that allow hitters to strike the ball better in general. The simple explanation for this would be that hitters are improving. The flip side would be that pitchers are providing more pitches to hit – possibly due to a change in the seams of the baseball that many pitchers have voiced concerns over. One other interesting possibility, which would need to be tested, is that the flattened seams might make for more optimal contact. If the baseball is smoother, it might be reasonable to expect that the bat to ball contact is more pure, and thus the quality of contact improves on average.