Did the Baseballs Carry More in 2019?

by Chris Russo

January 22, 2020

As much as baseball fans would like a simple explanation for the astronomical increase in home runs in 2019, it is becoming clearer that many factors have played into the surge. Among the possible reasons are batters prioritizing hitting homers more than ever before, pitchers having difficulty gripping the seams of the baseball, and of course the famous “juiced balls.” Last month, a committee released initial results of a comprehensive study attempting to determine the driving forces behind the home run rate growth.

I am particularly interested in the idea that fly balls were supposedly carrying more in 2019. On multiple occasions throughout the year, I listened to announcers observe that outfielders seemed to be severely misjudging fly balls. For instance, the center fielder would be drifting back toward the wall, as if he had a bead on it, and the ball would end up 15 rows deep. Although this may seem like evidence for increased carry of the baseball, such observations can easily be driven by confirmation bias. There was a tendency this year to believe that every ball in the air would be a homer, so when a ball would carry a lot, it fit with expectations and the belief continued to grow. It may just have simply been the case that the wind was blowing out that day, or that the batter struck the ball in a particular way, and the carry had nothing to do with the ball itself. To determine if the perception was in fact reality, I focus on the following question: Did similarly struck balls travel farther in 2019 than previous years?

The data I used consists of fly balls and line drives from 2015-19. To remove popups and low line drives, I further subset the batted balls to include only those with an exit velocity of at least 70 mph and a launch angle between 10 and 45 degrees. To start, consider the following table that presents average metrics for the past five regular seasons:

Year	Observations	Exit Velocity	Launch Angle	Distance
2015	47,795	92.8	24.8	307
2016	48,285	93.2	24.9	306
2017	48,243	93.2	25.0	307
2018	49,073	93.3	25.2	305
2019	49,859	93.7	25.2	308

The number of batted balls that fit the parameters has generally increased from year to year, reaching its highest point in 2019. There was an increase in the average distance from 2018 to 2019, but there was also an increase in average exit velocity. In general, distance seems to increase with exit velocity and decrease with launch angle. There does not seem to be a clear unexplained jump in distance in 2019, as the relationship between contact and distance is roughly consistent across years. Next, let’s examine fly balls (as classified by Statcast) only:

Year	Observations	Exit Velocity	Launch Angle	Distance
2015	20,368	91.6	33.5	333
2016	21,616	92.3	33.3	336
2017	22,295	92.8	33.3	341
2018	23,154	93.1	33.3	338
2019	23,675	93.6	33.4	343

The number of fly balls has increased every year, falling in line with the idea that hitters are trying to lift the ball more. Within the fly ball category, the average launch angle has pretty much been the same each year, while the average exit velocity has steadily increased. Interestingly, the inconsistency in distance actually appears to be in 2018 where an increase in exit velocity was met with a decrease in distance. Now consider home runs:

Year	Observations	Exit Velocity	Launch Angle	Distance
2015	4,899	103	27.8	400
2016	5,602	103	28.1	399
2017	6,096	103	28.0	400
2018	5,568	104	28.2	398
2019	6,765	104	28.2	400

Again, 2019 does not clearly stand out from other seasons.

To get a more precise idea of the relationship between 2019 batted balls and those from previous years, I built a random forest model on the data from 2015-18. The model predicts distance using exit velocity and launch angle, as well as ballpark and month of the year to account for atmosphere and seasonal effects, respectively. The model is trained on 80% of the data and tested on the remaining 20%. The test set R-squared value is 0.854, which is quite high but not all that surprising since we might expect the combination of exit velocity and launch angle to be a very strong predictor. The inclusion of ballparks is to account for significantly different conditions (such as the thin air at Coors Field) and the month captures when there tends to be cooler air that would result in less carry or vice versa. The model reveals that exit velocity is the most important predictor, with launch angle following closely behind. The ballpark is less important but still instructive and the month provides the least amount of information.

Next, we take this model, which performed well on the test set, and apply it to the 2019 data. If the model consistently underestimates the 2019 distance, it would suggest that there was an unexplained increase in distance in 2019. The model predicts the 2019 distances even more successfully, resulting in an R-squared of 0.860. For each observation, let the “differential” equal the actual distance minus the predicted distance, so that a positive differential would mean the ball traveled farther than the model predicted. Consider the distribution of the differentials:

The distribution is centered very close to zero and is roughly symmetric. The summary statistics ultimately reveal that the model does not consistently over- or underestimate the 2019 distance. In other words, based on a model that fits the data well, the 2019 regular season ball flight was not discernibly different from previous years. This is an interesting result, as it was basically accepted that the balls were flying farther in 2019.

This perception also played a role in the postseason, where all of the sudden it seemed that many of the fly balls that had been carrying were landing in gloves instead of the seats. Was this actually happening or was it another misconception?

Following a similar process as before, let’s compare the average metrics in the 2019 postseason with those in the regular season:

2019 Regular Season vs. Playoffs

Season	Observations	Exit Velocity	Launch Angle	Distance
Regular	49,859	93.7	25.2	308
Post	752	94.6	25.2	306

Fly Balls

Season	Observations	Exit Velocity	Launch Angle	Distance
Regular	23,675	93.6	33.4	343
Post	358	94.3	33.5	342

Home Runs

Season	Observations	Exit Velocity	Launch Angle	Distance
Regular	6,765	104	28.2	400
Post	95	105	27.8	397

Of course, the postseason has significantly fewer batted balls to draw from, but we do find an intriguing result from these initial calculations. For all three subsets of batted balls (which are determined in the same way as above), the average exit velocity in the postseason was much higher. The average launch angle did not change much, except for the home runs where it decreased from 28.2 to 27.8, which would also tend to increase distance. Yet the average distance decreased in every case.

Next, I trained a new random forest model on 80% of the 2019 regular season data, and the R-squared of the model applied to the remaining 20% is 0.855. The importance of each predictor was similar to the first model. Using the model to predict the 2019 postseason data reveals a similar relationship as the tables above. The R-squared is 0.817 and the distribution of differentials (defined above) is as follows:

The model clearly tends to overestimate the distance, which suggests that the balls were carrying less in the postseason. In this case, it appears that viewers’ eyes were not deceiving them, and that similarly struck balls were not seeing the same results. There are a number of possibilities for why this was the case, including the fun option of MLB “de-juicing” the balls, but I will investigate one of these options. It is possible that the October air tends to result in less carry in general, and that this difference happens in every postseason. To examine this possibility, let’s compare the year and regular/postseason in one table:

Year	Season	Exit Velocity	Launch Angle	Distance
2015	Regular	92.8	24.8	307
2015	Post	93.9	25.1	311
2016	Regular	93.2	24.9	306
2016	Post	93.7	24.9	305
2017	Regular	93.2	25.0	307
2017	Post	93.5	24.9	308
2018	Regular	93.3	25.2	305
2018	Post	93.9	25.8	303
2019	Regular	93.7	25.2	308
2019	Post	94.6	25.2	306

Notice that in 2016, we also see an increase in average exit velocity, no change in launch angle, and a (slight) decrease in distance. On the other hand, in 2017 the slight increase in exit velocity is met with a slight increase in distance. There ultimately seems to be an inconsistent relationship when including the full subset of batted balls, but considering just home runs suggests there may in fact be a precedent:

Year	Season	Exit Velocity	Launch Angle	Distance
2015	Regular	103	27.8	400
2015	Post	105	27.7	400
2016	Regular	103	28.1	399
2016	Post	104	28.0	392
2017	Regular	103	28.0	400
2017	Post	103	28.1	393
2018	Regular	104	28.2	398
2018	Post	104	28.2	396
2019	Regular	104	28.2	400
2019	Post	105	27.8	397

From 2016-19, the average distance for home runs decreased in the postseason, despite no change or an increase in average exit velocity and minimal change in launch angle. In 2016 and 2017 there was a large decrease of 7 feet in average distance. 2015 was the only year where the average distance remained the same, but the average exit velocity was a full 2 mph greater in the playoffs.

As the goal is to determine if there was a difference between the 2019 playoffs and previous postseasons, I trained another random forest model on 2015-18 playoff data, using the same 80-20 split. The 2015-18 test set R-squared is 0.557 and the 2019 test set R-squared is 0.541. The dropoff makes sense due to the significantly lower training set sample size. Note the 2019 differential distribution:

Although the distribution is not quite symmetric, the first and third quartiles are about evenly spaced from zero. There does not seem to be a difference between previous postseasons and 2019. This, coupled with the earlier result that the regular seasons were similar, would suggest that the decrease in carry in the playoffs happens in other years as well. To test this possibility, consider new random forest models that are trained on regular season data from each year separately and then tested on that year’s postseason. These models resulted in the following statistics and differentials:

Year	Regular Season Test Set R-squared	Playoff Test Set R-squared
2015	0.834	0.828
2016	0.812	0.757
2017	0.841	0.818
2018	0.805	0.843

The decrease in distance has indeed happened before, as recently as last year. The 2018 model overestimated the distance almost as much as the 2019 model. The same was true, to a lesser degree, in 2016. The 2015 and 2017 models actually tended to underestimate the distance, but not as severely as the overestimates of 2018 and 2019. The ultimate takeaway here is that while there certainly seemed to be less carry in the postseason in 2019, it was not an unprecedented occurrence. The same thing happened last season and no one seemed to notice, which circles back to the idea that we were analyzing every fly ball much more closely than in the past.

None of the results presented here are proof that the balls were the same in every year – there are other ways that a change in the ball itself could have led to more home runs. The data does suggest, however, that the ball was not carrying further this year and that the postseason dropoff in distance was not unique to 2019.

One last observation is that we consistently saw the average exit velocity increase along with the number of balls hit in the air. The key to the increase in home runs might thus be a change in conditions that allow hitters to strike the ball better in general. The simple explanation for this would be that hitters are improving. The flip side would be that pitchers are providing more pitches to hit – possibly due to a change in the seams of the baseball that many pitchers have voiced concerns over. One other interesting possibility, which would need to be tested, is that the flattened seams might make for more optimal contact. If the baseball is smoother, it might be reasonable to expect that the bat to ball contact is more pure, and thus the quality of contact improves on average.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG