## Home Runs and Temperature: Can We Test a Simple Physical Relationship With Historical Data?

Unlike most home-run-related articles written this year, this one has nothing to do with the recent home run surge, juiced balls, or the fly-ball revolution. Instead, this one’s about the influence of temperature on home-run rates.

Now, if you’re thinking *here comes another readily disproven theory about home runs and global warming (**a la Tim McCarver in 2012**),* don’t worry – that’s not where I’m going with this. Alan Nathan nicely settled the issue by demonstrating that temperature can’t nearly account for the large changes in home-run rates throughout MLB history in his 2012 Baseball Prospectus piece.

In this article, I want to revisit Nathan’s conclusion because it presents a potentially testable hypothesis given a large enough data set. If you haven’t read his article or thought about the relationship between temperature and home runs, it comes down to simple physics. Warmer air is less dense. The drag force on a moving baseball is proportional to air density. Therefore (all else being equal), a well-hit ball headed for the stands will experience less drag in warmer air and thus have a greater chance of clearing the fence. Nathan took HitTracker and HITf/x data for all 2009 and 2010 home runs and, using a model, estimated how far they would have gone if the air temperature were 72.7°F rather than the actual game-time temperature. From the difference between estimated 72.7°F distances and actual distances, Nathan found a linear relationship between game-time temperature and distance. (No surprise, given that there’s a linear dependence of drag on air density and a linear dependence of air density on temperature.) Based on his model, he suggests that a warming of 1°F leads to a 0.6% increase in home runs.

This should in principle be a testable hypothesis based on historical data: that the sensitivity of home runs per game to game-time temperature is roughly 0.6% per °F. The issue, of course, is that the temperature dependence of home-run rates is a tiny signal drowned out by much bigger controls on home-run production [e.g. changes in batting approach, pitching approach, PED usage, juiced balls (maybe?), field dimensions, park elevation, etc.]. To try to actually find this hypothesized temperature sensitivity we’ll need to (1) look at a massive number of realizations (i.e. we need a really long record), and (2) control for as many of these variables as possible. With that in mind, here’s the best approach I could come up with.

I used data (from Retrosheet) to find game-time temperature and home runs per game for every game played from 1952 to 2016. I excluded games for which game-time temperature was unavailable (not a big issue after 1995 but there are some big gaps before) and games played in domed stadiums where the temperature was constant (e.g. every game played at the Astrodome was listed as 72°F). I was left with 72,594 games, which I hoped was a big enough sample size. I then performed two exercises with the data, one qualitatively and one quantitatively informative. Let’s start with the qualitative one.

In this exercise, I crudely controlled for park effects by converting the whole data set from raw game-time temperatures (*T*) and home runs per game (*HR*) to what I’ll call *T** and *HR**, differences from the long-term median *T* and *HR* values at each ball park over the whole record. Formally, for any game, *T* *and *HR** are defined such that *T* *= *T *– *T _{med,park}* and

*HR**=

*HR – HR*, where

_{med,park}*T*and

_{med,park}*HR*are median temperature and HR/game, respectively, at a given ballpark over the whole data set. A positive value of

_{med,park}*HR**for a given game means that more home runs were hit than in a typical ball game at that ballpark. A positive value for

*T**means that it was warmer than usual for that particular game than on average at that ballpark. Next, I defined “warm” games as those for which

*T**>0 and “cold” games as those for which

*T**<0. I then generated three probability distributions of

*HR** for: 1) all games, 2) warm games and 3) cold games. Here’s what those look like:

The tiny shifts of the warm-game distribution toward more home runs and cold-game distribution toward fewer home runs suggests that the influence of temperature on home runs is indeed detectable. It’s encouraging, but only useful in a qualitative sense. That is, we can’t test for Nathan’s 0.6% HR increase per °F based on this exercise. So, I tried a second, more quantitative approach.

The idea behind this second exercise was to look at the sensitivity of home runs per game to game-time temperature over a single season at a single ballpark, then repeat this for every season (since 1952) at every ballpark and average all the regression coefficients (sensitivities). My thinking was that by only looking at one season at a time, significant changes in the game were unlikely to unfold (i.e. it’s possible but doubtful that there could be a sudden mid-season shift in PED usage, hitting approach, etc.) but changes in temperature would be large (from cold April night games to warm July and August matinees). In other words, this seemed like the best way to isolate the signal of interest (temperature) from all other major variables affecting home run production.

Let’s call a single season of games at a single ballpark a “ballpark-season.” I included only ballpark-seasons for which there were at least 30 games with both temperature and home run data, leading to a total of 930 ballpark-seasons. Here’s what the regression coefficients for these ballpark-seasons look like, with units of % change in HR (per game) per °F:

A few things are worth noting right away. First, there’s quite a bit of scatter, but 75.1% of these 930 values are positive, suggesting that in the vast majority of ballpark-seasons, higher home-run rates were associated with warmer game-time temperatures as expected. Second, unlike a time series of HR/game over the past 65 years, there’s no trend in these regression coefficients over time. That’s reasonably good evidence that we’ve controlled for major changes in the game at least to some extent, since the (linear) temperature *dependence* of home-run production should not have changed over time even though temperature itself has gradually increased (in the U.S.) by 1-2 °F since the early ‘50s. (Third, and not particularly important here, I’m not sure why so few game-time temperatures were recorded in the mid ‘80s Retrosheet data.)

Now, with these 930 realizations, we can calculate the mean sensitivity of HR/game to temperature, resulting in __0.76% per __** °F**. [Note that the scatter is large and the distribution doesn’t look very Gaussian (see below), but more Dirac-delta like (1 std dev ~ 1.66%, but middle 33% clustered within ~0.4% of mean)].

Nonetheless, the mean value is remarkably similar to Alan Nathan’s 0.6% per °F.

Although the data are pretty noisy, the fact that the mean is consistent with Nathan’s physical model-based result is somewhat satisfying. Now, just for fun, let’s crudely estimate how much of the league-wide trend in home runs can be explained by temperature. We’ll assume that the temperature change across all MLB ballparks uniformly follows the mean U.S. temperature change from 1952-2016 using NOAA data. In the top panel below, I’ve plotted total MLB-wide home runs per complete season (30 teams, 162 games) season by upscaling totals from 154-game seasons (before 1961 in the AL, 1962 in the NL), strike-shortened seasons, and years with fewer than 30 teams accordingly. In blue is the expected MLB-wide HR total if the only influence on home runs is temperature and assuming the true sensitivity to be 0.6% per °F. No surprise, the temperature effect pales in comparison to everything else. Shown in the bottom plot is the estimated difference due to temperature alone in MLB-wide season home run totals from the 1952 value of 3,079 (again, after scaling to account for differences in number of games and teams). You can think of this plot as telling you how many of the total home runs hit in a season wouldn’t have made it over the fence if air temperatures at remained constant at 1952 levels.

While these anomalies comprise a tiny fraction of the thousands of home runs hit per year, one could make that case (with considerably uncertainty admitted) that as many as 59 of these extra temperature-driven home runs were hit in 2016 (or about two per team!).