Author Archive

Is the Baseball Actually Juiced?

Home runs are on the rise. We all know this. The number of homers per game is at an all-time high in 2019, and has increased by about 36% just since 2015:

Home Run Rate
Year HR/game
2015 1.01
2016 1.16
2017 1.26
2018 1.15
2019 1.37

What we do not know is exactly why.

Commissioner Manfred recently suggested that the current baseballs have less drag through the air, caused by the more perfect “centering of the pill” (the innermost part of the ball). It has basically become an operational fact that there is something going on with the baseballs. Manfred’s explanation implies that the flight of the baseball is the key difference.

To look at this closer, I considered the distance traveled by balls in the air as a function of the exit velocity and launch angle at contact. If the average distance on similarly struck balls has increased over time, it would suggest that the ball itself is more aerodynamically efficient.

Pitch-by-pitch data for the 2015-2019 seasons was collected from Baseball Savant via the Statcast Search page. Two random forest models were built for each year, one using all fly balls and one using home runs. To account for a possible difference in flight due to the warm air in the summer months, only data through June of each year was used. (At the end of the season, the analysis can be applied to the full data set). In both cases, the distance the ball traveled is the response variable and the exit velocity and launch angle are the explanatory variables. The models are applied to a test data set of various exit velocity/launch angle combinations. Read the rest of this entry »

Ballpark Attendance and Starting Pitchers

When I am thinking about buying a ticket to a baseball game, often my first question is “Who’s pitching?” I have always felt that the most enjoyable type of game is one in which a great starter is on the mound. Is this feeling common among fans or do they buy tickets regardless of the starting pitcher?

To answer this question, I trained random forest models to predict attendance for games based on situational factors (not including the starting pitcher). Then I considered how the quality of starting pitchers relates to whether the models overestimate or underestimate the attendance. If the models consistently underestimate attendance when star pitchers are on the mound, it would suggest more tickets are sold because of the starter.


Information about each game was collected from Retrosheet’s game logs. In accordance with Retrosheet’s terms of use, please note the following statement: “The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at” Pitcher performance data was gathered from FanGraphs. In addition, the people.csv data set found here was used to match player ids from Retrosheet to FanGraphs. Read the rest of this entry »

Peter Alonso Has Adjusted — And Fast

Peter Alonso began 2019 by pummeling a belt-high fastball over the center field wall on the first pitch he saw in Spring Training. He has not stopped hitting since. Over his first 66 games in the regular season, he has slashed .254/.337/.596 with 22 homers and a .382 wOBA. The stats are impressive, but perhaps the most notable aspect of his success has been his ability to modify his approach in short order.

Over the first few weeks of the season, Alonso built an early reputation as a low-ball hitter. Even pitches well below the strike zone were getting sent over the fence. His slugging percentage per pitch by zone reflect this low-ball dominance:

Luckily for Alonso, pitchers had not yet caught on to his affinity for the low pitch. The pitch distribution chart below reveals that he was seeing a plurality of pitches at or below the middle of the zone.

This proved a lethal combination, as Alonso steamrolled his way through April. Read the rest of this entry »