Why Are so Many Runs Scored in the Bottom of the First Inning? by gregstoll June 17, 2021 After starting to look at some inning-by-inning data from my baseball win expectancy finder for another project, I stumbled across something weird that I can’t explain. Here’s a graph of expected runs scored per inning: Check out how high the bottom of the first inning is. On average, 0.6 runs are scored then compared to 0.5 runs in the top of the first. That’s a huge difference! Let’s look closer: Holy outlier, Batman! So what’s going on? Here are some ideas: Teams score more in the first inning because the top of the lineup is at bat. This is true! You can see in the top graph that the expected runs scored in first inning is the highest for both the home and visiting teams (see this Beyond the Box Score article that discusses this). But that doesn’t nearly explain why the home team does so much better than the visiting team! Starting pitchers are more likely to have a terrible first inning. This might be true, but I can’t think of any reason why this would affect visiting starting pitchers more than home starting pitchers. I also made a graph of the home advantage for each number of runs scored for the first and third inning (I picked the third inning because that’s the second-greatest difference between home and visitor): To me, these look almost exactly the same shape, so it’s not like the first inning has way more six-run frames or anything. This is just random chance. I guess that’s possible, but the effect seems large given that the data has more than 130,000 games. There’s a bug in my code. Maybe! I’ve been writing code for 20 years, and let me tell you, this is certainly possible! In fact, I found a bug in handling walk-off innings in the existing runs per inning code after seeing some weird results in this investigation. But it would be weird to have a bug that just affects the bottom of the first inning since it isn’t at the start or end of the game. I also implemented it in both Rust and Python, and the results match. But feel free to check – the Rust version is here and the Python version is here. This is different between baseball eras. I don’t know why this would be true, but it was easy enough to test out, and the difference is pretty consistent (see the raw data). The fact that home teams are usually better in the playoffs adds bias to this. I think this is a tiny bit true, but I reran the numbers with only regular season games (where the better team has no correlation with whether it’s the home or visiting team) and the difference looks almost exactly the same. In conclusion, I don’t know! But a few people have suggested that the visiting pitcher has to wait a while between warming up and pitching in the bottom of the first. Tom Tango made a similar observation a while ago. I dive into this more in the follow-up post here. Odds and ends: That top “expected runs per inning” graph has some other neat properties — for example, you can see that the second inning is the lowest scoring inning, presumably because players near the bottom of the lineup are usually up. Another thing you can see is how robust the home field advantage is. On average, the home team scores a little more than the visiting team in every inning! The graph only shows eight innings because in the ninth, things get complicated. For one thing, the bottom of the ninth inning only happens if the home team is behind or tied, which biases the sample somewhat. Also, if the game is tied and the home team hits a leadoff home run, they win the game but lose the opportunity to score any more runs. You can also notice the strangeness of the bottom of the first inning another way. If you look at the chance that the home team will win when the game is tied, their chances are better at the beginning of the bottom of the ninth than the bottom of the eighth because they have an extra chance to bat. That advantage gets lower the earlier in the game you go, with one exception. In the bottom of the first, the home team has a ~59% chance to win, but in the bottom of the second, that goes down to ~58%! The reason is that if the home team misses their chance to score runs in the bottom of the first, they’ve apparently missed a big opportunity! The raw report data can be found here in the GitHub repo. This piece originally ran on my blog.