The Leadoff Walk

We’ve all heard a broadcaster comment on the impending doom of a leadoff walk and yet they fail to seem to apply the same sort of fateful outcome for a single. I thought it would be interesting to find the outcomes of each of the ways a player can leadoff an inning by getting on first base and see if it affects whether or not the runner goes on to score. I took the retrosheet data sine 1952 (but not including this year) that I have as a MySQL database and created a quick python script to determine these results. I took it further and examined if the breakdown were any different in late game situations, as I’m always hearing “You never want to walk the leadoff batter but especially late in close ball games”. I was also curious if even in general more solitary runs get manufactured once a leadoff runner gets on base in late game situations.

Total times batter lead off an inning by getting to first: 508312
Total times runner scored: 192150

So a leadoff batter who starts on first base scores 37.80% percent of the time, here is the breakdown via the means they get aboard

Any inning

Single      325455 Scored 122662   37.69%
Walk        150570 Scored  57189   37.98%
HBP          11865 Scored   4600   38.77%
Error        19260 Scored   7270   37.74%
Strikeout     1007 Scored    375   37.24%
Catcher's Int. 155 Scored     54   34.84%
Totals      508312 Scored 192150   37.80%

So it appears as though it’s not much of a statistically significant difference between the walk and the single. The HBP numbers seems to be a bit of an outlier, I’m wondering if that is just sample size or if such an outcome rattles the pitcher to the point of that much more runs being produced.

Lets now examine the breakdown based upon the stage of the game.

6th inning or earlier

Single      217421 Scored  83243 38.29%
Walk        100587 Scored  38798 38.57%
HBP           7879 Scored   3070 38.96%
Error        12778 Scored   4880 38.19%
Strikeout      645 Scored    244 37.83%
Catcher's Int. 107 Scored     36 33.64%
Totals      339417 Scored 130271 38.38%

7th inning or later

Single     108034 Scored 39419 36.49%
Walk        49983 Scored 18391 36.79%
HBP          3986 Scored  1530 38.38%
Error        6482 Scored  2390 36.97%
Strikeout     362 Scored   131 36.19%
Catcher's Int. 48 Scored    18 37.50%
Totals     168895 Scored 61879 36.64%

Interesting how 1.74% more leadoff runners reaching first score in the earlier innings.  Is this a comment on the failure of manufacturing runs or pitching being different in the later stages of the game?  Perhaps a deeper look based upon “close game situations” is in order for that.

29 Comments<script src=""></script>
Newest Most Voted
Inline Feedbacks
View all comments
12 years ago

Couple of things:

The only reasonable explanation I can think of for the slight elevation in runs scored off of a lead-off walk versus a lead-off single is that when a lead-off walk is issued, it’s more likely that you’re dealing with a pitcher who has control problems. It’s even more likely when the leadoff runner reaches via HBP. Based on this explanation you might expect to see more lead-off HBP’s score than lead-off walks than lead-off hits, which you apparently do.

Regarding the lower likelihood of lead-off runners scoring in later innings I think that definitely has to do with the likelihood of the pitcher being changed in response to bad pitching outcomes.

12 years ago

Nice! I was always curious about that stat. But I also always tought that a leadoff walk (or, in this case) a leadoff HBP would usually score, because it’s an early evidence that the pitcher is loosing his control.
In other words: Obviously, no pitcher wants to walk the leadoff batter, nor hit him with a pitch, but when that does happen, that guy usually scores, because the pitcher is loosing his control over the outcome of pitches. He starts to miss his targets, and when that happens, he usually gets hit.
I don’t know if there is a way to do this, but if there were, it would be interesting to see if leadoff walks score as many times (37.8%), when the pitcher that walks them, gets replaced by a reliever. I would bet that inherited leadoff walks wouldn’t score as much. But that’s just me…

12 years ago

All of those numbers are within the sampling error, which should be listed explicitly with data like this.

1 SD for N trials with an expected success rate p = sqrt(p*(1-p)/N)

So for all innings:

Singles = 38.29 +/- 0.09%
Walks = 37.98 +/- 0.13%
HBP = 38.77 +/- 0.44%
Error = 37.74 +/- 0.35%

That’s all within the margin of error, and there’s no reason to rule out the null hypothesis that they are all have the same true scoring percentage.

The differences between the early and late innings are statistically significant thought, it would seem.

12 years ago

a good collection of info, but the leadoff walk is so dreaded because it’s your own fault. If they get a single they’ve earned it.

It’s not that the walk is so bad, it’s just that the walk is a strategic error.

Also, look at it this way, how often does a leadoff walk score (37%) vs how often a ball put in play scores? If you make them put it in play it’s more like a 10% chance to score. (the odds of getting on x the odds of scoring after you get on)

So indeed, a lead off walk is far worse than an average ball put in play.

12 years ago

I would think that leadoff hitters who reach in the 7th inning and later are less likely to score because the team is more likely to pull the ineffective pitcher and put in a more effective one.

12 years ago

What Joe IQ said.

12 years ago

First of all. This Community Blog thing is great. I have love the two that I have read so far. Great work Plen.

Second, I have to agree with Evan that the difference between the late inning and early inning numbers is likely pitching changes. For instance in a close game (7 inning or later), you are MUCH more likely to see a platoon matchup for a replacement pitcher.

Lastly, as LarryinLA points out there are definately some statistically significant differences in these numbers (I think the sample size being half a million is cool). It would seem to me that figuring out the potential causes of the difference would require looking at the follow moves/events (pitcher change, follow up BB, sac, hit, etc) and the situation (close game, etc)

For instance, if we say that late game BBs are followed by pitching changes but not early game BBs that would tell us something. Likewise, if we know that after a HBP the next hitter is more likely to see the same pitcher than after a BB, it could tell us something.

Great stuff. Thanks.

12 years ago

I think the likelyhood that the runner gets bunted to 2nd is higher in the later innings. Does a run score more often or less often when a runner is bunted over. I know that the expected runs after a bunt is lower, but is the chance that the one specified runner score higher?

12 years ago

Ditto the above posters. wOBA is about .015 points lower in innings 7-9 than it is in innings 1-6.

Bobby Boden
12 years ago

Peak scoring happens between the 5th and 7th innings. I have no evidence to prove why, but I suspect it has to do with the fact that your starting pitchers are tired by the 5th/6th/7th inning, and hitters have gotten a better look at them. If your starters are yanked, then your bottom teir bullpen arms are brought in. In the 8th, and 9th, you’ve typically got your better bullpen arms, who are fresh, and only facing batters once, so those 2 innings tend to be lower scoring then the rest (particularly the 9th, where your closer (best relief pitcher) usually resides).

The Duder
12 years ago

Selection bias.

Ted Hoppe
12 years ago

The recent trend is for pitchers to throw only 6 or 7 innings and then have a middle reliever take over. This wasn’t always the case. So measuring innings where the 4th and 5th innings, where starters have been through the order once perhaps might be a better indicator, and good and mediocre pitchers will average out The ninth inning, where a closer is called in, but not always, may also lower the 37% rate. Blown saves may have higher raters than the 37% rate I would be interested in knowing if there is any significant differences by decade, but where does one stop, by team by pitcher?
Great stats! Thanks, I would welcome more.

12 years ago

It would be interesting to see if the likelihood of scoring has changed over the years. In particular, it would be interesting if you see that only in the bullpen era does the difference between early game and late game scoring exist.

While I’d assume that the relief pitching suggestion is the cause of the difference between early and late game scoring, I’ll offer another possible explanation as well. The early part of the game includes the first inning. Generally, over the period covered by the data, the leadoff hitter in the first inning is usually (1) fast and therefore more likely to steal bases, score from second on a single, etc. and (2) good enough at getting on base to be batting leadoff (even if OBP wasn’t considered as important in eras past). Plus, the leadoff man is always followed by the better hitters in the lineup. I imagine that creates a (probably very) small bias, as a player reaching base to lead off the first inning might be slightly more likely to score than a player reaching base to lead off other innings. It might be interesting to see these numbers broken out by inning and/or by position in the batting order. In the first inning, the leadoff man is the only batter who could lead off. In other innings, there’s a distribution of who could be leading off, and it’s possible that could matter.

Nathaniel Dawson
12 years ago

An interesting follow-up on that would be checking the difference in late game scoring to early game scoring throughout the years. Relief pitchers are used much more these days, possibly widening that variance over time.

12 years ago

Great stuff, Plen. I don’t really have much to add, but I wanted to thank you for putting the work in to research this.

12 years ago

Not sure if this was already asked, but what about IBB? Were those looked at? Because the next batter will most likely be getting challenged more and have a greater opportunity to drive a pitch.

12 years ago

plen, you’ve shown that the percentage is significantly higher in the first inning. Does that explain the difference between early and late in games? If you removed the first inning, does the percentage look about the same for innings 2-6 as for innings 7+?

The common explanation was that a fresh reliever better prevented the run from scoring in the late innings. If that was true, you’d expect to see the percentages decline as the years progressed and relievers became more prevalent. Your data suggests the opposite. From 1952 through 1976, the percentage never climbed as high as 38%. It hasn’t been below 38% since 1993. Obviously, this doesn’t look at just the late-inning cases, and the addition of the DH in 1973 should be responsible for some of the increase, but it still looks like there’s been a trend that runs counter to what our initial belief was.

12 years ago

I’ve got a couple of thoughts (caveat’ing that, as Larry pointed out (and mentioning that Larry’s singles number was the one for 1-6th inning), your differences are still within the margin for error).

1) If you can, and continuing on with badenjr’s idea, break down the difference by lineup position. Of course, maybe this will result in too few samples, especially for the eighth and ninth hitters.

2) I’d also break it down by n’th time seeing that pitcher/batter matchup in that game. I think this will equalize for bullpen usage over the years, as there would be more 3rd and 4th time matchups in the earlier years.

12 years ago

The leadoff hitter tends to be a faster running. They ALWAYS lead off the first inning, hence the increase.
The second inning, the 4-5-6 (slower) hitters tend to start the inning, hence it’s harder to score. (Especially since they have to rely on the 6-7-8-9 hitters to drive them in).
That seems to clearly explain the increase % in the first inning and the decrease in the 2nd inning.
9th inning, your closer is in the game, hence the lower %.

I’d like to think innings 3-6 vs 7-8 would be fairly equal.

my $0.02

10 years ago

Just read this post, it’s awesome. I’m curious though, do you know where I can find data on the events that transpire on the next at bat (after a lead-off walk). I have a few theories, like increased errors (possibly catching his defense a little bit more off-guard from the lack of engagement on the last at bat) and increased hits (since the pitcher doesn’t want to fall behind, giving the batter a chance to t-off on a first pitch over the plate).

10 years ago

Same as kbertling353 says:

10 years ago

really nice article and good posts…ONE thing I’d like to see is pct of times the leadoff baserunner scores if he is on 2nd base vs 1st base. And the difference between whether they got to first on a double or stole while the 2nd batter was still batting. Just curious since this is considered scoring position where they can score on a single.

Lastly…the other time a leadoff runner might get on first is if they get on via an error. Can’t imagine the pct would be much different other than due to small sample size similar to IBB

9 years ago

Am I correct in that the data shows the percent of time that the actual player who led off with walk, single, etc. scores? Does anyone know the percent of time a team scores in an inning when leadoff man reaches first. For example, leadoff walk followed by fielders choice force out at second; now man on 1st with one out and he eventually scores.

7 years ago

The differences between the types of ways a player can reach is likely noise. They might be showing some sort of sort of proxy for command or the hit types might somehow select for different parts of the batting order, but I highly doubt that can be teased out from here.

With respect to innings, it’s already well-known that the later innings have a lower run environment so that’s what you’re seeing. This is caused by cooler weather and bullpens.

Matt Murray
7 years ago

Is their data to compare four pitch lead off walks vs non four pitch walks?