The Leadoff Walk
We’ve all heard a broadcaster comment on the impending doom of a leadoff walk and yet they fail to seem to apply the same sort of fateful outcome for a single. I thought it would be interesting to find the outcomes of each of the ways a player can leadoff an inning by getting on first base and see if it affects whether or not the runner goes on to score. I took the retrosheet data sine 1952 (but not including this year) that I have as a MySQL database and created a quick python script to determine these results. I took it further and examined if the breakdown were any different in late game situations, as I’m always hearing “You never want to walk the leadoff batter but especially late in close ball games”. I was also curious if even in general more solitary runs get manufactured once a leadoff runner gets on base in late game situations.
Total times batter lead off an inning by getting to first: 508312
Total times runner scored: 192150
So a leadoff batter who starts on first base scores 37.80% percent of the time, here is the breakdown via the means they get aboard
Any inning
Single 325455 Scored 122662 37.69%
Walk 150570 Scored 57189 37.98%
HBP 11865 Scored 4600 38.77%
Error 19260 Scored 7270 37.74%
Strikeout 1007 Scored 375 37.24%
Catcher's Int. 155 Scored 54 34.84%
Totals 508312 Scored 192150 37.80%
So it appears as though it’s not much of a statistically significant difference between the walk and the single. The HBP numbers seems to be a bit of an outlier, I’m wondering if that is just sample size or if such an outcome rattles the pitcher to the point of that much more runs being produced.
Lets now examine the breakdown based upon the stage of the game.
6th inning or earlier
Single 217421 Scored 83243 38.29%
Walk 100587 Scored 38798 38.57%
HBP 7879 Scored 3070 38.96%
Error 12778 Scored 4880 38.19%
Strikeout 645 Scored 244 37.83%
Catcher's Int. 107 Scored 36 33.64%
Totals 339417 Scored 130271 38.38%
7th inning or later
Single 108034 Scored 39419 36.49%
Walk 49983 Scored 18391 36.79%
HBP 3986 Scored 1530 38.38%
Error 6482 Scored 2390 36.97%
Strikeout 362 Scored 131 36.19%
Catcher's Int. 48 Scored 18 37.50%
Totals 168895 Scored 61879 36.64%
Interesting how 1.74% more leadoff runners reaching first score in the earlier innings. Is this a comment on the failure of manufacturing runs or pitching being different in the later stages of the game? Perhaps a deeper look based upon “close game situations” is in order for that.
Couple of things:
The only reasonable explanation I can think of for the slight elevation in runs scored off of a lead-off walk versus a lead-off single is that when a lead-off walk is issued, it’s more likely that you’re dealing with a pitcher who has control problems. It’s even more likely when the leadoff runner reaches via HBP. Based on this explanation you might expect to see more lead-off HBP’s score than lead-off walks than lead-off hits, which you apparently do.
Regarding the lower likelihood of lead-off runners scoring in later innings I think that definitely has to do with the likelihood of the pitcher being changed in response to bad pitching outcomes.
Nice! I was always curious about that stat. But I also always tought that a leadoff walk (or, in this case) a leadoff HBP would usually score, because it’s an early evidence that the pitcher is loosing his control.
In other words: Obviously, no pitcher wants to walk the leadoff batter, nor hit him with a pitch, but when that does happen, that guy usually scores, because the pitcher is loosing his control over the outcome of pitches. He starts to miss his targets, and when that happens, he usually gets hit.
I don’t know if there is a way to do this, but if there were, it would be interesting to see if leadoff walks score as many times (37.8%), when the pitcher that walks them, gets replaced by a reliever. I would bet that inherited leadoff walks wouldn’t score as much. But that’s just me…
All of those numbers are within the sampling error, which should be listed explicitly with data like this.
1 SD for N trials with an expected success rate p = sqrt(p*(1-p)/N)
So for all innings:
Singles = 38.29 +/- 0.09%
Walks = 37.98 +/- 0.13%
HBP = 38.77 +/- 0.44%
Error = 37.74 +/- 0.35%
That’s all within the margin of error, and there’s no reason to rule out the null hypothesis that they are all have the same true scoring percentage.
The differences between the early and late innings are statistically significant thought, it would seem.
Thanks for the refresher on the sampling error Larry, I meant to ask for those exact details from the community and probably should have before even posting.
I think Evan is right with regards to the significance of the game stage breakdown difference, that being pitching changes. I’d also think teams giving away outs via sacs might be a small cause as well. I think I’ll look a bit more into these results.
a good collection of info, but the leadoff walk is so dreaded because it’s your own fault. If they get a single they’ve earned it.
It’s not that the walk is so bad, it’s just that the walk is a strategic error.
Also, look at it this way, how often does a leadoff walk score (37%) vs how often a ball put in play scores? If you make them put it in play it’s more like a 10% chance to score. (the odds of getting on x the odds of scoring after you get on)
So indeed, a lead off walk is far worse than an average ball put in play.
I would think that leadoff hitters who reach in the 7th inning and later are less likely to score because the team is more likely to pull the ineffective pitcher and put in a more effective one.
What Joe IQ said.
First of all. This Community Blog thing is great. I have love the two that I have read so far. Great work Plen.
Second, I have to agree with Evan that the difference between the late inning and early inning numbers is likely pitching changes. For instance in a close game (7 inning or later), you are MUCH more likely to see a platoon matchup for a replacement pitcher.
Lastly, as LarryinLA points out there are definately some statistically significant differences in these numbers (I think the sample size being half a million is cool). It would seem to me that figuring out the potential causes of the difference would require looking at the follow moves/events (pitcher change, follow up BB, sac, hit, etc) and the situation (close game, etc)
For instance, if we say that late game BBs are followed by pitching changes but not early game BBs that would tell us something. Likewise, if we know that after a HBP the next hitter is more likely to see the same pitcher than after a BB, it could tell us something.
Great stuff. Thanks.
I think the likelyhood that the runner gets bunted to 2nd is higher in the later innings. Does a run score more often or less often when a runner is bunted over. I know that the expected runs after a bunt is lower, but is the chance that the one specified runner score higher?
Ditto the above posters. wOBA is about .015 points lower in innings 7-9 than it is in innings 1-6.
Peak scoring happens between the 5th and 7th innings. I have no evidence to prove why, but I suspect it has to do with the fact that your starting pitchers are tired by the 5th/6th/7th inning, and hitters have gotten a better look at them. If your starters are yanked, then your bottom teir bullpen arms are brought in. In the 8th, and 9th, you’ve typically got your better bullpen arms, who are fresh, and only facing batters once, so those 2 innings tend to be lower scoring then the rest (particularly the 9th, where your closer (best relief pitcher) usually resides).
Selection bias.
The recent trend is for pitchers to throw only 6 or 7 innings and then have a middle reliever take over. This wasn’t always the case. So measuring innings where the 4th and 5th innings, where starters have been through the order once perhaps might be a better indicator, and good and mediocre pitchers will average out The ninth inning, where a closer is called in, but not always, may also lower the 37% rate. Blown saves may have higher raters than the 37% rate I would be interested in knowing if there is any significant differences by decade, but where does one stop, by team by pitcher?
Great stats! Thanks, I would welcome more.
It would be interesting to see if the likelihood of scoring has changed over the years. In particular, it would be interesting if you see that only in the bullpen era does the difference between early game and late game scoring exist.
While I’d assume that the relief pitching suggestion is the cause of the difference between early and late game scoring, I’ll offer another possible explanation as well. The early part of the game includes the first inning. Generally, over the period covered by the data, the leadoff hitter in the first inning is usually (1) fast and therefore more likely to steal bases, score from second on a single, etc. and (2) good enough at getting on base to be batting leadoff (even if OBP wasn’t considered as important in eras past). Plus, the leadoff man is always followed by the better hitters in the lineup. I imagine that creates a (probably very) small bias, as a player reaching base to lead off the first inning might be slightly more likely to score than a player reaching base to lead off other innings. It might be interesting to see these numbers broken out by inning and/or by position in the batting order. In the first inning, the leadoff man is the only batter who could lead off. In other innings, there’s a distribution of who could be leading off, and it’s possible that could matter.
Great point about the leadoff hitter in the first inning badenjr and your suspicions are confirmed in the data. Here is the distribution by inning
1st 43.23%
2nd 34.22%
3rd 38.45%
4th 38.05%
5th 37.57%
6th 38.24%
7th 37.19%
8th 37.07%
9th 35.52%
Extras 36.04%
Here is the distribution by year.
1950 00.00% 1960 36.26% 1970 36.59%
1951 00.00% 1961 37.44% 1971 35.10%
1952 35.30% 1962 36.83% 1972 34.81%
1953 36.79% 1963 34.41% 1973 37.03%
1954 36.63% 1964 34.46% 1974 36.33%
1955 36.80% 1965 35.02% 1975 36.56%
1956 36.70% 1966 35.03% 1976 36.87%
1957 36.60% 1967 34.20% 1977 38.31%
1958 35.60% 1968 34.02% 1978 37.62%
1959 37.02% 1969 36.12% 1979 37.52%
1980 37.49% 1990 38.54% 2000 40.15%
1981 36.53% 1991 37.93% 2001 38.42%
1982 38.53% 1992 37.45% 2002 38.53%
1983 38.32% 1993 40.02% 2003 39.35%
1984 38.34% 1994 40.02% 2004 39.42%
1985 37.69% 1995 39.95% 2005 38.42%
1986 38.35% 1996 40.62% 2006 39.30%
1987 39.25% 1997 39.81% 2007 39.85%
1988 38.02% 1998 39.40% 2008 38.50%
1989 38.46% 1999 40.92% 2009 38.30%
An interesting follow-up on that would be checking the difference in late game scoring to early game scoring throughout the years. Relief pitchers are used much more these days, possibly widening that variance over time.
Great stuff, Plen. I don’t really have much to add, but I wanted to thank you for putting the work in to research this.
Not sure if this was already asked, but what about IBB? Were those looked at? Because the next batter will most likely be getting challenged more and have a greater opportunity to drive a pitch.
An IBB to lead off the inning? It was included in the BB totals but has only happened nine times since 1952
AUG 01 1953 – NY Giants at Cincinnati Reds
Daryl Spencer in the top of the 8th, Giants ahead 8-6
SEP 09 1970 – Washington Senators at Cleveland Indians
Frank Howard in the bottom of the 3rd, no score
Frank Howard in the bottom of the 5th, no score
(Howard was intentionally walked 3 times in the game)
AUG 06 1996 – Cincinnati Reds at San Francisco Giants
Barry Bonds in the bottom of the 9th, Reds ahead 3-2
MAY 09 2004 – San Francisco Giants at Cincinnati Reds
Barry Bonds in the top of the 10th, 6-6 tie
AUG 11 2004 – San Francisco Giants at Pittsburgh Pirates
Barry Bonds in the top of the 10th, 6-6 tie
* First two pitches were unintentional balls
JUN 17 2005 – Toronto Blue Jays at San Francisco Giants
Barry Bonds in the bottom of the 8th, 5-5 tie
* First three pitches were unintentional balls
SEP 04 2006 – Houston Astros at Philadelphia Phillies
Ryan Howard in the bottom of the 9th, 2-2 tie
JUN 17 2007 – Toronto Blue Jays at San Francisco Giants
Barry Bonds in the bottom of the 6th, Giants ahead 4-3
* First three pitches were unintentional balls
plen, you’ve shown that the percentage is significantly higher in the first inning. Does that explain the difference between early and late in games? If you removed the first inning, does the percentage look about the same for innings 2-6 as for innings 7+?
The common explanation was that a fresh reliever better prevented the run from scoring in the late innings. If that was true, you’d expect to see the percentages decline as the years progressed and relievers became more prevalent. Your data suggests the opposite. From 1952 through 1976, the percentage never climbed as high as 38%. It hasn’t been below 38% since 1993. Obviously, this doesn’t look at just the late-inning cases, and the addition of the DH in 1973 should be responsible for some of the increase, but it still looks like there’s been a trend that runs counter to what our initial belief was.
I’ve got a couple of thoughts (caveat’ing that, as Larry pointed out (and mentioning that Larry’s singles number was the one for 1-6th inning), your differences are still within the margin for error).
1) If you can, and continuing on with badenjr’s idea, break down the difference by lineup position. Of course, maybe this will result in too few samples, especially for the eighth and ninth hitters.
2) I’d also break it down by n’th time seeing that pitcher/batter matchup in that game. I think this will equalize for bullpen usage over the years, as there would be more 3rd and 4th time matchups in the earlier years.
The leadoff hitter tends to be a faster running. They ALWAYS lead off the first inning, hence the increase.
The second inning, the 4-5-6 (slower) hitters tend to start the inning, hence it’s harder to score. (Especially since they have to rely on the 6-7-8-9 hitters to drive them in).
That seems to clearly explain the increase % in the first inning and the decrease in the 2nd inning.
9th inning, your closer is in the game, hence the lower %.
I’d like to think innings 3-6 vs 7-8 would be fairly equal.
my $0.02
Just read this post, it’s awesome. I’m curious though, do you know where I can find data on the events that transpire on the next at bat (after a lead-off walk). I have a few theories, like increased errors (possibly catching his defense a little bit more off-guard from the lack of engagement on the last at bat) and increased hits (since the pitcher doesn’t want to fall behind, giving the batter a chance to t-off on a first pitch over the plate).
Same as kbertling353 says:
thanks
really nice article and good posts…ONE thing I’d like to see is pct of times the leadoff baserunner scores if he is on 2nd base vs 1st base. And the difference between whether they got to first on a double or stole while the 2nd batter was still batting. Just curious since this is considered scoring position where they can score on a single.
Lastly…the other time a leadoff runner might get on first is if they get on via an error. Can’t imagine the pct would be much different other than due to small sample size similar to IBB
Am I correct in that the data shows the percent of time that the actual player who led off with walk, single, etc. scores? Does anyone know the percent of time a team scores in an inning when leadoff man reaches first. For example, leadoff walk followed by fielders choice force out at second; now man on 1st with one out and he eventually scores.
The differences between the types of ways a player can reach is likely noise. They might be showing some sort of sort of proxy for command or the hit types might somehow select for different parts of the batting order, but I highly doubt that can be teased out from here.
With respect to innings, it’s already well-known that the later innings have a lower run environment so that’s what you’re seeing. This is caused by cooler weather and bullpens.
Is their data to compare four pitch lead off walks vs non four pitch walks?