Note: This is a piece I have blogged about previously for a British baseball site located here, and this is a slightly updated version.
Jeff Sullivan does pieces on the worst called balls and strikes at the halfway mark and end of each season. These are usually quite bizarre calls that have some unusual circumstances behind them, but for the most part they don’t have too much influence on the game. However, in this postseason, there was a poor “strike” call which had a huge impact on a game.
In bottom of the second inning of Game Three in the NLDS series between the Braves and the Dodgers, Walker Buehler was in a difficult situation with two outs and runners on second and third after an error from Cody Bellinger. The Dodgers decided to intentionally walk Charlie Culberson, loading the bases, to get to Braves pitcher Sean Newcomb – a fairly standard approach in the NL. But Buehler fired four balls to Newcomb and walked in the first run of the game, bringing up Ronald Acuna, who Buehler threw another three balls to to end up down 3-0 in the count.
Then came “ball four,” but it wasn’t called a ball despite being two inches above the top of the zone, as home plate umpire Gary Cederstrom called a strike. That meant Buehler threw another pitch to Acuna, who launched it for a grand slam, resulting in a score of 5-0 and not 2-0. The potentially “pitcher friendly” call by the umpire cost the Dodgers three runs in a game they ended up losing by just one.
To go to a hyperbolic extent, this meant they lost the game, they then had to play a further game in the series against Atlanta, they were then more tired than the Brewers in what became a seven-game series, they were then more tired than the Red Sox, and they therefore lost the World Series. Certainly a stretch, but it’s not hard to see the effect in the game considering the Braves managed just three runs in the other 35 innings of their four-game series.
Not every mistake made by an umpire has an easily identifiable ramification like that, but they do happen in most game, and it is no surprise that MLB and the WUA (World Umpire Association) want to have the smallest number of mistakes possible. Nowadays they can do this by looking at how many calls an umpire got right or wrong thanks to systems that track the speeds and trajectories of pitched baseballs.
Measuring umpire performance
In the 2006 playoffs, MLB debuted a system called PITCHf/x, which could track the baseball during a pitch enabling data points to be stored such as velocity of pitch, release point, location through the plate, and where the ball landed after being hit. This also tracked a player’s stance to calculate the top and bottom of the zone for each individual. This became standard across all stadiums in the 2008 season which means MLB and WUA could look at the call that was given by an umpire and the location of the pitch and decide if they got it correct or not.
This system was replaced by TrackMan in 2017. All of this information, going back to 2008, is publicly available via Statcast Search on the Baseball Savant site. One can grab all this data and determine the accuracy of the umpires in MLB by looking at all the pitches the umpires called, working out if the ball went through the strike zone or not, and if that matched the call.
Note: The data given by PITCHf/x and TrackMan gives the coordinates for the middle of the ball, so since just a piece of the baseball needs to cross home plate in order for the pitch to be considered a strike, the diameter of the ball (about 2.9 inches) has to be added to the plate size, making the range [−0.95; 0.95] in feet, with 0 being the center of the plate. The height of the strike zone varies from batter to batter, and these values are used for calculations, but for any graphical representations of the zone, I have chosen to use average values for the top and the bottom of the zones (3.5 and 1.6 feet, respectively).
Doing this for all the possible pitches since the start of the 2008 season gives us the above graph. Every single year, since the introduction in 2008, the umpires have gotten better with a correct call rate of 91.6% in 2018 compared to 86.6% in 2008. This to me shows that whatever the MLB and WUA are doing with this data and their training for the umpires is working. But how does it compare for balls and strikes?
Umpires get the calls right for balls better than for strikes, and that makes sense when you think that there will be a number of balls that are not even close and are easy decisions. What stands out from this graph is that most of the improvement seen in the overall performance of umpires is them calling strikes better, improving by 10 percentage points over the 11-year period.
If you have watched baseball for a while, you may have heard people saying that the zone changes depending on the count, i.e. the strike zone decreases in size on pitchers counts and increases on batters counts. To see if this is true, I looked at the correct call rate of balls and strike for each count.
The graph definitely agrees with the hypothesis above, as the lowest correct call rates for strikes are in pitcher’s counts (0-2, 67% & 1-2, 73%) while the lowest correct call rates for balls are in hitter’s counts (2-0, 88% & 3-0, 82%). This clearly shows that umpires are affected by the count, but how does it affect the strike zone?
Modelling the bias
To determine the impact on the size and shape of the zone by count, I had to build a model which would predict the chance a pitch is called a strike by an umpire based on its location. This is where R and R Studio are any budding baseball analyst’s best friend. There are modelling functions pre-built in R so you can use your input data to get predictions of how likely a pitch would be called a strike based on its location. With that data, you can then identify a line, or for the more analytically inclined, it is a contour generated by a loess function, where the strike call is 50% to give you a good idea of where the zone is, i.e. inside the line the pitch has an above 50% chance of being called a strike and outside it is below 50%. The graph below shows the 50% lines for left-handers and right-handers in 2008 and 2018.
As you can see for all of these splits, the strike zone for an 0-2 pitch is considerably smaller than the one for an 0-0 pitch, and the 3-0 pitch has a larger strike zone but not by that much. You can also see the differences between 2008 and 2018 as well with the side edges of the zones constricting in much closer to the actual zone size while bottom part of the zone has become larger, dropping outside the actual zone. According to my model, even with the enlarged zone expected for the 3-0 count, the pitch that Buehler threw had a 2.2% chance of being called a strike.
There is a clear difference between the biggest hitter (3-0) and pitcher (0-2) counts compared to 0-0. What about all 12 counts? What do their strike zones look like? The graph below is for 2018 right-handers only.
If we were to order these strike zones by area, they match up well with run potential for each of those counts, which to me means that the umpires are trying to — perhaps subconsciously — even the game out by being compassionate to whoever is struggling in the count.
So if the umpires are impacted by the count, are they influenced by other factors, like the players involved or home bias?
Additional umpire bias
Thankfully it took only a little digging to show no bias between home team and away team, but there is some potential bias for batters and pitchers. I split 2018’s pitchers and batters into thirds based on their wOBA for 2018 and ran the models again to see if there was any visual difference between their zones for the 0-0, 0-2, and 3-0 count. This produced the graphs below, again for right-handed batters only.
The top three are the pitcher splits, and while there isn’t much difference between the middle third and the top third (top 1-3% bigger in all counts), there is quite a difference for the bottom third pitchers for the 0-2 count. They have a significantly smaller (25%) strike zone for the 0-2 count, and while there were just over 2,000 pitches thrown in that scenario, which is smaller than the others (both just over 5,000), it is still enough for it to be significant. It appears that the poorer pitchers are getting a bad deal off the umpires when they are doing well in the count.
For batters, in all counts, the umpire strike zones are lower for the middle third than the top third, and the bottom third is even lower the middle third. This would be a worrying trend if the umpires were lowering the zone, but on further inspection of the average top and bottom of the zone, it doesn’t look like there is an issue here.
In the table above (values given in feet), the batters in the bottom third have a lower zone than the other two groups. If we take this into account, the ratio of how much smaller or larger the zone has gotten dependent on the count is similar (within 2%) across all three groups, suggesting no bias from the umpires here. While there is no observed bias here, it is interesting to note that there is maybe some correlation between strike zone height, and therefore batter height, and overall wOBA.
(The 3-0 strike zone for the bottom-third batters is slightly skewed due to lower volume of occurrences in 2018: only 449)
An Automated Zone
We have seen that umpires definitely have some bias, intentionally or not. What would happen if MLB switched to an automated strike zone? Beyond the fact that we wouldn’t see the 32,000 incorrect calls across a season, the best benefit for MLB right now might be improved pace of play. In 2018, there were 2,605 at-bats ended early by incorrect calls, but there were 4,039 that were extended by incorrect calls. That’s 1,434 net at-bats being longer than they should be.
The 4,039 at-bats which were extended by the bad calls had on average a further five pitches after the incorrect call. Using the net of 1,434 at-bats extended, we saw just over 7,000 extra pitches thrown, which accounted for 1% of the pitches thrown in 2018. So just looking at the specific scenario of pitches which could end at-bats, the time for a baseball game could be reduced by 1% by an automatic strike zone.
I am relatively certain that most of the MLB teams have done similar analysis and know that the umpires struggle most at calling pitches in the corners of the zone and have the hardest time calling strikes in 0-2 counts, which to me means that pitchers probably are not targeting them as they expect a bad call. I believe that with an automated strike zone, pitchers would be more aggressive towards pitching in the zone, especially the pitchers with better command.
One thing I haven’t talked about so far is pitch framing by catchers. With an automated zone, that would become a defunct skill, which would severely impact the defensive output of catchers. According to the catcher stats by Baseball Prospectus in 2018, for all but three of the top 20 defensive catchers, framing accounted for more than 80% of their Fielding Runs Above Average (FRAA). For example, Yasmani Grandal led MLB with 16.3 FRAA, of which 15.7 were down to his pitch framing ability.
If we were to remove pitch framing, the highest defensive contribution above average would be just 3.1 runs by Tucker Barnhart, which would be 27th overall for catchers if we compared that to the current metric. Barnhart is an extreme case of the switch that would happen if we removed framing, as BP gives him -11.5 defensive runs from framing and he was the 9th (out of 117) worst catcher in 2018.
|Top and Bottom 10 Catchers by FRAA including Framing|
|Yasmani Grandal||16.3||Omar Narvaez||-15.7|
|Jeff Mathis||14.1||Willson Contreras||-15|
|Max Stassi||14||Nick Hundley||-14.1|
|Tyler Flowers||13||Robinson Chirinos||-11|
|Austin Hedges||12.6||Isiah Kiner-Falefa||-10.9|
|Roberto Perez||12.1||A.J. Ellis||-9.7|
|Sandy Leon||11.7||Salvador Perez||-9.5|
|Erik Kratz||11.1||Mitch Garver||-8.5|
|Jorge Alfaro||10.2||Tucker Barnhart||-8.4|
|John Ryan Murphy||9.9||Drew Butera||-7.7|
|Top and Bottom 10 Catchers by FRAA excluding Framing|
|Tucker Barnhart||3.1||Omar Narvaez||-4.9|
|Willson Contreras||2.8||Gary Sanchez||-4.2|
|Jeff Mathis||2.3||Jonathan Lucroy||-3.6|
|Austin Romine||2.1||Jorge Alfaro||-2.1|
|Yan Gomes||2.1||Elias Diaz||-2|
|Kevin Plawecki||1.9||Brian McCann||-1.5|
|Rocky Gale||1.7||Josh Phegley||-1.4|
|Jacob Stallings||1.6||Rene Rivera||-1.4|
|Dustin Garneau||1.5||Francisco Pena||-1.3|
|Taylor Davis||1.5||Francisco Arcia||-1.2|
Many other catchers would benefit from this, like Willson Contreras, who would go from second-worst to second-best. There are others who would be impacted by this badly, such as Jorge Alfaro, who would go from ninth-best to fourth-worst. There are also players who don’t change their ranking much, such as Omar Narvaez, who stays the worst ranked catcher, while Jeff Mathis is second-best with and third-best without framing.
Removing the strike zone decreases the defensive output from catchers with the range of FRAA going from [+16.3 : -15.7] to [+3.1 : -4.9]. With this decreased impact of their defensive contribution, catchers’ offense would have to improve for their value to be the same, which wouldn’t be possible for most of the current backstops. MLB catchers in 2018 averaged a wRC+ of 84. Although probably unlikely, this may lead to some good offensive players being converted into catchers as their negative defensive runs would be countered by their high offensive production and still make them better overall for a team than the current catchers.
I honestly have been a long-time fan of the introduction of the automatic strike zone because of it making the game more accurate, but in doing this research, I have become even more of a fan considering the additions of probably reducing pitch count and increasing hits. I genuinely believe this is a change MLB should be looking into, but I do see there being resistance from the MLB Players Association if they envision an impact on the catchers.