Visualizing and Quantifying Strikes Zone Changes Over Time

This week the strike zone has been getting a lot of attention. If you’ve been paying any attention to baseball (and I’m sure you have since fantasy baseball leagues are starting to open up) there have been a few articles/releases suggesting that MLB may be considering raising the strike zone from the hollow beneath the kneecap to the top of the kneecap. It seems like a good idea since strikeout rates are on the rise, but was this a result of (1) pitchers getting better or (2) hitters getting worse or (3) have strikes been getting called differently? I’ll give you a hint; it’s neither of the first two suggestions, at least not directly. No, instead let’s focus on the strike zone and more specifically two things: (1) visualizing the strike zone from 2008 to 2015 and (2) using a standardized set of pitches look at how those pitches have been called over time.

Let’s go through the methods I used before we get to the plots. I used the pitchRx package in R to gather and store the data and used many of the functions included in the package. Next I went through the data and subset the PITCHf/x data by year since I was interested in looking at annual changes. Now due to a combination of time restraints and lack of computing power I didn’t run all of the pitches thrown in each year so I did some subsetting instead. I downloaded a CSV from the FanGraphs leaderboards of all qualified pitchers from 2008 to 2015. In each year I randomly selected 20 pitchers from the list of qualified starters to represent how the strike zone was called for that given year. Finally I ran the data through a general additive model (seen here) which was used to create the “heat maps” for the probability of called strikes in the plots below. I also tested the probability of five standard pitches being called strikes, but that is addressed a bit more later one so I won’t bore you with the details twice. Added note: if anyone actually wants a copy of the R code leave a comment below and I’ll get in contact with you.

Below I’ve included a GIF of the strike zone from 2008 to 2015 . If you watch it a few times you’ll begin to notice the gradual changes to the bottom of the strike zone, plus when it flips from 2015 to 2008 you can really notice the difference. It’s not surprising that there are inter-annual differences between the zones since I’m sure MLB makes a few minor tweaks every off-season and maybe there is a changing of the guard over time for the umps. I also need to apologize about the 2010 plot, the left (L) and right (R) are reversed and I can’t seem to switch them. We will just have to deal with that one plot being different. In all plots the label “L” refers to left-handed batters and “R” to right handed batters.

Now I wanted to find a way to quantify changes to how pitches were being called and I decided on using a set of standardized pitches. Below is a plot showing the locations I chose for my test pitches. I went with five different locations. The pitch right down the middle was my control of sorts, just to make sure things were getting called consistently over time. The remaining locations were the ones I was really interested about; three of those pitches were all located on the lower edge of the strike zone and the final pitch was located 0.2 feet or 2.4″ (the metric system would be more useful here, just sayin’) below the bottom edge of the strike zone. When I initially began this simulation I expected that the lowest pitch would be a second control pitch that would consistently be called a ball, but the results were pretty surprising. Also, I’d like to include that the strike zone to lefties is slightly shifted so that more outside pitches are called strikes.

OK so we are almost at the exciting conclusion. Using those standardized pitches from the plot above I used the general additive model to predict the probability of that pitch being called a strike in a given year. The results are summarized in the plot below. We can see that the pitch being thrown at coordinates 0, 2.5 (the one down the middle) the probability of being called a strike is basically 100% every year. Well that’s a good thing at least that call is consistent. The low pitch thrown down the middle on the bottom edge of the strike zone, coordinates 0, 1.7 (green line), has increasingly been called strike since 2008 to both right- and left-handed batters. Pitches down and in to righties increased pretty significantly this past season where the probability crept above 50%; to lefties that pitch is down and away and it’s been called pretty consistently since 2011 (red lines). Pitches thrown down and away to righties or down and in to lefties (coordinates 1, 1.7 — purple lines) haven’t changed all that much over the time period.

Now we get to what I think is the most interesting pitch. The low fastball down the middle (coordinates 0, 1.5) the one that should be out of the strike zone. This pitch is represented by the gold/yellow lines on the plots. In 2008 these pitches had a chance of being called a strike ~10% of the time to both righties and lefties. Over the last eight seasons that number has trended upwards and in the 2015 season settles in somewhere around 36-40%, which is not an insignificant proportion.

Based on this data it certainly appears as though MLB is justified into looking at raising the strike zone. Pitchers that live down in the zone have been given an increasing advantage in a relatively short amount of time. Hopefully this sheds some light onto the debate on whether or not to raise the strike zone in the coming seasons or maybe the umps will be able to make some adjustments for the upcoming season.

1 Comment
Newest Most Voted
Inline Feedbacks
View all comments
6 years ago

I recently picked up Baseball Data with R and have been going through it. I’d love to see the code used.