Archive for Research

What if a Team Bullpens an Entire Season?

We saw the Yankees basically bullpen the AL wild-card game. Sure, it was on accident, but their bullpen pitched 8.2 innings. And they did it well. This made me think about whether a team could put together a pitching staff that is almost completely used for bullpenning for the entire season.

To see if this would be possible, we will look at the Yankees since they are the team most closely equipped for it already. In the wild-card game, they essentially used four relief pitchers (let’s not count the one out Luis Severino had). Chad Green, David Robertson, Tommy Kahnle, and Aroldis Chapman combined for 8.2 innings and one earned run. Clearly, if a team could do this all the time, they would. In that game they did not use other relievers Dellin Betances and Adam Warren, as well as regular starting pitchers Jordan Montgomery and Jaime Garcia, who would have been available that night.

Since we now know what happened in that bullpen game, can we find out if it is possible to do it over a full season? First off, and MLB roster is comprised of 25 men for any given game and an additional 15 that can be called up if needed. An AL team can get by with 12 position players: one for every starting position (including DH) plus a fourth outfielder, utility infielder, and backup catcher.  Let’s say a team’s backups can field multiple positions, like many can. We can get rid of the everyday DH and use one of the backups or starters in that role for a needed day off. That leaves us with 11 position players and room for 14 pitchers.

Many of the Yankees’ own relievers can go multiple innings. Among those pitchers are Chad Green, David Robertson, Tommy Kahnle, Adam Warren, and occasionally Aroldis Chapman and Dellin Betances. Each are effective in their own right. The problem we have to face is the amount of rest needed for these pitchers. The four from the wild-card game each pitched with two days of rest, so we’ll set that as a bench mark. I also don’t want to assume a team needs five pitchers each game like they did in the wild card.

I don’t want to completely get rid of the starting pitcher. It would be dumb to just throw away what Luis Severino and other starters bring to that team. Instead, I want to put a hard limit on how much they pitch each game and how often they pitch. Theoretically, a team could go with a three-game cycle of pitchers. Games are played almost every day during the season, so the two days of rest benchmark will be used here. If we are using four pitchers per game every three games, we need 12 pitchers.

Game 1 Game 2 Game 3
L. Severino M. Tanaka S. Gray
C. Green A. Warren D. Robertson
T. Kahnle D. Betances C. Shreve
A. Chapman J. Holder G. Gallegos

I didn’t make this with any set reason, just the best options the Yankees would have in my view. There are many other options available for them and some may be even better. But, if this is the set of pitchers being used, that leaves two extra spots for our 14 available pitchers. Those two extra spots can be utilized for guys needed for extra innings that can pitch multiple innings, or a guy needed for an inning or two in case one of the above gets into trouble.

If a team were to go by this set of pitchers, the regular starting pitchers would be throwing 162 innings over a season. That would be seen as pretty normal for a starting pitcher over the course of a season and in some cases much less. Severino pitched 193 innings himself. The relievers, however, would see a pretty big bump in action. They would pitch 108 innings in a season, more than any of the pitchers above did last year. However, some of those pitchers were starters to begin their careers. Green, Warren, Betances, and Holder have each pitched more than 108 innings in a season. Now, that could be a reason for their increased effectiveness as relievers, but they would still only be pitching two innings in a game, not five or six.

It is possible to ask these relievers to stretch their arms out to be able to throw that many innings in a season. Relievers do transition to starting and this wouldn’t be quite the workload necessary. If a pitcher needs a break during a cycle through this set of pitchers, that could be what the additional two pitchers on the roster are for, or some of the 40-man pitchers could be called up to give a guy a break. They could also call up an actual starter from the minors to take over for four or five innings after the three-inning “starter” in this example. My point here is that if the relievers get tired over the course of a season, there are ways to give them breaks. Plus, the Yankees have so many resources and available pitchers that they have that capability to give breaks.

If the Yankees wanted to, they could keep Severino, Tanaka, Gray, Green, Warren, Robertson, Kahnle, Betances, and Chapman all on the roster for the whole season. That makes up 3/4 of the necessary pitchers. Shreve, Holder, and Gallegos could each be cycled up and down from AAA with other pitchers like Ben Heller, Domingo German, etc. in order to give breaks to the core nine pitchers. Another solution is to go out and get more relievers who can pitch multiple innings on a regular basis. They certainly have the prospects to do that. Pitchers like Brad Hand, Yusmeiro Petit, and Mike Minor each pitched over 77 innings and were very effective doing so.

Clearly there is much more that would be needed to make this a reality, and I don’t have the resources to know if it is even possible. Maybe these guys simply couldn’t pitch that many innings over a full season or they would lose too much velocity of break on their pitches from fatigue. But I saw David Robertson pitch 3.1 masterful innings in the wild-card game and pitch another 1.2 innings three days later. Obviously that is only two outings, but he was nevertheless effective in doing it, and I believe if any team could make this happen, it would be the Yankees.


The Effect of Rest Days on Starting Pitcher Performance

Since the dawn of baseball, fans and coaches alike have debated whether or not pitch count and days of rest affect a pitcher’s health status and performance. This ongoing discussion has led to a close examination of how to best manage the health status of a pitcher. Should you give your starting pitcher that extra day of rest or can you pitch him in the big game today? The question of how to manage your starting pitcher can make or break a season, and, therefore, certainly merits the amount of attention and debate it has received.

Major League Baseball’s adjustment to the age of big data has reshaped the way in which we view these age-old debates. Nowadays, there are public databases that allow hobbyists and students of the game to query their own data and investigate their own theories. Baseball Savant and Baseball Reference are the two main public databases in use, and are the two databases that will be utilized for this study. The data being queried is rest days and runs scored per inning pitched for starting pitchers in Major League Baseball in the last five full seasons.

Problem Definition

In this study, I will look at the effect that the number of days of rest has on the performance and health of a starting pitcher in Major League Baseball. More specifically, I will investigate whether or not fewer rest days are correlated with poor performance and poor health status. Not only does this study have the potential to save millions of dollars for the baseball industry, but it could also provide starting pitchers with more knowledge on how rest days between starts affects their health and performance. The predictor “Runs Scored per Inning Pitched” will be evaluated to determine performance. Although there is a significant amount of noise (i.e. many factors contribute to the outcome) in the runs scored predictor, it seems like the best way to determine a pitcher’s performance on a game-by-game basis. Ultimately, the number of runs scored is the difference between winning and losing, and therefore should be the main criteria used to judge the performance of a starting pitcher.

Results

I determined that there is a significant difference between a pitcher’s performances on a specific number of rest days versus the others. However, there is no significant difference in starting a pitcher on “short rest” (1-3 days) versus “normal rest” (4-6 days) versus “extended rest” (7+ days).

This is an extremely important result considering that starting pitchers are usually employed on three, four, or five days of rest. Currently, starting pitchers are believed to perform at the highest level without the added possibility of injury with this amount of “normal rest.” However, this study shows that there is no significant difference in starting your pitcher on short rest vs. normal rest vs. extended rest. While there is a correlation in the specific number of rest days and performance of a pitcher, there is no significant difference in starting your pitcher on short rest vs. normal rest vs. extended rest.

This study shows that each of those extra off days could not only make a significant difference in pitching performance but also could make a difference in health status for pitchers. There is a fine line between getting the most out of your starting pitcher, and overusing him.

Data Analysis and Tests

In order to determine if there is a significant difference between runs scored per inning pitched and the number of rest days, a non-parametric ANOVA test is needed. The results are as follows:

Reject Ho at alpha=. 05, the Runs Scored per Inning Pitched rate is significantly different for at least one of the number of days of rest. The number of runs scored per inning pitched is significantly different for at least one of the numbers of rest days.

However, we want to know if having your starting pitcher pitch on “short rest” is significantly different than having your starting pitcher on “normal rest.” In order to do this, the data was split into number of days of rest 1-3 and days of rest 4-6. Zero days of rest was eliminated, as these numbers typically only apply to relief pitchers. Then, a non-parametric rank sum test was conducted to determine if performance on “short rest” is significantly different than performance on “normal rest.” The results are as follows:

Do not reject Ho at alpha=. 05, the Runs Scored per Inning Pitched rate is not significantly different for “short rest” and “normal rest.” There is no significant difference in performance between pitchers on “short rest” and “normal rest.”

Last, “extended rest” was looked at to determine if runs scored per inning pitched was significantly different than “short rest” and “normal rest.” “Extended rest” includes all rest days of 7 and over. The results are as follows:

Do not reject Ho at alpha=. 05, the Runs Scored per Inning Pitched rate is not significantly different for short rest, normal rest, and extended rest. Therefore, there is no significant difference in performance between short rest, normal rest, and extended rest.

Recommendations

The first recommendation I would make would be to look at pitchers coming off the disabled list and starting. Starting pitchers can definitely be skipped in a rotation when a team has an off day. This causes there to be much more time between starts.

If possible, data that tracks rest time between pitcher’s starts up to the hour as a continuous variable would be ideal. This could provide more insight into the effect of rest on performance of starting pitchers, and it would provide more of a continuous variable for analysis instead of treating all rest days equally.

Another recommendation for the study would be to use a different predictor for performance. Finding a public database that included days of rest data for each start was tough, and finding one that had days of rest data for each start along with the predictors that were sought after was even tougher. Ideally, an advanced statistic like FIP or weighted On-Base Average would be used, but these predictors are very difficult to calculate for over 1300 data points.

As long as there are starting rotations in baseball, the question of how off-days affect the performance and health of starting pitchers will be studied. Another potential study would be to look at the pitch count of starting pitchers. This could have a similar effect as rest days when looking at performance. With the recommendations made in this study, a future study to determine if performance is affected by pitch count and days of rest would be extremely beneficial.


Dryness in Paradise: On Humidors in Spring Training

Spring-training games in the Cactus League are a unique joy, especially for baseball fans (like me) who hail from colder climes. Unlike the Grapefruit League, which features stadiums separated by hundreds of miles of humid Florida air, the Cactus League consists of a compact cluster of stadiums bathed in sunshine and desert-dry air. Spectators and players alike can enjoy the spring conditions (and for some, including myself and Carson Cistulli, Barrio Queen guacamole and sangria) in the Valley of the Sun for weeks before teams return to their home stadiums across the country in late March.

Figure 0: Your author enjoying the 82-degree sunshine (and probably a juicy IPA, not pictured) at Hohokam Stadium, March 2017

Some teams will return to relatively warm and dry climates (Arizona Diamondbacks, who have to trudge the 20 freeway miles to Chase Park), but others will return to retractable domes (Seattle Mariners) or cold conditions where snowed-out games are certainly not out of the question (Cleveland). Given that the point of spring training is to get players ready for 81 games at their home ballpark, are two months of baseball in dry, sunny paradise the best way to prepare players for opening day at home? Short of building exact climate-controlled replicas of Kauffman Stadium and Wrigley Field in the Phoenix Metro, how could teams better prepare their players for the start of the season at their own home ballpark? Enter an unlikely hero, the great “Rocky Mountain equalizer”: the humidor.

Figure 1: Climatology of Phoenix, AZ (Feb-Mar) and the home locations (ICAO Airport codes) of the 15 Cactus League teams (Apr-May)

Just by eyeballing the graphs in Figure 1, without wading into the different lines and the specific airports (some lines switch to larger airports with RH), no stadium’s meteorological conditions are close to those in the Phoenix area. With the exception of the Rangers, no team plays in a stadium with an average May high temperature greater than the average March high temperature in Arizona. And only the “high desert” of Colorado comes close in RH to the dry air in Arizona March. Clearly, the opening day meteorological conditions will be significantly different from those Cactus League players see during spring training (Figure 2).

Figure 2: Changes in climate between April (major airport nearest home stadium) and March (PHX), with larger markers indicating larger temperature differences (dotted markers indicate increased T) and blue markers indicating more humid conditions (orange being drier)

This drastic change in temperature and humidity (Figure 2) is likely to have a major impact on how the ball plays once teams leave Arizona. Like many baseball physics researchers before me, I will once again heavily rely on the work previously done by Dr. Alan Nathan to inform my physical exploration herein. As shown in Nathan, et al. (2011), the two crucial meteorological factors of temperature (T) and relative humidity (RH) have a strong impact on both aerodynamic factors (such as drag) AND contact factors (such as coefficient of restitution, COR) that determine how far a batted ball travels. Rather than run afoul of the copyright of the American Journal of Physics by reproducing the figures here, I highly encourage you to check out Figures 2-4 in Nathan, et al. (2011) to see these relationships.

Equation Block 1: Calculating the effect of COR changes on “effective” exit velocity of a batted ball

The eternally relevant Baseball Trajectory Calculator developed by Alan Nathan has the ability to adjust aerodynamic factors associated with stadium altitude, barometric pressure, temperature, and relative humidity. Combined with the equations from Block 1 above, the changes in COR as a result of meteorological changes can be simply approximated in the Nathan Calculator as a manual change in the rebound (exit) velocity of the ball off the bat.

Great, simply smash aerodynamic and COR changes together and we’re in business, right? Well, almost…it seems every baseball physics article could have all the baseball-specific details stripped out and what would remain is a meditation on linearity and covariance. This example is no different. While we might expect meteorologically-induced aerodynamic and contact factors to vary independently, in real on-the-field situations, balls will be affected by not only their current conditions but also their recent history of past conditions. Absent experimental data on the time scale of such internal ball changes, we can still get a general sense of what could happen when multiple changes overlap. Let’s dive into some colorful 3-D contour plots of results using the default batted ball parameters of the Trajectory Calculator (100 mph pitch, 100 mph exit velocity, 30 degree launch angle) and see what happens!

Figure 3: Effects of meteorological T and RH on fly ball distance, including COR effects equal to ambient conditions (as if balls were kept in the same conditions)

 

We aren’t too far afield from the basic variables one can change in the Nathan Calculator, so the results from Figure 3 aren’t terribly surprising. Baseballs travel further through warm and dry air. In addition, dry/warm baseballs are bouncier than cold/wet baseballs. It’s unlikely that equipment managers are keeping baseballs outside, so they probably aren’t going to actually experience changes in COR associated with extreme conditions due to the time necessary for water vapor to diffuse into the guts of the baseballs and soften them. But absent a sense of how equipment managers store baseballs, let’s explore the possible impact that a spring training humidor could have.

Figure 4: Effects of humidor-like T and RH on fly-ball distance, with aerodynamic effects equal to PHX March average but COR changing with humidor conditions

Figure 4 shows what would happen if we changed the internal ball T and RH but continued to play in the average Phoenix-area meteorological conditions in March. The weakness of the temperature effect compared to the strength of the humidity effect can be predicted with the slope of each experiment in Nathan, et al. (2011). It’s unlikely, though, that T and RH both have, when combined, a linear effect on COR. For example, it’s unclear whether this linear model captures the hot/wet and cold/dry combinations correctly. This indicates the need to inspect the covarying relationship between T and RH on COR (and therefore, fly-ball distance) more deeply than the simple linear combination I used in this model.

Table 1: Monthly climate, elevation, default fly ball distance using the Nathan Calculator and monthly climate, and scale factors for conversion of March fly ball distance (at PHX) to April fly ball distance (at home).

With the data from Figures 3-4, we can figure out an appropriate scaling factor (Table 1) to translate the dimensions of each team’s spring training stadium and compare them to the dimensions of their home stadium (Figure 5).

Figure 5: Surprise Stadium (KC) and Scottsdale Stadium (SF) scaled to April climatology in KC and SF (no humidor)

After comparing the “effective dimensions” of the Cactus League stadiums to the home stadiums of each team, one can’t help but wonder if the teams had a hand in the way the stadiums in Arizona were constructed. Some teams, such as the Royals, share a stadium with another team (Texas Rangers); therefore, this clearly can’t explain all of the similarities between stadium shapes.

Figure 5 shows that in Arizona during the month of March, the spring training stadiums play much “smaller” compared to other stadiums than their physical dimensions might indicate. By slightly lowering the COR of the ball by using a humidor, teams could cause their spring training stadiums to play with effective dimensions approximately equal to those of their home stadiums. If the Royals were to store their spring training baseballs in a humidor at approximately 70% RH, the differences between the distance up the lines (longer at Surprise than Kauffman) and the distance to straightaway center (shorter at Surprise than Kauffman) would yield around the same “effective surface area” of the scaled outfield.

This analysis, much like my earlier piece on fly-ball precession, neglects many physical variables that would impact the actual games being played. In this example, I have neglected the effects of wind and day-to-day changes in barometric pressure. Prevailing winds due to stadium orientation and location would make this experiment much more realistic. For variations in pressure due to synoptic weather systems (cold fronts, warm fronts, etc.), however, “averages” over an entire month inform us less in terms of the baseline environments of each stadium than monthly averages of temperature and relative humidity. The model also assumes that the balls are essentially stored in temperatures and humidities equal to the ambient conditions in the home stadiums; equipment managers likely store them in some indoor location, but it’s unclear whether they are treated to the exquisite RH control seen with the humidor at Coors Field. Such confounding factors will be explored in future follow-ups to this piece.

In addition to physical assumptions made here, it’s quite possible that baseball operations departments in teams have goals in spring training other than closely approximating the hitting conditions in their home stadiums. But if they want to see who will have power that plays well in their home stadium, the humble humidor could play a key role in moderating the enhanced fly-ball distance that comes naturally with the warm, dry spring air of paradise (Cactus League baseball, that is).


Can Wobble Rob(ble) Hitters? Fly Ball Distance and Baseball Precession

In the chase to break the story of the “smoking gun” behind the recent surge in MLB home runs, many a gallon of digital ink hath been spilt exploring possible modifications to the MLB balls, home-run-optimized swing paths, and even climate change. In my field of Earth Science (atmospheric chemistry, to be more exact), it’s rare that a trend in observations can be easily attributed to a single causal factor. Air quality in a city is driven by emissions of pollutants, wind conditions, humidity, solar radiation, and more; this typically leads to a jumble of coupled differential equations, each with a different capacity to impact overall air quality. To my untrained eye, agnostic to the contents of the confidential research commissioned by MLB and others, this problem is no different: a complex mixture of factors, some compounding each other and some canceling others, is likely fueling the recent home-run spike.

This article will examine the potential for a change in the MLB ball minimally explored thus far: reduction of precession due to decreased internal mass anisotropy. What a mouth full! “Precession” and “anisotropy” don’t have the same ring as “juiced ball” or “seam height” (though they may be on par with “coefficient of restitution”). But these words can be replaced with a more familiar (though funny-sounding) word: wobble. This wobble can occur for many reasons, but the most probable explanation in baseball is that the internal baseball guts are slightly shifted from the center of the ball. This could be due to manufacturing imperfection, or in the course of a game, contact-induced deformation of the ball.

Precession, in general, occurs when the rotational axis of an object changes its own orientation, whether due to an external torque (such as gravity) or due to changes in the moment of inertia of the rotating object (torque-free). Consider a spinning top: the top spins about its own axis (symmetrically spinning about the “stem” of the top) while the rotational axis itself (as visualized by the movement of the stem) can trace out a coherent pattern. If imparted with the same initial “amount” of spin in different ways, the total angular momentum (from both rotation and precession) of the top will be the same whether it’s spinning straight-up or precessing (wobbling) in an elliptical path.

Figure 0: Perhaps the most hotly debated spinning top in the world

As with other potential explanations relating to a physical change in the ball, a change in mass distribution could have occurred unintentionally due to routine improvements in manufacturing processes. By getting the center of mass (approximately, the cork core of the baseball) closer to the exact geometric center of the ball, backspin originally “lost” to precession (in the form of wobble-inducing sidespin) could remain as backspin while conserving total angular momentum; increased backspin has been shown to increase the “carry” of a fly ball, therefore increasing the distance (potentially extending warning-track shots over the fence). A deeper discussion of angular momentum can be found in any mechanics textbook or online resource (such as MIT OCW handouts), but the key takeaway when considering a particular batted fly ball is that productive backspin gets converted to non-productive precession (roughly approximated as sidespin in one axis) when mass is not isotropically (uniformly from the center in every direction) distributed. This imparts a torque-free precession on the spinning ball, causing the rotational axis to trace out a coherent shape.

Precession in baseball has not been deeply studied; in fact, when explicitly mentioned in seminal baseball physics resources, it is noted as a potential factor that will be ignored to simplify the set of physical equations. Together, dear reader, we shall peek behind the anisotropic veil and explore how precession might impact fly-ball distance, and by extension, home-run rates.

***

For those of us with some experience throwing a football, even just in the park, we can picture the ideal “backyard Super Bowl” pass: a tight spiral that neatly falls into the outstretched hands of the intended receiver. The difficulty of executing such a perfect throw is evident in the number of nicknames for imperfect throws that wobble (precess) on their way up the field short of their intended target (see “throwing ducks” re: Peyton Manning). In football, the wobbly precession of a ball in flight is typically blamed on the passer or credited to a defender for deflecting it (or in some cases, allegedly, a camera fly wire). It’s not as easy to imagine such behavior in baseball: even in slow-motion video shots of fly balls, the net spin of the ball is dominated by backspin. In addition, the nearly-spherical shape of a spinning baseball has significantly different aerodynamics than the tapered ellipsoid used in football. However, even a small amount of precession has the potential to shave yards off the distance of a football pass; therefore, impacts of precession are certainly worth exploring in the game of baseball.

As a sometimes-teacher (I have taught two laboratory classes at MIT), I strongly believe in the power of simple physical models to qualitatively inform trends in the not-so-simple real world. Therefore, for the first step of exploring the effect of ball precession in the game of baseball, I have turned to the wonderful Trajectory Calculator developed by Dr. Alan Nathan. The Calculator numerically solves the trajectory of a batted ball by computing key physical properties in discrete time steps. While many physical attributes of the ball are calculated in the various colored fields, any of them can be overwritten with custom values.

Figure 1: Fly Ball Distance with Nathan Trajectory Calculator defaults, conversion of backspin to sidespin

In Figure 1, I use the Trajectory Calculator to explore the effect of sidespin conversion on a single fly ball with the same initial contact conditions as the default (100mph exit velocity, 30-degree launch angle, default meteorological conditions), with the total spin set to 240 radians per second. Backspin is not converted to sidespin in a one-to-one fashion: because of the Pythagorean relationship between these factors, total spin is equal to the square root of the sum of the squares of sidespin and backspin. Therefore, to conserve angular momentum, a 10% reduction in backspin (216 rad/s) yields 104.6 rad/s of sidespin, which together lead to a ~1% decrease in fly ball distance from 385.3 ft to 381.3 ft.

With all of the assumptions made here, notably that introduction of precession can be simulated as pure conversion to sidespin to conserve angular momentum, the effect of precession on the flight path is clear but rather modest in this simple approach. However, the Calculator results show that by reducing the “wobble” in a ball’s trajectory, it will carry further. A league-wide reduction in precession would mean that balls would, on average, travel further, leading to an uptick in home runs. If decreased precession would also decrease the effective drag the ball experiences in flight, the effect of increased fly-ball distance could be even further enhanced.

A more realistic exploration of precession will require further modification to the modeling tools at hand. Following Brancazio (1987), which studied the effects of precession on the trajectory of a football, and additional follow-on work, a precession-only physical model can be developed to explore more complex aspects of the problem posed here. Elements of this precession-only model can be fed back into the Nathan Trajectory Calculator, but without a full understanding of some unconstrained physical constants and mechanical aspects of the pitch-contact-trajectory sequence, a tidy figure in the style of Figure 1 will be difficult to produce.

Again, as I mentioned above, I find simple models to be effective tools for teaching concepts. Therefore, let’s consider a “perfect” baseball to be a completely uniform, isotropic sphere, as in Figure 2. This perfect ball is axially symmetric and should not have any precession in its trajectory due to changes in its moment of inertia (I). Now, let’s add a small “spot mass” (that doesn’t add roughness to the surface) on the surface of the ball along the axis of rotation corresponding to pure backspin (the x-axis here). This ball with a spot mass should approximately represent an otherwise-perfect sphere whose center of mass is slightly shifted in the x-direction.

Figure 2: (A) real baseball, (B) perfect sphere, (C) sphere with a point mass at the surface, and (D) sphere with slightly offset center of mass approximately equivalent to (C)

If the model ball has a mass m1 that is isotropically distributed through the entire sphere, and a point mass with mass m2 that is located on the surface along the x-axis, the moment of inertia can be calculated in each direction, summing the contributions from the bulk mass m1 and the point mass m2 (Figure 3).

Figure 3: Moments of inertia for isotropic ball (mass m1) with a point mass (m2) at the surface

Of course, the mass of a real baseball isn’t isotropically distributed, and there is no such thing as a “point mass” in reality; however, by exploring different combinations of m1 and m2 that sum to to mass of an actual MLB baseball (5.125 oz, as used in the Nathan Trajectory Calculator), the ball can be distorted in a controlled manner to explore the effects on precession and fly-ball distance.  Using a set of equations derived from Brancazio (1987) Equation #7, the initial backspin of a ball (omega_x0) can be calculated given an initial total spin (omega), the variable B (the “spin-to-wobble” ratio indicating the number of revolutions about the x-axis per precession-induced “wobble”, a function of the moments of inertia I_x and I_yz), and the angle of precession (built into the variable C, with theta being the angle between the x-axis and the vector of angular momentum when precessing, similar to the angle between a table and the “stem” of a spinning top).

Equation Block 1: Derivations from Brancazio (1987) used in a simple model of baseball precession

The limitation of this approach is that in order to explore the theta-m2 phase space, we must prescribe a priori an angle theta at which the precession occurs. By instead solving for theta from equation 5 above (Figure 4), we can get a sense of the possible values for theta by prescribing the fraction of omega that is converted to precession (the variable A, a mixture of omega_y and omega_z, also called “effective sidespin”).

Figure 4: Contour plot of theta (degrees) with respect to ranges of m2 and variable A (effective total sidespin)

Figure 4 shows that angles between 0 and 6 degrees are reasonable for the conditions explored using the approach from Brancazio (1987) as translated to baseball. So let’s turn to equation 6, using a range of angles from 0 to 6 degrees, to explore the effects of precession on backspin omega_x (Figure 5).

Figure 5: Contour plots of backspin (omega_x) and effective sidespin (variable A) with respect to m2 (as % of m) and theta (degrees)

Great, the effect of a point mass along the x-axis of the ball can be quantified in this model! The effect is modest, but has the potential to slightly decrease the distance of an identically struck isotropic ball. But there is one major limitation to the model as currently shown: when the angle theta is chosen a priori, there is no capacity of the model to correct to a more physically stable angle. In fact, along the entire x-axis of the plots in Figure 5, where m2 = 0, the ball should be completely isotropic and therefore no precession would occur; a small initial theta would likely be damped out over a small number of time steps. In addition, the contours of constant omega_x in Figure 5a curve in the opposite sense than might be expected: increasing m2 should lead to more pronounced procession. On the other hand, this very simple model does not take into account the possible effects of torque-induced precession caused by gravity (extending the effect of mass anisotropy alone), nor does it account for additional drag impacting a precessing ball. More study is needed to further elucidate the possibility of precession having a considerable impact on fly-ball distance; however, unlike the sometimes-empty calls for “further exploration” of minimally promising leads in academic journal articles, I intend to execute such investigation.

All of these limitations are inherent in the fact that, without outside data to constrain the physics of precession as it applies to baseball, the problem we are trying to solve with this simple model is an ill-posed problem in which there is not a unique solution for a given set of initial conditions. Luckily for us, we live in the Statcast age where position, velocity, and spin of the baseball are all continuously measured (if not fully publicly available). In addition to benefits gained from Statcast data, this problem can also be further constrained by experimental data on MLB balls. Finally, an opportunity to put my skills as an experiment-first, computational-modeling-second scientist, to use! Stay tuned to these pages for follow-up experiments and data analysis in this vein.

The conspiratorial allure of an intentional ball modification directly induced by Commissioner Rob Manfred is visible on online comment sections far and wide; however, many of the most credible explanations for ball changes are benign in Commissioner intent and perhaps attendant with improvements in ball-manufacturing processes. In any case, there are likely multiple facets to the current home-run surge. Ball trajectory effects due to precession have traditionally been ignored to simplify the problem at hand; this initial exploration shows that due to the difficulty of the problem, that was likely a good trade-off given the data available in the past. In the future, however, past work in diverse areas from planetary dynamics to mechanics of other sports can be used alongside new and emerging data streams to help determine the impact of precession on fly-ball distance.

 

Python code used to generate Figures 4-5 can be found at https://github.com/mcclellm/baseball-fg

Special thanks to Prof. Peko Hosoi (MIT) and Dr. Alan Nathan for providing feedback on early versions of this idea, which was born on a scrap of paper at Saberseminar 2017.


Jonathan Lucroy, the Rockie, Is Baseball’s Best Contact Hitter

It’s no secret that Jonathan Lucroy is having a subpar season.

The two-time NL All Star was projected to be a top-three catcher in 2017.  Before the start of the season, Steamer pegged his value at 3.6 wins above replacement, while ZiPS had him at 3.2.  His .242/.297/.338 line and 66 wRC+ in 306 plate appearances as a member of the Texas Rangers produced 0.2 WAR.  No one really expected that.

Lucroy was eventually traded to the Colorado Rockies.  The Rockies, who had the worst catching tandem in baseball, instantly viewed Lucroy as an upgrade, while many other playoff-bound teams would have viewed him as a liability.  With the hitter-friendly environment of Coors Field and poor pitching staffs among the San Francisco Giants and San Diego Padres, the team figured that Lucroy would return to his All-Star form once again.  Although he has not returned to being the power threat that he once was, he has changed his game ever so slightly, such that he might have become the game’s best-hitting catcher.

His basic stat line is not reflective of his plate discipline as a member of the Rockies.  His slash line has gone back up to near his career average (.279/.384/.377), but what is most impressive about him is his actual hitting ability.  Always a good contact hitter, he has changed his game to be more selective, get more contact, and put the ball in play.  His 92 percent contact percentage ranks first in baseball since the trade, and his 88 percent contact percentage of pitches outside the strike zone also ranks first.  The result: a high walk rate (12.3 percent) and fewer swinging strikeouts (6.3 percent of plate appearances resulting in a strikeout).  All of this while swinging at fewer pitches outside the strike zone (18.6 percent) and fewer swings in general (38 percent).  You may be asking “Why isn’t he leading the league in hitting with numbers like that?”  Well, the answer is rather simple.

While he is making more contact than anyone in baseball, most of the balls in play are hit to the defense.  This season, he is hitting more ground balls than ever before.  As a Rockie, 50 percent of the balls he has hit in play have been ground balls, well above his career average of 42.8 percent.  As a result, he has hit fewer fly balls (28.7 percent) which has led to fewer home runs (3.2 percent HR/FB).  This explains his lack of power this year.

He has hit the ball in the wrong place more this season than any other.  For his career, Lucroy has had a tendency to drive the ball up the middle — that has not changed much this season — but this season he has hit the ball softer than in any previous season.  His average exit velocity (85.0 miles per hour) is more in line with middle infielders and outfielders than catchers.  In fact, he has the fourth-slowest average exit velocity among all qualified catchers.  His average exit velocity last season was 87.6 miles per hour, and it was 88.6 in 2015.  Without the wheels of a speedy outfielder or infielder capable of beating out a ground ball (or at the very least forcing the defense to rush the throw), a ground ball for Lucroy is as good as an out.  Just as the saying “baseball is a game of inches,” it’s a game of miles per hour, too.

Fewer ground balls are going through the holes in the infield, and fewer ground balls are becoming hits.  His batting average of balls in play as a Rockie is similar to his career average (.308 as a Rockie and .306 for his career), but his RBBIP — percentage of balls in play that go for a hit or an error — is .318.  While it is above league average, it is well below his RBBIP numbers of both his All Star seasons and 2012, when he hit .320.  Has Lucroy been entirely unlucky with his balls in play?  No; pitchers have pitched to him largely down and away, which has resulted in a horrible contact percentage on those pitches, and he has also regressed slightly in every season since 2015.  But if Lucroy can keep his contact percentage up, hit fewer ground balls, and stay selective at the plate, then he could be one of the best-hitting catchers in the game again.


Relief Pitcher Pitch Rankings

To follow the starting pitchers, we have the relief pitcher pitch rankings.

1. Top Ten Four-Seam Fastball (Min 300):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Craig Kimbrel 94.74 2.34 0.23 1.80 4.15
Sean Doolittle 90.81 1.91 0.22 2.02 3.93
Chad Green 85.35 1.30 0.20 2.57 3.87
Anthony Swarzak 78.77 0.58 0.20 2.37 2.95
Josh Fields 89.12 1.72 0.27 0.89 2.61
Pedro Baez 90.00 1.82 0.28 0.78 2.60
Tommy Kahnle 84.53 1.21 0.25 1.34 2.56
Drew Steckenrider 84.55 1.21 0.26 1.13 2.34
Seung Hwan Oh 80.80 0.80 0.24 1.50 2.30
Josh Hader 87.30 1.52 0.28 0.67 2.19

The Stars: Craig Kimbrel, Sean Doolittle, Pedro Baez

Young and Coming: Chad Green, Drew Steckenrider, Josh Hader

Surprises: Anthony Swarzak, Josh Fields, Tommy Kahnle

No surprise that Kimbrel, probably the most dominant reliever of the past few years, is at the top. Jeff Sullivan discussed Green’s immense success overall and of his fastball recently in his second year for the Yankees. Steckenrider is an unknown rookie for the Marlins, but he has been exceptional for them. Hader is a top prospect for the Brewers and future starter, but his stint in the bullpen has gone perfectly. Swarzak is having a career year, so much so that the Brewers traded for him in an attempt to contend. Kahnle has broken out with the White Sox and Yankees.

2. Top Five Two-Seam Fastball (Min 250):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Craig Stammen 67.73 0.49 0.25 1.95 2.44
Kelvin Herrera 81.71 2.71 0.36 -0.52 2.18
Edwin Diaz 75.76 1.76 0.32 0.42 2.18
Joe Kelly 72.95 1.32 0.30 0.79 2.11
Ryan Madson 68.80 0.66 0.28 1.23 1.89

The Stars: Kelvin Herrera, Ryan Madson

Young and Coming: Edwin Diaz

Surprises: Craig Stammen, Joe Kelly

Herrera has been mostly terrible this year, but his track record says he is still a star. And he clearly hasn’t lost anything from his two-seam fastball. Diaz dominated as a rookie, but has slowed down a lot this season. He’s still 23 — no reason to worry. Stammen didn’t even pitch in the MLB in 2016, but he is performing solidly for the Padres. Kelly is having a career year in Boston behind his high-heat fastball.

3. Top Five Cutter Fastball (Min 200):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Jacob Barnes 104.09 1.99 0.22 1.21 3.20
Dominic Leone 99.80 1.62 0.24 0.81 2.43
Kenley Jansen 90.61 0.84 0.22 1.38 2.21
Alex Colome 85.15 0.37 0.20 1.80 2.17
Tommy Hunter 88.07 0.62 0.22 1.32 1.94

The Stars: Kenley Jansen, Alex Colome

Young and Coming: None

Surprises: Dominic Leone, Jacob Barnes, Tommy Hunter

The most infamous cutter in the game makes the top five, coming from Dodgers closer Jansen. Colome has continued a breakout from 2016 as the Rays closer. Leone had a great rookie season for the Mariners in 2014, but was knocked around in 2015/2016. He has come back nicely in 2017.

4. Top Five Sinker Fastball (Min 200):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Pat Neshek 70.87 1.06 0.25 1.66 2.72
Matt Albers 66.94 0.58 0.24 1.96 2.54
Tony Watson 73.58 1.40 0.28 1.10 2.50
Scott Alexander 76.57 1.77 0.30 0.47 2.24
Richard Bleier 65.97 0.46 0.25 1.68 2.14

The Stars: Pat Neshek

Young and Coming: None

Surprises: Richard Bleier

Neshek, a two-time All-Star, has been spectacular for the Phillies. Bleier, a 30-year-old second-year player, has been unexpectedly good in the majors the past two years.

5. Top Two Splitter Fastball (Min 200):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Blake Parker 101.30 1.30 0.18 1.48 2.78
Chasen Shreve 97.50 0.79 0.18 1.48 2.27

Only nine relievers heavily used the splitter, so this is a small leaderboard. Parker has broken out for the Angels in 2017. Shreve is the third Yankee to appear.

6. Top Five Curveball (Min 200):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
David Robertson 102.86 1.89 0.16 0.73 2.62
Jerry Blevins 95.85 1.28 0.16 0.71 1.99
Ryan Pressly 89.25 0.70 0.12 1.28 1.98
Cody Allen 90.94 0.85 0.15 0.85 1.70
Keone Kela 85.24 0.35 0.13 1.12 1.47

The Stars: David Robertson, Cody Allen

Young and Coming: Keone Kela

Surprises: None

Our fourth Yankee to appear on a leaderboard is Robertson. And none of those four have been Dellin Betances or Aroldis Chapman. Scary. Kela has been one of the only relievers holding the Rangers bullpen afloat.

7. Top Ten Slider (Min 250):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Roberto Osuna 108.02 1.97 0.16 1.52 3.49
Arodys Vizcaino 105.81 1.78 0.16 1.54 3.32
Raisel Iglesias 98.47 1.13 0.14 1.93 3.06
Blake Treinen 105.37 1.74 0.17 1.23 2.97
Pedro Strop 107.08 1.89 0.19 0.97 2.86
Ken Giles 97.17 1.01 0.16 1.57 2.59
James Hoyt 110.74 2.22 0.23 0.19 2.41
Edwin Diaz 99.11 1.18 0.18 1.12 2.31
Adam Morgan 108.19 1.99 0.23 0.16 2.15
Kyle Barraclough 88.13 0.21 0.15 1.67 1.88

The Stars: Roberto Osuna, Pedro Strop, Ken Giles

Young and Coming: Raisel Iglesias, Edwin Diaz

Surprises: James Hoyt

Osuna has been nothing short of excellent for the Blue Jays, manning the closer job for all three of his professional seasons. Still just 22 years old, the best is yet to come. Strop is widely under-appreciated, but he has been a consistent force out of the Cubs bullpen for years. Mariners young stud Edwin Diaz makes his second leaderboard appearance. Hoyt has been terrible for the Astros, so his inclusion is unexpected.

8. Top Three Changeup (Min 200):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Tommy Kahnle 99.96 1.16 0.18 1.59 2.75
Felipe Rivero 105.68 1.86 0.22 0.47 2.33
Chris Devenski 100.35 1.21 0.20 0.89 2.10

(the changeup is not much of a reliever pitch, so this leaderboard is small)

The Stars: Chris Devenski

Young and Coming: Felipe Rivero

Surprises: None

Kahnle appears again. With much-improved stuff, he has been striking out everybody en route to a big breakout season. Devenksi is only in his second year, but also in his second year of excellence. The unheralded minor-league starter turned long reliever turned dynamic/versatile setup man has been a star in Houston’s bullpen. His changeup is nicknamed the “Circle of Death,” so no surprise seeing him here. Rivero has been dominant for the Pirates in his third year in the bigs.

Top Fifteen Overall:

Pitch Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
4-Seam Craig Kimbrel 94.74 2.34 0.23 1.80 4.15
4-Seam Sean Doolittle 90.81 1.91 0.22 2.02 3.93
4-Seam Chad Green 85.35 1.30 0.20 2.57 3.87
Slider Roberto Osuna 108.02 1.97 0.16 1.52 3.49
Slider Arodys Vizcaino 105.81 1.78 0.16 1.54 3.32
Cutter Jacob Barnes 104.09 1.99 0.22 1.21 3.20
Slider Raisel Iglesias 98.47 1.13 0.14 1.93 3.06
Slider Blake Treinen 105.37 1.74 0.17 1.23 2.97
4-Seam Anthony Swarzak 78.77 0.58 0.20 2.37 2.95
Slider Pedro Strop 107.08 1.89 0.19 0.97 2.86
Splitter Blake Parker 101.30 1.30 0.18 1.48 2.78
Changeup Tommy Kahnle 99.96 1.16 0.18 1.59 2.75
Sinker Pat Neshek 70.87 1.06 0.25 1.66 2.72
Curveball David Robertson 102.86 1.89 0.16 0.73 2.62
4-Seam Josh Fields 89.12 1.72 0.27 0.89 2.61

Best Pitch: Craig Kimbrel, Boston Red Sox, four-Seam

Biggest Surprise: Jacob Barnes, Milwaukee Brewers, Cutter

The leaderboard is run by four-seam fastballs and sliders at the top, which is unsurprising considering those are the favorite pitches of relievers. I’ve said this before, but three Yankees in the top 15. And neither of their alleged best two! That’s absurd. Seeing Kimbrel at the top is the exact opposite. Jacob Barnes, however, is crazy too. The unheralded second-year man hasn’t shown much yet, with a 4.00 FIP in 2017. But that cutter is doing something to hitters.

I will add one more, combining relievers and starters, and with some interesting tidbits.


Starting Pitcher Pitch Rankings

As I stated in my earlier article, I would be posting data from my pitch-effectiveness measurement I introduced. Let’s start with the starting pitchers.

1. Top Ten Four-Seam Fastballs (Min 500):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Chris Sale 85.89 3.08 0.24 2.86 5.94
Jacob deGrom 83.06 2.68 0.27 2.13 4.81
Jose Berrios 74.74 1.51 0.27 1.97 3.48
Jimmy Nelson 76.65 1.78 0.30 1.34 3.12
Jeff Samardzija 75.97 1.68 0.30 1.34 3.02
Max Scherzer 73.97 1.40 0.29 1.55 2.95
Chase Anderson 74.24 1.44 0.29 1.45 2.89
Rick Porcello 77.50 1.90 0.31 0.87 2.77
James Paxton 73.32 1.31 0.29 1.42 2.73
Danny Salazar 80.27 2.29 0.33 0.42 2.71

The Stars: Chris Sale, Jacob deGrom, Max Scherzer, James Paxton

Young and Coming: Jose Berrios

Surprises: Rick Porcello, Chase Anderson, Jeff Samardzija

This group includes some bona-fide talent and some surprises. Porcello’s 1.90 Z-Score on the Sw+Whf% jumps out, considering his lack of stuff and general pitch to contact. Anderson is quietly putting together a solid season, with a 2.88 ERA in 122 innings of work. Samardzija’s incredible strikeout and walk peripherals have been well documented this year.

2. Top Ten Two-Seam Fastballs (Min 300):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Sonny Gray 72.12 2.18 0.30 1.39 3.57
Jaime Garcia 67.96 1.49 0.28 1.97 3.46
David Price 72.83 2.29 0.32 0.86 3.15
Lance Lynn 66.66 1.27 0.31 1.16 2.43
Matt Garza 65.31 1.05 0.30 1.34 2.39
Luis Castillo 64.66 0.94 0.30 1.44 2.38
Chris Sale 65.23 1.04 0.30 1.34 2.38
Jameson Taillon 69.98 1.82 0.34 0.40 2.23
J.A. Happ 63.82 0.80 0.30 1.29 2.09
Julio Teheran 69.27 1.71 0.35 0.20 1.91

The Stars: Sonny Gray, David Price, Chris Sale, Julio Teheran

Young and Coming: Jameson Taillon, Luis Castillo

Surprises: Jaime Garcia, Matt Garza

We see Sale again, which, considering what he has done this year, is not surprising. Garza has been generally terrible this year, so his inclusion in this list is unexpected. Castillo, a rookie for the Cincinnati Reds, has pieced together some quality starts out of the spotlight.

3. Top Five Cut Fastballs (Min 200):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
James Paxton 89.03 1.81 0.20 2.03 3.84
Corey Kluber 97.90 2.82 0.28 0.48 3.30
Tyler Chatwood 84.08 1.25 0.21 1.81 3.06
John Lackey 84.72 1.32 0.26 0.85 2.17
Zack Godley 78.94 0.66 0.24 1.39 2.05

(Only five because the small use of cutters)

The Stars: James Paxton, Corey Kluber

Young and Coming: Zach Godley

Surprises: Tyler Chatwood

We see Paxton again, who has established himself as a star this season. Godley has been great for the Arizona Diamondbacks, and Tyler Chatwood has been poor for the Colorado Rockies.

4. Top Five Sinker Fastball (Min 200):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Trevor Williams 68.72 1.87 0.30 1.73 3.61
Jimmy Nelson 65.69 1.43 0.32 1.11 2.54
Jose Quintana 64.77 1.29 0.32 1.18 2.47
Jon Lester 61.89 0.87 0.31 1.29 2.17
Jake Arrieta 58.31 0.35 0.31 1.43 1.78

(Only five because the small use of sinkers)

The Stars: Jake Arrieta, Jon Lester, Jose Quintana

Young and Coming: Trevor Williams

Surprises: None

An emerging starter for the Pittsburgh Pirates, an emerging ace for the Milwaukee Brewers, and…three Chicago Cubs. I gave the Cubs pitchers the benefit of the doubt and put them under “The Stars” category, but they may have pitched their way out of there this season.

5. Top Two Splitter Fastball (Min 200):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Kevin Gausman 94.79 0.96 0.21 1.61 2.57
Ricky Nolasco 95.42 1.02 0.22 1.35 2.37

The splitter leaderboard included only nine starters, so this one is short. Kevin Gausman has rebounded from a horrendous start to be solid, and Ricky Nolasco has continued to provide what he always has: mediocrity.

6. Top Ten Curveball (Min 300):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Corey Kluber 109.61 3.16 0.12 2.26 5.42
Charlie Morton 88.69 1.30 0.17 1.44 2.74
James Paxton 84.54 0.93 0.16 1.49 2.42
Zack Godley 93.67 1.74 0.22 0.60 2.35
Aaron Nola 87.91 1.23 0.19 1.07 2.30
Carlos Carrasco 88.65 1.30 0.19 0.99 2.28
Ivan Nova 84.32 0.91 0.18 1.21 2.12
James Shields 91.18 1.52 0.22 0.50 2.02
Alex Meyer 82.68 0.76 0.19 1.07 1.84
Jon Lester 89.57 1.38 0.22 0.45 1.82

The Stars: Corey Kluber, James Paxton, Carlos Carrasco

Young and Coming: Zach Godley

Surprises: James Shields, Alex Meyer, John Lester, Charlie Morton

We see Kluber again, and Godley again, and Paxton for a third time. No surprise considering the seasons they have put up. Shields’ days as a front-of-the-rotation starter are far behind him. Meyer has quietly put together some solid starts for the Los Angeles Angels as a complete unknown. Lester is a surprise here because this is his second leaderboard appearance, and he has not pitched well. Morton is mostly known for his injury problems, but he has developed some of the best “stuff” in the game in his first year in Houston.

7. Top Ten Slider (Min 300):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Carlos Carrasco 108.62 2.51 0.15 2.06 4.56
Max Scherzer 104.66 2.10 0.17 1.79 3.89
Sonny Gray 97.27 1.35 0.16 1.87 3.22
Dylan Bundy 99.46 1.58 0.19 1.28 2.85
Clayton Kershaw 101.38 1.77 0.22 0.82 2.59
Patrick Corbin 94.91 1.11 0.19 1.24 2.35
Marcus Stroman 96.92 1.32 0.21 1.03 2.34
Zack Greinke 104.05 2.04 0.24 0.30 2.34
Mike Clevinger 96.96 1.32 0.21 1.01 2.33
Mike Leake 96.40 1.27 0.21 0.93 2.20

The Stars: Carlos Carrasco, Max Scherzer, Sonny Gray, Clayton Kershaw, Marcus Stroman, Zach Greinke

Young and Coming: Dylan Bundy, Mike Clevinger

Surprises: Patrick Corbin

Finally! The man we have been waiting to see, Kershaw, makes his first appearance. As does Scherzer. The star power of this group is by far the strongest. Bundy has been “Young and Coming” for decades it seems now, and no one knows if the flashes will become consistency ever. Still just 24 years old, though, so I will keep my hopes up. Clevinger has been a nice surprise for the Cleveland Indians, and Corbin has bounced back from a miserable 2016 to be solid for the Arizona Diamondbacks.

8. Top Ten Changeup (Min 300):

Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
Stephen Strasburg 104.30 2.31 0.15 2.76 5.07
Luis Castillo 97.25 1.46 0.18 2.27 3.73
Danny Salazar 102.60 2.11 0.23 1.01 3.12
Kyle Hendricks 95.35 1.23 0.22 1.25 2.49
Max Scherzer 90.38 0.63 0.20 1.72 2.35
Edinson Volquez 91.28 0.74 0.21 1.54 2.28
Carlos Carrasco 86.47 0.16 0.19 1.90 2.06
Eduardo Rodriguez 95.70 1.28 0.26 0.48 1.76
Jason Vargas 91.99 0.83 0.26 0.46 1.29
Cole Hamels 93.09 0.96 0.27 0.24 1.20

The Stars: Stephen Strasburg, Kyle Hendricks, Max Scherzer, Carlos Carrasco, Cole Hamels

Young and Coming: Luis Castillo, Eduardo Rodriguez

Surprises: Edinson Volquez

Scherzer again, which makes me feel better about the validity of this work. Carrasco for the third time in a row. His breaking and offspeed stuff are killer. Very few people outside of Cincinnati know Castillo, but this is the rookie’s second leaderboard appearance. Rodriguez has continued to flash this year, but injuries and inconsistency continue for the young Red Sock. Volquez is still embracing his mediocrity.

Starters Top Fifteen Overall:

Pitch Player Sw+Whf% Sw+Whf% Z xwOBA xwOBA Z Z Total
4-Seam Chris Sale 85.89 3.08 0.24 2.86 5.94
Curveball Corey Kluber 109.61 3.16 0.12 2.26 5.42
Changeup Stephen Strasburg 104.30 2.31 0.15 2.76 5.07
4-Seam Jacob deGrom 83.06 2.68 0.27 2.13 4.81
Slider Carlos Carrasco 108.62 2.51 0.15 2.06 4.56
Slider Max Scherzer 104.66 2.10 0.17 1.79 3.89
Cutter James Paxton 89.03 1.81 0.20 2.03 3.84
Changeup Luis Castillo 97.25 1.46 0.18 2.27 3.73
Sinker Trevor Williams 68.72 1.87 0.30 1.73 3.61
2-Seam Sonny Gray 72.12 2.18 0.30 1.39 3.57
4-Seam Jose Berrios 74.74 1.51 0.27 1.97 3.48
2-Seam Jaime Garcia 67.96 1.49 0.28 1.97 3.46
Cutter Corey Kluber 97.90 2.82 0.28 0.48 3.30
Slider Sonny Gray 97.27 1.35 0.16 1.87 3.22
2-Seam David Price 72.83 2.29 0.32 0.86 3.15

Best Pitch: Chris Sale, Boston Red Sox, 4-Seam Fastball

Best Repertoire: Corey Kluber, Cleveland Indians

Biggest Surprise: Luis Castillo, Cincinnati Reds, Changeup

This list is almost all household names. In first and second, we have the AL Cy Young frontrunners. Jeff Sullivan recently wrote an article about Kluber’s curveball, and how it may be the best pitch in baseball. It isn’t number one here, but second place is not too shabby. His cutter also appears here, so his dominance is not hard to explain. Sonny Gray’s stuff is well known, and he shows up twice on this table, but his numbers are not spectacular this year. Lastly, watch out for Castillo. He’s a no-name rook, but he has been solid for the Reds, and the ranking of his changeup may be the evidence to support his success.

Next up is relievers.


An Alternate Look at Ground Ball “Luckiness”

Earlier this season, Baseball Savant unveiled expected wOBA, which, around these parts of the Internet, has made some real waves. For those unfamiliar, expected wOBA, or xwOBA, predicts a batter’s wOBA from the launch angles and exit velocities of his in-play contact. Because certain speeds and angles are more conducive to hits — for instance, most consider an launch angle to be around 25 degrees — xwOBA is often interpreted as a rough measure of luck. In particular, the difference between a player’s expected and actual wOBA (referred to as xwOBA-wOBA) is often cited in discussions of just how “lucky” that player has been. If a hitter’s xwOBA is significantly higher than his actual wOBA, for example, one can deduce that he’s hit the ball far better than his actual results imply.

A few months ago, Craig Edwards wrote an excellent piece on the new statistic, and discussed the interaction between xwOBA-wOBA and player speed. He noted that most of the “luckiest” batters — those with negative xwOBA-wOBA figures — were generally some of the faster players in the league, and the least lucky batters were among the slowest. Intuitively, this makes sense, as faster players are more likely to beat out infield hits and take extra bases when given the opportunity.

Edwards also charted players’ xwOBA-wOBA against their BsR scores, producing a linear-looking graph (with an R-squared of 0.27) which confirmed at least a moderate link between the two statistics. He noted that because there was no “perfect metric” for player speed at the time, he chose to use BsR as a proxy. While BsR serves this purpose well enough, I do find it problematic that the statistic, by definition, includes runners “taking the extra base,” as this information is also reflected in the wOBA element of xwOBA-wOBA (i.e. when a batter stretches a would-be single into a double, his wOBA is that of a double, while his xwOBA remains at a single). I’d be more comfortable, therefore, comparing xwOBA-wOBA against a more “pure” form of player speed.

It’s fortunate, then, that in the time since Edwards’ piece, Baseball Savant has also released sprint speed, which captures a player’s feet traveled per second on a “maximum effort” play. Using a list of batters with at least 200 at-bats on the season, I’ve re-created the scatterplot used in Edwards’s article, replacing BsR with sprint speed:

all_chart

As it turns out, the results are fairly similar — there is a link, albeit not an incredibly strong one, between a hitter’s speed and his xwOBA-wOBA. The trend is downward-sloping, meaning that faster batters are luckier, but there’s still a lot of scatter around the line of best fit. The highest point on the graph, belonging to Tigers slugger Miguel Cabrera, is particularly far from the trend line, as his 66-point xwOBA-wOBA is far above the expected difference of around zero.

I should also note that the above scatterplot, with an R-squared of 0.16, has a notably weaker correlation coefficient than did Edwards’s chart. The plot did get me wondering, however, how much stronger or weaker the correlation would be for different hit types. Common sense suggests that batter speed, as it relates to xwOBA-wOBA, plays a much more significant role on ground balls than on balls hit in the air. After all, a lazy fly ball to left field will be caught whether hit by Byron Buxton (tied for the fastest batter in the league) or Albert Pujols (the slowest), but Buxton will reach far more on a weak ground ball to the pitcher:

buxton_gif

Again using the all-powerful Baseball Savant search tool, I gathered separate xwOBA-wOBA figures for fly balls, line drives, and grounders. Now, let’s see how the interaction between player speed and xwOBA-wOBA changes based on hit category:

hit_type_chart

There’s virtually no relationship at all for either fly balls or line drives — indeed, neither’s simple linear regression R-squared is significantly above zero — but ground balls are a different story. Not only is the smoothed line for grounders much steeper than for either of the other two hit types, but the R-squared was nearly 0.31. While this is by no means a high correlation coefficient, it does confirm a link between ground ball “luckiness” and player speed.

Because we now know that we should expect faster players to outperform their respective xwOBAs on ground balls (and vice versa), it may also be appropriate to adjust batters’ xwOBA-wOBA figures accordingly. Using the results of the simple linear regression for ground balls, I’ve calculated the difference between each major-league batter’s actual xwOBA-wOBA and his expected xwOBA-wOBA as per the regression. I’ve called the stat “Actual Less Expected xwOBA-wOBA” (It’s a mouthful, I know; let’s just agree to call it ALE xwOBA-wOBA), and while it’s a pretty rough measure, it provides us with a speed-neutral valuation of batters’ ground-ball “luckiness.” A high ALE xwOBA-wOBA indicates misfortune; Brandon Belt, for instance, has an actual xwOBA-wOBA 161 points higher than his sprint speed would suggest. Full lists of batters with the highest and lowest ALE xwOBA-wOBAs are as follows:

ALE_luck2

Finally, I multiplied each batter’s ALE xwOBA-wOBA figure by his ground-ball rate, as per FanGraphs (multiplied by 100 for aesthetic purposes). This should show us which batters have been the most and least lucky in the context of their own respective batted-ball profiles. As shown below, there are a lot of familiar names in these weighted ALE xwOBA-wOBA lists, but there are also a few differences:

ALE_weighted

As mentioned above, an R-squared of 0.31 isn’t big enough to draw any major conclusions. Even so, there’s value in controlling for player speed in any discussion of players outperforming or underperforming their expected wOBAs. By accounting for batters’ sprint speeds, we can get a purer look at which players have actually been the beneficiaries of good luck, and which batters’ negative xwOBA-wOBA on ground balls have resulted from their foot speed. Further, it helps to highlight players who actually have been unlucky; if a player has a ground-ball ALE xwOBA-wOBA close to zero, but a high overall xwOBA-wOBA, they’ve been hitting much higher-quality fly balls and line drives — neither of which are significantly impacted by player speed — than their results indicate. Miguel Cabrera, for instance, falls into that category; while his ground-ball ALE xwOBA-wOBA is relatively close to zero (indicating that he hasn’t benefited from any speed-neutral luck or unluck on grounders) his fly-ball xwOBA-wOBA is a whopping 0.166. So, even though Miggy isn’t one of the faster baserunners in the league, he’s still got a legitimate gripe against Lady Luck — and now, we can see which other batters do, too.


Reverse Engineering Swing Mechanics from Statcast Data

There’s no question that Statcast has revolutionized the way we think about hitting. Now in year three of the Statcast era, everyone from players to stat-heads to the average fan is talking about exit velocities and launch angles. But what can a player do to improve both their exit velocity and launch angle? It all comes down to the mechanics of the swing.

The next great revolution in baseball is leveraging data about swing mechanics to optimize exit velocities and launch angles. It’s a revolution that has already begun. Using technologies developed by companies like Zepp, Blast Motion, and Diamond Kinetics, players and coaches can now get detailed analyses of every swing during practice. Teams are already starting to integrate these swing analyses into their player-development programs. However, none of these sensors are currently being used during MLB games.

It’s only a matter of time before MLB starts tracking swing data during games, but until then we can use Statcast data and a little physics to reverse engineer the mechanics of the swing. A couple of weeks ago, Eno Sarris and Andrew Perpetua wrote some great articles about the importance of making contact out in front of the plate and how we can infer the contact point from Statcast data. Other than contact point, what are the other important characteristics of a swing? Well, let’s look at Eno’s favorite graphic, from the time Zepp analyzed his swing:

It all comes down to swing speed, attack angle, and timing! The time to impact is probably impossible to get from the Statcast data, so let’s focus on the two remaining metrics: swing speed and attack angle.

Swing speed

Statcast doesn’t measure swing speed directly, but nonetheless reports an estimated swing speed, computed using an algorithm with all the transparency of a black box. In fact, it’s so secretive that estimated swing speeds have all but disappeared from Baseball Savant in recent weeks. Just to find the data, I had to dig up a couple of the saved searches from Alex Chamberlain’s article from a few weeks ago on that topic. Here is the leaderboard of the fastest average estimated swing speeds as reported in that article:

Hitter Average Estimated Swing Speed, 2015-17
Player Year AB MPH
Giancarlo Stanton 2015 437 66.5
Aaron Judge 2017 406 66.1
Nelson Cruz 2016 325 65.5
Giancarlo Stanton 2016 192 64.8
Miguel Cabrera 2016 342 64.8
SOURCE: Baseball Savant/Statcast

Eno swings like Giancarlo Stanton!

Now, I don’t want to shatter anyone’s dreams of blasting a home run off of a Major League pitcher, but something is clearly off about the data. It turns out that not all reported bat speeds are equal. Physics tells us that as the bat rotates, the barrel (the end) of the bat moves the fastest and that the bat speed decreases in an approximately linear fashion as we move toward the hands. According to Patrick Cherveny, the lead biomechanist for Blast Motion, which is the official swing sensor of the MLB, measuring the barrel speed is essentially meaningless:

“We see some swing speeds where people claim that you get into the 90s. That would make sense if it’s at the end of the bat, but if you hit it at the end of the bat, it’s not going to travel as far because some of the energy is lost in the bat’s vibration. So that kind of a swing speed is essentially ‘false.’ Swing speed is dependent on where you’re measuring on the bat. In order to maximize quality of contact, the best hitters want to hit the ball in the “sweet spot” of the bat.”

Measuring the speed of the bat at the sweet spot, a two-inch-long area whose center is located six inches from the barrel of the bat, Blast Motion reports that MLB players swing the bat between 65 and 85 MPH. Zepp, on the other hand, reports the barrel speed, which accounts for its elevated values. Still, none of the swing-tracking devices on the market report swing speeds as low as those estimated by Statcast.

Let’s see if we can uncover more information about the black-box algorithm used by Statcast to estimate swing speeds. A quick linear regression between average estimated swing speed and average exit velocity for all batters with at least 100 batted ball events (BBE) in a season from 2015-2017 yields an R2 of 0.99. Wow! Statcast estimates swing speeds almost entirely from exit-velocity data. No wonder the names at the top of the list are so obvious.

Exit velocity, however, isn’t the only velocity measured by Statcast. We also know the speed of the pitch as it is released from the pitcher’s hand. Thinking about the physics, the bat transfers energy and momentum to the oncoming ball at the point where the bat collides with the ball. Thus, any estimation of swing speed based on Statcast’s EV and pitch speed data represents the speed of the bat at the point where it makes contact with the ball. Since hitters want to hit the ball at the sweet spot, swing speeds estimated from Statcast data should fall in approximately the same range as those measured by Blast Motion.

Much of the research on the physics of bat-ball collisions has been conducted by Dr. Alan Nathan, so let’s start with one of his equations:

EV = eAvball + (1 + eA)vbat

where EV is the exit velocity, vball is the velocity of the ball before it hits the bat, and vbat is the velocity of the bat. Here eA is a fudge factor called the collision efficiency, and depends on the COR of the ball, which was at the center of the juiced-ball controversy, the physical properties of the bat, and the point on the bat in which that bat strikes the ball. Thus, assuming all MLB players use a standard ball and bat, eA can be viewed as a measure of quality of contact. Nathan found that at the sweet spot of a wood bat, e= 0.2. Using that value of eA and the release speed and exit velocities from Statcast, we can estimate the bat speed for every ball in play. According to Nathan’s pitch-trajectory calculator, the average pitch slows down by 8.4% from the release point to when it crosses the plate, so we’ll also make that adjustment to the release speed reported by Statcast. Here’s the relationship between our physics-based model for swing speed and the estimated swing speed from Statcast/Baseball Savant:

Look at that! When you get a slope of 1 and an intercept of about 0, you know you’ve hit the nail on the head. This must be the equation that Statcast is using to estimate swing speed. After doing a little digging, it appears that Nathan gave them that exact formula, but assumed that the pitch slows down by 10% by the time it crosses the plate.

The problem with this algorithm is it assumes that the hitter always hits the ball at the sweet spot. Nathan’s paper actually shows that eA varies linearly as a function of EV, from about -0.1 for the weakest hit balls to 0.21 for the best hit, depending on how far from the barrel the bat collides with the ball. To get a good estimate of swing speed, we’ll need to get a better estimate of eA. Unfortunately, eA must be computed independently for every hitter due to inherent differences in a hitter’s strength. For instance, when Giancarlo Stanton hits a ball with an EV of 100 MPH, he is making weaker contact than when Billy Hamilton hits a ball 100 MPH.

I calibrated eA for each hitter with at least 100 BBE in a season by estimating that the average of the top 15 BBE by exit velocity corresponds to eA =0.21 and the average of the bottom 15 BBE by exit velocity corresponds to eA = -0.1 for each player. Since eA and EV are related linearly, we can compute eA from EV for each player. Finally, I will assume that every player uses a standard 34 in., 32 oz. bat. Since Nathan’s study used a 34 in., 31 oz. bat, I subtracted 0.42 MPH from the estimated swing speeds, because every extra ounce reduces that bat speed by about 0.42 MPH. Here’s a look at our new average estimated swing speeds:

We see that swing speed still correlates strongly with exit velocity, but with a much more reasonable R2 value of 0.81. Much of the remaining variance is due to the quality of contact, as estimated by eA. The colors here show the soft-hit rates from FanGraphs. We can see not only that slower swing speeds result in more soft contact, but also that the regression line strongly divides hitters based on their soft-contact rates. Hitters above the line tend to make better contact and hit the ball more efficiently than those below the line, given their swing speeds.

Knowing the value of eA also gives us an estimate of where the ball hit the bat in relation to the barrel. Nathan found that eA ~ d2, where d is the distance from the barrel. Since a quadratic function has no inverse, we’re forced to infer d from our computed values of eA by assuming a linear relationship between the two variables. Once we know where the ball struck the bat, we can also estimate the barrel speed and hand speed, assuming that those speeds are proportional to distance from the axis of rotation.

League Average Estimated Swing Speeds (MPH), 2015-17
Point of Contact Barrel Hands
Year Min Avg Max Min Avg Max Min Avg Max
2015 63.9 71.9 83.3 76.3 85.8 98.9 22.8 26.7 32.2
2016 63.7 72.2 80.8 76.2 86.2 95.5 22.9 26.8 31.0
2017 63.0 71.1 78.6 75.3 84.9 93.8 22.5 26.4 30.7
Overall 63.0 71.7 83.3 75.3 85.7 98.9 22.5 26.6 32.2
SOURCE: Baseball Savant/Statcast. Players with min 100 BBE in a season

I have no idea how accurate these estimates are, but they look pretty good! The swing speeds at the point of contact line up nicely with those from Blast Motion (65-85 MPH range and league average of 70 MPH), as do the barrel speeds (Zepp claims 75-95 MPH) and hand speeds (Blast Motion says 23-29 MPH). There’s a lot more uncertainty in the barrel and hand speeds than at the point of contact, because they require additional assumptions about bat size, axis of rotation, and distance from barrel of the point of contact. Even with all of those assumptions, the accuracy probably isn’t much worse than those of the swing-tracking devices on the market today, which claim an uncertainty of about 3-7 MPH for individual swings.

Here are the fastest and slowest average swing speeds in a season during the Statcast era:

Hitter Average Estimated Swing Speeds (MPH), 2015-17
Player Year BBE Point of Impact (MPH) Barrel (MPH) Hands (MPH)
Giancarlo Stanton 2015 187 83.3 98.9 32.2
Rickie Weeks Jr. 2016 127 80.8 95.5 29.5
Giancarlo Stanton 2016 275 80.3 95.5 31.0
Greg Bird 2015 107 80.2 95.2 30.4
Gary Sanchez 2016 145 80.1 95.0 29.9
Kelby Tomlinson 2017 131 63.8 76.3 24.1
Dee Gordon 2017 497 63.8 76.2 23.2
Shawn O’Malley 2016 152 63.7 76.2 23.2
Mallex Smith 2017 178 63.5 75.6 22.6
Billy Hamilton 2017 436 63.0 75.3 22.5
SOURCE: Baseball Savant/Statcast. Players with min 100 BBE in a season

At the top of the list we see some well-known sluggers and … Rickie Weeks? Who knew he had such elite bat speed? Unfortunately for him, his average eA in 2016 was the lowest of any player in the Statcast era, indicating that he was making a ton of weak contact. Weeks is the quintessential over-swinger, whose impressive bat speed is often nullified by a lack of bat control. That’s completely unsurprising for a player’s whose 2016 highlight reel features at least one hack that would make even Charlie Brown blush:

 

I was also going to include a table of all of the fastest individual swings, until it turned into an exercise in how many times I can write Giancarlo Stanton’s name. He has 18 of the top 19 swings by barrel speed, which tops out at 108 MPH.

Attack Angle

Unlike swing speed, Statcast doesn’t give us an estimate of attack angle. Instead, we’ll again turn to some research done by Dr. Alan Nathan, this time from his 2017 Saberseminar presentation. To better understand the geometry of the bat-ball collision, let’s look at a diagram from his presentation:

The attack angle, or swing plane, is the angle that the bat is moving at when it hits the ball. Drawing a line between the centers of the bat and ball at the time of impact defines a second angle, called the centerline angle. When a hitter swings the bat such that the attack angle lines up with the centerline angle, he generates his maximum exit velocity and launches the ball at an angle equal to that of the attack angle.

Armed with this information, we can compute the attack angle by looking at the launch angles when a hitter produces his highest exit velocities. Nathan does this by plotting EV against LA for each hitter (below is his figure for Khris Davis’s BBE, whose attack angle is about 20°). He then divides the data, presumably binning the data by launch angle and then pulling out the top few BBE by exit velocity in each bin (red points). Once the data has been divided, a parabola can be fit to the red points, such that the attack angle corresponds to the peak of the parabola.

I found that the computed attack angle is fairly sensitive to the number of bins and number of data points in each bin, so this method is far from perfect. Ultimately, I chose the number of bins based on each player’s standard deviation in launch angle (~3° bins), and selected the top 20% of data points by exit velocity. I then computed a second version of attack angle by averaging the launch angles of the top 15 BBE by exit velocity (just as I did when computing swing speeds). Finally, I averaged the values from the two different methods to get a final value for the attack angle.

This method of computing the attack angle gives us what I’ll call the “preferred” attack angle. Batters change their attack angles slightly based on pitch location, but the preferred attack angle represents the plane of a hitter’s natural swing when he gets a good pitch to hit (à la batting practice).

A lot of digital ink has been spilled over the last few years trying to make sense of how to evaluate hitters using launch angles. While a ton of progress has been made, we still have a long way to go. Who knew launch angles could be so complicated? Here, we see a relatively weak correlation between attack angle and launch angle, because launch angle is also strongly dependent a hitter’s aim, timing, and bat speed. While we don’t have any direct measurements of aim or timing, we can see from the color scale that players with flatter swings (lower attack angles) have more margin for error when it comes to timing, and therefore tend to have higher contact rates than players with uppercut swings (larger attack angles).

League Average Attack and Launch Angles (°), 2015-17
Year Launch Angle Attack Angle
2015 10.5 11.4
2016 11.1 12.0
2017 11.4 13.8
Overall 11.0 12.4
SOURCE: Baseball Savant/Statcast. Players with min 100 BBE in a season

The fly-ball revolution is even more evident when looking at league-wide attack angles instead of launch angles. There was a lot of buzz before this season about players reworking their swings to increase their launch angle. Not all of them were successful though, as the average launch angle only increased by 0.3°, despite a nearly 2° jump in attack angle.

Here are the highest and lowest preferred attack angles in a season during the Statcast era:

Hitter Preferred Attack Angle, 2015-17
Player Year BBE Attack Angle(°)
Brian Dozier 2017 433 29.2
Mike Napoli 2017 268 29.0
Ryan Schimpf 2016 351 27.6
Ryan Howard 2016 220 25.7
Chris Davis 2015 265 25.1
Jarrod Dyson 2016 269 -0.1
Jason Bourgeois 2015 164 -0.2
Justin Morneau 2015 143 -1.4
Billy Burns 2016 279 -1.7
Jonathan Herrera 2015 107 -4.5
SOURCE: Baseball Savant/Statcast. Players with min 100 BBE in a season

It’s good confirmation to see Ryan Schimpf’s name on this list, though it’s interesting that his attack angle isn’t the extreme outlier that his GB/FB ratio and LA are. An analysis of attack angle may also finally give us an answer to why Brian Doziers’s home runs have gone missing this season. His 2017 batting line is almost identical to that of 2016, except his ISO (and HRs) have plummeted. The biggest difference is his attack angle has skyrocketed from 20° to 29°. We know that the optimal LA for hitting home runs is about 24°, so he’s probably getting too much loft on his fly balls this year. All of these guys at the top of the list would probably benefit by flattening out their swings a bit. Interestingly, Joey Gallo, everyone’s other favorite extreme fly-ball hitter, has an attack angle right at 24° this year. He has built the perfect swing for his batted-ball profile, which explains why he is among the league leaders in HR/FB ratio.

This turned out to be an extremely lengthy primer on swing mechanics, but there plenty of questions that can be tackled with estimates of swing metrics. For instance, can we use swing speed and attack angle to predict future exit velocities and launch angles? How much do hitters reduce their swing speeds on two-strike counts? How do attack angles change with pitch location? But, alas, those questions will have to be answered at a later time.

A complete list of swing speeds and attack angles for players with at least 100 BBE is available here.


A Metric for Home-Plate Umpire Consistency

When calling balls and strikes, consistency matters. As long as an umpire always calls borderline pitches the same way within a game, players seem to accept variations from the rule book strike zone. While there have been many excellent analyses of umpire accuracy, these studies tend to focus on conformity to a fixed zone, rather than on the dependability of those calls.

Disgruntled fans can turn to Brooks Baseball’s strike zone plots when they feel an umpire has had a bad game against their team. For example, the following zone map seems egregiously bad:

Inconsistent Zone

The calls seem very capricious, especially on the outside (right) of the zone. Balls (in green) are found in the same locations as strikes (in red), and some called strikes landed much further outside than pitches that were called balls.

On the other hand, the zone map below appears fairly consistent:

Inconsistent Zone

One might quibble with a couple of the outside calls, but the called strikes, for the most part, are contained within a ring of balls. Notice also that pitches in the lower-inside corner were consistently called balls. While this umpire didn’t establish a perfectly rectangular zone, he did establish a consistent zone; neither pitcher got those calls on the inside corner, and hitters on both teams generally knew what to expect.

In this post, I will propose a metric for assessing the inconsistency of an umpire’s strike zone. This metric does not assess how well the umpire conformed to the rule-book zone or the consensus MLB zone. Rather, it uses some tools from computational geometry to compare the overall shape formed by called strikes with the shape formed by the called balls.

Data from MLB Advanced Media describes each pitch as an ordered pair (px, pz), representing the left/right and up/down positions of the ball as it crosses the front of the plate. This pitch-tracking data includes measurements of each batter’s stance, which can be used to normalize the up/down positions to account for batters of different heights. If we draw a scatterplot of these adjusted positions corresponding to called strikes during a given game, the outline of the points represents what we define as the umpire’s established strike zone.

Convex Hull

More precisely, the established strike zone is what mathematicians call the “convex hull” of these points. If you draw the points on a sheet of paper, the convex hull is what would remain if you trimmed the paper as much as possible, without removing any points, using only straight cuts that go all the way across the sheet.

A similar construction describes the alpha hull of a set of points: replace the paper cutter with a hole punch that can only punch out circular holes of a given radius. Punch out as much of the paper as possible, without removing any of the points, and what remains is the alpha hull. Unlike the convex hull, the alpha hull can have empty region in its interior. We can therefore define an umpire’s established ball zone as the alpha hull of points corresponding to called balls.

Alpha Hull

A consistently-called game should have the property that the established ball zone lies entirely outside of the established strike zone. Any called strikes that fall within the established ball zone (and any balls inside the established strike zone) are inconsistent calls. Since it is reasonable to expect that a consistent umpire will establish different zones depending on the handedness of the batter, we calculate established zones separately for left- and right-handed batters, and then count the number of inconsistent calls from each side of the plate.

Over the course of a game, an umpire’s inconsistency index is the ratio of inconsistent calls to the total number of calls made. For example, the plots below show the established strike and ball zones for the game between the Reds and the Giants on May 12, 2017. Of the 239 calls made that day by the home-plate umpire, 14 balls fell within the established strike zone, while 5 called strikes landed in the established ball zone, resulting in an inconsistency index of (14+5)/239 ≈ 0.0795.

Alpha Hull

How do MLB umpires fare under this metric? Quite well, actually. Using data for the 2017 season (through September 10), the average inconsistency index for all games called was 0.0396. Moreover, of the 2112 games analyzed, there were 183 games where the home-plate umpire scored an inconsistency index of 0.0, meaning that the established strike zone fell completely within the established ball zone. The 15 most consistent umpires, based on their average inconsistency index over all games called in 2017, are given in the table below.

Rank Umpire Inconsistency index
(lower is better)
1.  John Libka  0.0239
2.  Mike DiMuro  0.0253
3.  Nick Mahrley  0.0274
4.  Carlos Torres  0.0275
5.  Chris Segal  0.0275
6.  Chad Fairchild  0.0281
7.  Ben May  0.0281
8.  Travis Eggert  0.0292
9.  Dale Scott  0.0301
10.  Gabe Morales  0.0308
11.  Jim Wolf  0.0310
12.  Sean Barber  0.0310
13.  Eric Cooper  0.0312
14.  Manny Gonzalez  0.0313
15.  Brian Knight  0.0314

While the strike zones of these umpires may not robotically correspond to the rectangles we see on MLB broadcasts, the zones they do establish are remarkably consistent.


Graphs and computations in this article were produced in R, using the PitchRx and alphahull packages. Source code for producing these examples is available on GitHub.