Archive for Research

Progressive Pitch Projections

When examining a batter’s strike zone judgment, the analysis is typically done based on where the pitches passed the plane of the front of the strike zone. However, this analysis usually does not include a discussion of the pitches’ trajectories as they approached the plate, which influences whether or not a batter may choose to swing at a pitch. The aim of this research is to apply a simple model to project a pitch to the plane of the front of the strike zone, from progressively closer distances to home plate, and track how the projected location changes as the pitch nears the plate. In order to quantify the quality of a pitch’s projection as it approaches home plate, we will use a model for the probability of a pitch being called a strike to assess its attractiveness to a batter. While the focus of this will be the projections and results derived from them, a discussion of the strike zone probability model will be given after the main article.

To begin, we can start with a single pitch to explain the methodology. The pitch we will use was one thrown by Yu Darvish to Brett Wallace on April 2nd of 2013 (seen in the GIF below screen-captured from the MLB.tv archives) [Note: I started working on this quite awhile ago, so the data is from 2013, but the methodology could be run for any pitcher or any year].

 photo Darvish_Wallace_P.gif

The pitch is classified by PITCHf/x as a slider and results in a swinging strikeout for Wallace. The pitch ends up inside on Wallace and, based purely on its final location, does not look like a good pitch to swing at, two strikes or not. In order to analyze this pitch in the proposed manner of projecting it to the front of the plate at progressively closer distances, we will start at 50 feet from the back of home plate (from which all distances will be measured) and remove the remaining PITCHf/x definition of movement (as is calculated, for example, for the pfx_x and pfx_z variables at 40 feet) from the pitches to create a projection that has constant velocity in the x-value of the data and only the effects of gravity deviating the z-value from constant velocity. This methodology is adopted from an article by Alan Nathan in 2013 about Mariano Rivera’s cut fastball. At a given distance from the back of home plate, the pitch trajectory between 50 feet and this point is as determined by PITCHf/x, and the remaining trajectory to the front of home plate is extrapolated using the previously discussed method.

If we examine the above Darvish-Wallace pitch in this manner, the projection looks like this from the catcher’s perspective:

 photo Darvish_Wallace_XZ_250ms.gif

In the GIF, the counter at the top, in feet, represents the distance that we are projecting from. The black rectangular shape is the 50% called-strike contour, where 50% of the pitches passing through that point were called strikes, the inside of which we will call our “strike zone” (for a complete explanation of this strike zone, see the end of the article). Within the GIF, the blue circle is the outline of the pitch and the blue dot inside is the PITCHf/x location of the pitch at the front of the plate. The projection appears in red/green where red represents a lower-than-50% chance of a called strike for the projection and green 50% or higher. As one can see, early on, the pitch projects as a strike and as it comes closer to the plate, it projects further and further inside to the left-handed hitter. If we track the probability of the projection being called a strike, with our x-axis being the distance for the projection, we obtain:

 photo Darvish_Wallace_Probability.jpeg

Based on this graph, the pitch crosses the 50% called-strike threshold at approximately 29.389 feet (seen as a node on the graph). With this consideration, and the fact that the batter is not able to judge the location of the pitch with PITCHf/x precision, it seems reasonable that Brett Wallace might swing at this pitch.

We can also examine this from two other angles, but first we will present the actual pitch from behind as another point of reference:

 photo DarvishWallace_C.gif

Now, we will look at an angle which is close to this new perspective: an overhead view.

 photo Darvish_Wallace_XY_250ms.gif

The color palette here is the same as the previous GIF (blue is the actual trajectory in this case and red/green is as defined above) with the added line at the front of home plate indicating the 50% called-strike zone for the lefty batter. Note that since the scales of the two axes are not the same, the left-to-right behavior of the pitch appears exaggerated. The pitch projects as having a high probability of being called a strike early on and around 30 feet, starts to project more as a ball.

From the side, the pitch has nominal movement in the vertical direction, and so the projection appears not to move. However, the color-coding of the projected pitch trajectory shows the transition from 50%+ called-strike region to the below-50% region.

 photo Darvish_Wallace_YZ_250ms.gif

With this idea in mind, we can apply this to all pitches of a single type for a pitcher and see what information can be gleaned from it. We will break it down both by pitch type, as identified by PITCHf/x, and the handedness of the batter. We will perform this analysis on Yu Darvish’s 2013 PITCHf/x data and compare with all other right-handed pitchers from the same year.

To begin, we will examine Yu Darvish’s slider, which, according to the data, was Darvish’s most populous pitch in 2013. Since we are dealing with a data set of over 1000 sliders, we will first condense the information into a single graph and then look at the data more in-depth. We will separate the pitches into four categories based on their final location at the front of the strike zone: strike (50%+ chance of being called a strike) or ball (less than 50%), and swing or taken pitch. We will take the average called-strike probability of the projections in each of these four categories and plot it versus distance to the plate for the projection.

For left-handed batters versus Darvish in 2013:

 photo Darvish_ST_BS_SL_LHB.jpeg

The color-coding is: green = swing/strike, red = take/strike, blue = swing/ball, orange = take/ball. Looking at just pitches that are likely to be called strikes, the pitches swung at have a higher probability of being called strikes throughout their projections, peaking at the node located at 12.167 feet (0.928 average called-strike probability for the projections) for swings and at 1.417 (0.91), the front of home plate, for pitches taken. The swings at pitches in the strike zone end at a 0.924 average called-strike probability. Both curves for pitches outside the strike zone peak very early and remain relatively low in terms of probability throughout the projection.

We can also group all swings together and all pitches taken together to get a two-curve representation.

 photo Darvish_ST_SL_LHB.jpeg

For sliders to lefties, the probability of a called strike is higher throughout the projection for swings compared to sliders taken. Similar to the previous graph, the swing curve peaks before the plate, at 20 feet with a 0.627 average called-strike probability and ends at 0.613, whereas the pitches taken peak at the front of the plate with a called-strike probability of 0.402.

To examine this in more detail, we can look at the location of the projections as the pitches moves toward the plate, similar to the GIFs for the single pitch to Wallace. Using the same color scheme as the four-curve graph, we will plot each pitch’s projection.

 photo Darvish_Pitch_Proj_SL_LHB_250ms.gif

Of interest in this GIF is the observation that most swings outside the zone (blue) are down and to the right from the catcher’s perspective. In particular, based on the projections, there appears to be a subset of the pitches with a strong downward component of movement that are swung at below the strike zone, while most other pitches have more left-to-right movement. In addition, the pitches taken are largely on the outer half of the strike zone to lefties. To better illustrate the progressive contribution of movement to the pitches, we will divide the area around the strike zone into 9 regions: the strike zone and 8 regions around it: up-and-left of the zone, directly above the zone, up-and-right of the zone, directly left of the zone, etc. In each of these 9 regions, we will display the number of swings and number of pitches taken as well as the average direction that the projections are moving as more of the actual trajectory is added in, or in other words, the direction that the movement is carrying the pitch from a straight line trajectory, plus gravity, in the x- and z-coordinates.

 photo Darvish_Pitch_Proj_Gp_SL_LHB_250ms.gif

Note that the movement of the pitches is predominately to the right, from the catcher’s perspective, with some contribution in the downward direction. In the strike zone, the pitches taken have an average location to the left of those swung at. This may be due to the movement bringing the pitches into the strike zone too late for the hitter to react. Computing the percentage of swings in each region produces the following table:

 

Darvish – Sliders vs. LHB
10 25 0
12.9 62.8 12.5
33.3 65.4 49.2

 

From the table, where the middle square is the strike zone, we can see that the slider is most effective at inducing swings outside of the strike zone, which has a better percentage of swings than the strike zone itself (Note that some of these regions may contain small samples, but these can be distinguished by the above GIFs). Next is the strike zone, followed by the region directly down-and-right of the strike zone. Going back to the projections, pitches in the two aforementioned non-strike zone regions start by projecting near the bottom of the strike zone and, as they move closer to the plate, project into these two regions.

Putting these observations in context, the movement on the sliders from Yu Darvish to lefties may allow him to get pitches taken on the outer half of the plate, which is generally in the opposite direction of the movement, and swings on pitches down and inside, in the general direction of the pitch movement. This would signify that movement has a noticeable effect on the perception of sliders to lefties. Also of note is that the pitches up and left of the strike zone have very few swings among them, and those that were swung at are close to the zone. Again using movement as the explanation, the pitches project far outside initially and, as they near the plate, project closer to the strike zone, but not enough to incite a swing from a batter.

We can further illustrate these effects on the pitches outside the zone by treating the direction of the movement at 40 feet, taken from the PITCHf/x pfx_x and pfx_z variables, as a characteristic movement vector and finding the angle of it with the vector formed by the final location of the pitch and its minimum distance to the strike zone. So if the movement sends the pitch perpendicularly away from the strike zone, the angle will be 0 degrees; if the movement is parallel to the strike zone, the angle will be 90 degrees; and if the pitch is carried by the movement perpendicularly toward the strike zone, the angle will be 180 degrees. As an illustrative example, consider the aforementioned pitch from Darvish to Wallace:

 photo SZ_MVMT_Angle.jpeg

In this case, the movement vector of the pitch (red dashed vector) is nearly in the same the direction as the vector pointing out perpendicular from the strike zone (blue vector). This means that the angle between the two is going to be small (here, it is 0.276 degrees). If the movement vector in this case were nearly vertical, lying along the right edge of the zone, the angle would be close to 90 degrees.

Taking the movement for all sliders thrown to lefties in 2013 by Darvish and finding the angle it makes relative to the vector perpendicular to the zone, we get the following hexplot:

 photo Darvish_Out_SL_LHB.jpeg

Summing up the hexplot in terms of a table:

 

Darvish – Sliders Outside the Zone v. LHB
Angle Percentage Average Distance
Less Than 45 Degrees 31.8 0.779
Less Than 90 Degrees 67.9 0.691
All X 0.608

 

So 31.8% of the sliders thrown outside the strike zone to lefties had an angle of less than 45 degrees between the movement and the vector perpendicular to the strike zone. The average distance of these pitches from the strike zone was 0.779 feet. Increasing the restriction to less than 90 degrees, meaning that some part of the movement is perpendicular to the strike zone, we get 67.9% of pitches outside met this criterion with an average distance from the zone of 0.691 feet. Finally, for all pitches outside, the average distance was 0.608 feet.

As a point of comparison, for all MLB RHP in 2013, the same analogous plot and table are:

 

 photo MLB_Out_SL_RHP_LHB.jpeg

 

MLB RHP 2013 – Sliders Outside the Zone v. LHB
Angle Percentage Average Distance
Less Than 45 Degrees 25.3 0.652
Less Than 90 Degrees 52.6 0.624
All X 0.606

 

Note that the range of possible angles is 0 to 180 degrees, with 25.3% lying in the 0-45 degree range and 52.6% in the 0-90 degree range. So based on this and examining the hexplot visually, the pitches are fairly uniformly distributed across the range of angles.

Comparing Darvish to other RHP in 2013, he threw his slider more in the direction of movement outside the zone. In particular, for angles less than 45 degrees, he threw his slider an average of 1.5 inches further outside compared to other MLB RHP. That disparity shrinks when restricting to less than 90 degrees and is virtually the same for all pitches outside.

While this observation on its own does not have much significance, we can look to see if this was an effective strategy by looking only at swings and seeing the effects.

 

 photo Darvish_Swing_Out_LHB.jpeg

 

Darvish – Sliders Swung At Outside the Zone v. LHB
Angle Percentage Average Distance
Less Than 45 Degrees 39.9 0.59
Less Than 90 Degrees 83.2 0.526
All X 0.478

 

Examining both the hexplot and the table, Darvish induced most of his swings outside of the strike zone with pitches having its movement at an angle of less than 90 degrees relative to the strike zone. Note that when the pitch is thrown outside the zone in the general direction of movement (an angle of less than 90 degrees), the pitch can still induce the batter to swing while pitches not thrown in this general direction are only swung at when very close to the zone. In particular, the majority of pitches that reach the farthest outside the zone and still lead to swings are in the range of 30 to 60 degrees. This is due to many of the swings outside the zone being below the strike zone, where the angle with the down-and-to-the-right movement will be in the neighborhood of 45 degrees.

For all MLB RHP in 2013, the hexplot for swings produces a similar result:

 photo MLB_Swing_Out_SL_RHP_LHB.jpeg

 

MLB RHP 2013 – Sliders Swung At Outside the Zone v. LHB
Angle Percentage Average Distance
Less Than 45 Degrees 31.8 0.436
Less Than 90 Degrees 64.3 0.421
All X 0.405

 

From the hexplot, we can see that the majority of pitches swung at are at an angle of 90 degrees or less; 64.3% to be precise. For less than a 45-degree angle, the percentage is 31.8%. These are both up from the percentages from all pitches. As seen with the Darvish data, as the angle decreases, the average distance tends to increase.

Finally, for pitches not swung at outside the zone, we get a complementary result to the swing data:

 photo Darvish_Take_Out_SL_LHB.jpeg

 

Darvish – Sliders Taken Outside the Zone v. LHB
Angle Percentage Average Distance
Less Than 45 Degrees 26.3 0.976
Less Than 90 Degrees 57.4 0.854
All X 0.696

 

Here, the percentages are lower than for swings and, while the largest distance is for small angles, there is a grouping of pitches present in pitches taken at angles greater than 90 degrees that is virtually nonexistent for swings. So for Darvish, throwing sliders outside the strike zone with an angle greater than 90 degrees does not appear to be a fruitful strategy, unless it plays a larger role in the context of pitch sequencing. To sum up this observation, it would appear that pitching in the general direction of movement outside the strike zone is a necessary but not sufficient condition for inducing swings from left-handed batters.

For MLB right-handed pitchers, this observations appears to still hold:

 photo MLB_Take_Out_SL_RHP_LHB.jpeg

 

MLB RHP 2013 – Sliders Taken Outside the Zone v. LHB
Angle Percentage Average Distance
Less Than 45 Degrees 22.1 0.809
Less Than 90 Degrees 46.7 0.765
All X 0.708

 

As with Darvish, the percentages drop when comparing pitches taken to pitches swung at. The hexplot also bears this out, with the largest concentration of pitches taken outside the strike zone having an angle between movement and the strike zone vector of greater than 90 degrees. These results match in general with what we have seen with Darvish, and based on the numbers, Yu Darvish is able to play this effect to his advantage, with a larger-than-MLB-average percentage of sliders outside the zone to lefties with an acute angle.

Next, we will perform a similar analysis on sliders to righties. This will allow for comparison between the effects of the slider on batters from both sides of the plate.

 photo Darvish_ST_BS_SL_RHB.jpeg

Once again, for pitches in the strike zone, the sliders swung at by righties have a higher probability of being called strikes than those taken. The peak for swings at strikes occurs at 18.333 feet (v. 12.167 feet for LHB) with a 0.945 called-strike probability and ending at 0.931, and taken strikes at 13.667 feet (v. 1.417 feet for LHB) with a 0.892 probability and ending at 0.885.

 photo Darvish_ST_SL_RHB.jpeg

Just examining swings and pitches taken, the peak projected probability is earlier than for lefties at 26.25 feet with 0.672 probability and finishing at 0.629. It also peaks earlier for pitches taken, at 23.147 feet with peak and ending probabilities of 0.454 and 0.442, respectively. Comparing with the results for lefties, the RHB both swing at and take sliders with a higher probability of being called strikes, but have an earlier peak probability.

Breaking it down again in terms of the individual pitches:

 photo Darvish_Pitch_Proj_SL_RHB_250ms.gif

The plot here looks similar to that of the lefties. However, the pitches taken in the strike zone (red) appear more evenly distributed. In addition, the swings outside the zone (blue) appear to be more down and to the right and less directly below the strike zone. To confirm these observations, we can again simplify the plot to arrows indicating the direction of movement in each region and the number of each type of pitch in each region.

 photo Darvish_Pitch_Proj_Gp_SL_RHB_250ms.gif

The table below gives the percentage of swings on pitches in each of the nine regions for Yu Darvish’s sliders to RHB:

Darvish – Sliders vs. RHB
4.3 15 16.7
0 54.3 26.7
38.9 42.1 46.3

To confirm the first observation, note that the red arrow (pitches taken) virtually overlaps with the green arrow (pitches swung at) in the strike zone. Examining the table, the value that differs the most, among the reasonably populated regions, is directly below the strike zone (42.1% to RHB v. 65.4% to LHB). One possible explanation for this is that some of the sliders ending up in this region to LHB have a stronger downward component of the movement than for RHB. This can be seen by comparing the two GIFs.

Moving on to the results for the angle between the movement and the strike zone vector, the hexplot is heavily populated by pitches thrown in the direction of movement:

 photo Darvish_Out_SL_RHB.jpeg

Considering the same metrics for interpreting this plot as before:

Darvish – Sliders Outside the Zone v. RHB
Angle Percentage Average Distance
Less Than 45 Degrees 42.3 0.587
Less Than 90 Degrees 78.9 0.618
All X 0.572

From the table, we see that Yu Darvish threw 42.3% of his sliders to RHB with an angle of less than 45 degrees between the strike zone vector and the movement vector, up from 31.8% to LHB. Nearly 79% of his sliders outside the zone were thrown with an angle less than 90% degrees, again up from 67.9% to lefties. However, the average distance is down across the board as compared to lefties.

As a point of comparison, for MLB righties to right-handed batters, the distribution looks similar to that of Darvish:

 photo MLB_Out_SL_RHP_RHB.jpeg

MLB RHP 2013 – Sliders Outside the Zone v. RHB
Angle Percentage Average Distance
Less Than 45 Degrees 31.6 0.671
Less Than 90 Degrees 62.4 0.664
All X 0.673

Compared to Darvish, MLB RHP tend to throw a lower percentage of sliders with an angle less than 45 and 90 degrees. However, the MLB average distance from the strike zone is greater across the board.

Now, isolating only swings:

 photo Darvish_Swing_Out_RHB.jpeg

Darvish – Sliders Swung At Outside the Zone v. RHB
Angle Percentage Average Distance
Less Than 45 Degrees 46.8 0.513
Less Than 90 Degrees 86.2 0.558
All X 0.512

For RHB versus LHB, Darvish’s percentages are up, if only by a few percent. The average distance for less than 45 degrees is down from 0.59 feet to LHB but up in the other two cases. This can be seen in the hexplot since the protrusion in the distribution is around 60 degrees rather than being closer to 45 degrees as before.

The 2013 MLB data shows a similar result, with a roughly triangular pattern in the hexplot, where the distance from the strike zone for swings increases as the angle between the strike zone vector and movement vector decreases.

 photo MLB_Swing_Out_SL_RHP_RHB.jpeg

MLB RHP 2013 – Sliders Swung At Outside the Zone v. RHB
Angle Percentage Average Distance
Less Than 45 Degrees 32.3 0.437
Less Than 90 Degrees 64.8 0.427
All X 0.417

As in the case of lefties, all metrics for Darvish are above MLB-average.

For the sliders taken by right-handed batters:

 photo Darvish_Take_Out_SL_RHB.jpeg

Darvish – Sliders Taken Outside the Zone v. RHB
Angle Percentage Average Distance
Less Than 45 Degrees 39.8 0.634
Less Than 90 Degrees 74.9 0.656
All X 0.605

For angles less than 45 degrees, the percentage of sliders taken outside is noticeably up, as compared with LHB (39.8% v. 26.3%) as well as for less than 90 degrees (74.9% v. 57.4%). This is not surprising since the distribution for all pitches was markedly different between batters on either side of the plate and, in this case, skewed toward the less-than-90-degrees region. The average distances are, however, down from the case for lefties.

Comparing Darvish to other RHP in 2013, the results are similar:

 photo MLB_Take_Out_SL_RHP_RHB.jpeg

MLB RHP 2013 – Sliders Taken Outside the Zone v. RHB
Angle Percentage Average Distance
Less Than 45 Degrees 31.3 0.781
Less Than 90 Degrees 61.3 0.777
All X 0.788

In contrast to MLB RHP, Darvish’s sliders that are taken outside the strike zone are closer to it across the three measures. As before, Darvish’s sliders taken are thrown more in the direction of movement as compared to MLB righties in 2013.

Discussion

When constructing this algorithm, we need to choose a metric by which to group the pitches at each increment. In this case, we are using distance from the back of home plate. While this may be suitable for analyzing a single pitcher, when dealing with multiple pitchers or flipping the algorithm around and using it for evaluating a hitter, the variance in velocity of pitches in between pitchers may have an effect on the results. Therefore, it may be better, for working with multiple pitchers or a hitter, to use time as a metric instead. So rather than tracking the projections as y feet from home plate, we would use t seconds from home plate.

Using this method, with further refinement, we could potentially try to measure quantities such as “late break”. Granted, the PITCHf/x data is restricted to its parameterization by quadratic functions so even if aberrant behavior occurred near the plate, PITCHf/x would not be able to represent it. However if we define late break as x inches of movement over distance y from home plate (or t seconds from home plate), we could hope to quantify it. Based on how we construct the projection, such as including factors other than the PITCHf/x definition of movement, late break could be considered as a difference in perceived position at a distance versus the location at the front of the plate. As seen in the swing/take curves, after a certain distance, the probability of a called strike starts to drop off for Darvish’s sliders, and we could possibly choose, from that point on, to calculate late break for each pitcher. But to do this, we would first have to figure out all elements we wish to use, including movement, to make up pitch perception. As we have seen, for both Darvish and MLB RHP in general, throwing sliders outside of the strike zone in the general direction of movement (with less than a 90-degree angle between the movement vector and the vector perpendicular to the strike zone) elicits swings at a higher rate farther outside the strike zone. In the hexplot for swings, this takes the form of, roughly, a triangular shape of the data which widens in the distance direction as the angle decreases. This can also be seen in the GIFs for the blue pitches (swings outside of the strike zone).

In addition, other elements could be added into this medley for attempting to model a hitter’s perception of a pitch as it approaches the plate. First, one could remove the drag from the movement, leaving it in the projection. Without running the projections, we can see how this would affect the results by looking at how the “movement” differs at 40 feet with and without drag. Pictured below is a subsample of the movement vectors at 40 feet for Darvish’s sliders based on the PITCHf/x definition, in green, and the movement without drag, in blue. The blue vectors are found based on Alan Nathan’s paper on the subject. The dashed red lines connect the same pitch for the different versions of movement. We can see that the movement without drag is larger in magnitude, and in the downward direction and to the right, meaning the projections would start higher and to the left. Comparing the movement vectors with and without drag, the average change in movement for the entire sample is 1.571 inches and the average change in angle between the pairs of vectors is 5.527 degrees. With drag left in the projection and out of the movement, the swing hexplots would likely take a more triangular shape with the angle between the vectors decreasing and shifting the data downward for the pitches outside the zone that were previously moving more laterally.

 photo Darvish_Slider_Movement.jpeg

One could also affect the time to the plate for the pitches as well. As it stands, this approach assumes that the hitters have perfect timing and track pitches using a simple extrapolation approach. If one were to assume that the remaining velocity in the y-direction (toward the plate) was perceived as constant for the pitches, the hitters would be expecting the pitches to arrive faster than they actually are. This would lead to the projections appearing higher, since gravity would have less time to have an effect.

A rather large assumption that we are making is that batters can decouple vertical movement from gravity. Even in cases where the vertical movement is small, this will have an effect on the projected pitch location. This may also serve as an explanation as to why the sliders swung at below the strike zone do not always have a strong vertical component of movement.

Next time, we will look at Darvish’s four-seam fastballs, followed by his cut fastballs, in a similar manner. As we will see, certain pitches excel at inducing swings outside the strike zone when thrown in the general direction of movement while others show little to no benefit at all. We can also break down the pitches swung at by the result (in play, foul, swing-and-miss) to gain further insight.

Strike Zone Analysis

This section explains the calculation and choice of model for the probability of a called strike used in the above analysis. There have been a lot of excellent articles analyzing the strike zone, such as by Matthew Carruth, Bill Petti, and Jon Roegele, among others, and this method is derivative of those previous works. Our goal is the create an explicit piecewise function that reasonably models the probability that a pitch will be called a strike, based on empirical data. However, rather than treat the data as zero-dimensional (no height, width, or length for each datum), we represent each pitch as a two-dimensional circle with a three-inch diameter. Then, over a sufficiently refined grid, we calculate the number of 2D pitches that intersected each point that were called strikes divided by the number of 2D pitches that were taken (ball or strike). This gives the percentage of pitches that intersected each point that were called strikes. This number provides an empirical estimate of a pitch passing through that point being called a strike. The advantage of taking this approach is that we do not impose any a priori structure on the data, which can happen when using methods such as binning or model fitting to the zero-D data. It also conforms with using a 2D strike zone to perform the analysis by representing the data fully in 2D. Note that since using all MLB data from 2013 to generate these plots, we have a large enough data set that we do not get jumps or discontinuities for the strike zone that may occur for smaller data sets, such as for a single pitcher. As an example, the called-strike probability for LHB in 2013 looks like:

 photo SZ_Heat_LHB-1.jpeg

The colormap on the right gives the probability of a pitch at each location being called a strike, based on the data. The solid rectangle represents the textbook strike zone (with 1.5 and 3.5 vertical bounds), and the two dashed lines will be explained concurrently with the model.

For the model, we assume a small region where the probability of a called strike is essentially 1, which, in the graph, is the long-dashed line. Far outside the strike zone, will assume that the probability that a pitch is called a strike is essentially zero. In between, we need a way to model the transition between these two regions. To do this, we will adopt a general exponential decay model of the form exp(-a x^b), where a and b are parameters. In this case, we take x to be the minimum distance to the probability-1 region of the strike zone (long-dashed line). Since there is some flexibility in how we choose the probability-1 region and the subsequent parameters, we will do this less rigorously than could be done in order to keep things simple.

First we examined slices of the empirical data in profile and found that experimenting with the probability-1 region bounds and a, b values, a value around 4 for b worked well at matching the curvature. Then a choice of a equal 4 was found similarly via guess-and-check. Finally the probability-1 region was adjusted to make the model match the data based on a contour plot for each (see below). For lefties, the probability-1 region is [-0.55,0.25] x [2.15,2.85] feet.

 photo SZ_Contour_LHB.jpeg

Note that we do a decent job of matching the contours outside of the lower-right and upper-left regions, where there is some deviation. This can be adjusted for by changing the shape of the probability-1 area, but this increases the complexity of calculating the minimum distance. When plotting the model for the probability:

 photo SZ_Heat_LHB_Approx.jpeg

Here, the solid and long-dashed lines are as before, and the dotted line is the 50% called-strike contour from the model, which is used as the boundary of the strike zone in the above analysis. While the shape of the strike zone may seem unconventional, it is a natural approach for handling the zero-dimensional PITCHf/x data. For example, if we place a pitch on the edge of the rectangular textbook zone, a so-called borderline pitch, and track the path that the center would make as it moved around the rectangle, it would trace out a similar shape.

 photo SZAnimation.gif

For RHB, the heat map is much more balanced, left to right, making the fit much closer than could be achieved for LHB.

 photo SZ_Heat_RHB.jpeg

Again, the top and bottom of the 50% called-strike contour lies near 3.5 and 1.5 feet, respectively. Examining the contour map:

Here, the identified contours fit well all around. The called-strike probability, with the model applied, is:

 photo SZ_Heat_RHB_Approx.jpeg

In this case the probability-1 region is [-0.43,0.40] x [2.15,2.83] feet.

So, overall, the RHB called-strike probability model fits much better, especially in the corners, than for LHB. In order to properly fit the called-strike probability to such a model, one would first need to have a component of the algorithm that adjusts the probability-1 area, both by location and size, and possibly by shape. Then the parameters for the decay of the strike probability could be fit against the data. The probability-1 area could then be adjusted and fit again, to see if the overall fit is better. This might work similar to a simulated annealing process. However, for our purposes, sacrificing the corners for LHB seems reasonable to maintain simplicity of method and calculations.

In closing, if you made it this far, thank you for reading to the end.


The Baseball Fan’s Guide to Baby Naming

I’ve often wondered if some sort of bizarre connection exists between names and athletic ability, specifically when it comes to the sport of baseball. Considering I grew up in the 90’s, I will always associate certain names with possessing a supreme baseball talent. Names like Ken (Griffey Jr.), Mike (Piazza), Randy (Johnson), Greg (Maddux) and Frank (Thomas) are just a few examples. With a wealth of statistical information available, I thought I’d investigate into the possibility of an abnormal association between names and baseball skill.

I began digging up the most popular given names, by decade, using the 1970’s, 80’s & 90’s as focal points. This information was easily accessible on the official website of the U.S. Social Security Administration, as they provide the 200 most popular given names for male and female babies born during each decade. After scouring through all of the names listed, the records revealed there were 278 unique names appearing during that timespan.

Having narrowed down the most popular names for the timeframe, I wandered over to FanGraphs.com, to begin compiling the “skill” data. I will be using the statistic known as WAR (Wins Above Replacement) as my objective guide for evaluating talent. Sorting through all qualified players from 1970-1999, the data revealed 2,554 players eligible for inclusion. After combining all full names with their corresponding nicknames (i.e.: Michael & Mike), the list was condensed down to 507 unique names.

By comparing the 278 unique names identified via the Social Security Administration’s most popular names data, with the 507 qualified ballplayer names collected through FanGraphs, it was discovered that 193 of the names were present on both lists. The following tables point out some of the more intriguing findings the research was able to provide.

The first table[Table 1], below, is comprised of the 25 most frequent birth names from 1970-1999. The second table[Table 2] consists of the 25 WAR leaders by name, meaning the highest aggregate WAR totals collected by all players with that name. Naturally, many of the names that appear in the 25 most common names list, reappear here as well. Ken, Gary, Ron, Greg, Frank, Don, Chuck, George and Pete are the exceptions. It’s interesting to see that these names seem to have a higher AVG WAR per 1,000 births(as seen on the final table), perhaps indicative of those names’ supremacy as better baseball names? The last table[Table 3] contains the top 25 names by AVG WAR per 1,000 births; here we see some less common names finally begin to appear. These names provide the most proverbial bang (WAR) for your buck (name). Yes, some names, like Barry and Reggie, are inflated in the rankings — probably due to the dominant play of Barry Bonds and Reggie Jackson, but could it not also mean these players were just byproducts of their birth names?!? Probably not, but it’s interesting, nonetheless.

So if you’re looking to increase the chances your child will make it professionally as a baseball player, then you might want to take a look at the names toward the top of the AVG WAR per 1,000 births table, choose your favorite, and hope for the best…OR, you could always just have a daughter.

Please post comments with your thoughts or questions. Charts can be found below.

25 Most Common Birth Names 1970-1999

Rank

Name

Total Births

Total WAR

WAR per 1,000 Births

1

Michael/Mike

2,203,167

1,138

0.516529

2

Christopher/Chris

1,555,705

184

0.11821

3

John

1,374,102

799

0.581252

4

James/Jim

1,319,849

678

0.513316

5

David/Dave

1,275,295

859

0.673491

6

Robert/Rob/Bob

1,244,602

873

0.70175

7

Jason

1,217,737

77

0.062904

8

Joseph/Joe

1,074,683

616

0.573006

9

Matthew/Matt

1,033,326

95

0.091646

10

William/Will/Bill

967,204

838

0.866415

11

Steve(Steven/Stephen)

916,304

535

0.583649

12

Daniel/Dane

912,098

233

0.255674

13

Brian

879,592

154

0.174967

14

Anthony/Tony

765,460

314

0.409819

15

Jeffrey/Jeff

693,934

298

0.430012

16

Richard/Rich/Rick/Dick

683,124

888

1.29991

17

Joshua

677,224

0

0

18

Eric

627,323

122

0.194637

19

Kevin

613,357

305

0.497426

20

Thomas/Tom

583,811

505

0.86552

21

Andrew/Andy

566,653

184

0.325243

22

Ryan

558,252

17

0.030094

23

Jon/Jonathan

540,500

61

0.112118

24

Timothy/Tim

535,434

253

0.473074

25

Mark

518,108

397

0.765477

 

25 Highest Cumulative WAR, by Name, 1970-1999

Rank

Name

Total Births

Total WAR

WAR per 1,000 Births

1

Michael/Mike

2,203,167

1,138

0.516529

2

Richard/Rich/Rick/Dick

683,124

888

1.29991

3

Robert/Rob/Bob

1,244,602

873

0.70175

4

David/Dave

1,275,295

859

0.673491

5

William/Will/Bill

967,204

838

0.866415

6

John

1,374,102

799

0.581252

7

James/Jim

1,319,849

678

0.513316

8

Joseph/Joe

1,074,683

616

0.573006

9

Steve(Steven/Stephen)

916,304

535

0.583649

10

Thomas/Tom

583,811

505

0.86552

11

Kenneth/Ken

312,170

439

1.405644

12

Mark

518,108

397

0.765477

13

Gary

176,811

353

1.998179

14

Ronald/Ron

246,721

342

1.38456

15

Anthony/Tony

765,460

314

0.409819

16

Kevin

613,357

305

0.497426

17

Gregory/Greg

324,880

303

0.931729

18

Jeffrey/Jeff

693,934

298

0.430012

19

Donald

215,772

298

1.380161

20

Frank

176,720

298

1.687415

21

Charles/Chuck

458,032

262

0.571357

22

Timothy/Tim

535,434

253

0.473074

23

Lawrence

220,557

248

1.126239

24

George

226,108

246

1.090187

25

Peter

181,358

246

1.357536

 

25 Highest WAR per 1,000 Births, by Name, 1970-1999

Rank

Name

Total Births

Total WAR

WAR per 1,000 Births

1

Barry

34,534

175

5.079053

2

Leonard

31,626

123

3.895529

3

Omar

13,656

53

3.873755

4

Fernando

13,180

47

3.543247

5

Theodore/Ted

27,144

93

3.444592

6

Jack

53,079

176

3.323348

7

Reginald/Reggie

47,883

157

3.283002

8

Frederick/Fred

54,529

146

2.681142

9

Bruce

56,609

141

2.487237

10

Calvin

43,412

107

2.453239

11

Gary

176,811

353

1.998179

12

Roger

77,458

151

1.948153

13

Glenn

33,794

65

1.929337

14

Darrell

53,317

102

1.920588

15

Frank

176,720

298

1.687415

16

Dennis

131,577

218

1.653024

17

Jerry

122,465

201

1.638019

18

Dale

36,162

54

1.48775

19

Lee

62,922

89

1.406503

20

Kenneth/Ken

312,170

439

1.405644

21

Louis/Lou

142,969

200

1.400304

22

Ronald/Ron

246,721

342

1.38456

23

Roy

59,004

82

1.382957

24

Donald

215,772

298

1.380161

25

Jay

63,795

87

1.368446

 


Is Velocity More Important Than We Think?

There is a reason that one of the first things scouts look for in pitching prospects is velocity. Higher velocity leads to a higher whiff rate, which leads to more strikeouts; it goes without saying that striking out batters is a good starting point to becoming a successful pitcher. While there are many other essential components to pitching, high velocity is always a plus. But does high velocity have other benefits besides improved whiff rate?

In my research I compared batted ball distance to velocity, using only hits classified as fly balls or popups. I used intervals of 1 mph between endpoints of 82 and 100 mph. Only pitches classified as four seam fastballs, two seam fastballs, cutters, and sinkers were used. Baseball Savant was my source, using the complete sample of their applicable PITCHf/x data, from 2008-2014.

Velocity (mph) Batted Ball Distance (ft.)
100+ 229.78
99-100 229.53
98-99 234.92
97-98 235.97
96-97 236.23
95-96 239.78
94-95 240.14
93-94 240.47
92-93 240.60
91-92 242.90
90-91 244.02
89-90 244.80
88-89 245.65
87-88 243.76
86-87 244.21
85-86 244.36
84-85 242.59
83-84 245.45
82-83 244.28
0-82 239.06

Velocity vs. Batted Ball Distance

On an individual level, there will always be large discrepancies due to sample size, but when we apply all the data we have, there appears to be an obvious trend. Higher velocity generally leads to lower batted ball distances on fly balls/popups. Once below the 88 mph threshold, it is unclear whether less velocity makes a difference in terms of batted ball distance, as the distances start to plateau and even take a significant drop in the sub-82 mph sample. But the trend is very clear above 88 mph that higher velocity leads to less batted ball distance.

Now that we see this trend, I have two theories as to why this might happen. The higher velocity could lead to a horizontal exit angle directed more towards the opposite field, where hitters have less power. Or the higher velocity could be harder to square up, leading to more weak contact and popups. Perhaps it is a combination of both.


BABIPf/x: A Predictive Pitch-Based Model

BABIPf/x: A Predictive Pitch-based Model

Jonathan Luman, September 2014

In recent years the strongest predictors of a pitcher’s future performance have been fielding independent peripherals: homeruns, strikeouts, and walks. This has largely been because of the difficulty in predicting the rate at which balls in play (BIP) (i.e., all other plate appearance outcomes) will fall for hits (i.e., batting average on balls in play [BABIP]). A major problem with using BABIP statistics is isolating a pitcher’s “true talent” level due in large part to the relatively low rate of balls in play. A typical qualified season sees 550 or so BIP which allows about a 0.030 uncertainty[1] which is well within the pitcher-to-pitcher talent variation.

It has long been known that batted ball types fall for hits at desperate rates (ground balls being favorable to fly balls and linedrives far greater to either). Naturally, BABIP predictors have traditionally relied on this data. These data are a categorization of BIP results and, due to sample size limitations, are subject to significant year-to-year variation. This data can be innovatively applied to improve its utility (Max Weinstein recently claimed a predictive correlation of 0.37, Redefining Batted Balls to Predict BABIP, Hardball Times, Feb 2014).

An estimation of a pitcher’s BABIP can be made by categorizing pitches thrown with PITCHf/x data and comparing to league wide BABIP on similar pitches, shown conceptually in the MLB gameday screen grab in Figure 1.


Figure 1 MLB Gameday screen grab. Expected BABIP of each pitch differs based on pitch location, movement, velocity and other parameters [2].

Problem statement

Using pitcher-only data (i.e., not considering batted ball results) a model for predicted BABIP (BABIPf/x) is developed with the ability to predict a pitcher’s next season and long-term BABIPs.

Overview of Approach: BABIP thru League Averaged Pitch Categories

Conceptually, batted ball results are a function of the dynamics of contact. While there are limitless trajectories a pitch can fly toward the plate there are, practically, a finite set of “ways” a ball can be thrown: a handful of pitch classes; 12 different counts; and bins of speed, location, movement, etc. The seven million or so batted balls for which we have PITCHf/x data (2008-2013) have been binned into categories of statistically relevant size (several thousand batted balls per category, 76 categories altogether) so BABIP for a pitch category can be calculated with high precision[3]. Resolution of a pitcher’s expected BABIP can then be modeled by understanding the frequency of his pitches matching the league-wide pitch categories. Modeled BABIP then takes the form:


Where:

P%j: Pitches categorized into major categories per PITCHf/x auto-classification, a pitcher specific parameter.

Fastballs: FA, SI, FF, FS, FT, SF (two seam, four seam, sinker, split finger, others)

Changeup: CH

Slider/Cutter: SL, FC

Curveball: CU, KC (curve and knuckle curve)

fi: The fraction of pitches thrown by a pitcher matching a particular category, a pitcher specific parameter.

BABIPi: Batting average on balls in play for of pitch categoryi, calculated league wide.

gi: The ball in play rate for the pitch category, calculated league wide.

C’: A correlation for the frequency a pitcher works in favorable (or unfavorable) counts.

cm: BABIP coefficient of each pitch count, derived similarly to BABIPf/x categories. Coefficients are the result of a net-count regression to handle low sample size counts, calculated league wide

fm: same as fi, a pitcher specific parameter.

-BABIP-: average actual BABIP, calculated league wide.

p0’: A regression on the release point similarity based on most frequently used pitches.

The abstraction of PITCHf/x auto-classification is more a convenience than requirement. Because pitches will ultimately be binned together based on zone position, movement, and similar parameters failures of PITCHf/x auto-classification are of small consequence. The auto-classification facilitated in the establishment of pitch categories with different BABIP tendencies.

Neither the C’ and p0’ corrections are fundamental to the process. Long-term BABIPf/x results are shown using the C’ correction. Next-year BABIPf/x calculations exclude this term. p0’ has been preliminarily defined but not yet implemented.

This BABIP model based on pitch category rates has several advantages. Pitch mix (and pitch category mix) stabilize quickly so the BABIPf/x predictions stabilize with small pitch sample sizes and are independent of defense and opponents. This enables BABIP predictions earlier than was previously possible. Also, PITCHf/x data are independent of batted ball results; the data of two sources could be combined for an integrated BABIP model of greater accuracy[4].

Fastball BABIPf/x Category Definition

To provide insight into the pitch category a discussion of the 30 fastballs (FA,FF,FT,SI,FS,SF)[5] categories is provided. The process to develop changeup, cutter/slider and curves BABIPf/x components was similar.

Figure 2 shows a histogram of the vertical pitch location (pz) of all fastballs put into play normalized by total number of fastballs and bin size (so that the histogram integrates to 1.0). The brown line is a normal distribution with the same mean and standard deviation as the observed pz measurements. The close match demonstrates that the vertical pitch locations is normally distributed and centered on the strike zone.


Figure 2: Vertical pitch locations of all fastballs in play 2008-2013

BABIP can be computed for several groupings of vertical pitch location based on their position in distribution, as shown conceptually in Figure 3. Pitches in the lower quarter of the distribution have a higher BABIP than do the pitches in the upper quarter[6].


Figure 3: Vertical pitch location divided into uneven tertiles

Figure 4 shows the BABIP of the uneven tertiles with error bars used to depict the 90% binomial confidence intervals. Not unexpectedly, pitches near the top of the strike zone fall for hits less frequently than pitches near the bottom of the strike zone. Recall that pitches down in the zone more frequently result in ground balls which are associated a relatively higher BABIP. The lack of overlap between the confidence intervals is a strong indication that a reliable effect being demonstrated. Care should be taken to point out that this reduced BABIP does not necessarily indicate that elevated pitches are preferable (for the pitcher) than low pitches. Elevated pitches may result in more homeruns and/or called pitches (i.e., called strikes and balls ), which are excluded from BIP sets.


Figure 4: BABIP of fastballs including 90% confidence intervals for lower, mid, and upper tertiles of vertical pitch location

It was found that fastball BABIPf/x categories can be defined on six parameters in the PITCHf/x database: pz, pfx_z, px, count, start_speed, and the relative match between pitcher and batter handedness.[7] PITCHf/x parameters were ranked based on BABIP sensitivity and probing for key bilinear sensitivities. Categories were defined when measureable differences in BABIP were identified.

The fastball pitch categories comprising BABIPf/x are shown in Table 1. For continuously variable parameters the numerical values are percentiles on a normal cumulative distribution function. For example, a pz category of 0-0.75 indicates a pitch below 2.86 ft (the red and green regions of the PDF shown in Figure 3).

Table 1: Fastball pitch categories of BABIPf/x

Improvements in model effectiveness could be achieved splitting categories with large populations further[8]. Non-elevated pitches with modest vertical break are broken down by insideness/outsideness for pitches off the plate (categories 1-4) or by pitch count for pitches over the plate (categories 5-16). The BABIP categories that include pitch count were the result of a regression accounting for relative pitcher or batter advantage (R2 = 0.72), the confidence interval size is an approximation. Pitch velocity becomes a significant factor for pitches breaking down out of the strike zone (17-19, 28-30). Counterintuitively, at least to the author, is that for these pitches increased velocity is correlated with increased BABIP. Categories 20 and 22 reflect pitches at the left and right extremes of the BIP zone, there is no statistical significance to the difference in BABIP of these categories. Fastballs with the lowest BABIP tended to be elevated pitches with significant downward break (23-27). Fastballs with the highest BABIP tended to be low pitches with modest vertical break thrown in hitter friendly counts. Figure 5 shows the bins sorted on BABIP and a 90% binomial confidence interval depicted with error bars.


Figure 5: BABIP and confidence intervals of fastballs BABIPf/x categories

Figure 6 is a graphical representation of the fastball pitch categories of BABIPf/x. The vertical axis is the vertical pitch location percentiles based on fastball mean and standard deviations, the horizontal axis is vertical movement percentiles based on fastball mean and standard deviations. The “strike zone” covers most all of the vertical axis, very few balls are put into play that are not within the vertical limits of the rulebook strike zone[9]. The larger regions were split into subcategories based on the BABIP parameters with the highest sensitivity. For example, pitches high in the strike zone, but with low vertical movement, the horizontal location tends to drive BABIP (categories 20-22). However, for pitches low in the stike zone with high vertical movement, pitch velocity tends to drive BABIP (categories 17-19). As few categories were defined as possible while maintaining approximately a 0.015 variation between adjacent regions to preserve sample size and small confidence intervals.


Figure 6 Graphical depiction of fastball pitch categories of BABIPf/x

Long-Term Model Results

BABIPf/x was evaluated against the ball in play results for the 200 pitchers having thrown the most pitches in the 2008-2013 seasons. Table 2 shows these pitchers actual BABIP, BABIPf/x and statistical significance test “p‑values”, the table is sorted by most pitches thrown. No pitcher has thrown fewer than 6000 pitches. The top 20 pitchers (by number of pitches thrown) have the same average p-value as do the bottom 20 pitchers, suggesting that 6000 pitches is sufficient for model stabilization. A smaller threshold is likely demonstrable. The null hypothesis states that the BIP results differ from the modeled BABIP and cannot be rejected for low p‑values. A crude model evaluation suggests that the model is ”wrong” for p-values less than 0.05.

A more precise evaluation states that there is greater likelihood that a pitchers “true-talent BABIP” differs from the model for lower p-values. p-values computed from a league-average baseline can be compared to the BABIPf/x p-values for model evaluation. For the pitchers who differ from league average substantially, BABIPf/x results in about 2% greater accuracy, see Figure 7.

Table 2: Actual BABIP and BABIPf/x with binomial p-tests for 200 top pitchers by number of pitches thrown 2008-2013


Figure 7: p-values for BABIPf/x and BABIPleague average

Example: Comparison of Model to career-to-date

This model has been developed to reflect a pitcher’s “true talent” BABIP performance. “True talent” level can only be established over large BIP samples. For relatively infrequent events, like balls in play, this takes a long time, often many seasons. A pitcher throws many more pitches than balls are put into play, so a model based pitch observance ought to converge more quickly than observed BABIP[10]. We can test this hypothesis by anecdote by looking at an example pitcher[11]. It is desirable for our example pitcher to have:

  • Thrown many pitches—to establish reliable “true talent” performance
  • Begun his career during the PITCHf/x era—so his career-to-date performance is contained in the database.
  • A modestly above or below average BABIP—so that the trivial solution (i.e., league average) can be rejected.
  • Had some significant year-to-year BABIP variation—to test the predictive nature of the model.
  • Had a BABIPf/x p-value between 0.2 and 0.6—that is, a fair, but not great match against “true talent” so as to not “cherry pick” favorable results.

Justin Masterson meets all these requirements, so he’ll serve as our illustrative example. Justin’s 2008-2013 career is broken down into 2-month segments, three per season. His career-to-date BABIP is the summation of all hits/balls-in-play from the beginning of 2008 until “now”, where “now” is varied parametrically. Stated another way, his 2008 career-to-date BABIP includes only his 2008 season and his 2010 career-to-date includes all balls-in-play from his 2008,2009, and 2010 seasons. Career-to-date BABIP is plotted in red in Figure 8. Justin’s 2008 BABIP was a very low 0.243, suppressed by his amazing debut months where his BABIP was a mere 0.143. Not surprisingly, his career BABIP has risen and has more-or-less stabilized at slightly higher-than-average (0.301 end of 2013). Figure 8 also contains each two-month BABIPf/x prediction for Justin in green, these are not career-to-date predictions, but each is based on only 2 months of pitching. Each prediction is a fair reflection of Justin’s long-term “true talent” level. 2014 was a “disappointing BABIP year” for Justin, 0.346 as of this writing (1 September 2014), raising his career-to-date BABIP to 0.306.


Figure 8 Justin Masterson’s Career-to-date BABIP compared with his two-month BABIPf/x predictions

This anecdote doesn’t prove much, it does suggest that the BABIPf/x model might have predictive ability to evaluate future performance. Evaluating “true talent” level from small samples is powerful in its own right, and can be inferred from the long-term modeling results. Predicting next-year’s performance is valuable for other purposes and is a natural use case.

Predictive Model Results

Predicting future performance is a challenging use for any modeling. In addition to the model error due to uncertain sources, predictive modeling is also complicated by the measurement uncertainty in the future value. This is especially true of BABIP modeling which has large variation due to year to year variation. Predictive BABIP modeling has no ability to predict changes in a pitcher approach, either intentional (e.g., pitch mix) or unintentional (e.g., injury).

Predictive modeling baseline

For the years 2008-2013, sequential 6-month BABIPs[12] have been tested for statistical significance. The sequential 6-month BABIP (year 2) is tested against the preceding 6-month BABIP (year 1) the binomial p‑values[13] for year-to-year BABIP variation are shown in Figure 9. This will serve as a baseline to compare against the BABIPf/x p-values. The predictive period is regressed toward league mean BABIP in an attempt to increase the predictive value.


Figure 9 p-values testing statistical significance

The predictive value of raw BABIP is very low, 15.9% of p-values were lower than 0.05 resulting in a strong presumption against the null hypothesis (i.e., the sequential sample was not consistent with the mean of the predictive sample) a further 8.3% of p-values were less than 0.1 resulting in a low presumption against the null hypothesis (a total of more than 24% with a presumption that the sequential BABIP is not consistent with the preceding BABIP). These samples did not increase greatly when the predictive sample was regressed to the league average (also demonstrated in Figure 10, 20% with p-values less than 0.1).   This is because the measurement uncertainty in future year BABIP is a major uncertainty contributor. To combat this, the sequential sample was also regressed to league average and improved p-values resulted[14], see Figure 10. The corollary is that league average BABIP is more predictive of future BABIP than is previous year BABIP.


Figure 10 p-values testing statistical significance of year-to-year BABIP

Predictive Modeling using BABIPf/x

p-values are recomputed comparing the sequential sample compared against the BABIPf/x prediction from the prior 6-month period both with and without a Bayesian regression of ball in play results in the predictive sample. Figure 11 shows the BABIPf/x p-value distributions overlaid on the baseline year-to-year BABIP significance distributions (of Figure 10), less than 2% of correlations having a strong presumption against the null hypothesis (and less than 4% of p-values are less than 0.1). In general, at any significance level greater than 0.1, 10% fewer pitcher seasons have a presumption against the null hypothesis. That is, the BABIPf/x values are consistently more predictive than are previous year BABIP results. This is a similar level of predictability as xBABIP (Zimmerman, 2014) or pBABIP (Weinstein, 2014).

Utilizing the actual BABIP in the predictive sample did not significantly improve the predictive capability (i.e., the Bayesian inference).   A Bayesian regression of longer period would provide greater utility, however, over long enough samples the career-to-date sample becomes the dominant term. The major drawback of career-to-date as the dominant term is the inability to identify changes in the pitchers “true talent” level. A Bayesian regression utilizing batted ball data is expected to improve results considerably as the data sources are independent.


Figure 11 BABIPf/x p-values compared to year-to-year BABIP p-values

Conclusion

BABIPf/x correlates well to long term BABIP, better than does league average results. BABIPf/x is more predictive of next year BABIP than is previous year’s BABIP. Because batted ball results (GB, LD and FB rates) are an independent data source than is PITCHf/x categories (i.e, location, movement , etc.) these data sources could be combined to form a multi-source predictive BABIP model of better quality than either source alone. Additional work could be done to improve count, release location corrections to BABIPf/x, as well as refinement to the BABIPf/x categories.

Bibliography

Weinstein, M. (2014, February 17). Redefining Batted Balls to Predict BABIP. Retrieved August 30, 2014, from The Hardball Times: http://www.hardballtimes.com/redefining-batted-balls-to-predict-babip/

Zimmerman, J. (2014, July 25). Updated xBABIP Values. Retrieved August 30, 2014, from Fangraphs: http://www.fangraphs.com/fantasy/updated-xbabip-values/

 

 

 

[1] 90% binomial confidence interval

[2] Expected BABIPs from Table 1. Pitch 1 and 2 match category 17. Pitch 3 matches category 26. Pitch 4 matches category 23. Pitch 5 matches category 7.

[3] Binomial uncertainty is a function only of mean and number of observations.

[4] Multiple techniques exist for this sort of integration. Two data sources can result in accuracies better than either data source separately.

[5] There are some indications that Sinkers and Splitters need to be broken out separately.

[6] The regions shown are not equally sized; the middle region contains half of the area.

[7] Derivative fields were considered, it was found that the native PITCHf/x fields were entirely suitable.

[8] One of the current shortcomings is the lack of categories with low BABIP. Splitting categories with excess sample size will provide greater diversity and dynamic range of model results.

[9] The BABIPf/x model accounts for pitchers who frequently pitch above or below the strike zone with the gi term (the league wide rate that pitches in a category are put into play).

[10] Observed BABIP may never actually “converge”. As pitcher’s pitch selection or ability may evolve more rapidly than an adequate sample size to precisely compute his BABIP may accrue.

[11] Predictive capability will be tested more thoroughly in the next section.

[12] 3 two-months samples to get more “seasons”. For example, a “season” might be August 2009-July 2010, spanning the off-season”.

[13] To qualify for a p-test, both current and sequential 6-month periods had to have 350 balls in play, 2/3 of a qualified season.

[14] Naturally. League average successes and failures are being added to both populations.


Why Is Brandon Finnegan So Unique?

On September 30, Royals 2014 1st Round Draft Pick Brandon Finnegan was brought into the AL Wild Card Game against the Oakland A’s just under 4 months after being drafted out of Texas Christian University. Manager Ned Yost had little choice but to take a leap of faith with the rookie Finnegan, having used pitchers like Kelvin Herrera, Wade Davis, and Greg Holland already. Finnegan pitched very well, allowing 2 baserunners in 2.1 innings and striking out 3 Oakland batters. He was removed with a runner on base and was charged a run when the runner scored, but otherwise had a great outing.

I found it ironic and puzzling that the only team to utilize this approach of drafting a college pitcher, rushing him up the farm system, and giving him a shot at the postseason was the team that already had the likes of Herrera, Davis and Holland. After all, it seems like every playoff team could use some help out of the bullpen. When compared to other positions, predicting a relief pitcher’s success in the big leagues really doesn’t seem too hard either.

In 2014, 12 relievers pitched more than 60 innings with an FIP under 2.50. Aside from the sinker-oriented Steve Cishek and Pat Neshek, all of them averaged at least 92.5 mph on their fastballs. Everyone except Cishek generated swinging strikes at least 11% of the time, almost 2% more than the 9.4% league average. Simply put, pitchers with high velocity are safe bets when it comes to building a bullpen.

I can understand why a team might be stingy with its first-round draft pick. The first rounder is supposed to be the future of the franchise, the one who fans envision 25 years older, making his Hall of Fame induction speech. But looking at the 93 2nd round draft picks from 2006-2008 (an arbitrary time period which I felt gave players sufficient time to reach the big leagues), it is clear that players selected this late in the draft are no sure thing.

48 picks have yet to make their major league debut, and another 21 have career WAR’s equal to or less than 0*. There are exceptions like Giancarlo Stanton, Jordan Zimmermann and Freddie Freeman, but the data looks even worse after the 15th pick of the second round. Of the 48 picks in the 16-32 slots, only 8 players have career WAR’s greater than 0*. 29 have yet to make their MLB debut.

Since 2011, 10 relievers have posted FIP’s under 2.50 with at least 100 innings pitched. Of those drafted in the American amateur draft, only Sean Doolittle was picked before the 3rd round. He was drafted in the first round as a first baseman. While overpaying for an elite reliever can be appealing for teams like the Angels or Tigers, both teams in win-now mode, a possible fall back option is taking a chance on the best reliever available in the draft with the second-round pick. Chances are, that pitcher will still be on the board.

Of course, there are major-league relievers who can throw hard but still do not succeed at the big-league level. Also, stats like average fastball velocity and swinging strike rates might not be available for college players. The prior is virtually impossible without Pitch F/X. If this is the case, GMs can consider reverting to the eye test to determine how hard a pitcher throws and what his command and movement look like. Generally accepted measures of command such as K-BB% can be derived from box scores.

For traditional fans who still value the human element of baseball, there are ways to gauge an NCAA pitcher’s ability to pitch in the spotlight. Stats like opposing batting average with runners on base and inherited runners stranded can be determined by simply looking at play-by-play recaps. Both measure a pitcher’s ability to perform under pressure, even if only in a limited sample size. I do not know what kinds of information are given to baseball operations teams, but I would be surprised if a college pitcher’s WPA in high-leverage situations was available.

If I was Tigers GM Dave Dombrowski or Angels GM Jerry Dipoto circa July, I would make the trade for Joakim Soria or Huston Street without hesitation. Both teams, one could argue, were a bullpen arm away from being World Series favorites. But for teams who don’t have the resources Detroit and Los Angeles have or don’t want to give up too many prospects, the best mid season bullpen pickup might not have even thrown his first professional pitch yet.

*I had to use rWAR, not fWAR in the interest of time. Baseball Reference has the draft results with career WAR readily available. Of course, data not from FanGraphs was taken from baseball-reference.com.


How Well Did the FanGraphs Playoff Odds Work?

One of the more fan-accessible advanced stats are playoff odds [technically postseason probabilities]. Playoff odds range from 0% – 100% telling the fan the probability that a certain team will reach the MLB postseason. These are determined by creating a Monte Carlo simulation which runs the baseball season thousands of times [10,000 times specifically for FanGraphs]. In those simulations, if a team reaches the postseason 5,000 times, then the team is predicted to have a 50% probability for making the postseason. FanGraphs runs these every day, so playoff odds can be collected every day and show the story of a team’s season if they are graphed.

2014 Playoff Probability Season

Above is a composite graph of the three different types of teams. The Dodgers were identified as a good team early in the season and their playoff odds stayed high because of consistently good play. The Brewers started their season off strong but had two steep drop offs in early July and early September. Even though the Brewers had more wins than the Dodgers, the FanGraphs playoff odds never valued the Brewers more than the Dodgers. The Royals started slow and had a strong finish to secure themselves their first postseason birth since 1985. All these seasons are different and their stories are captured by the graph. Generally, this is how fans will remember their team’s season — by the storyline.

Since the playoff odds change every day and become either 100% or 0% by the end of the season, the projections need to be compared to the actual results at the end of the season. The interpretation of having a playoff probability of 85% means that 85% of the time teams with the given parameters will make the postseason.

I gathered the entire 2014 season playoff odds from FanGraphs, put the predictions in buckets containing 10% increments of playoff probability. The bucket containing all the predictions for 20% means that 20% of all the predictions in that bucket will go on to postseason. This can be applied to all the buckets 0%, 10%, 20%, etc.

Fangraphs Playoff Evaluation

Above is a chart comparing the buckets to the actual results. Since this is only using one year of data and only 10 teams made the playoffs, the results don’t quite match up to the buckets. The desired pattern is encouraging, but I would insist on looking at multiple years before making any real conclusions. The results for any given year is subject to the ‘stories’ of the 30 teams that played that season. For example, the 2014 season did not have a team like the 2011 Red Sox, who failed to make the postseason after having a > 95% playoff probability. This is colloquially considered an epic ‘collapse’, but the 95% probability prediction not only implies there’s chance the team might fail, but it PREDICTS that 5% of the teams will fail. So there would be nothing wrong with the playoff odds model if ‘collapses’ like the Red Sox only happened once in a while.

The playoff probability model relies on an expected winning percentage. Unlike a binary variable like making the postseason, a winning percentage has a more continuous quality to the data, so this will make the evaluation of the model easier. For the most part most teams do a good job staying around the initial predicted winning percentage coming really close to the prediction by the end of the season. Not every prediction is correct, but if there are enough good predictions the predictive model is useful.

Teams also aren’t static, so teams can become worse by trading away players at the trade deadline or improve by acquiring those good players who were traded. There are also factors like injuries or player improvement, that the prediction system can’t account for because they are unpredictable by definition. The following line graph allows you to pick a team and check to see how they did relative to the predicted winning percentage. Some teams are spot on like the Pirates, but there are a few like the Orioles which are really far off.

Pirates Expected Win Percentage

Orioles Expected Win Percentage

The residual distribution [the actual values – the predicted values] should be a normal distribution centered around 0 wins. The following graph shows the residual distribution in numbers of wins, the teams in the middle had their actual results close to the predicted values. The values on the edges of the distribution are more extreme deviations. You would expect that improved teams would balance out the teams that got worse. However, the graph is skewed toward the teams that become much worse implying that there would be some mechanism that makes bad teams lose more often. This is where attitude, trades, and changes in strategy would come into play. I’d would go so far to say this is evidence that soft skills of a team like chemistry break down.

Difference Between Wins and Predicted Wins

Since I don’t have access to more years of FanGraphs projections or other projection systems, I can’t do a full evaluation of the team projections. More years of playoff odds should yield probability buckets that reflect the expectation much better than a single year. This would allow for more than 10 different paths to the postseason to be present in the data. In the absence of this, I would say the playoff odds and predicted win expectancy are on the right track and a good predictor of how a team will perform.


Evaluating the Eno Sarris Pitcher Analysis Method

For regular listeners of the Sleeper and the Bust podcast , I do not need to tell you what the Eno Sarris Pitcher Analysis Method is (let’s drop the Eno and leave the Sarris so we can call it SPAM). For those who aren’t familiar, you can see it at work in this article and this one over here. Basically, it is based on the idea that a pitcher can be evaluated by comparing their performance in several key metrics against league averages. We are primarily looking at swinging strike rates and groundball rates by pitch type.

I wanted to see how well this method works, so I grabbed my handy Excel toolkit and pulled down lots of pitching data. Unfortunately, pitch-type PITCHf/x data is not on the FanGraphs leaderboard (come on, Appelman!), so I headed on over to Baseball Prospectus to use their PITCHf/x leaderboards. I pulled the GB/BIP, swing%, whiff/swing, and velocity data for all starters that threw at least 50 of each pitch type in a given season. Is 50 pitches an arbitrary cut-off? Yes, yes it is.

I included four seam fastballs, two seam fastballs, cut fastballs, curves, sliders, changeups, and splitfingers. I used all the data that was available, which goes back to 2007. And, because I am impatient and couldn’t wait until the 2014 season was in the books, I didn’t include data from the last two weeks of this season. I calculated the swinging strike % by multiplying the swing % and the whiff/swing values together. After this, I pulled the K%, ERA, and WHIP data from the FanGraphs leaderboards. In all, I analyzed 1,851 pitcher-seasons.

Note: the swinging strike rates I calculated do differ from those on the player pages at FanGraphs. I’m not sure why there is a discrepancy since they are both based on PITCHf/x data, but there is one. Therefore, I did not use the FanGraphs pitch-type benchmarks in this analysis.

I pulled K%, ERA, and WHIP because I wanted to use these as proxies for pitching outcomes (i.e. my dependent variables). I amended SPAM to include four-seam velocity, because we all know how much of an effect velocity has on run prevention.

Here’s how I did this. I first calculated the league averages for each metric for each season to account for the pitching environment of that season. The table below shows the league average values for each of the metrics for each season.

FF FT FC CU SL CH FS
Year SwStr% SwStr% SwStr% SwStr% SwStr% SwStr% SwStr%
2007 6.1% 4.6% 10.0% 10.2% 13.4% 13.1% 13.9%
2008 5.9% 4.5% 9.7% 9.7% 14.2% 13.0% 14.1%
2009 6.1% 4.7% 9.7% 10.1% 14.1% 12.5% 15.2%
2010 6.0% 4.8% 9.8% 9.5% 14.1% 13.5% 14.5%
2011 6.3% 4.5% 9.1% 9.9% 14.9% 12.8% 14.7%
2012 6.6% 5.0% 10.3% 10.9% 15.6% 13.1% 15.5%
2013 6.7% 5.1% 9.3% 10.5% 15.0% 13.8% 17.2%
2014 6.56% 5.1% 9.8% 10.7% 15.5% 14.3% 17.4%

 

FF FT FC CU SL CH FS FF
Year GB% GB% GB% GB% GB% GB% GB% Velocity BB%
2007 33.8% 49.8% 44.8% 47.2% 42.9% 48.1% 52.9% 91.06 8.92%
2008 33.2% 49.9% 44.1% 48.7% 44.1% 46.8% 52.0% 90.87 9.17%
2009 33.1% 48.9% 42.9% 50.6% 43.7% 47.2% 53.5% 91.17 9.13%
2010 35.6% 48.9% 43.9% 50.1% 44.0% 47.6% 52.9% 91.22 8.61%
2011 33.8% 49.9% 45.2% 48.9% 45.8% 47.3% 54.7% 91.57 8.23%
2012 34.0% 50.9% 43.8% 52.2% 43.9% 48.6% 53.2% 91.76 8.36%
2013 34.6% 51.4% 45.0% 50.2% 45.8% 47.4% 54.6% 92.02 8.33%
2014 35.8% 50.6% 46.1% 49.9% 45.3% 50.3% 52.7% 92.24 7.84%

 

I then gave each pitcher one point for each metric that was above league average. For example, King Felix this year gets above average whiffs on five pitches, gets above average grounders on four pitches and has above average four-seam velocity, so he gets ten points. I then computed the SPAM score for each pitcher in each season by summing the scores for the individual metrics.

Here is a table of some randomly-selected pitcher-seasons to give you an idea of the types of SPAM scores I found. This table shows you that there are certainly outliers, guys with good results and bad scores or vice-versa.

Player Year Score ERA WHIP
Felix Hernandez 2014 10 2.07 0.91
Zach McAllister 2014 6 5.51 1.49
Yu Darvish 2012 11 3.90 1.28
Bronson Arroyo 2011 2 5.07 1.37
Drew Pomeranz 2011 1 5.40 1.31
Johan Santana 2008 7 2.53 1.15
Zack Greinke 2008 8 3.47 1.28
Edinson Volquez 2010 9 4.31 1.50

Before we dive into the results, I am not a statistician, but I am an engineer, so maybe I’m not completely off the hook. I am looking at these results from a high level and a simple perspective. Maybe I can build off these results and look for deeper connections in the future. First, let’s just look at some averages.

SPAM without BB%
Averages in Each SPAM Bin
SPAM Score ERA WHIP K% # of Pitcher Seasons
0 7.27 1.77 13.3% 44
1 5.92 1.62 14.1% 120
2 5.64 1.56 15.4% 218
3 5.05 1.49 15.9% 298
4 4.72 1.42 16.9% 297
5 4.52 1.40 17.2% 293
6 4.14 1.34 18.7% 226
7 4.02 1.31 19.7% 182
8 3.79 1.30 19.9% 110
9 3.60 1.27 20.9% 38
10 3.39 1.20 22.1% 17
11 3.42 1.21 23.0% 7
12 3.45 1.12 26.8% 1

The above table shows the average K%, ERA, WHIP for each SPAM score, along with the number of pitcher-seasons that earned that score.

Finally, onto the scatter plots! First up, we have the K% vs. SPAM score graph. We expect this one to have a strong positive correlation, since whiff rates and velocity normally correspond to strikeouts (ground balls, not so much). I used a simple linear regression, since it seemed to be the best fit and the easiest to understand.

Here is the WHIP vs. SPAM score graph.

Here is the ERA vs. SPAM score graph.

Obviously, none of these show strong R2 values, but the table of averages above and these graphs do show there is a clear trend here, with higher scores mostly leading to lower ERAs and WHIPs, and higher K%.

None of the above accounts for control directly, so I thought I would try adding BB% as another metric to the SPAM score. I computed the league average walk rate for each season and handed out the points. The addition of BB% changed the values, but didn’t really impact the trends. Below is the averages table for the SPAM scores with BB%. Below that, you will find the three graphs again. The linear trend lines are a little better fit now, but nothing earth-shattering.

SPAM with BB%
Averages in Each SPAM Bin
SPAM Score ERA WHIP K% # of Pitcher Seasons
0 7.70 1.88 13.1% 27
1 6.38 1.73 13.7% 73
2 6.02 1.65 15.0% 160
3 5.24 1.52 15.7% 254
4 4.89 1.45 16.2% 287
5 4.69 1.42 17.2% 289
6 4.34 1.37 17.4% 270
7 4.03 1.31 19.1% 197
8 3.90 1.29 20.0% 160
9 3.71 1.27 20.1% 80
10 3.44 1.22 21.0% 34
11 3.41 1.20 22.5% 14
12 3.36 1.20 21.5% 5
13 3.45 1.12 26.8% 1



So, what does all this tell us? Well, it seems that Eno’s SPAM method does a pretty good job of identifying pitchers that will be successful and is useful for identifying breakout pitchers. The beauty of this method is that it does not require a lot of data. Per-pitch metrics stabilize faster than per plate-appearance ones, so we can start to evaluate pitchers after only a start or two instead of waiting for the 170 PA required for BB% or the 70 PA for K%. I plan on digging deeper into this data over the offseason to see if I can pull any more insights from it. Please let me know in the comments if you think of something worth investigating further. Eno, if you are reading this, I hope I gave your method the treatment it deserves. And, as I do in all of my online ramblings, I will end with Tschüs!


Another Look at Momentum in October

It’s coming to that time of the year when baseball fans hunker down into their deep-seeded trenches of pro-momentum and anti-momentum factions in regards to playoff baseball. Dave Cameron wrote, just the other day, about how teams that do better in the second half don’t do any better come playoff time, and just about every Baseball Prospectus article these days will mention that narratives can be written either way after the fact, but before the fact, we simply don’t know whether the hot team will “stay hot,” or the struggling team will manage “to right the ship” come playoff time.

Since this post is showing up on FanGraphs, the readers will likely surmise (correctly) that I have historically sided with the anti-momentum crowd. However, an interesting thing happened the other day.

I was trying to make my case to a friend about how hot teams don’t have an inherent advantage, so I made my way over to Baseball-Reference to check in on some of the most recent World Series winners. I wanted to see how they had performed in September (or those few regular season games that spill into October in certain years) to prove that you didn’t need to be hot to end the season in order to capture baseball’s biggest prize. So I started with last season.

As it turns out, the Red Sox went 16-9 over baseball’s final month, which was their second best month of the season. Well, that doesn’t prove anything, it’s simply one year. Then I went to 2012. As it turns out, the Giants went 20-10 from the beginning of September, their best stretch month of baseball all season. Still, that’s only two. The Cardinals of 2011 would certainly be different. Nope. Eighteen and eight, in what was their best month of baseball of the season. This could go on for a while, but let’s simply go to the chart:

 

World Series Champions’ Late Season Success 2002-2013

WS Champ Year Sept/Oct W-L Month W/L% Season W-L Season W/L% Month Rank
Red Sox 2013 16-9 0.640 97-65 0.599 2nd
Giants 2012 20-10 0.667 94-68 0.580 1st
Cardinals 2011 18-8 0.692 90-72 0.556 1st
Giants 2010 19-10 0.655 92-70 0.568 2nd
Yankees 2009 20-11 0.645 103-59 0.636 3rd
Phillies 2008 17-8 0.680 92-70 0.568 1st
Red Sox 2007 16-11 0.593 96-66 0.593 3rd
Cardinals 2006 12-17 0.414 83-78 0.516 5th
White Sox 2005 19-12 0.613 99-63 0.611 4th
Red Sox 2004 21-11 0.656 98-64 0.605 3rd
Marlins 2003 18-8 0.692 91-71 0.562 2nd
Angels 2002 18-9 0.667 99-63 0.611 2nd
214-124 0.633 1134-809 0.584

 

As the reader can see, the World Series champs of the past twelve years were almost always playing some of their best baseball in the final month (with an occasional October nubbin) of the season. Sure, the 2006 Cardinals were under .500, but they were a pretty fluky team in general, owning the fewest regular season wins ever of a championship team. Other than those Cardinals, however, every other team was above .500 during those final four-to-five weeks. And sure, some of that can be explained by the fact that these are top teams who are likely to be nearly .500 or above every month, but in seven of the last twelve years, these teams had either their best, or second best, month right at the end of the season. Their September/October winning percentage was nearly fifty points higher than their season totals, and would be even higher with those strong September/October records removed.

I began to wonder if I had stumbled onto something.

Sure, Cameron and the Prospectus gang had proven that in the Large N analysis of the entire second half and playoffs as a whole momentum didn’t matter, but what about on a smaller scale, maybe this phenomenon did hold some water. So I expanded my search back to the beginning of the Wild Card era, which seemed a natural breaking point given that before 1995 (well, technically 1994, but we all know how that played out) only four teams made the playoffs (which was an expansion from the two teams that made it throughout baseball history until 1969). Let’s check out the chart:

 

World Series Champions’ Late Season Success 1995-2013

WS Champ Year Sept/Oct W-L Month W/L% Season W-L Season W/L% Month Rank
Red Sox 2013 16-9 0.640 97-65 0.599 2nd
Giants 2012 20-10 0.667 94-68 0.580 1st
Cardinals 2011 18-8 0.692 90-72 0.556 1st
Giants 2010 19-10 0.655 92-70 0.568 2nd
Yankees 2009 20-11 0.645 103-59 0.636 3rd
Phillies 2008 17-8 0.680 92-70 0.568 1st
Red Sox 2007 16-11 0.593 96-66 0.593 3rd
Cardinals 2006 12-17 0.414 83-78 0.516 5th
White Sox 2005 19-12 0.613 99-63 0.611 4th
Red Sox 2004 21-11 0.656 98-64 0.605 3rd
Marlins 2003 18-8 0.692 91-71 0.562 2nd
Angels 2002 18-9 0.667 99-63 0.611 2nd
Diamondbacks 2001 14-13 0.519 92-70 0.568 5th
Yankees 2000 13-18 0.419 87-74 0.540 5th
Yankees 1999 17-14 0.548 98-64 0.605 5th
Yankees 1998 16-11 0.593 114-48 0.704 6th
Marlins 1997 12-15 0.444 92-70 0.568 6th
Yankees 1996 16-11 0.593 92-70 0.568 t-3rd
Braves 1995 16-12 0.571 90-54 0.625 4th
318-218 0.593 1799-1259 0.588

 

And wouldn’t you know it. Yet more proof that a big enough sample size can debunk almost any baseball myth. From 1995-2001, there were a pair of losing records, and the best any team did was have their tied-for-third best month of the season. With the addition of only those seven years, the winning percentage that had had such a big gap before is now nearly exactly even in September/October compared to the season as a whole. Now if the World Series rolls around in a month, and the Orioles and Cardinals (the two best September records in 2014, so far) are playing maybe we can pay a little bit of attention to this trend since it has been prevalent for over a decade. But if we get to the World Series and it ends up as a 1989 Bay Bridge Series rematch, with the ice cold A’s, and the only slightly better of late Giants squaring off, we’ll know that once again the large sample size guys have won.


Your One-Stop Shop for Postseason Narrative Debunking

I, like you, have been hearing and reading a lot about the postseason and which teams are best positioned to go deep into October. The rationales aren’t always based on more tangible factors — like, say, which teams are good — but rather on “hidden” or “insider” clues (use of scare quotes completely intentional) drawn from other qualities. I decided to test each of the factors I’ve heard or read about.

Full disclosure: This isn’t exactly original research. Well, it is, in that I made a list of various hypotheses to test, decided how to test them, and then spent hours pulling down data from Baseball-Reference and FanGraphs in order to create an unwieldy 275×23 spreadsheet full of logical operators. But it isn’t original in that some of the questions I’m addressing have been addressed elsewhere. For example, I’m going to consider whether postseason experience matters. David Gassko at the Hardball Times addressed the impact of players’ postseason experience on postseason outcomes in 2008, and Russell Carleton at Baseball Prospectus provided an analysis recently. I’m not claiming to be the first person to have thought of these items or of putting them to the test. What I’ve got here, though, is an attempt to combine a lot of narratives in one place, and to bring the research up to date through the 2013 postseason.

I’m going to look at seven questions:

  • Does prior postseason experience matter?
  • Do veteran players have an edge?
  • How important is momentum leading into the postseason?
  • Does good pitching stop good hitting?
  • Are teams reliant on home runs at a disadvantage?
  • Is having one or more ace starters an advantage?

For each question, I’ll present my research methodology and my results. Then, once I’ve presented all the conclusions, I’ll follow it up with a deeper discussion of my research methodology for those of you who care. (I imagine a lot of you do. This is, after all, FanGraphs.) In all cases, I’ve looked at every postseason series since the advent of the current Divisional Series-League Championship Series-World Series format in 1995. (I’m ignoring the wild card play-in/coin flip game.) That’s 19 years, seven series per year (four DS, 2 LCS, 1 WS), 133 series in total.

DOES POSTSEASON EXPERIENCE MATTER?

The Narrative: Teams that have been through the crucible of baseball’s postseason know what to expect and are better equipped to handle the pressures–national TV every game, zillions of reporters in the clubhouse, distant relations asking for tickets–than teams that haven’t been there before.

The Methodology: For each team, I checked the number of postseason series they played over the prior three years. The team with the most series was deemed the most experienced. If there was a tie, no team was more experienced. I also excluded series in which the more experienced team had just one postseason series under its belt, i.e., a Divisional Series elimination. I figured a team had to do more than one one-and-done in the past three years to qualify as experienced. In last year’s NLCS, for example, the Cardinals had played in five series over the past three years (three in 2011, two in 2012), while the Dodgers had played in none. So St. Louis got the nod. In the Dodgers’ prior Divisional Series, LA played an Atlanta team that lost a Divisional Series in 2010, its only postseason appearance in the prior three years, so neither team got credit for experience.

The Result: Narrative debunked. There have been 101 series in which one team was more experienced than the other, per my definition. The more experienced team won 50 of those series, or 49.5%. There is, at least since 1995, no relationship between postseason experience and success in the postseason.

DOES VETERAN PLAYERS HAVE AN EDGE?

The Narrative: The pressure on players grows exponentially in October. A veteran presence helps keep the clubhouse relaxed and helps players perform up to their capabilities, yet stay within themselves. Teams lacking that presence can play tight, trying to throw every pitch past the opposing batters and trying to hit a three-run homer with the bases empty on every at bat. (Sorry, I know, I’m laying it on thick, but that’s what you hear.)

The Methodology: For each team, I took the average of the batters’ weighted (by at bats + gamed played) age and the pitchers’ weighted (by 3 x games started + games + saves) age. I considered one team older than the other if its average age was 1.5 years older than that of its opponent. For example, in the 2012 ALCS, the Yankees’s average age was 31.5, and the Tigers’ was 28.1, so the Yankees had a veteran edge. When the Tigers advanced to the World Series against San Francisco, the Giants’ average age was 28.9, so neither team had an advantage.

The Result: Narrative in doubt. There have been 51 series in which one team’s average age was 1.5 or more years greater than the other. The older team won 27 of those series, or 53%. That’s not enough to make a definite call. And if you take away just one year–2009, when the aging Yankees took their most recent World Series–the percentage drops to 50%–no impact at all.

HOW IMPORTANT IS MOMENTUM LEADING INTO THE POSTSEASON?

The Narrative: Teams that end the year on a hot streak can carry that momentum right into the postseason. By contrast, a team that plays mediocre ball leading up to October develops bad habits, or forgets how to win, or something. (Sorry, but I have a really hard time with this one. We’re hearing it a lot this year–think of the hot Pirates or the cold A’s–but there are other teams, like the Orioles, who have the luxury of resting their players and lining up their starting rotation. I have a hard time believing that the O’s 3-3 record since Sept. 17 means anything.)

The Methodology: I looked up each team’s won-lost percentage over the last 30 days of the season and deemed a team as having more momentum if its winning percentage was 100 or more percentage points higher than that of its opponent. For example, in one of last year’s ALDS, the A’s were 19-8 (.704 winning percentage) over their last 30 days and the Tigers were 13-13 (.500), so the A’s had momentum. The Red Sox entered the other series on a 16-9 run (.640) and the Rays were 17-12 (.586), so neither team had an edge.

The Result: Narrative in doubt, and then, only for the Divisional Series. There have been 64 series in which one team’s winning percentage over its past 30 days was 100 percentage points higher than that of its opponent. In those series, the team with the better record won 33, or 51.5% of the time. That’s not much of an edge. And when you consider that a lot of those were in the Divisional Series, where the rules are slanted in favor of the better team (the team with the better record generally gets home field advantage), it goes away completely. Looking just at the ALCS, NLCS, and World Series, the team with the better record over the last 30 days of the season won 13 of 27 series, or 48%, debunking the narrative. In the Divisional Series, the hotter team over the last 30 days won 20 of 37 series, or 53%. That’s an edge, but not much of one.

DOES GOOD PITCHING STOP GOOD HITTING?

The Narrative: Pitching and defense win in October. Teams that hit a lot get shut down in the postseason.

The Methodology: I struggled with a methodology for this one. I came up with this: When a team whose hitting (measured by park-adjusted OPS) was 5% better than average faced a team whose pitching (by park-adjusted ERA) was 5% better than average, I deemed it as a good-hitting team meeting a good-pitching team. For example, the 2012 ALCS featured a good-hitting Yankees team (112 OPS+) against a good-pitching Tigers team (113 ERA+). The Yankees were also good-pitching (110 ERA+), but the Tigers weren’t good-hitting (103 OPS+).

The Result: Narrative in doubt. There have been 65 series in which a good-hitting team faced a good-pitching team, as defined above. (There were four in which both teams qualified as good-hitting and good-pitching; in those cases, I went with the better-hitting team for compiling my results.) In those series, the better-hitting team won 32 times, or 49%. That is, good hitting beat good pitching about half the time. That pretty much says it.

ARE TEAMS RELIANT AT HOME RUNS AT A DISADVANTAGE?

The Narrative: Teams that sit back and wait for home runs are at a disadvantage in the postseason, when better pitching makes run manufacture more important. Scrappy teams advance, sluggers go home.

The Methodology: I calculated each team’s percentage of runs derived from home runs. In every series, if one team derived 5% more of its runs from homers than another, I deemed that team as reliant on home runs. For example, in last year’s NLCS, the Cardinals scored 204 of their 783 runs on homers (26%). The Dodgers scored 207 of their 649 via the long ball (32%). So the Dodgers were more reliant on home runs. In the ALCS, the Red Sox scored 36% of their runs (305/853) on homers compared to 38% for the Tigers (301/796), so neither team had an edge.

The Result: Narrative in doubt. There have been 60 series in which one team derived a 5% or greater proportion of its runs from homers than its opponent. In those series, the more homer-happy team won 27 series, or 45% of the time. So the less homer-reliant team won 55%, which is OK, but certainly not a strong majority. And if you remove just one year–2012, when the less homer-reliant team won six series (three of those victories were by the Giants)–the percentage drops to 50%.

IS HAVING ONE OR MORE ACE STARTERS AN ADVANTAGE?

The Narrative: An ace starter can get two starts in a postseason series (three if he goes on short rest in the seventh game of a Championship or World Series.) Assuming he wins, that means his team needs win only one of three remaining games in a Divisional Series and only two of five or one of four in a Championship or World Series. A team lacking such a lights-out starter is at a disadvantage.

The Methodology: This is another one I struggled with. Defining an “ace” isn’t easy. I arrived at this: I totaled the Cy Young Award points for each team’s starters. If one team’s total exceeded the other’s by 60 or more points — the difference between the total number of first- and second-place votes since 2010 — I determined that team had an edge in aces. (The difference was half that prior to 2010, because the voting system changed in 2010, when the voting went from three deep to five deep and the difference between a first and second place vote rose from one point to two.) For example, in last year’s Boston-Tampa Bay Divisional Series, the only starter to receive Cy Young consideration was Tampa Bay’s Matt Moore, who got four points for two fourth-place votes. That’s not enough to give the Rays an edge. But in the other series, Tigers Max Scherzer (203 points) and Anibal Sanchez (46) combined for 249 points, while the A’s got 25 points for Bartolo Colon. That gives an edge to the Tigers.

The Result: Narrative in doubt. There have been 82 series in which one team’s starters got significantly more Cy Young Award vote points than its opponents’. The team with higher total won 44 series, or just under 54%. That’s not much better than a coin flip. And again, one year — in this case, 2001, when the team with the significantly higher Cy Young tally won six series — tipped the balance. Without the contributions of Randy Johnson, Curt Schilling, Roger Clemens, Freddy Garcia, Jamie Moyer, and Mike Mussina to that year’s postseason, the team with the apparent aces has won just 38 of 76 series, exactly half.

Conclusion: None of the narratives I examined stand up to scrutiny. Maybe the team that wins in the postseason, you know, just plays better.

 

Now, About the Methodology: I know there are limitations and valid criticisms of how I analyzed these data. Let me explain myself.

For postseason experience, I feel pretty good about counting the number of series each team played over the prior three years. One could argue that I should’ve looked at the postseason experience of the players rather than the franchise, but I’ll defend my method. There isn’t so much roster and coaching staff turnover from year to year to render franchise comparisons meaningless.

For defining veteran players, there are two issues. First, my choice of an age difference of 1.5 years is admittedly arbitrary. My thinking was pretty simple: one year doesn’t amount to much, and there were only 35 series in which the age difference was greater than two years. So 1.5 was a good compromise. Second, I know, age isn’t the same as years of experience. But it’s an OK proxy, it’s readily available, and it’s the kind of thing that the narrative’s built on. Bryce Harper has more plate appearances than J.D. Martinez, but he’s also over five years younger–whom do you think the announcers will describe as the veteran?

For momentum, I think the 30-day split’s appropriate. I could’ve chosen 14 days instead of 30 — FanGraphs’ splits on its Leaders boards totally rock — but I thought that’d include too many less meaningful late-season games when teams, as I mentioned, might be resting players and setting up their rotations. As for the difference of 100 points for winning percentage, that’s also a case of an admittedly arbitrary number that yields a reasonable sample size. A difference of 150 points, for example, would yield similar results but a sample size of only 39 compared to the 64 I got with 100 points.

For good hitting and good pitching, I realize that there are better measures of “good” than OPS+ and ERA+: wRC+ and FIP-, of course, among others. But I wanted to pick statistics that were consistent with the narrative. When a sportswriter or TV announcer says “good pitching beats good hitting,” I’ll bet you that at least 99 times out of a hundred that isn’t shorthand for “low FIP- beats high wRC+.” If you and I were asked to test whether good pitching beats good hitting, that’s probably how we’d do it. But that’s not what we’re looking at here OPS and ERA are more consistent with the narrative.

For reliance on home runs, it seems pretty clear to me that the right measure is percentage of runs scored via the long ball. Again, my choice of a difference of five percentage points is arbitrary, but it’s a nice round number that yields a reasonable sample size.

Finally, my use of Cy Young voting to determine a team’s ace or aces: Go ahead, open fire. I didn’t like it, either. But once again, we’re looking at a narrative, which may not be the objective truth. Look, Roger Clemens won the AL Cy Young Award in 2001 because he went 20-3. He was fourth in the league in WAR. He was ninth in ERA. He was third in FIP. He was, pretty clearly to me, not only not the best pitcher in the league, but also only the third best pitcher on his own team (I’d take Mussina and Pettite first). But I’ll bet you that when the Yankees played the Mariners for the ALCS that year (too far ago for me to remember clearly), part of the storyline was how the Yankees got stretched to five games in the Divisional Series and therefore wouldn’t have their ace, Roger Clemens, available until the fourth game against the Mariners. Never mind that Pettite was the MVP of the ALCS. The ace narrative is based on who’s perceived as the ace, not who actually is. (And a technical note: Until the Astros moved from the NL to the AL, the difference between first- and second-place votes in the two leagues were different, since there were 28 voters in the AL and 32 in the NL. The results I listed aren’t affected by that small difference. I checked.)


Defining Balanced Lineups

We’re used to hearing about teams having balanced or deep lineups. Other teams are defined as “stars and scrubs”. While I think we all know what these term mean, it’s not something that’s ever been quantified (at least, not to my knowledge). Since the issue of depth is an interesting one to me, I thought it’d be fun to to tackle this using wOBA.

For each team, I calculated wOBA on a team level, then the weighted standard deviation for all players. This produces each teams’ distribution, but since the size of the standard deviation is dependent on the average, (meaning that it’s not standard when comparing teams) I used the coefficient of variation (aka CV, simply standard deviation/average) as the final measure of consistency. The lower the CV, the smaller the spread of wOBA performance.

Read the rest of this entry »