Archive for Player Analysis

Estimating Pitcher Release Point Distance from PITCHf/x Data

For PITCHf/x data, the starting point for pitches, in terms of the location, velocity, and acceleration, is set at 50 feet from the back of home plate. This is effectively the time-zero location of each pitch. However, 55 feet seems to be the consensus for setting an actual release point distance from home plate, and is used for all pitchers. While this is a reasonable estimate to handle the PITCHf/x data en masse, it would be interesting to see if we can calculate this on the level of individual pitchers, since their release point distances will probably vary based on a number of parameters (height, stride, throwing motion, etc.). The goal here is to try to use PITCHf/x data to estimate the average distance from home plate the each pitcher releases his pitches, conceding that each pitch is going to be released from a slightly different distance. Since we are operating in the blind, we have to first define what it means to find a pitcher’s release point distance based solely on PITCHf/x data. This definition will set the course by which we will go about calculating the release point distance mathematically.

We will define the release point distance as the y-location (the direction from home plate to the pitching mound) at which the pitches from a specific pitcher are “closest together”. This definition makes sense as we would expect the point of origin to be the location where the pitches are closer together than any future point in their trajectory. It also gives us a way to look for this point: treat the pitch locations at a specified distance as a cluster and find the distance at which they are closest. In order to do this, we will make a few assumptions. First, we will assume that the pitches near the release point are from a single bivariate normal (or two-dimensional Gaussian) distribution, from which we can compute a sample mean and covariance. This assumption seems reasonable for most pitchers, but for others we will have to do a little more work.

Next we need to define a metric for measuring this idea of closeness. The previous assumption gives us a possible way to do this: compute the ellipse, based on the data at a fixed distance from home plate, that accounts for two standard deviations in each direction along the principal axes for the cluster. This is a way to provide a two-dimensional figure which encloses most of the data, of which we can calculate an associated area. The one-dimensional analogue to this is finding the distance between two standard deviations of a univariate normal distribution. Such a calculation in two dimensions amounts to finding the sample covariance, which, for this problem, will be a 2×2 matrix, finding its eigenvalues and eigenvectors, and using this to find the area of the ellipse. Here, each eigenvector defines a principal axis and its corresponding eigenvalue the variance along that axis (taking the square root of each eigenvalue gives the standard deviation along that axis). The formula for the area of an ellipse is Area = pi*a*b, where a is half of the length of the major axis and b half of the length of the minor axis. The area of the ellipse we are interested in is four times pi times the square root of each eigenvalue. Note that since we want to find the distance corresponding to the minimum area, the choice of two standard deviations, in lieu of one or three, is irrelevant since this plays the role of a scale factor and will not affect the location of the minimum, only the value of the functional.

With this definition of closeness in order, we can now set up the algorithm. To be safe, we will take a large berth around y=55 to calculate the ellipses. Based on trial and error, y=45 to y=65 seems more than sufficient. Starting at one end, say y=45, we use the PITCHf/x location, velocity, and acceleration data to calculate the x (horizontal) and z (vertical) position of each pitch at 45 feet. We can then compute the sample covariance and then the area of the ellipse. Working in increments, say one inch, we can work toward y=65. This will produce a discrete function with a minimum value. We can then find where the minimum occurs (choosing the smallest value in a finite set) and thus the estimate of the release point distance for the pitcher.

Earlier we assumed that the data at a fixed y-location was from a bivariate normal distribution. While this is a reasonable assumption, one can still run into difficulties with noisy/inaccurate data or multiple clusters. This can be for myriad reasons: in-season change in pitching mechanics, change in location on the pitching rubber, etc. Since data sets with these factors present will still produce results via the outlined algorithm despite violating our assumptions, the results may be spurious. To handle this, we will fit the data to a Gaussian mixture model via an incremental k-means algorithm at 55 feet. This will approximate the distribution of the data with a probability density function (pdf) that is the sum of k bivariate normal distributions, referred to as components, weighted by their contribution to the pdf, where the weights sum to unity. The number of components, k, is determined by the algorithm based on the distribution of the data.

With the mixture model in hand, we then are faced with how to assign each data point to a cluster. This is not so much a problem as a choice and there are a few reasonable ways to do it. In the process of determining the pdf, each data point is assigned a conditional probability that it belongs to each component. Based on these probabilities, we can assign each data point to a component, thus forming clusters (from here on, we will use the term “cluster” generically to refer to the number of components in the pdf as well as the groupings of data to simplify the terminology). The easiest way to assign the data would be to associate each point with the cluster that it has the highest probability of belonging to. We could then take the largest cluster and perform the analysis on it. However, this becomes troublesome for cases like overlapping clusters.

A better assumption would be that there is one dominant cluster and to treat the rest as “noise”. Then we would keep only the points that have at least a fixed probability or better of belonging to the dominant cluster, say five percent. This will throw away less data and fits better with the previous assumption of a single bivariate normal cluster. Both of these methods will also handle the problem of having disjoint clusters by choosing only the one with the most data. In demonstrating the algorithm, we will try these two methods for sorting the data as well as including all data, bivariate normal or not. We will also explore a temporal sorting of the data, as this may do a better job than spatial clustering and is much cheaper to perform.

To demonstrate this algorithm, we will choose three pitchers with unique data sets from the 2012 season and see how it performs on them: Clayton Kershaw, Lance Lynn, and Cole Hamels.

Case 1: Clayton Kershaw

Kershaw Clusters photo Kershaw_Clusters.jpeg

At 55 feet, the Gaussian mixture model identifies five clusters for Kershaw’s data. The green stars represent the center of each cluster and the red ellipses indicate two standard deviations from center along the principal axes. The largest cluster in this group has a weight of .64, meaning it accounts for 64% of the mixture model’s distribution. This is the cluster around the point (1.56,6.44). We will work off of this cluster and remove the data that has a low probability of coming from it. This is will include dispensing with the sparse cluster to the upper-right and some data on the periphery of the main cluster. We can see how Kershaw’s clusters are generated by taking a rolling average of his pitch locations at 55 feet (the standard distance used for release points) over the course of 300 pitches (about three starts).

Kershaw Rolling Average photo Kershaw_Average.jpeg

The green square indicates the average of the first 300 pitches and the red the last 300. From the plot, we can see that Kershaw’s data at 55 feet has very little variation in the vertical direction but, over the course of the season, drifts about 0.4 feet with a large part of the rolling average living between 1.5 and 1.6 feet (measured from the center of home plate). For future reference, we will define a “move” of release point as a 9-inch change in consecutive, disjoint 300-pitch averages (this is the “0 Moves” that shows up in the title of the plot and would have been denoted by a blue square in the plot). The choices of 300 pitches and 9 inches for a move was chosen to provide a large enough sample and enough distance for the clusters to be noticeably disjoint, but one could choose, for example, 100 pitches and 6 inches or any other reasonable values. So, we can conclude that Kershaw never made a significant change in his release point during 2012 and therefore treating the data a single cluster is justifiable.

From the spatial clustering results, the first way we will clean up the data set is to take only the data which is most likely from the dominant cluster (based on the conditional probabilities from the clustering algorithm). We can then take this data and approximate the release point distance via the previously discussed algorithm. The release point for this set is estimated at 54 feet, 5 inches. We can also estimate the arm release angle, the angle a pitcher’s arm would make with a horizontal line when viewed from the catcher’s perspective (0 degrees would be a sidearm delivery and would increase as the arm was raised, up to 90 degrees). This can be accomplished by taking the angle of the eigenvector, from horizontal, which corresponds to the smaller variance. This is working under the assumption that a pitcher’s release point will vary more perpendicular to the arm than parallel to the arm. In this case, the arm angle is estimated at 90 degrees. This is likely because we have blunted the edges of the cluster too much, making it closer to circular than the original data. This is because we have the clusters to the left and right of the dominant cluster which are not contributing data. It is obvious that this way of sorting the data has the problem of creating sharp transitions at the edge of cluster.

Kershaw Most Likely photo Kershaw_Likely_Final.jpeg

As discussed above, we run the algorithm from 45 to 65 feet, in one-inch increments, and find the location corresponding to the smallest ellipse. We can look at the functional that tracks the area of the ellipses at different distances in the aforementioned case.

Kershaw Most Likely Functional photo Kershaw_Likely_Fcn.jpeg

This area method produces a functional (in our case, it has been discretized to each inch) that can be minimized easily. It is clear from the plot that the minimum occurs at slightly less than 55 feet. Since all of the plots for the functional essentially look parabolic, we will forgo any future plots of this nature.

The next method is to assume that the data is all from one cluster and remove any data points that have a lower than five-percent probability of coming from the dominant cluster. This produces slightly better visual results.

Kershaw Five Percent photo Kershaw_Five_Pct_Final.jpeg

For this choice, we get trimming away at the edges, but it is not as extreme as in the previous case. The release point is at 54 feet, 3 inches, which is very close to our previous estimate. The arm angle is more realistic, since we maintain the elliptical shape of the data, at 82 degrees.

Kershaw Original photo Kershaw_Orig_Final.jpeg

Finally, we will run the algorithm with the data as-is. We get an ellipse that fits the original data well and indicates a release point of 54 feet, 9 inches. The arm angle, for the original data set, is 79 degrees.

Examining the results, the original data set may be the one of choice for running the algorithm. The shape of the data is already elliptic and, for all intents and purposes, one cluster. However, one may still want to remove manually the handful of outliers before preforming the estimation.

Case 2: Lance Lynn

Clayton Kershaw’s data set is much cleaner than most, consisting of a single cluster and a few outliers. Lance Lynn’s data has a different structure.

Lynn Clusters photo Lynn_Clusters.jpeg

The algorithm produces three clusters, two of which share some overlap and the third disjoint from the others. Immediately, it is obvious that running the algorithm on the original data will not produce good results because we do not have a single cluster like with Kershaw. One of our other choices will likely do better. Looking at the rolling average of release points, we can get an idea of what is going on with the data set.

Lynn Rolling Average photo Lynn_Average.jpeg

From the rolling average, we see that Lynn’s release point started around -2.3 feet, jumped to -3.4 feet and moved back to -2.3 feet. The moves discussed in the Kershaw section of 9 inches over consecutive, disjoint 300-pitch sequences are indicated by the two blue squares. So around Pitch #1518, Lynn moved about a foot to the left (from the catcher’s perspective) and later moved back, around Pitch #2239. So it makes sense that Lynn might have three clusters since there were two moves. However his first and third clusters could be considered the same since they are very similar in spatial location.

Lynn’s dominant cluster is the middle one, accounting for about 48% of the distribution. Running any sort of analysis on this will likely draw data from the right cluster as well. First up is the most-likely method:

Lynn Most Likely photo Lynn_Likely_Final.jpeg

Since we have two clusters that overlap, this method sharply cuts the data on the right hand side. The release point is at 54 feet, 4 inches and the release angle is 33 degrees. For the five-percent method, the cluster will be better shaped since the transition between clusters will not be so sharp.

Lynn Five Percent photo Lynn_Five_Pct_Final.jpeg

This produces a well-shaped single cluster which is free of all of the data on the left and some of the data from the far right cluster. The release point is at 53 feet, 11 inches and at an angle of 49 degrees.

As opposed to Kershaw, who had a single cluster, Lynn has at least two clusters. Therefore, running this method on the original data set probably will not fare well.

Lynn Original photo Lynn_Orig_Final.jpeg

Having more than one cluster and analyzing it as only one causes both a problem with the release point and release angle. Since the data has disjoint clusters, it violates our bivariate normal assumption. Also, the angle will likely be incorrect since the ellipse will not properly fit the data (in this instance, it is 82 degrees). Note that the release point distance is not in line with the estimates from the other two methods, being 51 feet, 5 inches instead of around 54 feet.

In this case, as opposed to Kershaw, who only had one pitch cluster, we can temporally sort the data based on the rolling average at the blue square (where the largest difference between the consecutive rolling averages is located).

Lynn Time Clusters photo Lynn_Time_Clusters.jpeg

Since there are two moves in release point, this generates three clusters, two of which overlap, as expected from the analysis of the rolling averages. As before, we can work with the dominant cluster, which is the red data. We will refer to this as the largest method, since it is the largest in terms of number of data points.  Note that with spatial clustering, we would pick up the some of the green and red data in the dominant cluster. Running the same algorithm for finding the release point distance and angle, we get:

Lynn Largest photo Lynn_Large_Final.jpeg

The distance from home plate of 53 feet, 9 inches matches our other estimates of about 54 feet. The angle in this case is 55 degrees, which is also in agreement. To finish our case study, we will look at another data set that has more than one cluster.

Case 3: Cole Hamels

Hamels Clusters photo Hamels_Clusters.jpeg

For Cole Hamels, we get two dense clusters and two sparse clusters. The two dense clusters appear to have a similar shape and one is shifted a little over a foot away from the other. The middle of the three consecutive clusters only accounts for 14% of the distribution and the long cluster running diagonally through the graph is mostly picking up the handful of outliers, and consists of less than 1% of the distribution. We will work with the the cluster with the largest weight, about 0.48, which is the cluster on the far right. If we look at the rolling average for Hamels’ release point, we can see that he switched his release point somewhere around Pitch #1359 last season.

Hamels Rolling Average photo Hamels_Average.jpeg

As in the clustered data, Hamel’s release point moves horizontally by just over a foot to the right during the season. As before, we will start by taking only the data which most likely belongs to the cluster on the right.

Hamels Most Likely photo Hamels_Likely_Final.jpeg

The release point distance is estimated at 52 feet, 11 inches using this method. In this case, the release angle is approximately 71 degrees. Note that on the top and the left the data has been noticeably trimmed away due to assigning data to the most likely cluster. The five-percent method produces:

Hamels Five Percent photo Hamels_Five_Pct_Final.jpeg

For this method of sorting through the data, we get 52 feet, 10 inches for the release point distance. The cluster has a better shape than the most-likely method and gives a release angle of 74 degrees. So far, both estimates are very close. Using just the original data set, we expect that the method will not perform well because there are two disjoint clusters.

Hamels Original photo Hamels_Orig_Final.jpeg

We run into the problem of treating two clusters as one and the angle of release goes to 89 degrees since both clusters are at about the same vertical level and therefore there is a large variation in the data horizontally.

Just like with Lance Lynn, we can do a temporal splitting of the data. In this case, we get two clusters since he changed his release point once.

Hamels Time Clusters photo Hamels_Time_Clusters.jpeg

Working with the dominant cluster, the blue data, we obtain a release point at 53 feet, 2 inches and a release angle of 75 degrees.

Hamels Largest photo Hamels_Large_Final.jpeg

All three methods that sort the data before performing the algorithm lead to similar results.

Conclusions:

Examining the results of these three cases, we can draw a few conclusions. First, regardless of the accuracy of the method, it does produce results within the realm of possibility. We do not get release point distances that are at the boundary of our search space of 45 to 65 feet, or something that would definitely be incorrect, such as 60 feet.  So while these release point distances have some error in them, this algorithm can likely be refined to be more accurate. Another interesting result is that, provided that the data is predominantly one cluster, the results do not change dramatically due to how we remove outliers or smaller additional clusters. In most cases, the change is typically only a few inches. For the release angles, the five-percent method or largest method probably produces the best results because it does not misshape the clusters like the mostly-likely method does and does not run into the problem of multiple clusters that may plague the original data. Overall, the five-percent method is probably the best bet for running the algorithm and getting decent results for cases of repeated clusters (Lance Lynn) and the largest method will work best for disjoint clusters (Cole Hamels). If just one cluster exists, then working with the original data would seem preferable (Clayton Kershaw).

Moving forward, the goal is settle on a single method for sorting the data before running the algorithm. The largest method seems the best choice for a robust algorithm since it is inexpensive and, based on limited results, performs on par with the best spatial clustering methods. One problem that comes up in running the simulations that does not show up in the data is the cost of the clustering algorithm. Since the method for finding the clusters is incremental, it can be slow, depending on the number of clusters. One must also iterate to find the covariance matrices and weights for each cluster, which can also be expensive. In addition, the spatial clustering only has the advantages of removing outliers and maintaining repeated clusters, as in Lance Lynn’s case. Given the difference in run time, a few seconds for temporal splitting versus a few hours for spatial clustering, it seems a small price to pay. There are also other approaches that can be taken. The data could be broken down by start and sorted that way as well, with some criteria assigned to determine when data from two starts belong to the same cluster.

Another problem exists that we may not be able to account for. Since the data for the path of a pitch starts at 50 feet and is for tracking the pitch toward home plate, we are essentially extrapolating to get the position of the pitch before (for larger values than) 50 feet. While this may hold for a small distance, we do not know exactly how far this trajectory is correct. The location of the pitch prior to its individual release point, which we may not know, is essentially hypothetical data since the pitch never existed at that distance from home plate. This is why is might be important to get a good estimate of a pitcher’s release point distance.

There are certainly many other ways to go about estimating release point distance, such as other ways to judge “closeness” of the pitches or sort the data. By mathematizing the problem, and depending on the implementation choices, we have a means to find a distinct release point distance. This is a first attempt at solving this problem which shows some potential. The goal now is to refine it and make it more robust.

Once the algorithm is finalized, it would be interesting to go through video and see how well the results match reality, in terms of release point distance and angle. As it is, we are essentially operating blind since we are using nothing but the PITCHf/x data and some reasonable assumptions. While this worked to produce decent results, it would be best to create a single, robust algorithm that does not require visual inspection of the data for each case. When that is completed, we could then run the algorithm on a large sample of pitchers and compare the results.


The Ten Highest BABIPs Since 1945

Earlier this season I looked at the ten lowest BABIPs since 1945, investigating what, exactly, this statistic can teach us about hitters. The conclusions ranged from clear to not-so-much: your batting average on balls in play will be lower if you’re too slow to beat out infield grounders, if you hit an unusually low number of line drives, if you’re getting poor contact by swinging at bad pitches, and if you’re just plain unlucky. Sometimes players saw their power numbers drop along with their BABIPs, most likely because of an inferior approach at the plate which caused weak hits, but sometimes players saw their power numbers rise sharply: one of the ten lowest BABIPs ever belongs to Roger Maris, because he put 61 balls out of play and over the outfield fences.

Will our high scorers clear things up?

What is BABIP? (Copied from the First Post)

Batting average on balls in play is exactly that: when you hit the ball and it’s not a home run, what’s your batting average? Imagine you’d only ever batted twice; first you hit a single and then you struck out. Your BABIP would be 1.000. If a single and a groundout, .500. After seven games of the 2013 season, Rick Ankiel had two home runs but no singles, doubles, or triples, so his BABIP was .000.

Across any given season, the average BABIP tends to be about .300. All this means is that, when you hit the ball at professional defenders, there’s a 70% chance they’ll get you out.

The Ten Highest BABIPs Since 1945

Leaderboard

10. Willie McGee, 1985 (.395). McGee’s presence here isn’t surprising, since his hallmarks, aside from excellent hitting skills (and not much power), were speedy outfield defense and quality baserunning. It’s easy to imagine McGee beating infield grounders, hustling out hits, or being above average at driving the ball, even though some of those statistics weren’t tracked at the time.

9. Derek Jeter, 1999 (.396). Jeter’s 2006 ranks 17th on the list, too. Jeter’s 266 infield hits since 2002, when batted-ball data started being counted, ranks second among all hitters in that decade-plus. First place? You’ll find out who that is in a minute (if you don’t know already).

8. Wade Boggs, 1985 (.396). Hey look, two top-ten BABIP seasons in the exact same year! Boggs edges McGee and the whole league with 240 hits in 161 games, 187 (77.9%) of them singles. During all his batting-title years, his BABIP was high, bottoming out at .361. Lucky? No: more like extremely good contact skills.

7. Austin Jackson, 2010 (.396). Jackson’s breakout season in center field for Detroit (that .396 BABIP led him to a .293 average) was followed by a breakup 2011 when his BABIP dropped 56 points (still above average!) and his batting average and on-base percentage fell 54 and 28 points, respectively. So far in 2013 Jackson’s at a career low on balls in play, but he’s also dramatically reduced his previously ugly strikeout rate, which has bolstered his return to the ranks of the truly outstanding.

6. Andres Galarraga, 1993 (.399). Before Galarraga cranked out 47 home runs at the age of 35, he had an also highly improbable 1993. Triple slash, 1989-1992 (509 games): .246/.301/.399. Home runs in those 509 games: 62. Triple slash in 1993: .370/.403/.602.

Three observations. First, Galarraga’s batting average never came within fifty (!) points of that again. Second, this was his first season in Colorado, although it wasn’t a full one, as he only played 120 games. The Coors boost to his power was minimal, at first. Third, the guy could not take a walk.

5. Ichiro Suzuki, 2004 (.399). Will anyone be surprised to see Ichiro here? Speedy, with a near-mythical gift for hitting, Ichiro also has a gift for avoiding fly balls (23.8% flyballs, fourteenth-lowest in baseball since we started counting in 2002). And another thing we’ve been counting since 2002: Ichiro has 463 infield hits, 40% more than second-place Derek Jeter. In 2004, Ichiro had 57 infield hits in 161 games, or about one every series. Since 2002, Mark Sweeney has 12 infield hits in 690 games.

4. Roberto Clemente, 1967 (.403). Clemente was in the middle of a run of six consecutive 6.0+ WAR years. His high batting average on balls in play made this one his most valuable of all (7.7), 40 points above his career average (which was identical to his BABIP the year before). Clemente hit six fewer homers and five fewer doubles but 19 more singles, explaining the paradox that his slugging percentage rose while his power actually dropped.

3. Manny Ramirez, 2000 (.403). This is one of seven seasons in which Manny posted a BABIP above .350. I looked at batted ball data, available from 2002 onward, and found that Manny’s 22.6% line drives ranked 31st among the 481 hitters who’ve racked up more than 1,500 plate appearances since. Of course, Manny was inconsistent in that stretch. His .373 BABIP in 2002 coincided (or not!) with a line-drive rate of 25.3%. (Mark Loretta sits at first since ’02, 26.0%, while at second with 25.2% is Joey Votto, more on whom shortly.)

2. Jose Hernandez, 2002 (.404). I was alive and watching baseball in 2002 and I had never heard of Jose Hernandez. The Brewers shortstop had four pretty good seasons (1998-99, 2001, 2004), three terrible ones (1996, 2000, 2003), and a rather miraculous 2002 which found Hernandez riding a tidal wave of good luck on balls in play. His average rose 39 points, and dropped by 63 the next season; he struck out in literally one-third of his at-bats (188 Ks); his power numbers were unchanged. But, aside from luck, there was another big change. This was the first year batted-ball data is available, and the only year where Hernandez’ flyball rate was below 30%. Between Hernandez, Ichiro, and Jeter, flyball rate is a significant predictor of BABIP.

1. Rod Carew, 1977 (.408). What does it take to have the highest career BABIP of any finished career since 1945? (“Hang on,” you say, “what’s with this ‘finished career’ business?” “Ah,” I say, “Austin Jackson and Joey Votto are in the lead.”) Carew’s career BABIP is .359. Carew’s 1974 ranks 19th on this list (.391). So the guy was a great hitter: but his 1977 was extraordinary. An 8.5 WAR season, it saw a dramatic spike in singles, plus career highs in doubles, triples, and (tied with 1975) home runs. There was also an MVP award.

Conclusions

Again, some of the things we learned are unsurprising: speed is good; being an all-time great contact hitter is good. But there’s a twist: Jose Hernandez benefited from a whole lot of luck, and Rod Carew had the year of his life, but most of the guys here are obviously disposed to high BABIPs based on their skills. We were able to blame a lot of the bottom-ten seasons on hard times and bad breaks, but most of these guys are exceptional hitters with speed and contact ability.

And there’s a new factor begging for our attention.

When we looked at the ten lowest BABIPs, we were unwittingly at a disadvantage, because only one of those low seasons took place while batted-ball data was documented. Three of our ten highest have happened since 2002, though, as well as #13, 14, and 17, which means we have evidence of a new factor.

Hit more line drives, and your batting average on balls in play goes up.

Hit more fly balls, and it goes down–fast.

As a Community Research writer, I can’t insert a chart here; as a lazy person, I don’t have a chart to insert. But the next step in our inquiry is very, very clear. Does fly ball hitting suppress BABIP? Is it because of the increase in home runs, the ease with which defenders catch the ball, both, or neither?

Even More Pertinent Conclusion

We live in the golden age of BABIP. If I had done this “Ten Highest” post including 2013, the present season would have accounted for 40% of the list.

Among the top 20 BABIP guys with more than 700 games played in their careers, there are some retirees: Rod Carew (#2), Ron LeFlore (#7), Wade Boggs, Roberto Clemente, Kirby Puckett, Tony Gwynn, Willie McGee, and John Kruk. But 12 of the top 20 guys are currently active: Joey Votto (#1), Derek Jeter (#3), Shin-Soo Choo (#4), Matt Kemp (#5), Joe Mauer, Miguel Cabrera, Ichiro Suzuki, Matt Holliday, Michael Bourn, Ryan Braun, Wilson Betemit, David Wright.

As commenter Ferd pointed out last time, the league average BABIP was .260 in 1968; when I started the series, I relied on research which assured me that BABIP was consistent over time, but this is clearly not true. This means that there are two more lines of inquiry we should follow.

1. Why are so many BABIP leaders currently active? Is it a change in hitting style? Is it a change in pitching style? Is it a change in the data being used or the calculations being made? Or is it simply because most of them haven’t gotten older, slower, and less talented at the plate, and once they all age and retire order will be restored?

2. Wilson Betemit? How did that happen?


Chris Davis’s Oddly Historic Season So Far

A lot of ink (and pixels) have been spilled about Chris Davis’ great season.  It’s hard to overstate just how great a .337/.432/.721 start through roughly one-third of the season is, especially in this renewed era of depressed offense.  MLB’s .722 OPS this year so far ranks it as Baseball’s second-lowest since 1992’s .700.  (2011 = .720)  Quite straight, Davis is having the best offensive season in the American League of any player whose first name is not some variation of “Michael”.

Here’s yet another data point for you to chew on: Chris Davis is on track to have one of the highest extra-base hit (XBH) to plate appearance (PA) ratios in history.

As of the morning of Memorial Day 2013, Davis has hit an XBH in 16.5% of his PAs.  In conversational terms, he hits an XBH about every six times he steps to the plate.

If Davis were to end the season with this ratio and qualify for a batting championship, it would rank second in history behind this other guy’s pretty good season.

In fact, only nine qualified players in modern history have ever had an XBH-PA ratio of greater than 15% over the course of an entire season.  Here is the list, with Davis’s 2013 added for context:

Rk Player Year XBH PA XBH %
1 Babe Ruth 1921 119 693 17.2%
2 Chris Davis 2013 34 206 16.5%
3 Albert Belle 1995 103 631 16.3%
4 Lou Gehrig 1927 117 717 16.3%
5 Barry Bonds 2001 107 664 16.1%
6 Babe Ruth 1920 99 616 16.1%
7 Jeff Bagwell 1994 73 479 15.2%
8 Al Simmons 1930 93 611 15.2%
9 Albert Belle 1994 73 480 15.2%
10 Todd Helton 2001 105 697 15.1%

You may have noticed that 30% of the players on this list are named either Al or Albert, but none of them are named Pujols.  None of them are named Miguel, either.  In fact, the closest the reigning American League Triple Crown winner has come to cracking this list was in 2010 with a 13.0% XBH-PA ratio, and as of this morning he sits well out of range in 2013 at 12.5%, despite his own empirically otherworldly start.

This is, without a doubt, a most exclusive list of a most consistently slugging nature.  It’s enough to send pitchers into grand mal seizures at the very contemplation of this.  Or perhaps more exactly, it might if they were even aware of it.  This data point has probably not yet been illuminated in quite this way—this here article is the closest I myself have found so far, and Davis is not even the star of the piece.  But that does not mitigate the impressiveness of this feat of his so far.

This is not to say that Chris Davis is a better hitter than Miguel Cabrera, or Albert Pujols or Joey Votto or even Shin Soo Choo, for that matter.  But even if this does turn out to be a world class-level fluke season for him, Davis has a chance to crack an elite list inhabited only by the greatest of the great, even if he never knows it.


The Ten Lowest BABIPs Since 1945

For hitters, BABIP is often an explanation for unusually good or bad seasons. But what causes a great or poor BABIP? And are we right to simply blame BABIP whenever a bizarre season happens? It might help to look at some extreme cases. Even if we don’t learn something about how to interpret hitters’ BABIP, we can at least have fun. Nerdy, nerdy fun.

What is BABIP?

Batting average on balls in play is exactly that: when you hit the ball and it’s not a home run, what’s your batting average? Imagine you’d only ever batted twice; first you hit a single and then you struck out. Your BABIP would be 1.000. If a single and a groundout, .500. After seven games of the 2013 season, Rick Ankiel had two home runs but no singles, doubles, or triples, so his BABIP was .000.

Across any given season, the average BABIP tends to be about .300. All this means is that, when you hit the ball at professional defenders, there’s a 70% chance they’ll get you out.

What influences BABIP?

The enemy. Defense and to some extent pitching are factors, but over the course of a full year, as you face the entire league, this averages out.

Power. If you hit twenty balls to the warning track, and a lot of them fall for hits, your BABIP will increase. But if they all carry right over the fence for home runs, they will stop counting for this purpose, meaning your BABIP will probably decrease since more of your hits will be excluded from the stat.

Hitting style. There are six infielders, so more ground balls tend to be fielded; this is why pitchers, who are wimpy at hitting, tend to have low BABIPs. Fly balls are often caught, so the best scores go to line-drive hitters.

Speed. If you’re fast enough to beat throws and bunt for singles, your BABIP will be higher. If you run like I do, probably not so much.

Luck. Maybe the biggest single factor is: are you lucky? We all see hard-hit balls straight at defenders, or guys who go on “hot streaks” where the ball “finds all the holes.” That’s called “luck,” and BABIP can quantify it. Believe it or not, you really can have good or bad luck that lasts an entire year.

Let’s illustrate these principles by looking at some hitters with very low BABIPs.

The Ten Lowest BABIPs Since 1945

10. Roger Maris, 1961 (.209). 38.4% of Roger Maris’ hits that year were home runs. (Stop now to think about that.) If the ball stayed in the park, somebody probably caught it. On the other hand, if the ball had a chance of leaving the park, it did. 61 of them did.

9. Jim King, 1963 (.208). Although somewhat powerful (24 homers), Jim King was also something else: bad. His BABIP never came close to league average, and in partial seasons after ’63 it would be .207 and .209. He was known as a power-hitting bench bat, and only found regular playing time on the miserable Washington Senators (106 losses that year).

8. Dave Kingman, 1982 (.207). Dave Kingman hit homers (37) and struck out a whole lot, and based on his terrible, terrible fielding metrics, he was a mighty slow fellow. There’s also another factor here: he was old. “But he was only 33,” you say. “If there was something to this age thing, he’d get worse as he got even older.” “Aha,” I reply, “that’s why you’re supposed to keep reading!”

7. Dick McAuliffe, 1971 (.206). Here’s our first plausible “bad luck” guy. A career .264 BABIP, and indeed the following year he had a .264 BABIP. A career .247 hitter, and the following year he hit .240. A career .343 OBP, and the following year his OBP was .339. So Dick McAuliffe bounced back just fine, but it’s worth noting two things: first, a career .247 hitter is not that good, and second, for whatever reason his walk rate did decline sharply during his “unlucky” year. Was he swinging more aggressively? If so, he was still striking out less than usual.

6. Roy Cullenbine, 1947 (.206). I mentioned Roy Cullenbine in my first post on these venerable pages: a man who combined all-time bad luck with a truly incredible batting eye, walking 22.6% of the time despite being a distinctly non-intimidating hitter. The only guy in 1947 who walked more was Triple Crown winner Ted Williams, and Williams was frequently being walked on purpose. Cullenbine’s possibly all-time-great ability to take a walk was rewarded with–well, never playing in another major league game.

He did hit 24 homers, but this is another bad luck year. Heck, Cullenbine’s BABIP in 1946 was .347.

5. Dave Kingman, 1986 (.204). Toldja so! Here’s Kingman, age 37, hitting home runs (35) but nothing else. A full-time DH by now, he (like Cullenbine) never played in the big leagues again.

4. Brooks Robinson, 1975 (.204). Only six home runs to his name, still manning third base, Brooks Robinson is another example of what’s becoming a clear trend: he was 38 years old. He played partial seasons after this, but not full ones. This was a truly godawful year: .201/.267/.274, good for a wRC+ of 54.

3. Ted Simmons, 1981 (.200). A catcher and a fairly slow runner turning 32, Simmons saw a small drop in power, which he partially recovered the next year, and a 97-point drop in his BABIP, hard to explain just from the power outage. The traditional explanation for his poor 1981 is that he had just moved to Milwaukee and the American League. Luck might have hurt him, too.

2. Curt Blefary, 1968 (.198). Carson Cistulli previously highlighted Blefary on this site. After winning Rookie of the Year in 1965, the young outfielder posted two more above-average seasons before falling off a metaphorical cliff in 1968. He was being bounced around between positions, and he was never a speedster: his defense inspired the nicknames Clank and Buffalo.

Part of it must be bad luck. The BABIP .045 below his career average bounced back in 1969, when he moved to catcher and had a fairly good season for the Astros; a power decline turned out to be real, but his other numbers recovered. And yet Blefary would play his last major league game at age 29, moving on to a career as a “sheriff, bartender, truck driver, and night club owner.”

1. Aaron Hill, 2010 (.196). Aaron Hill’s notoriously lost season is the only one here from the last twenty-five years–and the most dramatic of all. Interestingly, a RotoGraphs article on Hill attributes his 2010 to pure awfulness but his recovery in 2011 to an “inflated” BABIP. But a .196 BABIP, a full hundred points below average, counts as deflated, right? Hill sucked in 2010 despite 26 homers and a slightly increased walk rate.

The advantage of recency is that we have more data. Here the culprit is obvious: he had previously been, and would soon be again, very good at hitting line drives, but in 2010 his line-drive percentage dropped by half (just 10.6%) and more than half of the balls he hit all year became fly balls. Some of those drifted out of the park, but most drifted over a waiting defender. And even though Hill was walking more, he was also swinging more frequently at pitches outside the strike zone. Hill’s new approach in 2010 didn’t hurt his ability to take a walk, but it hurt his ability to drive the ball. Still, to earn the lowest BABIP in modern history, he also suffered from an entire season of some of the worst luck any batter’s ever had.

Conclusion

The BABIP losers here didn’t do badly over their careers: combined, these “bottom 10” earned 42 All-Star appearances (18 by Brooks Robinson), 3 MVP awards, and a Rookie of the Year prize.

This unscientific survey confirms a lot of preconceived ideas:
– slower players don’t create their own luck on balls hit in fair territory
– aging players often lose their speed or power or both
– swinging at balls outside the strike zone means you make inferior contact
– sometimes, good luck isn’t enough to save a terrible hitter
– sometimes, terrible luck is enough to end a good hitter’s career

But there’s an interesting question to be raised here. Some of these guys–Maris, Kingman–hit homers like crazy, thus suppressing their BABIPs. On the other hand, Blefary and Simmons lost home run power in their hard-luck years. Simmons was playing in a new ballpark and Blefary at a new position. Maybe they were the Aaron Hills of their times, adjusting their approaches in deleterious ways (probably swinging at more pitches). Maybe they hit the ball poorly for unknown, reversible reasons. Maybe they had bad luck.

If I were counseling hitters on how to maximize their batting average on balls in play, I would say this: cultivate speed and athleticism, swing at better pitches, and try to hit line drives. I don’t know if BABIP can or should be learned, however. Ultimately, BABIP is the baseball version of a zen koan or hippie bumper sticker. BABIP: Stuff Happens. Or, more accurately, sometimes in baseball you make your own fate, but sometimes your fate makes you.


The True Dickey Effect

Most people that try to analyze this Dickey effect tend to group all the pitchers that follow in to one grouping with one ERA and compare to the total ERA of the bullpen or rotation. This is a simplistic and non-descriptive way of analyzing the effect and does not look at the how often the pitchers are pitching not after Dickey.

I decided to determine if there truly is an effect on pitchers’ statistics (ERA, WHIP, K%, BB%) who follow Dickey in relief and the starters of the next game against the same team. I went through every game that Dickey has pitched and recorded the stats (IP, TBF, H, ER, BB, K) of each reliever individually and the stats of the next starting pitcher if the next game was against the same team. I did this for each season. I then took the pitchers’ stats for the whole year and subtracted their stats from their following Dickey stats to have their stats when they did not follow Dickey. I summed the stats for following Dickey and weighted each pitcher based on the batters he faced over the total batters faced after Dickey. I then calculated the rate stats from the total. This weight was then applied to the not after Dickey stats. So for example if Francisco faced 19.11% of batters after Dickey, it was adjusted so that he also faced 19.11% of the batters not after Dickey. This gives an effective way of comparing the statistics and an accurate relationship can be determined. The not after Dickey stats were then summed and the rate stats were calculated as well. The two rate stats after Dickey and not after Dickey were compared using this formula (afterDickeySTAT-notafterDickeySTAT)/notafterDickeySTAT. This tells me how much better or worse relievers or starters did when following Dickey in the form of a percentage.

I then added the stats after Dickey for starters and relievers from all three years and the stats not after Dickey and I applied the same technique of weighting the sample so that if Niese’12 faced 10.9% of all starter batters faced following a Dickey start against the same team, it was adjusted so that he faced 10.9% of the batters faced by starters not after Dickey (only the starters that pitched after Dickey that season). The same technique was used from the year to year technique and a total % for each stat was calculated.

Here is the weighted year by year breakdown of the starters’ statistics following Dickey and a total (- indicates a decrease which is desired for all stats except K%):

2012:
ERA: -46.94%  with 5/5 starters seeing a decrease
WHIP: -16.16% with 4/5 seeing a decrease
K%: 47.04% with 4/5 seeing an increase
BB%: 6.50% with 3/5 seeing a decrease
HR%: -50.53% with 5/5 seeing a decrease
BABIP: -14.08% with 4/5 seeing a decrease
FIP: -25.17% with 5/5 seeing a decrease

2011:
ERA: 17.92%  with 0/3 seeing a decrease
WHIP: -9.63% with 2/3 seeing a decrease
K%: -2.64% with 2/3 seeing an increase
BB%: -15.94% with 2/3 seeing a decrease
HR%: -9.21% with 2/3 seeing a decrease
BABIP: -15.14% with 2/3 seeing a decrease
FIP: -5.58% with 2/3 seeing a decrease

2010:
ERA: -23.82%  with 5/7 seeing a decrease
WHIP: 1.68% with 5/7 seeing a decrease
K%: -22.91% with 1/7 seeing an increase
BB%: -2.34% with 5/7 seeing a decrease
HR%: -43.61% with 5/7 seeing a decrease
BABIP: -3.61% with 4/7 seeing a decrease
FIP: -10.61% with 5/7 seeing a decrease

Total:
ERA: -17.21%  with 10/15 seeing a decrease
WHIP: -8.10% with 11/15 seeing a decrease
K%: -3.38% with 7/15 seeing an increase
BB%: -5.17% with 10/15 seeing a decrease
HR%: -32.96% with 12/15 seeing a decrease
BABIP: -11.04% with 10/15 seeing a decrease
FIP: -13.34% with 12/15 seeing a decrease

So for starters that pitch in games following Dickey against the same team, it can be concluded that there is an effect on ERA, WHIP, BABIP, and FIP and a slight effect on BB% and on K%. There is also a large effect on HR rates which we can attribute the ERA effect to. This also tells us that batters are making worse contact the day after Dickey.

So a starter (like Morrow) who follows Dickey against the same team can expect to see around a 17.2% reduction in his ERA that game compared to if he was not following Dickey against the same opponent. For example if Morrow had a 3.00 ERA in games not after Dickey he can expect a 2.48 ERA in games after Dickey.

So if in a full season where Morrow follows Dickey against the same team 66% of the time (games 2 and 3 of a series) in which he normally would have a 3.00 ERA without Dickey ahead of him, he could expect a 2.66 ERA for the season. This seams to be a significant improvement and would equate to a 7.6 run difference (or 0.8 WAR) over 200 innings.

Here is a year by year breakdown of relievers after Dickey (these are smaller sample sizes so I will not include how many relievers saw an increase or decrease):

2012:
ERA: -25.51%
WHIP: -1.57%
K%: 27.04%
BB%: -49.25%
HR%: -34.66%
BABIP: 30.23%
FIP: -38.34%

2011:
ERA: -17.43%
WHIP: 8.45%
K%: 6.74%
BB%: -5.14%
HR%: 7.34%
BABIP: 9.75%
FIP: -2.05%

2010:
ERA: -2.55%
WHIP: 7.69%
K%: -9.28%
BB%: 10.84%
HR%: 2.11%
BABIP: 4.23%
FIP: 9.43%

Total:
ERA: -16.61%
WHIP: 5.38%
K%: 7.50%
BB%: -12.65%
HR%: -8.53%
BABIP: 13.38%
FIP: -10.40%

As expected there was a good effect on the relievers’ ERA, FIP, K%, and BB%, but the WHIP and BABIP were affected negatively. This tells me that the batters were more free swinging after just seeing Dickey (more hits, less walks, more strikeouts).

So in a season where there are 55 IP after Dickey in games (like in 2012) there would be a 16.6% reduction in runs given up in those 55 innings. If the bullpen’s ERA is 4.20 without Dickey it can be expected to be 3.50 after Dickey. Over 55 IP this difference would save 4.3 runs (or 0.4 WAR).

Combine this with the saved starter runs and you get 11.9 runs saved or (1.2 WAR). This is Dickey’s underlying value with the team that he creates by baffling hitters. This 1.2 WAR is if Morrow has a 3.00 ERA normally and the bullpen has a 4.00 ERA. If Morrow normally had a 4.00 ERA than his ERA would reduce to 3.54 over the season with 10.2 runs saved for 200 innings (1.0 WAR) and if the bullpen has a 4.00 ERA normally as well, 4.1 runs would be saved there, equating to 14.3 runs saved or a 1.4 WAR over a season.


Introducing BERA: Another ERA Estimator to Confuse You All

Coming up with BERA… like its [almost] namesake might say, it was 90% mental, and the other half was physical.  OK, maybe he’d say something more along the lines of “what the hell is this…” but that’s beside the point.    By BERA, I mean BABIP-estimating ERA (or something like that… maybe one of you can come up with something fancier).  It’s an ERA estimator that’s along the lines of SIERA, only it’s simpler, and—dare I say—better.

You know, I started out not knowing where I was going, so I was worried I might not get there.  As you may recall, I’ve been pondering pitcher BABIPs for a little while here (see article 1 and article 2), and whereas my focus thus far had been on explaining big-picture, long-term BABIP stuff in terms of batted ball data, one question that remained was how well this info could be used to predict future BABIPs.  After monkeying around with answering that question, though, I saw that SIERA’s BABIP component could be improved upon, so I set to work in coming up with BERA.  In doing so, I definitely piggybacked off of FIP and a little of what SIERA had already done.  You can observe a lot just by watching, you know.   I’m also a believer in “less is more” (except for when it comes to the size of my articles, obviously), so I tried to go for the best compromise of simplicity and accuracy that I could.

Read the rest of this entry »


BABIP and Innings Pitched (Plus, Explaining Popups)

In my last post on explaining pitchers’ BABIPs by way of their batted ball rates, I was very careful to say that it was applicable in the long run, as it’s hard to be accurate over a short number of innings pitched, due to all the “noise” in BABIP (Batting Average on Balls In Play).  I only used pitchers with a qualifying number of innings pitched (IP) in the calculations, for that reason.  After writing the post, I did some messing around with the data, to find out just how much of an effect IP had on the predictability of BABIP.

Hold on to your propeller beanies, fellow stat geeks: the correlation between xBABIP and BABIP went from 0.805 when the minimum IP was set to 1500, to 0.632 at a 200 IP minimum, down to 0.518 at 50 IP.  OK, maybe it’s not that surprising.  Still, I thought I’d better show you how confident you can be in my xBABIP formula’s accuracy when you take the pitcher’s innings pitched into account.

The formula, again: xBABIP = 0.4*LD% – 0.6*FB%*IFFB% + 0.235

And remember, that formula is primarily meant to be a backwards-looking estimator of “true,” defense-neutral BABIP.  My next article will (probably) discuss another formula I’ve come up with that’s more forward-looking.

Read the rest of this entry »


Projecting BABIP Using Batted Ball Data

Hi everybody, this is my first post here. Today, I’ll be sharing some of my BABIP research with you. There will probably be several more in the near future.

Now, I don’t know about you, but Voros McCracken’s famous thesis stating that pitchers have practically no control over their batting average on balls in play (BABIP) always seemed counterintuitive to me, ever since I heard it about 10 years ago. Basically, my thought this whole time was that if an Average Joe were pitching to an MLB lineup, the hitters would rarely be fooled by the pitches, and would be crushing most of them, making it very tough on the fielders. Think Home Run Derby (only with a lot more walks). Now, the worst MLB pitcher is a lot closer in ability to the best pitcher than he is to an Average Joe, but there still must be a spectrum amongst MLB pitchers relating to their BABIP, I figured. After crunching some numbers, I have to say that intuition hasn’t completely failed me.

This is going to be a long article, so if you want the main point right here, right now, it’s this: in the long run, about 40% or more of the difference in pitchers’ BABIPs can be explained by two factors that are independent of their team’s defense: how often batters hit infield fly balls and line drives off of them. It is more difficult to predict on a yearly basis, where I can only say that those factors can predict over 22% of the difference. Line drive rates are fairly inconsistent, but pop fly rates are among the more predictable pitching stats (about as much as K/BB). I’ll explain the formula at the very end of the article.

Read the rest of this entry »


On Zach Britton’s “Pitching to Contact” Comments

This past October, David Laurila conducted an interview with Zach Britton, the 23-year-old lefty who just finished up his rookie season with the Orioles. As a highly touted prospect, Britton didn’t put up impressive strikeout totals, but his groundball-inducing heavy sinker allowed him to enjoy much success in the minors. When Laurila asked Britton for his thoughts on his underwhelming major league 1.56 K/BB ratio, Britton responded as follows:

“I know that it could be better, obviously. I’m not going to be a guy who strikes out a ton of people; I’ll never lead the league in strikeouts. And with the movement I have, I’m going to walk guys. That’s something I can improve upon as I get older and more experienced, though. I can learn to make better adjustments… I pitch to contact. If I get a guy 0-2, I’m not necessarily looking to strike him out; I’m looking to get him to hit a ground ball. It’s a mindset. I’m not a huge believer in having to strike guys out in order to be successful. I’d rather keep my defense on their toes and get outs. Most times, when I strike guys out, it’s not on three or four pitches; it usually takes five, six or seven. Pitching to contact allows me to be more efficient.”

My first instinct was to be a bit skeptical of the effectiveness of this “mindset.”  Numerous studies have indicated that is issuing walks, not striking batters out, that ultimately increases pitch count to the point of being “inefficient.” Yet in sabermetric analysis, it is not uncommon to find outliers in these aggregate models — some players simply don’t fit the mold of generally accepted principles. Britton, after all, ought to know his own tendencies better than anyone else.

Read the rest of this entry »


Plugging the Cardinals’ Shortstop Hole

It’s been nine months since the trade that brought Ryan Theriot to St. Louis, and the shortstop picture for the Cardinals is no clearer today than it was then. With their playoff hopes all but officially extinct, the prospect of another offseason spent looking for up-the-middle help looms large.

The trio of players who have garnered playing time at short for the Cards this season have been unimpressive, producing a combined 0.4 WAR in approximately a season’s worth of plate appearances. Theriot is an obvious non-tender candidate, while newly acquired Rafael Furcal will almost certainly have his $12 million option declined and become a free agent at the end of the season. This leaves the Cards with only Tyler Greene as an internal option, and the free agent market for shortstops is about as thin (the obvious exception being Jose Reyes, who the Cardinals have almost no hope of signing if they expect to keep Chris Carpenter and/or Albert Pujols). While the Cardinals will likely either give Greene a shot to hold down the job, or pick up another bargain during the free agency period, I’d like to propose that the Cardinals consider a radical alternative that could provide the team with a definitive edge: Albert Pujols.

Read the rest of this entry »