Learning a Lesson From Basketball Analytics

I read an interesting article here by Brian Woolley which attempted to adjust batter performance for the quality of pitching they face. It’s interesting because we tend to assume when we look at a player’s performance that they faced more or less the same quality of competition as everyone else, despite the fact we know, especially in small samples, that may not be the case. This is even more evident when we look at minor league performance, where the quality of competition can vary wildly from one prospect to another. How can we discard the assumption of equal quality of competition and try to get a more accurate picture of a player’s performance? In basketball analytics, this quality of competition piece is an even more pronounced issue because of the fact that players are selected to play in specific situations by a coach, unlike the lineup card which dictates when everyone bats.

There is a metric in basketball called Regularized Adjusted Plus-Minus (RAPM) which attempts to value individual players based on their contribution to the outcome of a game while accounting for the quality of the teammates and opponents when on the court. The initial idea in the public sphere came from Dan T. Rosenbaum in a 2004 article detailing Adjusted Plus-Minus (APM). You can read more about the basketball variant in the linked article, but I’ll describe how I adapted it to a baseball context.

To setup the system, I created a linear regression model which takes each player as an independent variable, every plate appearance as an observation, and the outcome of the plate appearance as the dependent variable we’re trying to predict. Specifically, if a player is not part of a plate appearance, the value for their independent variable is a 0, if they are the pitcher, they are a -1, and if they are the batter, they are a 1. Note that for players who appear as both a pitcher and hitter, they are given two independent variables so we can measure their impact on both sides of the ball separately. The outcome of the plate appearance is defined in terms of weighted On-Base Average (wOBA).

However, the wOBA value of a plate appearance is not just dependent upon the pitcher and the hitter, but also the fielders and the ballpark. We could try to include more independent variables, but that would mean creating another independent variable for each player who played the field, which would more or less double our already large model. We could even try creating a unique independent variable for every player and for each position they played to be more accurate, but that is simply infeasible for my computational power. Creating a ballpark variable would be much more reasonable, but I think that without the fielders, the model doesn’t really pass muster.

Instead I took the role of fielding out of the dependent variable. In the case where the ball was put in play, I used the expected wOBA (xwOBA) based on the launch angle and exit velocity created by Baseball Savant, and when not in play, the ordinary wOBA value. I ran this regression using data from the entire 2020 regular season and then again separately with the entire 2019 season.

The scores for the individual players are the estimates of the coefficients in the model, where high coefficients for batters imply that they are able to increase the dependent variable and thus they are a good hitter while accounting for their opponent. For a pitcher, a high estimate implies that they are able to suppress the dependent variable and again means they are a good pitcher while accounting for opponent.

Notice that I keep saying independent variables when talking about the players who face each other in a plate appearance. This is an assumption of linear regression, but we are clearly violating it because players are not randomly assigned to face each other as batters or pitchers. Instead, we have players who are specialized as one or the other organized into teams which play games more frequently against other teams in their division or league, so this is not independence. All this works to create a model with very large variation on its estimates for coefficients. To combat this, we use a regularization technique called ridge regression, which acknowledges the lack of independence between variables as what’s known as multicollinearity and introduces bias to the estimates in order to reduce the variation and ultimately reduce the total error. You can read more about ridge regression here, but I have Jeremias Engelmann to thank for his work introducing it into basketball analytics.

Here’s a link to the code on my GitHub and the complete list of scores derived from the model for 2019 and 2020 split by batter and pitcher. One more thing to note is that scores in small samples can be very misleading, which is obviously true of any statistic. In the face of this, I made a couple of versions of the statistic. Since the model outputs very small numbers for the estimates of coefficients, I scaled them in order to make 100 the league average and 0 the floor for anyone with multiple plate appearances. This number will be the raw RAPM, and it correlates very nicely with statistics which we commonly use to measure performance. Splitting batters for the 2019 and 2020 campaigns gives us 1,564 batter seasons, and the raw RAPM correlates with wOBA for that player season at .852. Doing the same for pitchers gives us 1,561 pitcher seasons and the raw RAPM correlates with FIP- at -.65, which implies that good players score low in FIP- and high in raw RAPM, as expected.

The second metric I derived was what I deem to be RAPM, which regresses raw RAPM back to the mean in proportion to how many plate appearances a player had. The weighting function which does this regression is arbitrary but should follow the principle that an additional plate appearance to a player with a low number of plate appearances should tell us more than an additional plate appearance to a player with many plate appearances already. I went with a log function on plate appearances which follows this pattern.

On the y-axis is the proportion of the raw RAPM, which is not regressed to the mean as compared to the x-axis, which is number of plate appearances. In total, the RAPM score is a weighted sum of the proportion not regressed times the raw RAPM and one minus that proportion times the league average raw RAPM.

RAPM = proportion * rawRAPM + (1-proportion) * league average rawRAPM

With that, we can produce a leaderboard of top performers by our new opponent quality weighted metric for both pitchers and batters. Here are the pitchers first:

RAPM Pitcher Leaderboard
Season Pitcher PA Proportion Raw RAPM RAPM
2019 Gerrit Cole 957 80.6% 123.85 119.27
2019 Liam Hendriks 330 68.1% 127.88 119.05
2019 Kirby Yates 242 64.5% 127.52 117.81
2019 Seth Lugo 310 67.4% 125.26 117.09
2019 Jacob deGrom 805 78.6% 120.48 116.14
2019 Josh Hader 293 66.7% 123.79 115.93
2019 Emilio Pagán 283 66.3% 123.89 115.91
2019 Justin Verlander 994 81.1% 119.39 115.76
2019 Tyler Glasnow 263 65.5% 123.12 115.20
2019 J.B. Wendelken 129 57.1% 126.43 115.17
2019 Aaron Bummer 261 65.4% 122.71 114.91
2019 Jordan Hicks 110 55.2% 126.86 114.91
2019 Tyler Rogers 71 50.1% 129.57 114.90
2019 Max Scherzer 815 78.8% 118.58 114.67
2019 Ken Giles 208 62.7% 123.03 114.51
2020 Devin Williams 200 62.3% 122.94 114.28
2019 Taylor Rogers 283 66.3% 121.21 114.13
2019 Roberto Osuna 295 66.8% 120.93 114.04
2020 Trevor Bauer 608 75.3% 118.16 113.68
2019 Cole Sulser 29 39.6% 134.22 113.64

You’ll notice that almost all of these come from the 2019 season because of the weighting function for regressing players to the mean. You’ll also note a large number of relievers. It may be that relievers and starters should be viewed separately or perhaps that the weighting function should be less logarithmic and down-weight small sample sizes even more. Regardless, I think this is a fair leaderboard and captures some of the best seasons, including Gerrit Cole, Jacob deGrom, Justin Verlander, and Max Scherzer. Now the batters:

RAPM Batter Leaderboard
Season Batter PA Proportion Raw RAPM RAPM
2019 Mike Trout 586 77.3% 181.43 161.70
2019 Cody Bellinger 661 78.7% 169.74 153.76
2019 Christian Yelich 564 76.8% 168.12 151.07
2019 Anthony Rendon 711 79.6% 164.78 150.48
2019 Juan Soto 729 79.9% 164.49 150.46
2019 Mookie Betts 702 79.5% 163.83 149.62
2019 Nelson Cruz 529 76.0% 166.30 149.12
2019 Howie Kendrick 434 73.6% 166.47 147.52
2019 J.D. Martinez 649 78.5% 161.84 147.39
2019 Freddie Freeman 704 79.5% 160.17 146.73
2019 Aaron Judge 485 75.0% 162.42 145.45
2019 Jorge Soler 677 79.0% 158.64 145.21
2020 Freddie Freeman 528 76.0% 158.58 144.52
2019 George Springer 640 78.3% 157.74 144.06
2019 Ronald Acuña Jr. 736 80.0% 155.85 143.62
2019 Yordan Alvarez 429 73.5% 161.04 143.43
2019 Anthony Rizzo 610 77.8% 157.25 143.32
2019 Bryce Harper 672 78.9% 155.27 142.49
2019 Max Muncy 609 77.7% 155.70 142.10
2019 Josh Donaldson 679 79.1% 154.39 141.87

When creating a batting metric in a completely new way, there’s a fear that the methodology might make sense but the results just don’t accord with reality. However, it looks like we passed the ultimate reality check and that is that Mike Trout is indeed the best hitter in baseball. So does this method actually do what it is setting out to do? Is RAPM really a function of just how good a player is and not the quality of their competition? To investigate this, I calculated the average opponents’ FIP- for batters and wOBA for pitchers and used them to try to predict the RAPM along with their own wOBA and FIP-, respectively. Unsurprisingly, the individuals’ own performance statistic was highly significant in trying to predict RAPM in a linear regression, but the opponent quality story is more mixed.

For pitchers, looking at the full dataset, the p-value of average opponent wOBA is 0.006 and has a negative coefficient, which implies that pitchers with weaker opponents are still rated more highly. However, despite regressing RAPM scores to the mean, there still may be reason to remove players from the dataset who have a low number of plate appearances, as these estimates may be poor. Here is the data on how different cutoff points for plate appearances affect the significance level of opponent performance.

What is the optimal cutoff point to balance the validity of the observation and the number of observations in the dataset? I think it’s some relatively low number since we did already regress our scores to the mean, but no exact number is obvious. Since the methodology makes sense in accounting for opponent performance, and a high percentage of the plate appearance cutoff points fail to show the significance of opponent performance to predict RAPM, I think I’ve done a good job creating a metric for pitchers which accounts for opponent performance.

Interestingly, we see the inverse picture with batters. The regression on the whole dataset of hitters has opponent average FIP- p-value for predicting RAPM at .9, which suggests that there’s no evidence to say RAPM is a function of opponent quality which is what we desire. However, by moving the plate appearance cut offs for batters, we see a different story.

This one seems much clearer in that a reasonable plate appearance cut off would imply that opponent performance is a factor in predicting batter RAPM. Moreover, since the coefficients on these regressions are always positive, it suggests that when a batter faces pitchers with a higher FIP- on average, i.e. worse pitchers, their RAPM goes up as well. Perhaps this method is accounting for opponent performance but not heavily enough to account for how important it is for the outcomes of hitters.

Despite the shortcoming in this respect, is the metric still useful at predicting future outcomes? To test this, I compared how well RAPM in 2019 correlated with a player’s 2020 wOBA and FIP- for individual batters and pitchers respectively. I then compared these correlations to the ability of the actual 2019 wOBA and FIP- figures themselves at predicting the 2020 scores to see if the RAPM can more accurately reflect a player’s future than the unadjusted statistics.

For pitchers, a player’s 2019 FIP- correlated with their 2020 FIP- at only around .04 while the 2019 RAPM correlates with 2020 FIP- at -.13, which again means that good pitchers are good in both metrics. This implies RAPM is a better future predictor of FIP- than FIP- itself by about nine points of correlation. Note that these numbers can also be adjusted for how many plate appearances a player has to have in both seasons to be in the data. Here is the difference between the two correlations, where negative values imply that RAPM is a better predictor of FIP-.

For every plate appearance cutoff up to 200 in both seasons, RAPM does indeed perform better, and if you take the average over all cutoff points, RAPM performs on average of three points of correlation better.

When it comes to batters on the full dataset, 2019 wOBA correlates with 2020 wOBA for individuals at a rate of .288, and 2019 RAPM correlates with 2020 wOBA at .372. In this case, the performance gap is about the same, but here is the view for different cutoff points, where positive implies RAPM performs better than wOBA.

This one is even more clear that RAPM was a better future predictor of wOBA than wOBA itself, where the median difference across all cutoff points is 10 points of correlation.

All in all, RAPM does appear to be a very useful metric for understanding player performance while attempting to account for the quality of their opponent. How effective this method is at precisely accounting opponent performance especially for batters is less clear. Regardless of this fact, it certainly is doing something which allows it to strip out some noise in performance demonstrated by its increased ability to predict the future of a player over less nuanced methods.

And all this from basketball analytics. While baseball may have been the first to the party in terms of using advanced statistical methods to understand the game, there are many sports analytics communities which can inform our research in 2021. Understanding more intricate and fast-moving games is a challenge for other sports but one that many have caught on, which has allowed them to implement really complex yet elegant solutions. Ultimately, I recommend branching out, as you may find that you’ll learn a little bit even about the game you thought you weren’t thinking about.

newest oldest most voted
Green Mountain Boy
Green Mountain Boy

Excellent and informative work! I have a couple of questions which are kind of related to each other:

1. How does recent individual performance factor in? In other words, how does being on a hot or cold streak affect the numbers, if at all?
2. How sticky are the numbers year to year? Or, put another way, how many individual plate appearances (or batters faced) before one can confidently assert the true skill level of any player?