Looking at bWAR’s Defensive Adjustment for Pitcher WAR
In 2018, the Philadelphia Phillies defense combined for -93 DRS, costing the team over half a run per game compared to an average defense. Though the pitching staff combined for a 3.83 FIP (seventh in MLB), defensive mistakes led to a middling 4.49 runs against per game. Much of the rotation underperformed their peripheral statistics, with Nick Pivetta and Vince Velasquez each having ERAs that exceeded their FIPs by more than a run. Against this backdrop, Aaron Nola’s performance was miraculous: a 2.37 ERA (fourth in MLB) over 212.1 innings pitched. Nola’s elite run prevention despite Philadelphia’s historically bad defense resulted in 10.2 bWAR, the 2second-best single season among all currently active pitchers.
According to FanGraphs, however, Nola’s performance was more All-Star-worthy than historic, as his 3.01 FIP notched him just 5.5 fWAR. Perhaps this case merely illustrates why one should opt for fWAR over bWAR and FIP over ERA; however, I don’t believe that bWAR is beyond salvaging, and philosophically I believe that to assess a pitcher’s value to a team we must examine the entirety of his work rather than just balls not in play. What this case illustrates is the need to rethink bWAR’s defensive component, which assumes that all pitchers are equally affected by a team’s good or bad defensive performance. This is obviously not the case. Defense is capricious, and a few bad or great plays behind a pitcher can cost or save several runs.
Thanks to Statcast’s Outs Above Average, which allows us to isolate a defense’s performance behind a specific pitcher, we can develop a new defensive adjustment that avoids outlier performances like Nola’s while not simply ignoring quality of contact on balls in play.
A Primer on Baseball-Reference’s Pitching WAR
Wins Above Replacement is a black box to many fans, but the calculations that go into Baseball-Reference’s WAR for pitchers are straightforward. First, one calculates how many runs per game an average pitcher would give up in the same situation as the given pitcher:
RA9avg = PPFp/100 * (oppRA9 – RA9def + RA9role).
PPFp is the three-year park factor for the given pitcher weighted by batters faced in each park. oppRA9 adjusts for the quality of competition by estimating how many runs opposing batters average per nine innings. RA9role adjusts for whether a pitcher is a starter or reliever, as relievers are generally more dominant.
The most important component here for this analysis is RA9def. To find this number, a team’s defensive runs saved (DRS) is divided by the number of team innings then scaled to give the number of runs saved per nine innings. A positive RA9def indicates an above-average defense; a negative RA9def indicates a below-average defense. This component assumes that defensive performance is uniform among a team’s pitchers, which again clearly is not the case and is what this article attempts to address.
We now have an estimate of how an average starter or reliever would perform against a given set of opponents within a given offensive environment. We then compare the pitcher’s performance to this theoretical average pitcher’s performance over the number of innings pitched to get his Runs Above Average:
RAA = IP*(RA9avg – RA9)/9.
There are also adjustments for the new extra innings rule and high-leverage situations for relievers, but this is the main calculation. From here, we convert RAA to Wins Above Average through a “Pythagorean” formula called PythagenPat, which generally shows a run to be worth 1/10 to 1/8 of a win. Here I will set the value of a run to .11 wins, which is close to the true value this season. Finally, we add the value of an average player over a replacement player (around two wins over a full season) to get the player’s WAR.
Fixing the Defensive Adjustment
In Nola’s case, the defensive adjustment vastly inflated his bWAR in 2018. The Phillies defense cost a pitcher on average 0.61 runs per game, but Nola gave up only a .251 BABIP (tied for fourth-lowest in baseball), suggesting that the Phillies defense made far fewer mistakes behind Nola than for the pitching staff as a whole.
Unfortunately, we do not have access to the number of defensive runs saved while a pitcher is on the mound; however, Statcast allows one to filter its Outs Above Average metric for a given pitcher to see how the defense actually performed behind him. OAA estimates the percentage of balls hit toward a player that would be converted into outs by an average fielder and compares this to the percentage actually converted. We can accumulate these results for all fielders in all innings when the pitcher is on the mound to estimate the number of outs saved by the defense for a given pitcher. Not all outs are equally valuable, but Statcast handily estimates the fielding runs prevented (FRP) from these outs. Replacing DRS with FRP in the pitcher bWAR calculation is straightforward: we simply replace the RA9def component with FRP scaled to nine innings.
Results (2021)
The following tables display the largest improvements and declines in bWAR when substituting FRP for DRS for the first half of the 2021 MLB season. DRS adjustment refers to team DRS prorated to the player’s innings to be on the same scale as FRP. Please note that some of these results may be off by a rounding error, as I am pulling numbers from publicly available tables on Baseball-Reference.
Player | DRS Adjustment | Fielding Runs Prevented | fWAR | Old bWAR | New bWAR | Change in bWAR |
---|---|---|---|---|---|---|
Germán Márquez | 6.62 | 0 | 2.8 | 2.8 | 3.6 | 0.8 |
Adrian Houser | 4.14 | -3 | 0.5 | 0.5 | 1.3 | 0.8 |
Antonio Senzatela | 7.13 | 1 | 1.7 | 0.1 | 0.9 | 0.8 |
Nathan Eovaldi | 2.53 | -4 | 3.4 | 2.5 | 3.3 | 0.8 |
Jon Lester | 3.80 | -4 | -0.1 | -1.4 | -0.6 | 0.8 |
Chi Chi González | 6.17 | 0 | 0.3 | -0.2 | 0.6 | 0.8 |
Ryan Yarbrough | 7.92 | 1 | 1.0 | -0.4 | 0.3 | 0.7 |
Brett Anderson | 2.89 | -3 | 0.3 | -0.1 | 0.6 | 0.7 |
Yu Darvish | 2.10 | -4 | 2.6 | 2.1 | 2.7 | 0.6 |
Chris Flexen | 1.85 | -4 | 1.9 | 1.4 | 2.0 | 0.6 |
No pitcher has been more unfairly penalized this season by using team DRS than Germán Márquez, who has been nearly a full win better this season than bWAR indicates. While the Rockies have been the second-best defense in MLB by DRS, they have been merely average with Márquez on the mound according to Outs Above Average. The story is similar for the rest of the group. Each pitcher plays in front of an above-average defense by DRS, but none of the pitchers has received a commensurate benefit while on the mound.
Player | DRS Adjustment | Fielding Runs Prevented | fWAR | Old bWAR | New bWAR | Change in bWAR |
---|---|---|---|---|---|---|
Zack Wheeler | -5.45 | 5 | 4.5 | 4.8 | 3.7 | -1.1 |
Yusei Kikuchi | 1.64 | 8 | 1.0 | 2.1 | 1.4 | -0.7 |
Shohei Ohtani | -1.19 | 4 | 1.5 | 1.9 | 1.3 | -0.6 |
Adam Wainwright | 1.29 | 7 | 1.5 | 1.4 | 0.8 | -0.6 |
Aaron Nola | -5.07 | 0 | 2.3 | 1.3 | 0.7 | -0.6 |
Jameson Taillon | -1.65 | 3 | 0.9 | 0.8 | 0.3 | -0.5 |
Casey Mize | -4.34 | 0 | 0.9 | 2.8 | 2.3 | -0.5 |
José Ureña | -4.19 | 0 | 0.0 | -0.7 | -1.2 | -0.5 |
Spencer Turnbull | -2.22 | 2 | 1.5 | 1.4 | 0.9 | -0.5 |
JT Brubaker | 0.10 | 4 | 0.6 | 0.9 | 0.5 | -0.4 |
As in 2018, Nola has not been hurt by his team’s poor defense; however, his teammate Zack Wheeler is by far the biggest beneficiary of Baseball-Reference’s use of DRS. While bWAR assumes that the Phillies’ defense has been terrible behind Wheeler, in reality it has been well above average, resulting in more than a full extra win of value attributed to Wheeler.
With these adjustments, we can calculate our new OAA-adjusted bWAR leaderboard for 2021.
Player | fWAR | Old bWAR | New bWAR |
---|---|---|---|
Kevin Gausman | 3.4 | 4.8 | 4.6 |
Brandon Woodruff | 3.3 | 4.4 | 4.6 |
Wade Miley | 2.4 | 4.2 | 4.4 |
Jacob deGrom | 4.8 | 4.4 | 4.4 |
Gerrit Cole | 3.3 | 4.3 | 4.1 |
Kyle Gibson | 2.0 | 4.2 | 4.0 |
Robbie Ray | 1.5 | 3.3 | 3.8 |
Zack Wheeler | 4.5 | 4.8 | 3.7 |
Germán Márquez | 2.8 | 2.8 | 3.6 |
Lance Lynn | 2.5 | 3.4 | 3.4 |
Results (2019-2021)
Statcast provides statistics on OAA and FRP for pitchers for each season since 2019, so we can aggregate these statistics to see who is affected the most by tailoring bWAR’s fielding adjustment to a defense’s actual performance behind a pitcher.
Player | DRS Adjustment | Fielding Runs Prevented | fWAR | Old bWAR | New bWAR | Change in bWAR |
---|---|---|---|---|---|---|
Walker Buehler | 20.94 | -4 | 7.8 | 5.4 | 7.9 | 2.5 |
Trevor Bauer | 15.20 | -6 | 7.3 | 7.1 | 9.4 | 2.3 |
Clayton Kershaw | 21.36 | -2 | 7.6 | 6.5 | 8.8 | 2.3 |
Shane Bieber | 13.93 | -3 | 11.1 | 10.8 | 12.7 | 1.9 |
Robbie Ray | 10.00 | -6 | 3.5 | 4.5 | 6.2 | 1.7 |
Zack Greinke | 22.40 | 6 | 9.1 | 8.5 | 10.2 | 1.7 |
Wade Miley | 10.54 | -5 | 4.4 | 5.7 | 7.4 | 1.7 |
Julio Urías | 13.72 | 0 | 4.9 | 3.3 | 4.7 | 1.4 |
Adam Wainwright | 16.33 | 3 | 4.8 | 3.3 | 4.6 | 1.3 |
Ross Stripling | 9.17 | -4 | 1.7 | 1.5 | 2.8 | 1.3 |
Since the start of 2019, Walker Buehler has been the most undervalued pitcher by bWAR, as the Dodgers defense has prevented about 25 fewer runs behind him than bWAR implies. Adjusting bWAR for OAA puts his valuation in line with that of fWAR as one of the 20 most valuable pitchers in baseball. Of the 10 pitchers in this leaderboard, half have pitched at least in part for the Dodgers. This is unsurprising, as the Dodgers rank second in baseball in DRS since 2019 but are 15th in FRP at an even 0. The goal here is merely to isolate performance behind a pitcher, but large differences between DRS and FRP for a team may result in significant changes as well.
Player | DRS Adjustment | Fielding Runs Prevented | fWAR | Old bWAR | New bWAR | Change in bWAR |
---|---|---|---|---|---|---|
Zack Wheeler | -21.43 | 4 | 11.2 | 11.6 | 9.0 | -2.6 |
Yusei Kikuchi | -7.55 | 11 | 2.4 | 2.7 | 0.8 | -1.9 |
Jacob deGrom | -9.00 | 8 | 14.4 | 15.1 | 13.4 | -1.7 |
Lucas Giolito | -4.30 | 8 | 9.1 | 7.7 | 6.4 | -1.3 |
Lance Lynn | -4.20 | 6 | 10.5 | 13.1 | 11.9 | -1.2 |
Spencer Turnbull | -10.36 | 0 | 5.8 | 4.7 | 3.6 | -1.1 |
Erick Fedde | -1.58 | 9 | 0.5 | 2.3 | 1.2 | -1.1 |
Tyler Chatwood | -0.12 | 9 | 1.1 | 1.2 | 0.2 | -1.0 |
John Means | -9.53 | -1 | 4.8 | 9.0 | 8.1 | -0.9 |
Junior Guerra | 1.78 | 10 | 0.1 | 2.3 | 1.4 | -0.9 |
For some reason, Philadelphia’s poor defense never seems to be a problem behind Wheeler, who has garnered an extra 2.6 bWAR as a result. I do not have a good explanation for why Philadelphia has performed so much better behind Nola than it has on average. Both DRS and OAA rate Philadelphia as one of the worst defenses in baseball since the start of 2019. Wheeler earns a high percentage of ground balls, but Philadelphia’s infield defense (-29 FRP since 2019) has been far worse than its outfield defense (-4 FRP since 2019). Perhaps a deeper dive would better illuminate what is driving this disparity, but it may just be random variation.
With these adjustments, we can calculate our new OAA-adjusted bWAR leaderboard for 2019-2021.
Player | fWAR | Old bWAR | New bWAR |
---|---|---|---|
Gerrit Cole | 12.1 | 13.2 | 13.6 |
Jacob deGrom | 14.4 | 15.1 | 13.4 |
Shane Bieber | 11.1 | 10.8 | 12.7 |
Lance Lynn | 10.5 | 13.1 | 11.9 |
Zack Greinke | 9.1 | 8.5 | 10.2 |
Max Scherzer | 10.8 | 9.6 | 9.8 |
Brandon Woodruff | 8.6 | 9.5 | 9.7 |
Trevor Bauer | 7.3 | 7.1 | 9.4 |
Sonny Gray | 7.5 | 9.2 | 9.4 |
Zack Wheeler | 11.2 | 11.6 | 9.0 |
Conclusions
OAA provides a powerful tool to lend more legitimacy to RA9-based evaluation of pitchers. This OAA-adjusted bWAR is intended to estimate a pitcher’s value retrospectively. There is an element of luck involved in OAA-adjusted bWAR that is perhaps less reflective of true talent than fWAR, but we are only attempting to measure value to a team in a given year. In the short term, we should not expect it to be more predictive of a pitcher’s future value than FanGraphs’ FIP-based WAR, as of course FIP is more predictive of ERA than ERA itself in smaller samples.
Unfortunately, Outs Above Average is only available for pitchers from 2019 onwards, so we cannot accurately assess Nola’s 2018 season with this method — but perhaps by using OAA we can avoid such outlier valuations in the future.
BREF Pitcher WAR is based on a framework that compares what the pitcher did vs an Average Pitcher in that league within the same environment (same competition, same park, same fielders behind him). What you are doing is isolating the performance of those fielders to what they did when the pitcher was on the mound, but that is not necessarily what would happen if an Average Pitcher was on the mound. The only issue I have with BREF defense adjustment is that it should prorate the adjustment to the average Balls in Play of an Average Pitcher, not the Balls in Play of the Actual Pitcher. If you dont do this you are comparing a high strike out pitcher that would have lower BIP to an Average Pitcher with the same lower BIP. My adjustment would use Average BIP distribution (flyballs, linedrives and groundouts) and use the actual Statcast OAA to that average distribution. Then you will have a true comparison to the Average Pitcher in the same environment
I agree that the bbref defensive adjustment should prorate to balls in play (though obviously for the OAA-based adjustment here that isn’t necessary). I don’t think that what you are suggesting is any closer to what would happen when an average pitcher is on the mound. An average pitcher would not give up the same batted ball profile. What I am more interested in is isolating the performance of pitcher from the defense’s performance on his balls in play. By using OAA, we can see how much credit to assign to the pitcher vs. a fielder for outs. Obviously this incorporates some luck, but I’m less interested in true talent level than past outcomes.