Adjusting Batter Performance by the Quality of the Opposing Pitcher

February 12, 2021

In the 2020 season, American League MVP José Abreu faced 107 different pitchers, including the top four in Cy Young voting point totals (Shane Bieber, Trevor Bauer, Yu Darvish, and Kenta Maeda). Bauer was the only of the four not to allow a home run to Abreu in 2020. In comparison, MVP Runner-up José Ramírez faced 69 of the pitchers that Abreu faced. The third-place DJ LeMahieu faced a completely different set of pitchers, not a single one overlapping with Abreu’s.

While these batters were compared by their offensive production, it appears Abreu faced more challenging pitching. Using FanGraphs’s xFIP- (for which a lower number is better) as a measure of a pitcher’s quality, Abreu was up against a 96.75 xFIP- on average while LeMahieu faced pitchers with at a 105.93 mark. Both LeMahieu’s weighted on-base average (wOBA) of .429 and Abreu’s .411 were exceptional, but is the 18-point difference truly reflective of the difference between the two players’ seasons?

Overview

To answer the question, I derived a value with a similar intuition to Baseball Prospectus’s Deserved Run Average (DRA). DRA is a measure that adjusts a pitcher’s performance by the quality of the batters they are facing. This statistic also accounts for numerous context factors to attempt to better isolate the pitcher’s contribution. DRA shows that the quality of the batter can be influential in a pitcher’s performance, so it makes sense that the quality of pitcher is influential in a batter’s performance.

As for the statistic I will be working with, I choose to refer to this as “pitcher-adjusted weighted on-base average,” or pwOBA. The intuition is simple: a batter should get credit for offensive production against challenging pitching. The formula for pwOBA is based on the formula for wOBA. With wOBA, every event has a run value (ex. 1.979 for home runs in 2020) and a batter gets credit for these values accumulated over the course of the season. The sum of these values is then divided by (AB + BB – IBB + SF + HBP).

In the case of pwOBA, both the run values and the denominator are adjusted by a weight associated with the pitcher. This means a player facing more challenging pitching will have a larger denominator but has more opportunity to increase their numerator. This process allows for plate appearances against tougher pitching to be more influential on the player’s pwOBA. Without adjusting the denominator, much of the results are reflective of the level of competition that the player faced that year. Now the results can identify players who were able to perform well in the opportunities to face above-average pitchers.

The next challenge is creating a method with which to produce the weights. Many methods work for this, but I tried to stick by two rules throughout. First, an average pitcher should have a weight of 1, no matter the statistic used to determine what an average pitcher was. Second, no weight can be negative. Even if the opposing pitcher was so bad that they gave up a home run on every single pitch they threw, no batter should lose value by producing on offense. This would completely undermine the point of wOBA.

I used data from FanGraphs and Baseball Savant, scraped using Bill Petti’s ‘BaseballR’ package. I chose to weight based on xFIP- since it was a simple first iteration given that it is based on a pitcher’s relation to the average. I also liked the statistic’s lack of reliance on defense and ability to account for ballpark and other potentially confounding variables. Unlike counting stats such as wins and strikeouts that overvalue starters with more opportunities, xFIP- is susceptible to large variation for low sample sizes. This problem will be addressed later. Recall that the interpretation of xFIP- is based on whether it is higher or lower than 100. In response, the derivation of the weights is a piecewise function with one formula when xFIP- is greater than 100 and another when it is less than 100.

Weighting Function

Since a higher xFIP- means a worse pitcher, the weight should be less than 1 when xFIP- is greater than 100. Given that xFIP- could theoretically be infinitely high (0 IP) and the weight can never be negative, we want the function to approach 0 as xFIP- approaches infinity. The function I chose is (100/xFIP-)². This satisfies the requirement that an average pitcher has a weight of 1 and is never negative, since this is only applied when xFIP- is above 100. If we consider a pitcher with an xFIP- of 150, meaning they are 50% worse than an average pitcher, the weights given to hits off this pitcher are multiplied by 0.44. On the other hand, the function to produce weights when xFIP- is less than 100 is (2 – xFIP-/100)². Dating back to 2002, the lowest qualified xFIP- was 45 (2020 Shane Bieber). This means that the highest weight for a qualified pitcher is 2.4. While the function allows for extremely high weights, there have never been qualified xFIP- scores so low that this is a concern.

These functions can be adjusted in numerous ways, but the more spread out the weights are, the further a player’s wOBA and pwOBA can become. The first iteration of this analysis did not square the formulas, but the variation in weights was not much. In future analyses, I will find the best weighting functions and optimal exponent by optimizing the relationship between pwOBA of one year to performance in the future. This becomes a balancing act, as weights with more variability will help distinguish batters but have the potential to overstate the difference between the difficulties of hitting various pitchers. The goal should be to roughly capture how much more difficult it was to hit against a pitcher, so extreme weights may be incorrect.

The challenge with this weighting function is that the fewer batters a pitcher has faced, the more volatile their xFIP- is. For instance, Tampa Bay Rays position player Mike Brosseau had an xFIP- of -64, the best mark of 2020. Brosseau only faced one batter and struck him out. With no disrespect to Brosseau, I suspect that given more pitching opportunities, he would not strike out every batter. The scatterplot below shows the variability that low sample sizes create in xFIP-.

xFIP- vs. Batters Faced (2018 – 2020)

As pitchers face more batters, the variability in xFIP- decreases considerably. It makes sense that batters faced and xFIP- show a slight negative correlation as we would expect better pitchers to pitch more often. One option is to have batters faced included in the weighting formula to adjust for the variability concern, but this would give serious advantage to starting pitchers. While an end-of-the-rotation starter will have more innings pitched than a top-tier reliever, this innings difference is not necessarily reflective of a difference in ability. Splitting by pitching classification (starter, closer, opener, etc.) is also a challenge now that pitchers today do not always fit into easily identified groups. Of the 2,365 pitchers analyzed since 2018, 594 (25%) made both starting and relief appearances.

Instead of weight adjustments by batters faced, I chose to have a threshold at which the weighting formula is applied. Any pitcher with too few batters faced will have a weight of 1, as there is insufficient data to suggest that their xFIP- is reflective of their pitching ability. While there is no distinct number of batters faced at which we become certain of a pitcher’s ability, the standard deviation of xFIP- appears to level out around 110 batters faced. This was found by splitting pitchers from 2018 to 2020 into 50 groups by batters faced, where we see the standard deviation of these groups falling below 25 after 110 and staying consistently below 25 afterwards.

Variability of xFIP- vs. Batters Faced

A plate appearance against a pitcher will have a weight of 1 unless the pitcher had 110 or more batters faced that season. While a pitcher with fewer than 110 may be particularly good, there is not enough data to be confident about their ability, so we will assume they are average. Such a process is not a problem when the pwOBA is calculated retroactively but would not work in-season. I am okay with this result considering the goal is to reflect a pitcher’s ability with a level of certainty, and this certainty will not be reached early in the season.

This did provide a challenge in 2020 with such a small sample size, as only 203 pitchers reached 110 batters faced. Some major pitching threats did not reach this threshold, including Devin Williams and Liam Hendriks. Fortunately, the 110-batter threshold is more commonly met in a normal year — 502 pitchers did it in 2019 and 495 in 2018.

The process of forcing a weight of 1 to pitchers with few batters faced slightly skewed the distribution of weights. In 2020, rather than being centered at 1, the mean weight is 1.08. To adjust for this, the weights were all divided by 1.08 to force a center of 1. This results in a weight of 0.93 for pitchers who would have previously had a weight of 1, but this allows for the scale of pwOBA to be the same as wOBA. The same process was applied to each year. This changes the assumption that underqualified pitchers are average, but the difference is negligible and the slight lowering accounts for the negative correlation between batters faced and xFIP-. This results in a median pwOBA of 0.337, just slightly lower than the median wOBA of 0.338. The graph below shows just how similar the distributions of pwOBA and wOBA are, meaning that the two can be interpreted on the same scale.

Distribution of wOBA and pwOBA (2020)

Analysis

As interesting as it would be to consider individual at-bats, the best part about pwOBA is seeing how it affects players across a season. As for José Abreu (10th place in wOBA), his rank falls just barely to 12th once we account for the pitcher’s ability. We do see some major shake-up among the top 10, with four players falling out and Nelson Cruz dropping all the way from 9th to 21st.

2020 wOBA Leaders

Name	wOBA Rank	wOBA	pwOBA Rank	pwOBA
Juan Soto	1	.478	1	.458
Freddie Freeman	2	.456	2	.442
Marcell Ozuna	3	.444	3	.434
DJ LeMahieu	4	.429	10	.405
Jose Ramirez	5	.415	6	.413
Trea Turner	6	.413	15	.394
Ronald Acuña Jr.	7	.413	14	.396
Dominic Smith	8	.412	8	.410
Nelson Cruz	9	.411	21	.386
José Abreu	10	.411	12	.404

Another fascinating portion of the results is in who saw major differences between their wOBA and pwOBA. Notably, Mike Yastrzemski (+0.025), Anthony Rendon (+0.021), Anthony Rizzo (+0.020), and Max Kepler (+0.017) saw the largest increases among qualified players. On the other hand, the lowest drop was Brandon Lowe (-0.026), followed by Franmil Reyes (-0.025) and Cruz (-0.025). Also, three players had the exact same wOBA and pwOBA: Bryce Harper (0.400), Brandon Crawford (0.334), and J.D. Martinez (0.290). It would be worthwhile to examine how different weighting functions adjusts the differences between pwOBA and wOBA, but alternative possibilities are saved for another analysis.

The difference in competition levels is more prevalent in 2020 due to the format of the schedule and the lower number of games. Looking at 2019, the difference between player’s wOBA and pwOBA is slightly lower (as expected across more than 2.5x as many games), but the difference is still relevant. Like 2020, we can see a slight shifting around within the wOBA leaderboard.

2019 wOBA Leaders

Name	wOBA Rank	wOBA	pwOBA Rank	pwOBA
Christian Yelich	1	.442	1	.431
Mike Trout	2	.436	2	.430
Alex Bregman	3	.418	3	.408
Nelson Cruz	4	.417	8	.391
Cody Bellinger	5	.415	4	.408
Anthony Rendon	6	.413	5	.403
Ketel Marte	7	.405	9	.390
George Springer	8	.400	11	.387
Juan Soto	9	.394	6	.395
Nolan Arenado	10	.392	16	.378

Conclusion

Overall, while this was a simple analysis, it is apparent that adjusting for a player’s opponent yields results that are both interesting and informative. Seeing differences between wOBA and pwOBA confirms the suspicion that pitcher quality is a relevant factor in the batter’s success. As for LeMahieu and Abreu, the difference between the two players appears overstated by merely observing wOBA. LeMahieu still performed at an elite level even after adjusting for pitcher quality, but the gap in pwOBA falls to a single-point differential. While the MVP voters still chose Abreu, the introduction of pitcher-adjustments has the potential to adjust perception of key players.

This analysis examines just one simple way in which to calculate pwOBA, though there is room for improvement. While I used this piecewise function to derive the weights, there are many other functions that work. This function creates weights with slight variation from the mean. More variation in the weights will create greater differences between wOBA and pwOBA, while little variation will have a small difference between wOBA and pwOBA. Another adjustment could be to use other measurements than xFIP-. A few notable possibilities are WAR, wOBA allowed, and ERA-. Another useful adjustment would be to introduce context factors such as ballpark factor, further isolating the batter’s pitcher-adjusted offensive production. Additionally, a classification algorithm can divide pitchers into groups that can be included in the weight. Lastly, the weights could account for a pitcher’s performance on a rolling scale. This would account for the fact that a pitcher’s performance may not be constant throughout a season, though such a process would face the sample size problem once again.

The concept of weighting results by the quality of the opponent has numerous other applications across baseball. For example, we can apply the same weighting function to homeruns in 2020, and the results are still quite interesting. The largest changes from HR to pitcher-adjusted HR are -3.64 for Corey Seager and -2.54 for LeMahieu. The largest increase was 2.12 for Abreu, followed by Kepler with +1.59 and Jesús Aguilar +1.56. I can also imagine this concept being applied to WAR, given the fact that wOBA is included in the WAR calculation formula. This concept has already been applied to pitching with Baseball Prospectus’s DRA, but batter-adjusted performance can be applied to many already known pitching statistics.

As for how pwOBA can be utilized, there are many possibilities to examine. My next step is to see if pitcher-adjusted statistics have any increased predictive power. It would take further analysis, but I am thinking optimistically. First, I am curious to see if predictive power increases at the major league level. Next, I am interested in the concept’s applications to amateur and minor league prospects. Lastly, I would like to see if pitcher adjustments can isolate players who perform better in the postseason relative to their peers. These are all avenues with important implications, and I welcome any suggestions or potential ideas on adjusting my methods. Thank you for reading and I hope you found it as interesting as I did!

5 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

weekendatbidens

4 years ago

Thank you for the read! It goes to show the depth of figuring, tinkering, and work necessary for these projects is impressive. One thing I’m glad you’ve addressed at the end: “another useful adjustment would be to introduce context factors such as ballpark factor, further isolating the batter’s pitcher-adjusted offensive production,” as an area from the outset I had concerns was missing, considering AL-East pitchers have run-boosting parks.

One other thing that does gives me pause about the data. It pertains to whether the xFIP- for 2020 contains information that delineates whether the quality of hitters in each division is intra-divisionally controlled. If this isn’t controlled, I’m worried that it would amplify any bias relating to the quality of hitters by boosting pitchers divisionally and therefore boosting those same hitters. Any clarification would be really helpful so I can thoroughly comprehend and have the confidence to utilize these conclusions.

Bryan Woolley

Reply to weekendatbidens

Thank you for your response! I completely agree, there is definitely a lot of tinkering that this project needs before it can be a trustworthy metric. The ballpark factor is one of the more important adjustments I need to fix. Even more important I think is your second point about the use of xFIP- for pitchers. Just like hitters’ performances appear to depend on the quality of the pitcher, I am sure that the pitchers’ performances are dependent on the quality of hitters they are facing. The challenge with this then is that the problem is cyclical, but I believe a MCMC algorithm could be designed to get around this problem. This is something I am emphasizing as I work to improve this concept, so thank you for your notice of the problem. I hope that these adjustments maintain or even amplify the difference between wOBA and pwOBA, but it is completely possible that such adjustments narrow the difference and diminish the relevance of this concept. For now I think that pwOBA and the structure of the analysis reveal that adjusting for the quality of competition is a potentially relevant concept, but the exact values of pwOBA might not be ready just yet. Thank you again for your response and help!

Erik Larsen

I like the approach and loved the idea. Tough to address in the anomaly of 2020, but these results make sense. The pwOBA density is very interesting (more guys below .300, fewer above .375).
The batters faced issue is tricky. Would love to dig into optimizing weights. Also though, how to apply to pitchers who don’t face 100 batters in a season (i.e. most relievers or guys shuttled from the minors) and/or appropriately optimize those weights. This could also dig at pitching quality within an appearance.

patsen29Member since 2018

Reply to Erik Larsen

Maybe it could be the modern answer to MLEs. Instead of having broad league adjustments, you can actually compare players across levels directly.

The anomaly of 2020 is difficult, but I think it was really interesting for the analysis given the structure of the schedule. Since not every team played each other the differences in quality faced was more drastic than any other year. As for the pitchers who don’t face 100 batters, someone made a really interesting suggestion about using preseason projections. These are ideally a good estimate of the player’s talent level and avoids abnormal performances.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG