Utility in a Pinch: Does Versatility Bring an Offensive Benefit?

Mookie Betts or Ronald Acuña Jr.? This was the question 2023 MVP voters were tasked with answering. There were cases for both: Acuña had just completed an incredible, record-breaking 40/70 season, while Betts had something others argued could not be quantified — positional versatility. The 2023 season marked his professional return to significant time in the infield, complementing his esteemed right field defense that had already earned him multiple Gold Glove awards.
Yet, as this debate unfolded, I found myself wondering: How, in the most quantified sport in the world, was there such a gap in our numerical understanding of player value? Surprisingly little research has been done on the value of positional versatility beyond its defensive benefits, which, for the most part, are at least somewhat accounted for in existing metrics. Considering this, with my research, I focused on quantifying versatility in a different way: by assessing its impact on a team’s ability to use pinch-hitters. Positional versatility allows a manager to make more substitutions in favor of stronger hitters off the bench, potentially increasing wins in a way not currently reflected in WAR. Conversely, versatility may have no such effect or may provide only a marginal benefit that does not meaningfully impact wins. Is versatility truly undervalued in today’s analytical climate? Let’s take a look.
To begin the process of gathering data, I used Baseball Reference to collect data from 2021 and 2022, including OPS+, WAR, runs, and roster construction by position, to calculate correlations between offensive indicators and roster versatility. I chose 2021 and 2022 to avoid potential confounding effects from recent rule changes. The 2020 season was impacted by COVID-19, and the introduction of the universal designated hitter rule and roster expansion changed the dynamics of pinch-hitting. Any data from before 2021 could be influenced by these factors in a significant way, confounding the impact of positional versatility on pinch-hit opportunities (PHO). The most difficult aspect of this project was defining versatility. Baseball fans have an intuitive sense of what it means for a player to be versatile: A player who can competently play multiple, meaningfully different positions when needed is versatile.
However, quantifying this is challenging. How does one differentiate the versatility of a player who can play left and right field from one who can play both catcher and center field? The latter is undoubtedly more impressive and rare, but how can this be numerically represented? My initial approach was binary: classifying players as utility players or not based on an arbitrary threshold of positional appearances. However, this approach was ineffective, as it failed to correlate with meaningful statistics such as wins, OPS, or PHO.
I then moved to a broader, non-binary metric by recording the positions per player (p/p) for each team, grouping all outfield positions under a single “OF” category. While there may be some instances where a player’s versatility comes from being able to play a specific outfield position, outfielders are generally more interchangeable than any other position. Thus, grouping them into one position reduces issues with considering a player who can merely play all three outfield positions as a versatile utility player. This approach provided a more general measure of team versatility and yielded a moderate correlation with PHO, with R = 0.4, R² = 0.16, and P = 0.001. While an R-value of 0.4 does not indicate a particularly strong correlation, it does suggest a meaningful relationship between positional versatility and pinch-hit opportunities. The statistically significant p-value below the α = 0.05 threshold further supports that this correlation is unlikely to be due to random chance. PHO, simply defined, is a tally of all pinch-hit appearances recorded in the Retrosheet season logs. Other pinch-hitting statistics referenced in this study also come from Retrosheet data.
To investigate whether specific positions impact PHO, I tallied the number of players at each position while also recording the position replaced in each PHO (denoted as POS-sub). This allowed me to examine whether teams with an excess of a particular position were more likely to substitute players at that position. While certain positions may offer additional value, these are largely accounted for in WAR calculations. Therefore, I focused on whether a team’s roster composition affected PHO at the positional level. However, I found no significant correlation between specific positional roster makeup and pinch-hitting metrics, leading me to conclude that general versatility — rather than the specific positions played — has the greater impact on PHO.
Now that some correlation was established, I considered how best to incorporate these findings into WAR. The relationship between p/p and PHO is likely non-linear, reflecting diminishing marginal returns as players add more positions. Additionally, I needed to transition from a team-level metric (p/p) to an individual-level metric, akin to WAR. To accomplish this, I revisited Baseball Reference and recorded the number of players per team who played n positions. For example, a player who played second base, third base, and shortstop contributed to the team’s 3-pos count. If that same player later played left field, he would be included in the 4-pos tally. This approach allowed me to isolate the impact of increasing each n-pos value on PHO. While I also collected data for players who played five or six positions, the sample size was too small to draw meaningful conclusions, with only one player registering five positions and none registering six.
To model PHO, I ran an OLS regression with PHO as the dependent variable and the number of players in each n-pos category as independent variables:
import numpy as np
import pandas as pd
import statsmodels.api as sm
def fit_pho(df):
X = df[['1pos', '2pos', '3pos', '4pos']]
X = sm.add_constant(X)
y = df['Pho']
model = sm.OLS(y, X).fit()
coeffs = model.params.to_dict()
intercept = coeffs.pop('const')
df['Normalized_Pho'] = y - intercept
return {"Coefficients": coeffs, "Intercept": intercept}
The separate coefficients for each position count account for non-linear differences in their impact. This distinction is important: Under p/p, a team with one player who plays three positions and another who plays two is considered identical to a team with two players who each play two positions. By separating coefficients, this regression helps determine whether these scenarios produce different effects on PHO. I tested several other variations, including a curved regression and a model normalized for roster size, but none outperformed this linear approach.
The final regression equation was:
PHO=276.61+(−1.15×1pos)+(27.90×2pos)+(40.67×3pos)+(54.87×4pos)
With an R value of 0.42 and R² of 0.18, this model explained about 2% more variance in PHO than p/p alone (16% vs. 18%). The coefficients align with expectations, showing increasing impact as n-pos rises, with diminishing marginal returns beyond two positions. The 1-pos coefficient slightly decreases PHO, suggesting that playing only one position may be detrimental to a team’s pinch-hitting flexibility. Meanwhile, playing two or more positions consistently increases PHO. We would expect the benefit from increasing the number of positions to decrease as the count gets higher and higher. However, as previously stated, there was not a large enough sample size to examine the relationships as positions increase beyond four.
Even now that we have a statistically significant relationship, there is still a lack of evidence connecting positional versatility to wins, the statistic of interest here. Converting PHO to wins is difficult because the value of a pinch-hitter is still up for debate. This is why, as you may have noticed in the Python code, I removed the baseline PHO from the regression equation. This was in an effort to normalize PHO to a statistic that could either be negative or positive, more accurately reflecting the negative impact of a lack of positional versatility wins. This helps when incorporating PHO into WAR without giving a boost to every team. While existing research has convincingly shown that hitters do worse when in a pinch-hitting position, that does not mean they do not provide a better chance of success compared to the player that is being substituted. In order to quantify this added benefit, we must determine the difference in wOBA between the typical person who is going to be coming off the bench vs. the person who would have hit. To achieve this, I referenced an article published by Baseball Prospectus on whether having good bats come off the bench offered a benefit. This article’s model offers that a PHO will lead to an average of .017 higher wOBA expected from the pinch-hitter than the previous batter:
(same-hand PH hit value * same-hand PH proportion) + (opposite-hand PH value * opposite-hand PH portion) =
(.011* .22) + (.019*.78) = .01724 wOBA.
Now, because of the difference in years, this is not the most sound method for calculating the wOBA per pinch-hit at-bat. I normalized for the era using wOBA scales and average wOBA to get a RAA value for both 2021 and 2022, but the general looseness of my calculations made me use these numbers as a starting point instead of as a constant. I moved up and down in intervals of .003. The value for wOBA per PHO that worked the best was .035, meaning that it had the most explanatory value with the lowest root mean squared error (RMSE). To incorporate that into a win value that can be added to WAR — what we will call versatility adjustment — we divide the wOBA value by the wOBA scale and the runs per win of the given year, right around 9.97 for 2021 and 9.57 for 2022. Now we have a number for wins per pinch-hit.
The final step is to center PHO around a baseline. This is necessary because otherwise, all teams would gain “wins” from the PHO that is assumed to be a given. This is the same reason why WAR calculations assume a zero-WAR team will still win about 48 games. The only time a team is gaining value in versatility compared to other teams is when they overperform a given baseline. This is why, as you may have noticed in the Python code, I subtracted the y-intercept PHO from the regression equation. This was in an effort to normalize PHO to a statistic that could either be negative or positive, more accurately reflecting the negative impact of a lack of positional versatility wins. This helps when incorporating PHO into WAR without giving a boost to every team for the assumed baseline PHO, regardless of whether there is neither positive nor negative versatility.
With all of that taken into account, we can put together va-WAR and compare its predictive value for wins against Baseball Reference’s current model. Overall, the models performed nearly identically, with the va-WAR having an r-squared value less than 1% greater at equal RMSE values. Ultimately, this indicates that while versatility does offer an increased opportunity for pinch-hitters, the value gained from this is negligible in that it offers almost no predictive power toward wins, and thus has minimal usefulness in WAR calculations.
One possible explanation for this is that while versatility may bring marginal benefit, the variance in versatility between teams is so little that it will not make the difference in a team’s win column. This likely stems from the availability of versatility in the minor leagues and the relative ease of teaching a player a new position, making versatility a non-limited quality. Compared to strong hitters or strong fielders, there is an abundance of versatile players. All of this is not to say that versatility has no value. Some other benefits provided include the less-tangible ability to fill in quickly in the result of an injury to an everyday player, something much harder to quantify, or merely the ability to play difficult positions well whenever needed, which is already reflected in defensive WAR.
In terms of application, this study runs counter to the popular opinion that versatility frees up the roster in a way that brings benefit offensively, leading to more versatile players being undervalued by WAR. Likely, the benefit brought by utility players comes from their defensive ability to play multiple difficult positions well, which is already mostly accounted for in current WAR calculations. No significant evidence has been found that Mookie Betts was undervalued by MVP voters because of a lack of consideration for his positional utility.