When 1 + 1 Doesn’t Equal 2

By Bryan Woolley, JP Wong, and Nick Skiera.

Baseball, like all sports, is exciting because of the concept of variance. No team scores the exact same number of runs every game. That is why the Dodgers (5.82 runs/game) were not 60-0 in 2020. Runs per game strongly correlates with winning percentage for obvious reasons, but a team’s variance (essentially their consistency) plays a crucial role in their ability to win baseball games

Relating to this, we came across an interesting game theory concept. Given certain properties of the run-scoring distributions, the competitor with the lower output can increase their win probability by increasing the variance in their output. Conversely, the competitor with the higher output can increase their win probability by decreasing the variance in their output. Were this to apply to baseball, lower-scoring teams could win more games by becoming more inconsistent. Of course this is all just in theory, so the requirements for it to be relevant in reality to baseball might not be met.

We will examine the importance of variance in baseball both to test the theory and to attempt to uncover interesting trends in the sport. In our analysis we find that variance plays a significant role in a team’s success, suggesting that roster and lineup construction can be optimized by going beyond mean production. So as our title proposes, 1 WAR + 1 WAR and 2 WAR might not always be worth the same amount to a team if they are produced with different consistencies.

To test the importance of variance, we fit two logistic regression models. The data included single-season values for each of the 30 teams for the 2018, 2019, and 2020 seasons. The first response variable is a binary indicator variable of whether the team won the game. The predictor variables are the team’s mean runs scored across the season, mean runs allowed, variance in runs scored, and variance in runs allowed. We also included an indicator variable denoting whether the team has a higher mean run differential than their opponent. This indicator variable is then also interacted with the variance terms to attempt to account for the proposed theory. The coefficients and p-values are included in the table below, but note that the response variable is the log odds of victory, so the coefficients do not have a linear relationship with win probability.

MLB 2018-2020
Variable Coefficient P-Value
Mean Runs Scored .2734 <.001
Mean Runs Allowed -.2954 <.001
Variance in Runs Scored (VRS) -.0327 .046
Variance in Runs Allowed (VRA) .0343 .004
I{Higher Run Differential} .6342 .024
VRS * I{Higher Run Differential} .0046 .807
VRA * I{Higher Run Differential} -.0091 .618

A couple things to note here. First, the coefficients for the mean values are significant and have unsurprising signs. The values suggest that when a team scores more runs on average they are more likely to win, and when they give up more runs on average they are less likely to win. We also find that the two variance terms are statistically significant, although the .046 for VRS has us concerned about multiple hypothesis testing. We do not utilize any model selection techniques yet, as this will come when we apply these concepts to specific players. The negative value for VRS suggests that the more inconsistent a team’s offense is, the less likely they are to win the game. Conversely, the positive value for VRA suggests that having an inconsistent pitching staff is positively correlated with a team’s probability of winning.

The intuitive interpretation of these results is the understanding that a similar subset of hitters plays every day for an offense, but a different starter pitches each day. Therefore successful teams can expect strong outings from their Nos. 1, 2, and 3 starters each week and conversely expect to allow more runs from the bottom of their rotation.

However, scoring runs must be a lot more consistent in order to get wins. Since mostly the same eight or nine players stay in the lineup each day, there is no accounting for top-heavy output. Over the course of the season, teams must stay relatively consistent in run production in order to support their starters when they pitch well and maximize their wins.

The final three terms are the ones intended to account for the game theory we discussed previously. The indicator was included on its own as a control to ensure that the coefficients for the interaction terms were not picking up on the fact that the team was already expected to be likely to win the game. The insignificance of the interaction terms appears to suggest that the theory does not apply in practice. While a team’s variance is important, the importance is not dependent on the team’s status as the better or worse team. The theory could be unsupported by reality based on the shape of run-scoring distributions, the magnitude of the variance, and/or the slight difference in mean run scoring.

Another possibility is that the theory does not apply at a per-game level but shows up across the span of a season. With that in mind, we use a logistic model once again but this time we have the response variable for season win percentage. This means that the response variable is no longer a binary indicator variable but is instead now a percentage between 0 and 1. The same predictor variables were included in this model with the exception of the indicator for whether the team had a higher run differential, as this doesn’t apply in the case of this response variable. Instead, the analogous variable we use is the mean’s average run differential, as we would expect a team with a positive value to be favored on average.

This model does not show anything drastically different from the previous, as the interaction terms are insignificant and the mean/variance terms are significant. The signs of the coefficients are the same as well, the only difference is in their magnitudes. So once again, the theory does not show up in application.

While we found no evidence of the game theory concept in reality, we were satisfied to find that variance was a significant predictor of both win probability and win percentage. This means that there is potential for hidden value in players that goes unnoticed when only considering mean production. For example, consistent hitters are more valuable than their inconsistent counterparts. With wOBA as the variable quantifying run production, we examine a player’s production on a per-game basis through mean and variance. We found that the higher a player’s mean production, the higher their variance.

Considering the types of games that the best players have, the positive correlation of variance and mean makes sense. Every player in MLB has had bad stretches of games, even Mike Trout or Jacob deGrom. What separates the best players from the rest are the outstanding performances that happen more often with these players. Those huge games result in the increased variance in the better players, who have a wider range of performances than the lower-tier ones.

Instead of examining who the most and least consistent players are, it is more interesting to look at how consistent a player is relative to their peers with similar mean production. We looked at the residuals of a linear regression to see which players are the most unusual compared to the average expected variance for a player with the same mean production. This would mean that a negative residual is a player more consistent than expected, something that benefits their team. For the combined 2018, 2019, and 2020 seasons, of the players who played in 250 or more games, the most negative residuals are shown in the table below.

Most Consistent Batters, 2018-2020
Player Mean Variance Residual
Mike Moustakas .960 .600 -.183
Alex Bregman 1.335 .809 -.165
Jason Kipnis .985 .662 -.134
Max Muncy .978 .669 -.123
Kris Bryant 1.237 .812 -.112
Albert Pujols .929 .657 -.110
Xander Bogaerts 1.262 .827 -.109
Josh Bell 1.075 .736 -.105
Justin Smoak .996 .696 -.105
Khris Davis .870 .636 -.101

The issue with the magnitude of these variances is that given the coefficient associated with mean and variance in terms of team winning percentage, variance is not relevant enough to warrant substantial value. If we design a variable called “Variance-Adjusted Value” (VaV) with the coefficients taken from the logistic regression…

`VaV = 0.1095*(Mean) -0.0056*(Variance)`

… then we compare VaV to mean production and find a correlation of 0.9997. This means that a player’s mean production is so much more important than their variance, that comparing players by VaV is almost indistinguishable from comparing their mean production.

So at first order, introducing variance to an individual player’s value does little to nothing to change our understanding of their value. One concern with our approach is that in logistic regression, the coefficients do not apply directly to win probability but the log odds. This means that this is a nonlinear trend, so an increase in the mean affects the win probability in different amounts depending on the previous value of the mean. When applied to players, this would mean that their value to the team depends on the team’s ability without the players. We do not examine this avenue in this analysis, but it is a worthwhile extension.

Although we did not find relevance at an individual level for variance, it is not to say that variance is unimportant, as we saw quite definitely that variance was a significant predictor of both win probability and winning percentage. Rather, the effect is seen at a team level rather than the individual level. A few next steps to consider would be covariance across players. It is conceivable that an individual has a high covariance with others, meaning that they decrease the variance of the lineup. This would mean that different lineup constructions could take advantage of the relevance of variance.

Another avenue to pursue is other measurements of consistency. Rather than variance, one could quantify consistency by the weight in the tails of the run-scoring distribution. Or for another example, a statistic other than wOBA could be used to quantify a specific player’s consistency. One possibility we are particularly curious about is the concept of hot and cold streaks, meaning that the magnitude or duration of these streaks could capture consistency.

Regardless, it is clear that variability provides a potentially exploitable aspect of the game. While we were not able to find anything interesting at the individual level, that is not to say that the concept could not be useful moving forward. The importance of variance might be most relevant at the roster construction level. Since increased variance was beneficial in runs allowed, this would suggest that inconsistent pitching staffs might be more successful across a season than a consistent pitching staff of the same mean talent. Conversely, a consistent lineup would be expected to be more successful than an inconsistent lineup, which suggests the importance of depth and balance.

Regardless of the specific uses of variance in baseball, we are thrilled to see the statistical significance of the values and are fascinated by the avenues this might present moving forward.

Member
MLBtoPDX#2024

This is spot on

Member
Member
newsense

It might be easier to show the impact of variance in individual player performance or covariance among players through simulation

Member
Member
kevo8

Really loved this piece. I’ve got two thoughts to add: 1) I think another plausible explanation for the positive impact of offensive consistency and negative impact of pitching consistency relates to blowouts. The marginal value of scoring presumably falls at high levels in a given game – if you are up 14-2, scoring a 15th run doesn’t help you much, and giving up that run doesn’t hurt the pitching team. 2) I suspect that looking at batter variance on the individual level is the wrong lens. Team-level variance presumably wouldn’t be too impacted by just one player’s consistency or inconsistency,… Read more »

Member

Very interesting article. On the hitting side, there are some elements of consistency that lead to increased performance – namely a hitter’s std dev. of launch angle. I know this is a different direction than the article but just pointing out that examining consistency over a larger number of batted balls (don’t recall the number needed but want to say it’s around 100 BBEs), there is a connection between consistency and wOBA – I haven’t checked but it would be interesting to see if the wOBAs for those rolling BBEs have a lower variance in addition to having a higher… Read more »

Member
Member
jts19

Very cool work. One additional reason that run-scoring variance may decrease win probability is that the distribution of runs per game is likely right-skewed, such that a team with a high variance will have a lower median (and a handful more blowout wins) than a low-variance team with a higher median. To another commenter’s point, the marginal value of each run decreases the higher you get.

Member
Lanidrac

Simple, you use binary, and 1 + 1 = 10.

Member

I have to say, this is maybe one of the top 10 papers I’ve read here. A question, does z-scoring thee data have any impact?