Team Construction, OBP, and the Importance of Variance

December 19, 2013

A recent article by ncarrington brought up an interesting point, and it’s one that merits further investigation. The basis of the article points out that even though two teams may have similar team average on-base percentages, a lack of consistency within one team will cause them to under-perform their collective numbers when it comes to run production. A balanced team, on the other hand, will score more runs. That’s our hypothesis.

How does the scientific method work again? Er, nevermind, let’s just look at the data.

In order to gain an initial understanding we’re going to start by looking at how teams fared in 2013. We’ll calculate a league average runs/OBP number that will work as a proxy for how many runs a team should be expected to score based on their OBP. And then we’ll calculate the standard deviation of each team’s OBP (weighted to plate appearances), and compare that to the league average standard deviation. If our hypothesis is true, teams with a relatively low OBP deviations will outperform their expected runs scored number.

Of course, there’s a lot more to team production than OBP. We’re going to conquer that later. Bear with me–here’s 2013.

A few things to keep in mind while dissecting this chart: 668.5 is the baseline number for Runs/(OBP/LeagueOBP). Any team number above this means that they are outperforming, while any number below represents underperformance. The league average team OBP standard deviation is .162

Team	Runs/(OBP/LeagueOBP)	OBP Standard Deviation
Royals	647.71	0.1
Rangers	710.22	0.17
Padres	632.53	0.14
Mariners	642.88	0.15
Angels	700.75	0.17
Twins	618.61	0.16
Tigers	723.95	0.12
Astros	642.5	0.15
Giants	620.1	0.15
Dodgers	627.18	0.21
Reds	673.82	0.19
Mets	638.45	0.18
Diamondbacks	668.02	0.16
Braves	675.02	0.16
Blue Jays	705.27	0.17
White Sox	622.92	0.15
Red Sox	768.53	0.19
Cubs	631.74	0.12
Athletics	738.61	0.15
Nationals	662.76	0.18
Brewers	650.02	0.16
Rays	669.46	0.18
Orioles	749.95	0.19
Rockies	689.93	0.18
Phillies	627.95	0.14
Indians	717.08	0.18
Pirates	637.87	0.17
Cardinals	744.3	0.2
Marlins	552.48	0.14
Yankees	666.17	0.14

That chart’s kind of a bear, so I’m going to break it up into buckets. In 2013 there were 16 teams that exhibited above-average variances. Of those, 11 outperformed expectations while only 5 underperformed expectations. Now for the flipside–of the 14 teams that exhibited below-average variances, only 2 outperformed expectations while a shocking 12(!) teams underperformed.

That absolutely flies in the face of our hypothesis. A startling 23 out of 30 teams suggest that a high variance will actually help a team score more runs while a low variance will cause a team to score less.

Before we get all comfy with our conclusions, however, we’re going to acknowledge how complicated baseball is. It’s so complicated that we have to worry about this thing called sample size, since we have no idea what’s going on until we’ve seen a lot of things go on. So I’m going to open up the floodgates on this particular study, and we’re going to use every team’s season since 1920. League average OBP standard deviation and runs/OBP numbers will be calculated for each year, and we’ll use the aforementioned bucket approach to examine the results.

Team Seasons 1920-2013

Result	Occurrences
High variance, outperformed expectations	504
High variance, underperformed expectations	508
Low variance, outperformed expectations	492
Low variance, underperformed expectations	538

Small sample size strikes again. Will there ever be a sabermetric article that doesn’t talk about sample size? Maybe, but it probably won’t be written by me. Anyways, the point is that variance in team OBP has little to no effect on actual results when you up your sample size to 2000+. As a side note of some interest, I wondered if teams with high variances would tend have bigger power numbers than their low variance counterparts. High variance teams have averaged an ISO of .132 since 1920. Low variance teams? .131. So, uh, not really.

If you want to examine the ISO numbers a little more, here’s this: outperforming teams had an ISO of .144 while underperforming teams had an ISO .120. These numbers remain the same for both high and low variance teams. It appears that overachieving/underachieving OBP expectations can be almost entirely explained by ISO.

I’m not satisfied with that answer, though. Was 2013 really just an aberration? What if we limit our samples to only teams that significantly outperformed or underperformed expectations (by 50 runs) while having a significantly large or small team standard deviation OBP.

Team Seasons 1920-2013, significant values only

Result	Occurrences
High variance, outperformed expectations	117
High variance, underperformed expectations	93
Low variance, outperformed expectations	101
Low variance, underperformed expectations	119

The numbers here do point a little bit more towards high variance leading to outperformance. High-variance teams are more likely to strongly outperform their expectations to the tune of about 20%, and the same is true for low-variance teams regarding underperforming. Bear in mind, however, that that is not a huge number, and that is not a huge sample size. If you’re trying to predict whether a team should outperform or underperform their collective means then variance is something to consider, but it isn’t the first place you should look.

Being balanced is nice. Being consistent is nice. It’s something we have a natural inclinations towards as humans–it’s why we invented farming, civilization, the light bulb, etc. But when you’re building a baseball team it’s not something that’s going to help you win games. You win games with good players.

What If: The St. Louis Cardinals Were Two Teams

Mark Trumbo, Pedro Alvarez, and Perception

Brandon Reppert is a computer "scientist" who finds talking about himself in the third-person peculiar.

9 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Tom

11 years ago

Great work! Fascinating stuff.

evo34Member since 2023

11 years ago

Interesting article. A couple notes:

1) ISO is not a good comparative stat (for anything, really). The reason is that it does not actually isolate power. A .220 hitter with a .400 SLG (.180 ISO) has a ton more true isolated power than a .320 hitter with a .500 SLG (.180 ISO). Why? Because the .320 hitter put more balls into play, and plenty of weakly hit balls in play turn into doubles and triples, essentially inflating the player’s perceived raw power vs. that of a lower average hitter.

2) A similar issue exists with your use of standard deviation of OBP within a team. A .300 OBP team with a .015 stdev is more volatile than a .350 OBP team with a .015 stdev. But your analysis treats them as equally volatile (as far as I can tell).

So, basically, I think there is a selection bias when you find that teams with higher raw variance have outperformed. I.e., better teams should naturally have higher raw statistical variance than average simply because of scaling.

Brandon Reppert

11 years ago

Reply to evo34

Thanks for the feedback!

In regards to point (1) ISO tested what I wanted it to, since I didn’t want to test just raw power, but how that power was utilized. I wanted to see why teams outperformed, and it was basically because they gained more bases per time that they get on base. Which makes sense. What you say is correct, though–ISO does not measure raw power.

(2) is an good point, but it doesn’t make much of a difference. I originally used a quick-and-dirty constant rather than having a variable coefficient, but I just now redid the results using a variable coefficient for significance based upon the team’s OBP. Here are the edited results for the last table:

Results | # | ISO
High variance, increases runs | 116 | 0.15
High variance, decreases runs | 95 | 0.109
Low variance, increases runs | 100 | 0.155
Low variance, decreases runs | 124 | 0.1081

Dre

11 years ago

Great point, evo. Perhaps coefficient of variation would be a better measure of consistency.

Smell the Glove

11 years ago

Ok, think about this.

On the low-scoring side of the coin, variation would be low because you can’t go below zero. On the high-scoring side, there’s really no limit. A blow-out can be 8, 10, 12 runs.

So it’s really the act of underperfoming that begets low variance, not the other way around.

Brandon Reppert

11 years ago

Reply to Smell the Glove

Excellent thought, that’s absolutely true.

studstats_13Member since 2020

11 years ago

good article could have explained the statics a little bit more

Mark McCluskey

11 years ago

I’m not sure I understand the hypothesis that variance is bad. The Hall of Fame is not filled with average players; rather, it is filled with players far above the average, and that gap between average and elite is variance. Now, possessing variance is not the key to success, since you can be as far below average as above it. The key is managing variance.

There is a concept in banking and insurance called “diversification benefit”. Essentially, the good performance of some assets will cancel the bad performance of others, thus lowering the total risk of the portfolio in aggregate. If you have two portfolios with the same total variance, the one with the lower correlation stands a better chance of higher returns, since the likelihood of sustaining significant losses across multiple assets is greatly reduced.

The same can be said of a baseball team, which itself is a portfolio of risky assets. A team stands a better chance of scoring runs when the performance of hitters is not conditional on the performance of those ahead of or behind them in the batting order. If you have high total variance in OBP, but low correlation between individual performances day-to-day, that team stands a good chance to score a lot of runs. It won’t have a lot of 10 run outings, but you won’t have a lot of shutouts either.

DaveO

10 years ago

This entire hypothesis depends on the assumption that Runs/(OBP/LeagueOBP) is somewhat of a constant. This is not the case. Teams do a better job of converting OBP into Runs if they also have a high number of XBH, especially Home Runs.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG