Team Construction, OBP, and the Importance of Variance by Brandon Firstname December 19, 2013 A recent article by ncarrington brought up an interesting point, and it’s one that merits further investigation. The basis of the article points out that even though two teams may have similar team average on-base percentages, a lack of consistency within one team will cause them to under-perform their collective numbers when it comes to run production. A balanced team, on the other hand, will score more runs. That’s our hypothesis. How does the scientific method work again? Er, nevermind, let’s just look at the data. In order to gain an initial understanding we’re going to start by looking at how teams fared in 2013. We’ll calculate a league average runs/OBP number that will work as a proxy for how many runs a team should be expected to score based on their OBP. And then we’ll calculate the standard deviation of each team’s OBP (weighted to plate appearances), and compare that to the league average standard deviation. If our hypothesis is true, teams with a relatively low OBP deviations will outperform their expected runs scored number. Of course, there’s a lot more to team production than OBP. We’re going to conquer that later. Bear with me–here’s 2013. A few things to keep in mind while dissecting this chart: 668.5 is the baseline number for Runs/(OBP/LeagueOBP). Any team number above this means that they are outperforming, while any number below represents underperformance. The league average team OBP standard deviation is .162 Team Runs/(OBP/LeagueOBP) OBP Standard Deviation Royals 647.71 0.1 Rangers 710.22 0.17 Padres 632.53 0.14 Mariners 642.88 0.15 Angels 700.75 0.17 Twins 618.61 0.16 Tigers 723.95 0.12 Astros 642.5 0.15 Giants 620.1 0.15 Dodgers 627.18 0.21 Reds 673.82 0.19 Mets 638.45 0.18 Diamondbacks 668.02 0.16 Braves 675.02 0.16 Blue Jays 705.27 0.17 White Sox 622.92 0.15 Red Sox 768.53 0.19 Cubs 631.74 0.12 Athletics 738.61 0.15 Nationals 662.76 0.18 Brewers 650.02 0.16 Rays 669.46 0.18 Orioles 749.95 0.19 Rockies 689.93 0.18 Phillies 627.95 0.14 Indians 717.08 0.18 Pirates 637.87 0.17 Cardinals 744.3 0.2 Marlins 552.48 0.14 Yankees 666.17 0.14 That chart’s kind of a bear, so I’m going to break it up into buckets. In 2013 there were 16 teams that exhibited above-average variances. Of those, 11 outperformed expectations while only 5 underperformed expectations. Now for the flipside–of the 14 teams that exhibited below-average variances, only 2 outperformed expectations while a shocking 12(!) teams underperformed. That absolutely flies in the face of our hypothesis. A startling 23 out of 30 teams suggest that a high variance will actually help a team score more runs while a low variance will cause a team to score less. Before we get all comfy with our conclusions, however, we’re going to acknowledge how complicated baseball is. It’s so complicated that we have to worry about this thing called sample size, since we have no idea what’s going on until we’ve seen a lot of things go on. So I’m going to open up the floodgates on this particular study, and we’re going to use every team’s season since 1920. League average OBP standard deviation and runs/OBP numbers will be calculated for each year, and we’ll use the aforementioned bucket approach to examine the results. Team Seasons 1920-2013 Result Occurrences High variance, outperformed expectations 504 High variance, underperformed expectations 508 Low variance, outperformed expectations 492 Low variance, underperformed expectations 538 Small sample size strikes again. Will there ever be a sabermetric article that doesn’t talk about sample size? Maybe, but it probably won’t be written by me. Anyways, the point is that variance in team OBP has little to no effect on actual results when you up your sample size to 2000+. As a side note of some interest, I wondered if teams with high variances would tend have bigger power numbers than their low variance counterparts. High variance teams have averaged an ISO of .132 since 1920. Low variance teams? .131. So, uh, not really. If you want to examine the ISO numbers a little more, here’s this: outperforming teams had an ISO of .144 while underperforming teams had an ISO .120. These numbers remain the same for both high and low variance teams. It appears that overachieving/underachieving OBP expectations can be almost entirely explained by ISO. I’m not satisfied with that answer, though. Was 2013 really just an aberration? What if we limit our samples to only teams that significantly outperformed or underperformed expectations (by 50 runs) while having a significantly large or small team standard deviation OBP. Team Seasons 1920-2013, significant values only Result Occurrences High variance, outperformed expectations 117 High variance, underperformed expectations 93 Low variance, outperformed expectations 101 Low variance, underperformed expectations 119 The numbers here do point a little bit more towards high variance leading to outperformance. High-variance teams are more likely to strongly outperform their expectations to the tune of about 20%, and the same is true for low-variance teams regarding underperforming. Bear in mind, however, that that is not a huge number, and that is not a huge sample size. If you’re trying to predict whether a team should outperform or underperform their collective means then variance is something to consider, but it isn’t the first place you should look. Being balanced is nice. Being consistent is nice. It’s something we have a natural inclinations towards as humans–it’s why we invented farming, civilization, the light bulb, etc. But when you’re building a baseball team it’s not something that’s going to help you win games. You win games with good players.