How Often Is the “Best Team” Really the Best?

by Cavarretta

January 31, 2017

We know the playoffs are a crapshoot. A 5- or 7-game series tells us very little about which team is actually the better team. But it is easy to forget that the regular season is a crapshoot, too, just with a larger sample size. Teams go into a given game with a certain probability of winning, based on their true-talent levels (i.e., their probability of winning a game against a .500 team). And then, as luck decides, one team wins and the other loses. A season is just the sum total of 162 luck-based games for each team, and there is no guarantee that the luck must even out in the end.

After the regular season, the team with the best record is usually proclaimed “the best team in baseball.” It was the Cubs this year, and the Cardinals the year before, and the Angels the year before that. But were those teams really the best? We can’t tell just by looking at their records. It would be great if we knew the true-talent level of every team. But baseball doesn’t give us probabilities of teams winning; it only gives us outcomes. The same flaw exists for Pythagorean Record, BaseRuns, or any other metric you might use to evaluate a team at season’s end. BaseRuns gets the closest to a team’s true-talent level, because it uses a sample size of thousands of plate appearances, but it’s still an estimate based on outcomes, and not the underlying probabilities of those outcomes.

I wanted to know what the probability is that the team with the most true talent finishes the regular season with the best record in baseball. Since there’s no way to test that empirically, I ran a simulation in R. For each trial of the simulation, every team was assigned a random true-talent level from a normal distribution (see Phil Birnbaum’s blog post for my methodology, although I based my calculations for true-talent variance off of win totals from the two-wild-card era). The teams then played through the 2017 schedule, with each game being simulated using Bill James’ log5 formula. If the team with the most wins matched the team with the most true talent, that trial counted as a success. Trials in which two or more teams tied for the most wins were thrown out altogether.

I ran through one million simulated seasons using this method. In 91.2% of them, a single team finished with the best record in the league. But out of those seasons, the team with the best record matched the team with the most true talent only 43.1% of the time.

So, given that a team finishes with the best record in baseball, there is a 43.1% chance that they are actually the best team. More likely than not, some other team was more talented. Even after 162 games, we can’t really be sure who deserved to come out on top.

3 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Bill but not Ted

8 years ago

I really enjoyed this post. logical premise and execution. Stuff like this is why baseball is appealing to me, no guarantees. Chance matters.

bunslow

The next obvious question is how many games does it typically take to reach a given level of “confidence” that the best record is also the best true talent?

Cavarretta

Reply to bunslow

Interesting question. It would depend on how those games are scheduled, as the limited number of interleague games means that comparisons between NL and AL teams aren’t very reliable.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG