Throughout the baseball season, I like to estimate teams wins, but I don’t do it in the traditional way. Some time ago, I discovered that I could use innings pitched to get a close estimate. Here’s what I do:
1) Take team games played and divide by 2;
2) Take the team’s innings pitched and subtract the team opponents’ innings pitched;
3) Add 1 and 2.
For example, the Washington Nationals, as of the All-Star break, have played 88 games. They have 789.33 IP, and their opponents have 781.33 IP. So I take 88 divided by 2, which gives me 44. Then I take 789.33 minus 781.33, which gives me 8. Then 44 plus 8 gives me an estimate of 52 team wins. Checking the standings, I see that Washington indeed has 52 wins.
How does my method compare with the traditional Pythagorean? (The Pythagorean method, of course, takes runs scored squared and divides by runs scored squared plus runs allowed squared.) I’ve set up some charts to demonstrate. First, let me present the relevant statistics for all teams as of the All-Star break (all statistics courtesy CBS Sportsline):
|Chi. White Sox||87||760.33||771.33||397||429|
Now let me present a chart showing how many teams wins are predicted by my method and the Pythagorean method (for the Pythagorean method, I’m using 1.82 as my exponent, as shown by MLB on their Standings page):
|Team||EST W (IP)||EST W (R)||Actual W|
|Chi. White Sox||32.50||40.44||38|
My method appears in the second column, and the Pythagorean method appears in the third column, with actual team wins in the last column. My method, as shown above, gives estimated wins directly. The Pythagorean method actually computes winning percentage. To get the estimated wins for the Pythagorean method, I multiplied the team’s estimated winning percentage by the team’s games played.
The methods are pretty close! On a couple of teams, though, the methods miss by a wide margin. I’m way off on the Angels, for example, while Pythagoras is off on the Giants. But which of these methods is closer overall? I did an r-squared between each of the estimated win columns and the actual wins and got these results:
|RSQ (IP)||RSQ (R)|
Mine’s a little higher, but let’s use mean squared error (MSE) as a cross-check. Here are my numbers:
|Team||MSE (IP)||MSE (R)|
|Chi. White Sox||30.25||5.94|
I’m not a numbers person, so if I’ve made made errors in my calculations, please let me know, and I will never, ever trouble you fine readers again with another post. But I’ve published previous studies of both methods (in other places, under other names) and have found each time that my method edges out the Pythagorean in both r-squared and MSE.
If my method works at all, it’s because better teams typically have to get more outs to finish off their opponents. If the Dodgers, say, are at home against the Phillies, chances are they’re already winning when they go to the bottom of the ninth, and so the Dodgers don’t have to come to bat. That means the Dodgers had to get 27 outs and the Phillies had to get only 24. Conversely, on the road, if the Dodgers are leading the Phillies, the Phillies have to come to bat in the bottom of the ninth, and the Dodgers have to get the full 27 outs to end the game.
One caveat: my method tends to be more descriptive than predictive, so it’s a better measure of how a team has performed, not a good predictor of how a team will perform in the future. The Pythagorean method is much better as a predictive tool.
So there it is! My estimated team wins method. I hope you find it useful.