Estimating Team Wins With Innings Pitched

Throughout the baseball season, I like to estimate teams wins, but I don’t do it in the traditional way. Some time ago, I discovered that I could use innings pitched to get a close estimate. Here’s what I do:

1) Take team games played and divide by 2;

2) Take the team’s innings pitched and subtract the team opponents’ innings pitched;

3) Add 1 and 2.

For example, the Washington Nationals, as of the All-Star break, have played 88 games. They have 789.33 IP, and their opponents have 781.33 IP. So I take 88 divided by 2, which gives me 44. Then I take 789.33 minus 781.33, which gives me 8. Then 44 plus 8 gives me an estimate of 52 team wins. Checking the standings, I see that Washington indeed has 52 wins.

How does my method compare with the traditional Pythagorean? (The Pythagorean method, of course, takes runs scored squared and divides by runs scored squared plus runs allowed squared.) I’ve set up some charts to demonstrate. First, let me present the relevant statistics for all teams as of the All-Star break (all statistics courtesy CBS Sportsline):

Team GP IP IPA R RA
Arizona 89 797 787 446 344
Atlanta 87 783 787.67 405 449
Baltimore 88 782.67 790.67 392 470
Boston 89 794.67 795 431 366
Chi. Cubs 88 785 787 399 399
Chi. White Sox 87 760.33 771.33 397 429
Cincinnati 88 781.67 786.67 424 463
Cleveland 87 768.67 763.67 421 347
Colorado 91 812.33 806.67 461 419
Detroit 87 762.67 766.67 409 440
Houston 89 800 784.33 527 365
Kansas City 87 775.33 775.67 362 387
L.A. Angels 92 817 824.33 377 399
L.A. Dodgers 90 806.33 786.67 463 300
Miami 87 771.67 777 410 429
Milwaukee 91 818.67 809.33 451 406
Minnesota 88 785.67 781 403 463
N.Y. Mets 86 773 775 406 455
N.Y. Yankees 86 768 765.33 477 379
Oakland 89 784 790.67 382 470
Philadelphia 87 775 790.33 332 424
Pittsburgh 89 800.67 802 378 403
San Diego 88 776.33 781 312 440
San Francisco 90 813.33 827.33 431 435
Seattle 90 800 797.67 354 453
St. Louis 88 798 793 402 389
Tampa Bay 90 805 802.33 428 412
Texas 88 783.67 783 444 415
Toronto 88 789 788.33 366 430
Washington 88 789.33 781.33 486 396

Now let me present a chart showing how many teams wins are predicted by my method and the Pythagorean method (for the Pythagorean method, I’m using 1.82 as my exponent, as shown by MLB on their Standings page):

Team EST W (IP) EST W (R) Actual W
Arizona 54.50 54.82 53
Atlanta 38.83 39.43 42
Baltimore 36.00 36.80 42
Boston 44.17 51.07 50
Chi. Cubs 42.00 44.00 43
Chi. White Sox 32.50 40.44 38
Cincinnati 39.00 40.48 39
Cleveland 48.50 51.07 47
Colorado 51.16 49.45 52
Detroit 39.50 40.61 39
Houston 60.17 58.84 60
Kansas City 43.16 40.86 44
L.A. Angels 38.67 43.63 45
L.A. Dodgers 64.66 61.90 61
Miami 38.17 41.71 41
Milwaukee 54.84 49.84 50
Minnesota 48.67 38.47 45
N.Y. Mets 41.00 38.56 39
N.Y. Yankees 45.67 51.87 45
Oakland 37.83 36.20 39
Philadelphia 28.17 33.97 29
Pittsburgh 43.17 41.91 42
San Diego 39.33 30.67 38
San Francisco 31.00 44.62 34
Seattle 47.33 35.07 43
St. Louis 49.00 45.32 43
Tampa Bay 47.67 46.56 47
Texas 44.67 46.70 43
Toronto 44.67 37.59 41
Washington 52.00 52.11 52

My method appears in the second column, and the Pythagorean method appears in the third column, with actual team wins in the last column. My method, as shown above, gives estimated wins directly. The Pythagorean method actually computes winning percentage. To get the estimated wins for the Pythagorean method, I multiplied the team’s estimated winning percentage by the team’s games played.

The methods are pretty close! On a couple of teams, though, the methods miss by a wide margin. I’m way off on the Angels, for example, while Pythagoras is off on the Giants. But which of these methods is closer overall? I did an r-squared between each of the estimated win columns and the actual wins and got these results:

RSQ (IP) RSQ (R)
0.8497 0.7147

Mine’s a little higher, but let’s use mean squared error (MSE) as a cross-check. Here are my numbers:

Team MSE (IP) MSE (R)
Arizona 2.25 3.33
Atlanta 10.05 6.61
Baltimore 36.00 27.05
Boston 33.99 1.15
Chi. Cubs 1.00 1.00
Chi. White Sox 30.25 5.94
Cincinnati 0.00 2.20
Cleveland 2.25 16.60
Colorado 0.71 6.53
Detroit 0.25 2.60
Houston 0.03 1.34
Kansas City 0.71 9.86
L.A. Angels 40.07 1.88
L.A. Dodgers 13.40 0.81
Miami 8.01 0.50
Milwaukee 23.43 0.03
Minnesota 13.47 42.61
N.Y. Mets 4.00 0.20
N.Y. Yankees 0.45 47.20
Oakland 1.37 7.82
Philadelphia 0.69 24.74
Pittsburgh 1.37 0.01
San Diego 1.77 53.77
San Francisco 9.00 112.82
Seattle 18.75 62.92
St. Louis 36.00 5.36
Tampa Bay 0.45 0.19
Texas 2.79 13.70
Toronto 13.47 11.61
Washington 0.00 0.01
AVG 10.20 15.68

I’m not a numbers person, so if I’ve made made errors in my calculations, please let me know, and I will never, ever trouble you fine readers again with another post. But I’ve published previous studies of both methods (in other places, under other names) and have found each time that my method edges out the Pythagorean in both r-squared and MSE.

If my method works at all, it’s because better teams typically have to get more outs to finish off their opponents. If the Dodgers, say, are at home against the Phillies, chances are they’re already winning when they go to the bottom of the ninth, and so the Dodgers don’t have to come to bat. That means the Dodgers had to get 27 outs and the Phillies had to get only 24. Conversely, on the road, if the Dodgers are leading the Phillies, the Phillies have to come to bat in the bottom of the ninth, and the Dodgers have to get the full 27 outs to end the game.

One caveat: my method tends to be more descriptive than predictive, so it’s a better measure of how a team has performed, not a good predictor of how a team will perform in the future. The Pythagorean method is much better as a predictive tool.

So there it is! My estimated team wins method. I hope you find it useful.





2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Jim Melichar
6 years ago

Your caveat is really the entire article. Not pitching the bottom of the last inning more means you’ve won more. Should be plain and simple like that. The only noise that should enter the correlation is the road wins vs. home wins a team has.

In order for it to be predictive you’d have to first predict home victories and that doesn’t feel like something you want to get into 🙂

evo34
6 years ago

Apparently, there is no review process in Community Research.