Estimating Team Wins With Innings Pitched

by Envy Angel

July 14, 2017

Throughout the baseball season, I like to estimate teams wins, but I don’t do it in the traditional way. Some time ago, I discovered that I could use innings pitched to get a close estimate. Here’s what I do:

1) Take team games played and divide by 2;

2) Take the team’s innings pitched and subtract the team opponents’ innings pitched;

3) Add 1 and 2.

For example, the Washington Nationals, as of the All-Star break, have played 88 games. They have 789.33 IP, and their opponents have 781.33 IP. So I take 88 divided by 2, which gives me 44. Then I take 789.33 minus 781.33, which gives me 8. Then 44 plus 8 gives me an estimate of 52 team wins. Checking the standings, I see that Washington indeed has 52 wins.

How does my method compare with the traditional Pythagorean? (The Pythagorean method, of course, takes runs scored squared and divides by runs scored squared plus runs allowed squared.) I’ve set up some charts to demonstrate. First, let me present the relevant statistics for all teams as of the All-Star break (all statistics courtesy CBS Sportsline):

Team	GP	IP	IPA	R	RA
Arizona	89	797	787	446	344
Atlanta	87	783	787.67	405	449
Baltimore	88	782.67	790.67	392	470
Boston	89	794.67	795	431	366
Chi. Cubs	88	785	787	399	399
Chi. White Sox	87	760.33	771.33	397	429
Cincinnati	88	781.67	786.67	424	463
Cleveland	87	768.67	763.67	421	347
Colorado	91	812.33	806.67	461	419
Detroit	87	762.67	766.67	409	440
Houston	89	800	784.33	527	365
Kansas City	87	775.33	775.67	362	387
L.A. Angels	92	817	824.33	377	399
L.A. Dodgers	90	806.33	786.67	463	300
Miami	87	771.67	777	410	429
Milwaukee	91	818.67	809.33	451	406
Minnesota	88	785.67	781	403	463
N.Y. Mets	86	773	775	406	455
N.Y. Yankees	86	768	765.33	477	379
Oakland	89	784	790.67	382	470
Philadelphia	87	775	790.33	332	424
Pittsburgh	89	800.67	802	378	403
San Diego	88	776.33	781	312	440
San Francisco	90	813.33	827.33	431	435
Seattle	90	800	797.67	354	453
St. Louis	88	798	793	402	389
Tampa Bay	90	805	802.33	428	412
Texas	88	783.67	783	444	415
Toronto	88	789	788.33	366	430
Washington	88	789.33	781.33	486	396

Now let me present a chart showing how many teams wins are predicted by my method and the Pythagorean method (for the Pythagorean method, I’m using 1.82 as my exponent, as shown by MLB on their Standings page):

Team	EST W (IP)	EST W (R)	Actual W
Arizona	54.50	54.82	53
Atlanta	38.83	39.43	42
Baltimore	36.00	36.80	42
Boston	44.17	51.07	50
Chi. Cubs	42.00	44.00	43
Chi. White Sox	32.50	40.44	38
Cincinnati	39.00	40.48	39
Cleveland	48.50	51.07	47
Colorado	51.16	49.45	52
Detroit	39.50	40.61	39
Houston	60.17	58.84	60
Kansas City	43.16	40.86	44
L.A. Angels	38.67	43.63	45
L.A. Dodgers	64.66	61.90	61
Miami	38.17	41.71	41
Milwaukee	54.84	49.84	50
Minnesota	48.67	38.47	45
N.Y. Mets	41.00	38.56	39
N.Y. Yankees	45.67	51.87	45
Oakland	37.83	36.20	39
Philadelphia	28.17	33.97	29
Pittsburgh	43.17	41.91	42
San Diego	39.33	30.67	38
San Francisco	31.00	44.62	34
Seattle	47.33	35.07	43
St. Louis	49.00	45.32	43
Tampa Bay	47.67	46.56	47
Texas	44.67	46.70	43
Toronto	44.67	37.59	41
Washington	52.00	52.11	52

My method appears in the second column, and the Pythagorean method appears in the third column, with actual team wins in the last column. My method, as shown above, gives estimated wins directly. The Pythagorean method actually computes winning percentage. To get the estimated wins for the Pythagorean method, I multiplied the team’s estimated winning percentage by the team’s games played.

The methods are pretty close! On a couple of teams, though, the methods miss by a wide margin. I’m way off on the Angels, for example, while Pythagoras is off on the Giants. But which of these methods is closer overall? I did an r-squared between each of the estimated win columns and the actual wins and got these results:

RSQ (IP)	RSQ (R)
0.8497	0.7147

Mine’s a little higher, but let’s use mean squared error (MSE) as a cross-check. Here are my numbers:

Team	MSE (IP)	MSE (R)
Arizona	2.25	3.33
Atlanta	10.05	6.61
Baltimore	36.00	27.05
Boston	33.99	1.15
Chi. Cubs	1.00	1.00
Chi. White Sox	30.25	5.94
Cincinnati	0.00	2.20
Cleveland	2.25	16.60
Colorado	0.71	6.53
Detroit	0.25	2.60
Houston	0.03	1.34
Kansas City	0.71	9.86
L.A. Angels	40.07	1.88
L.A. Dodgers	13.40	0.81
Miami	8.01	0.50
Milwaukee	23.43	0.03
Minnesota	13.47	42.61
N.Y. Mets	4.00	0.20
N.Y. Yankees	0.45	47.20
Oakland	1.37	7.82
Philadelphia	0.69	24.74
Pittsburgh	1.37	0.01
San Diego	1.77	53.77
San Francisco	9.00	112.82
Seattle	18.75	62.92
St. Louis	36.00	5.36
Tampa Bay	0.45	0.19
Texas	2.79	13.70
Toronto	13.47	11.61
Washington	0.00	0.01
AVG	10.20	15.68

I’m not a numbers person, so if I’ve made made errors in my calculations, please let me know, and I will never, ever trouble you fine readers again with another post. But I’ve published previous studies of both methods (in other places, under other names) and have found each time that my method edges out the Pythagorean in both r-squared and MSE.

If my method works at all, it’s because better teams typically have to get more outs to finish off their opponents. If the Dodgers, say, are at home against the Phillies, chances are they’re already winning when they go to the bottom of the ninth, and so the Dodgers don’t have to come to bat. That means the Dodgers had to get 27 outs and the Phillies had to get only 24. Conversely, on the road, if the Dodgers are leading the Phillies, the Phillies have to come to bat in the bottom of the ninth, and the Dodgers have to get the full 27 outs to end the game.

One caveat: my method tends to be more descriptive than predictive, so it’s a better measure of how a team has performed, not a good predictor of how a team will perform in the future. The Pythagorean method is much better as a predictive tool.

So there it is! My estimated team wins method. I hope you find it useful.

2 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Jim Melichar

6 years ago

Your caveat is really the entire article. Not pitching the bottom of the last inning more means you’ve won more. Should be plain and simple like that. The only noise that should enter the correlation is the road wins vs. home wins a team has.

In order for it to be predictive you’d have to first predict home victories and that doesn’t feel like something you want to get into 🙂

evo34

Apparently, there is no review process in Community Research.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG