Author: Will Larson

2014 Projection Review (Updated)

March 17, 2015

Update: The previous version of this post, published last week, contained a data error that has now been fixed. Steamer/Razzball and Pod projections have been added and the hitter sample has been corrected from the prior version of this article.

Welcome to my 5^th annual forecast review. Each year, every projection submitted to me at http://www.bbprojectionproject.com is tested for error (RMSE), overall predictive power (R^2), and is then ranked. I present both RMSE and R^2 because both have their uses. RMSE is a standard measure of forecast error, but this metric penalizes general optimism/pessimism about the run environment, even if a forecast has low error after controlling for the bias. For instance, Marcel is very good at predicting the run environment and the FanGraphs Fans are pretty terrible, so Marcel will usually have a better RMSE than the Fans. On the other hand, R^2 serves as a better test of the relative performance of players by ignoring any general biases in the forecasts that are pervasive in the forecasting system. Marcel tends to be lower in this metric versus other systems due to its rigid formula, whereas more sophisticated methods like ZIPS or Steamer tend to do better.

Comparisons are based on the set of players that every system projected. This amounts to 70 pitchers and 141 hitters for 2014. This is certainly limiting, but there is an inherent tradeoff in the number of projection systems that can be analyzed vs. the number of players that are projected by all systems. My policy is to consider as many projection systems as possible, as long as the number of players doesn’t get too low.

Now, on to the contest!

This year certainly saw some interesting results. By the R^2 metric, the best forecaster for hitters (Dan Rosenheck) only published forecasts for hitter categories–evidently there’s some benefit in specialization when it comes to projecting baseball players. The best pitcher forecasts came from Mike Podhorzer’s Pod forecasts. The best composite score came from my own personal forecast brew, which is computed based on an algorithm that estimates weights of other main-line forecasts. In a sense, this is not an original forecast, so I now note forecasts that I know use other forecasts as inputs with an “*” (I realize that to some degree, most everyone calibrates their forecasts to what they see other people doing). The next two forecasts are also of this same type, the AggPro and the Steamer/Razzball forecasts. The top “structural” forecast was Pod, followed by ZIPS, Rotovalue, and CBS.

In terms of RMSE, Dan Rosenheck ran away with the hitters, and my weighted average did the best among pitchers. The top overall performers across categories were MORPS, Marcel, Rotovalue, and AggPro.

Overall, there are a few interesting comparisons to be made between projection systems across different years. Among the open-source stats community, Steamer vs ZIPS is always interesting to watch. In prior years, Steamer has been better. This year, however, ZIPS made huge gains and beat Steamer. Marcel, had a typical year—with a very favorable ranking on RMSE but not R^2. The FanGraphs Fans had a down year, finishing near the bottom in most metrics. CBS Sportsline is the top forecast by a major media company, which in general, tend to do poorly. Finally, most every projection submitted beat the naïve previous-season benchmark, where the 2014 forecast is simply the actual performance in 2013. At least we’re all doing something right.

Thank you again to all who submitted projections. I invite anyone who is interested to submit their top-line hitter and pitcher projections to me at larsonwd@gmail.com. You projection will be put up on http://www.bbprojectionproject.com as soon as I receive it, unless you want me to embargo it until the end of the season, which some people choose to do because of fantasy baseball or other proprietary reason. All the code (STATA) and data for these evaluations are available upon request. If I’m using the wrong versions of anyone’s projections (which can happen!), please let me know.

R^2 Rankings:

Place	Forecast System	Hitters	Pitchers	Average
N/A	Dan Rosenheck*	1.60		1.60
N/A	Beans		5.00	5.00
1st	Will Larson*	6.60	5.25	5.93
2nd	AggPro*	8.40	6.25	7.33
3rd	Steamer/Razzball*	6.20	9.00	7.60
4th	Pod	11.20	4.75	7.98
5th	ZIPS	10.00	7.25	8.63
6th	Rotovalue	9.00	8.25	8.63
7th	CBS Sportsline	10.20	8.00	9.10
8th	ESPN	9.40	10.50	9.95
9th	Steamer	9.60	11.50	10.55
10th	Fangraphs Fans	13.60	9.00	11.30
11th	Rotochamp	7.60	15.25	11.43
12th	Razzball	11.60	12.25	11.93
13th	MORPS	13.20	11.00	12.10
14th	Clay Davenport	14.60	11.50	13.05
15th	Cairo	8.20	18.00	13.10
16th	Marcel	16.60	10.00	13.30
17th	Bayesball	9.80	20.50	15.15
18th	Guru	16.80	14.00	15.40
19th	Oliver	16.40	15.00	15.70
20th	Prior Season	20.40	18.75	19.58

RMSE Rankings:

Place	System	Hitters	Pitchers	Average
N/A	Dan Rosenheck*	1.40		1.40
1st	MORPS	4.20	8.50	6.35
N/A	Beans		6.50	6.50
2nd	Marcel	8.00	7.00	7.50
3rd	Rotovalue	8.60	7.00	7.80
4th	AggPro*	7.60	8.25	7.93
5th	ZIPS	9.60	7.75	8.68
6th	Clay Davenport	6.60	10.75	8.68
7th	Steamer	7.80	11.00	9.40
8th	Cairo	4.80	14.00	9.40
9th	Steamer/Razzball*	9.80	10.00	9.90
10th	Will Larson*	15.60	4.75	10.18
11th	Guru	7.80	13.00	10.40
12th	Rotochamp	10.20	11.50	10.85
13th	Bayesball	7.20	15.25	11.23
14th	Pod	15.80	8.75	12.28
15th	Razzball	16.20	13.00	14.60
16th	Oliver	14.40	15.25	14.83
17th	ESPN	18.40	11.50	14.95
18th	CBS Sportsline	17.40	13.50	15.45
19th	Fangraphs Fans	19.40	13.25	16.33
20th	Prior Season	20.00	20.50	20.25

RMSE, Hitters:

system	r	rank	hr	rank	rbi	rank	avg	rank	sb	rank	AVG
Dan Rosenheck*	19.22	1	7.07	1	20.91	1	0.024	2	6.24	2	1.40
MORPS	20.56	2	7.70	3	22.35	2	0.027	13	6.13	1	4.20
Cairo	21.55	3	7.87	6	22.53	3	0.025	9	6.30	3	4.80
Clay Davenport	21.91	6	7.92	7	23.74	8	0.025	8	6.33	4	6.60
Bayesball	22.47	9	8.24	10	24.03	10	0.022	1	6.39	6	7.20
AggPro*	22.64	12	8.23	9	23.34	6	0.024	3	6.42	8	7.60
Steamer	22.58	10	8.22	8	23.37	7	0.025	7	6.41	7	7.80
Guru	22.62	11	7.74	4	23.76	9	0.025	6	6.88	9	7.80
Marcel	21.67	4	7.62	2	22.76	4	0.027	16	7.04	14	8.00
Rotovalue	22.03	7	7.77	5	23.02	5	0.026	10	7.07	16	8.60
ZIPS	22.11	8	8.46	11	25.30	14	0.024	4	6.94	11	9.60
Steamer/Razzball*	23.87	13	8.73	13	24.75	13	0.024	5	6.35	5	9.80
Rotochamp	21.73	5	8.49	12	24.60	12	0.026	12	6.93	10	10.20
Oliver	24.67	16	9.26	18	26.86	16	0.026	11	6.94	11	14.40
Will Larson*	24.88	17	8.75	14	24.37	11	0.029	19	7.08	17	15.60
Pod	24.23	14	9.10	16	26.54	15	0.035	21	7.04	13	15.80
Razzball	24.57	15	8.90	15	27.45	19	0.027	14	7.14	18	16.20
CBS Sportsline	26.28	19	9.94	21	26.90	17	0.027	15	7.06	15	17.40
ESPN	25.88	18	9.88	20	27.25	18	0.028	17	7.32	19	18.40
Fangraphs Fans	27.20	21	9.24	17	28.98	21	0.029	18	7.62	20	19.40
Prior Season	26.56	20	9.39	19	28.77	20	0.033	20	7.84	21	20.00

R^2, Hitters:

system	r	rank	hr	rank	rbi	rank	avg	rank	sb	rank	AVG
Dan Rosenheck*	0.267	1	0.329	1	0.181	1	0.373	2	0.679	3	1.60
Steamer/Razzball*	0.143	12	0.270	5	0.150	8	0.325	5	0.689	1	6.20
Will Larson*	0.162	10	0.263	8	0.165	5	0.320	6	0.676	4	6.60
Rotochamp	0.227	2	0.268	7	0.127	15	0.293	9	0.675	5	7.60
Cairo	0.166	7	0.259	10	0.165	4	0.288	12	0.659	8	8.20
AggPro*	0.129	15	0.269	6	0.141	11	0.352	3	0.660	7	8.40
Rotovalue	0.164	8	0.272	3	0.167	2	0.278	14	0.574	18	9.00
ESPN	0.166	6	0.253	12	0.166	3	0.273	16	0.656	10	9.40
Steamer	0.130	14	0.260	9	0.135	12	0.317	7	0.661	6	9.60
Bayesball	0.144	11	0.235	17	0.148	9	0.424	1	0.655	11	9.80
ZIPS	0.180	4	0.244	14	0.124	16	0.347	4	0.652	12	10.00
CBS Sportsline	0.162	9	0.243	15	0.151	7	0.266	18	0.682	2	10.20
Pod	0.183	3	0.271	4	0.128	14	0.111	21	0.641	14	11.20
Razzball	0.128	16	0.281	2	0.159	6	0.256	19	0.639	15	11.60
MORPS	0.174	5	0.217	19	0.132	13	0.288	13	0.636	16	13.20
Fangraphs Fans	0.103	19	0.255	11	0.116	18	0.289	11	0.657	9	13.60
Clay Davenport	0.134	13	0.237	16	0.143	10	0.271	17	0.622	17	14.60
Oliver	0.065	21	0.223	18	0.101	20	0.289	10	0.648	13	16.40
Marcel	0.119	17	0.250	13	0.122	17	0.275	15	0.515	21	16.60
Guru	0.118	18	0.210	20	0.109	19	0.311	8	0.555	19	16.80
Prior Season	0.094	20	0.206	21	0.093	21	0.197	20	0.525	20	20.40

RMSE, Pitchers:

system	W	rank	ERA	rank	WHIP	rank	SO	rank	AVG
Will Larson*	4.77	2	0.992	6	0.148	10	56.62	1	4.75
Beans	4.82	4	0.983	3	0.148	11	58.88	8	6.50
Marcel	4.90	8	1.003	11	0.143	4	57.93	5	7.00
Rotovalue	4.83	6	0.978	2	0.151	17	57.26	3	7.00
ZIPS	5.06	15	0.965	1	0.139	1	60.06	14	7.75
AggPro*	4.94	9	0.992	7	0.144	7	59.18	10	8.25
MORPS	4.71	1	1.026	18	0.149	13	56.69	2	8.50
Pod	4.82	5	0.995	10	0.144	8	59.75	12	8.75
Steamer/Razzball*	4.89	7	1.004	12	0.150	15	58.20	6	10.00
Clay Davenport	4.78	3	1.015	15	0.148	12	59.80	13	10.75
Steamer	4.94	10	1.006	14	0.150	16	57.89	4	11.00
ESPN	5.40	18	0.994	8	0.141	3	63.31	17	11.50
Rotochamp	5.04	14	0.989	4	0.145	9	64.18	19	11.50
Razzball	5.25	17	0.990	5	0.149	14	62.89	16	13.00
Guru	4.96	12	1.055	19	0.144	6	61.96	15	13.00
Fangraphs Fans	5.56	20	1.005	13	0.141	2	64.09	18	13.25
CBS Sportsline	5.47	19	0.995	9	0.143	5	67.18	21	13.50
Cairo	4.96	11	1.022	17	0.170	21	58.76	7	14.00
Oliver	5.12	16	1.019	16	0.151	18	59.73	11	15.25
Bayesball	5.04	13	1.082	20	0.163	19	59.11	9	15.25
Prior Season	5.64	21	1.157	21	0.169	20	64.99	20	20.50

R^2 Pitchers:

system	W	rank	ERA	rank	WHIP	rank	SO	rank	AVG
Pod	0.229	1	0.174	9	0.302	5	0.134	4	4.75
Beans	0.184	5	0.196	3	0.269	10	0.136	2	5.00
Will Larson*	0.194	3	0.199	2	0.269	11	0.133	5	5.25
AggPro*	0.190	4	0.190	6	0.287	7	0.121	8	6.25
ZIPS	0.137	12	0.207	1	0.331	2	0.102	14	7.25
CBS Sportsline	0.222	2	0.176	8	0.330	3	0.079	19	8.00
Rotovalue	0.158	9	0.183	7	0.242	16	0.179	1	8.25
Fangraphs Fans	0.122	16	0.161	13	0.372	1	0.125	6	9.00
Steamer/Razzball*	0.167	8	0.192	4	0.254	14	0.111	10	9.00
Marcel	0.137	13	0.146	14	0.302	6	0.122	7	10.00
ESPN	0.146	11	0.171	11	0.309	4	0.101	16	10.50
MORPS	0.181	6	0.112	18	0.236	17	0.134	3	11.00
Steamer	0.128	15	0.192	5	0.254	13	0.104	13	11.50
Clay Davenport	0.177	7	0.120	15	0.252	15	0.117	9	11.50
Razzball	0.154	10	0.174	10	0.257	12	0.097	17	12.25
Guru	0.115	17	0.106	19	0.281	9	0.109	11	14.00
Oliver	0.133	14	0.119	16	0.225	18	0.107	12	15.00
Rotochamp	0.079	20	0.170	12	0.283	8	0.037	21	15.25
Cairo	0.115	18	0.118	17	0.178	19	0.097	18	18.00
Prior Season	0.088	19	0.028	21	0.164	20	0.102	15	18.75
Bayesball	0.077	21	0.103	20	0.159	21	0.060	20	20.50

Evaluating 2013 Projections

by Will Larson

February 4, 2014

Welcome to the 3^rd annual forecast competition, where each forecaster who submits projections to bbprojectionproject.com is evaluated based on RMSE and model R^2 relative to actuals (see last year’s results here). Categories evaluated for hitters are: AVG, Runs, HR, RBI, and SB, and for pitchers are: Wins, ERA, WHIP, and Strikeouts. RMSE is a popular metric to evaluate forecast accuracy, but I actually prefer R^2. This metric removes average bias (see here) and effectively evaluates forecasted player-by-player variation, making it more useful when attempting to rank players (i.e. for fantasy baseball purposes).

Here are the winners for 2014 for R^2 (more detailed tables are below):

Place	Forecast System	Hitters	Pitchers	Average
1st	Dan Rosenheck	2.80	2.50	2.65
2nd	Steamer	1.60	6.00	3.80
3rd	FanGraphs Fans	5.80	2.75	4.28
4th	Will Larson	6.60	3.00	4.80
5th	AggPro	6.40	4.25	5.33
6th	CBS Sportsline	5.40	8.00	6.70
7th	ESPN	6.60	7.50	7.05
8th	John Grenci		8.00	8.00
9th	ZiPS	9.80	7.25	8.53
10th	Razzball	6.80	10.25	8.53
11th	Rotochamp	8.60	9.00	8.80
12th	Sports Illustrated	8.80	12.00	10.40
13th	Guru	10.60	12.00	11.30
14th	Marcel	11.20	12.50	11.85

And here are the winners for the RMSE portion of the competition:

Place	Forecast System	Hitters	Pitchers	Average
1st	Dan Rosenheck	2.60	2.00	2.30
2nd	Will Larson	3.60	2.50	3.05
3rd	Steamer	1.80	5.00	3.40
4th	AggPro	4.00	3.00	3.50
5th	ZIPS	6.00	5.75	5.88
6th	Guru	4.80	7.25	6.03
7th	Marcel	6.20	8.50	7.35
8th	John Grenci		7.50	7.50
9th	Rotochamp	9.40	9.00	9.20
10th	ESPN	9.20	10.50	9.85
11th	Fangraphs Fans	11.80	8.75	10.28
12th	Razzball	9.40	11.25	10.33
13th	Sports Illustrated	10.60	11.75	11.18
14th	CBS Sportsline	11.60	12.25	11.93

I’m beginning to notice some trends in the results across years. First, systems that include averaging do particularly well. This is pretty well established by now, but it’s always useful to reflect upon. It’s been asked in the past to perform evaluations separating forecasts computed by averaging with those that do not include information from others’ forecasts (more “structural” forecasts). I decided not to do this because the nature of the baseball forecasting “season” makes it impossible to be sure forecasts are created without taking into account information from others’ forecasts. This can include direct influence (forecasting as a weighted average of others’ forecasts), but can also occur in more subtle ways, such as model selection based on forecasts that others have put forward. Second, FanGraphs Fans are always fascinating to me, and how they can be so biased, but yet contain some of the best unique and relevant information for forecasting player variation. The takeaway from the Fans forecast set is that crowdsourced-averaging works, as long as you can remove the bias in some way, or ignore it by instead focusing on ordinal ranks.

Some additional notes: it would be interesting to decompose these aggregate stats in to rates multiplied by playing time, but it’s difficult to gather all of this for each projection system. Therefore, I focus on top-line output metrics. Also, absolute rankings are presented, but many of these are likely statistically indistinguishable from each other. If someone wants to run Diebold-Mariano tests, you can download the data used in this comparison from bbprojectionproject.com

Thanks for reading, and please submit your projections for next year! Also, as always, I welcome any comments, and I’ll do my best to respond.

R^2 Detailed Tables

system	r	rank	hr	rank	rbi	rank	avg	rank	sb	rank	AVG
AggPro	0.250	6	0.42	9	0.308	8	0.32	1	0.538	8	6.4
Dan Rosenheck	0.296	3	0.45	1	0.340	3	0.3	3	0.568	4	2.8
Steamer	0.376	1	0.45	2	0.393	1	0.31	2	0.572	2	1.6
Will Larson	0.336	2	0.43	6	0.345	2	0.21	13	0.509	10	6.6
Marcel	0.146	12	0.36	12	0.236	12	0.27	8	0.477	12	11.2
ZIPS	0.118	13	0.42	8	0.230	13	0.3	4	0.504	11	9.8
CBS Sportsline	0.278	4	0.44	3	0.320	4	0.25	10	0.542	6	5.4
ESPN	0.241	7	0.43	5	0.317	5	0.29	7	0.532	9	6.6
Razzball	0.239	8	0.43	4	0.314	6	0.24	11	0.553	5	6.8
Rotochamp	0.234	9	0.41	10	0.287	9	0.23	12	0.569	3	8.6
Fangraphs Fans	0.268	5	0.42	7	0.272	10	0.3	6	0.574	1	5.8
Guru	0.186	11	0.33	13	0.263	11	0.3	5	0.476	13	10.6
Sports Illustrated	0.221	10	0.4	11	0.314	7	0.27	9	0.541	7	8.8

system	W	rank	ERA	rank	WHIP	rank	SO	rank	AVG rank
AggPro	0.13	3	0.15	4	0.25	4	0.402	6	4.25
Dan Rosenheck	0.17	1	0.19	2	0.27	2	0.406	5	2.5
Steamer	0.09	6	0.15	3	0.26	3	0.341	12	6
Will Larson	0.16	2	0.19	1	0.24	5	0.413	4	3
Marcel	0.05	14	0.02	13	0.17	9	0.293	14	12.5
ZIPS	0.09	7	0.07	9	0.21	6	0.375	7	7.25
CBS Sportsline	0.1	5	0.08	7	0.15	10	0.359	10	8
ESPN	0.08	10	0.05	11	0.2	7	0.43	2	7.5
Razzball	0.06	13	0.07	8	0.14	12	0.374	8	10.3
Rotochamp	0.08	9	0.06	10	0.17	8	0.359	9	9
Fangraphs Fans	0.11	4	0.08	5	0.28	1	0.435	1	2.75
Guru	0.07	11	0.05	12	0.11	14	0.343	11	12
Sports Illustrated	0.09	8	0.02	14	0.14	13	0.338	13	12
John Grenci	0.07	12	0.08	6	0.15	11	0.42	3	8

RMSE Detailed Tables

system	r	rank	hr	rank	rbi	rank	avg	rank	sb	rank	AVG
AggPro	22.495	4	7.34	4	23.217	4	0.03	4	7.096	4	4
Dan Rosenheck	20.792	3	6.91	1	21.867	2	0.03	5	6.467	2	2.6
Steamer	20.355	2	7.02	2	21.817	1	0.03	3	6.258	1	1.8
Will Larson	20.091	1	7.2	3	22.234	3	0.03	8	6.864	3	3.6
Marcel	23.473	6	7.51	6	23.831	6	0.03	7	7.334	6	6.2
ZIPS	25.380	7	7.43	5	25.662	7	0.03	1	8.048	10	6
CBS Sportsline	25.866	10	8.63	13	26.837	10	0.03	12	8.527	13	11.6
ESPN	25.698	8	8.37	12	26.418	9	0.03	6	8.120	11	9.2
Razzball	25.831	9	8.01	9	27.842	12	0.03	9	7.920	8	9.4
Rotochamp	26.199	11	8	8	25.995	8	0.04	13	7.686	7	9.4
Fangraphs Fans	26.854	13	8.12	10	30.804	13	0.03	11	8.289	12	11.8
Guru	23.187	5	7.58	7	23.608	5	0.03	2	7.198	5	4.8
Sports Illustrated	26.609	12	8.24	11	27.173	11	0.03	10	8.009	9	10.6

system	W	rank	ERA	rank	WHIP	rank	SO	rank	AVG rank
AggPro	4.4	3	1.031	4	0.17	4	47.01	1	3
Dan Rosenheck	4.25	1	1.014	1	0.17	1	47.9	5	2
Steamer	5.02	8	1.030	3	0.17	2	49.45	7	5
Will Larson	4.34	2	1.017	2	0.17	3	47.44	3	2.5
Marcel	4.62	5	1.158	13	0.18	8	50.84	8	8.5
ZIPS	4.78	7	1.101	7	0.17	5	47.85	4	5.75
CBS Sportsline	5.56	13	1.134	11	0.19	11	57.14	14	12.3
ESPN	5.81	14	1.126	10	0.18	7	53.54	11	10.5
Razzball	5.39	12	1.115	8	0.19	12	55.55	13	11.3
Rotochamp	4.71	6	1.138	12	0.18	9	51.81	9	9
Fangraphs Fans	5.29	10	1.123	9	0.17	6	52.57	10	8.75
Guru	4.51	4	1.093	6	0.19	13	48.79	6	7.25
Sports Illustrated	5.33	11	1.176	14	0.18	10	55.32	12	11.8
John Grenci	5.14	9	1.080	5	0.19	14	47.26	2	7.5

Evaluating 2012 Projections

by Will Larson

February 21, 2013

Evaluating 2012 Projections

Hello loyal readers. It’s time for the annual evaluation of last year’s player projections. Last year saw Gore, Snapp, and Highly’s Aggpro forecasts win among hitter projections (http://www.fangraphs.com/community/comparing-2011-hitter-forecasts/) and Baseball Dope win among pitchers http://www.fangraphs.com/community/comparing-2011-pitcher-forecasts/ . In general, projections computed using averages or weighted averages tended to perform best among hitters, while for pitchers, structural models computed using “deep” statistics (k/9, hr/fb%, etc.) did better.

2012 Summary

In 2012, there were 12 projections submitted for hitters and 12 for pitchers (11 submitted projections for both). The evaluation only considers players where every projection system has a projection.

Read the rest of this entry »

Comparing 2011 Pitcher Forecasts

by Will Larson

March 21, 2012

This article is the second of a two part series evaluating 2011 baseball player forecasts. The first looks at hitters and found that forecast averages outperform any particular forecasting system. For pitchers, it appears as though the results are somewhat reversed. Structural forecasts that are computed using “deep” statistics (k/9, hr/fb%, etc.) seem to have done particularly well.

As with the other article, I will look at two main bases of comparison: Root Mean Squared Error both with and without bias. Bias is important to consider because it is easily removed from a forecast and it can mask an otherwise good forecasting approach. For example Fangraphs Fan hitter projections are often quite biased, but are very good at predicting numbers when this bias is removed.

Read the rest of this entry »

Comparing 2011 Hitter Forecasts

by Will Larson

February 27, 2012

This article is an update to the article I wrote last year on Fangraphs.

This year, I’m going to look at the forecasting performance of 12 different baseball player forecasting systems. I will look at two main bases of comparison: Root Mean Squared Error both with and without bias. Bias is important to consider because it is easily removed from a forecast and it can mask an otherwise good forecasting approach. For example, Fangraphs Fan projections are often quite biased, but are very good at predicting numbers when this bias is removed.

Read the rest of this entry »

Comparing 2010 Pitcher Forecasts

by Will Larson

February 15, 2011

In two previous articles, I considered the ability of freely available forecasts to predict hitter performance (part 1 and part 2), and how forecasts can be used to predict player randomness (here). In this article, I look at the performance of the same six forecasts as before (ZIPS, Marcel, CHONE, Fangraphs Fans, ESPN, CBS), but instead look at starting pitchers’ wins, strikeouts, ERA, and WHIP.

Results are quite different than for hitters. ESPN is the clear winner here, with the most accurate forecasts and the ones with the most unique and relevant information. Fangraphs Fan projections are highly biased, as with the hitters, yet they add a large amount of distinct information, and thus are quite useful. Surprisingly, the mechanical forecasts are, for the most part, failures. While ZIPS has the least bias, it is encompassed by other models in every statistic.* Marcel and CHONE are also poor performers with no useful and unique information, but with higher bias.

Read the rest of this entry »

Projecting Uncertainty

by Will Larson

February 4, 2011

This article explores the ability to predict the randomness of players’ performance in 5 standard hitting categories: HRs, Runs, RBIs, SBs, and AVG. There have been efforts to do so by forecasters, most notably by Tango’s “reliability score.” (See Matt Klaassen’s article) I also test the idea that variation among forecasts (among ESPN, CHONE, Fangraphs Fans, ZIPS, Marcel, and CBS Sportsline) can predict player randomness as well.

I find that 1) variance among forecasts is a strong predictor of actual forecast error variance for HRs, Runs, RBIs and Steals, but a weak one for batting average, 2) Tango’s reliability score serves as a weak predictor of all 5 stats, and that 3), the forecast variance information dominates Tango’s measures in all categories but AVG.

Now let’s set up the analysis. Say, for example, that three forecasts say that Player A will hit 19, 20, and 21 home runs, respectively, and Player B will hit 10, 20, and 30 home runs. Does the fact that there is agreement in Player A’s forecast and disagreement in Player B’s provide some information about the randomness of Player A’s eventual performance relative to Player B’s?

To answer this, we need to do a few things first. We need a measure of dispersion of the forecasts. To do this, I define the forecast variance as the variance of the six forecasts for each stat, for each player. If we take the square root of this number, we get the standard deviation of the forecast. So, the standard deviation of the forecasts of Player A’s HRs would be 1, and the standard deviation of the forecasts for Player 2 would be 10.

Next we turn to some regression analysis.* The dependent variable is the absolute error for a particular player’s consensus forecast (defined as the average among the six different forecasts). For both players A and B in the example, this number would be 20. This is my measure for performance randomness. Controlling for the projected counting stats, we can estimate this absolute error as a function of some measure of forecast reliability.

Tango’s reliability score is one such measure, and the forecast standard deviation is another. What we would predict is that Tango’s score (where 0 means least reliable and 1 means most) would have a negative effect on the error. We would also predict that the forecast standard deviation would have a positive effect on the error. Now let’s see what the data tell us:

Runs:

	R absolute error
	[1]	[2]	[3]
R Standard Deviation		0.45	0.44
		(0.27)	(0.32)
R mean forecast	0.05	0.02	0.03
	(0.06)	(0.05)	(0.06)
Tango’s reliability measure	-8.15		-0.59
	(9.09)		(10.60)
Constant	22.94	14.93	15.36

HRs:

	HR absolute error
	[1]	[2]	[3]
HR Standard Deviation		0.82	0.78
		(0.30)	(0.32)
HR mean forecast	0.20	0.12	0.13
	(0.03)	(0.04)	(0.04)
Tango’s reliability measure	-3.26		-0.84
	(2.52)		(2.69)
Constant	5.32	2.31	2.94

RBIs:

	RBI absolute error
	[1]	[2]	[3]
RBI Standard Deviation		0.44	0.34
		(0.28)	(0.31)
RBI mean forecast	0.09	0.05	0.08
	(0.05)	(0.05)	(0.05)
Tango’s reliability measure	-12.52		-7.83
	(9.12)		(10.08)
Constant	23.78	12.66	18.37

SBs:

	SB absolute error
	[1]	[2]	[3]
SB Standard Deviation		0.50	0.41
		(0.24)	(0.27)
SB mean forecast	0.37	0.30	0.31
	(0.03)	(0.04)	(0.04)
Tango’s reliability measure	-3.47		-1.90
	(2.19)		(2.42)
Constant	3.80	0.75	2.30

AVG:

	AVG absolute error
	[1]	[2]	[3]
AVG Standard Deviation		0.567	0.287
		(0.689)	(0.713)
AVG mean forecast	-0.085	-0.107	-0.083
	(0.091)	(0.090)	(0.092)
Tango’s reliability measure	-0.023		-0.022
	(0.014)		(0.015)
Constant	0.069	0.054	0.066

We see that HRs are the statistic for which errors are most easily forecasted, errors for Rs, RBIs, and SBs are moderately forecastable, and errors for AVG are not very forecastable. We see this because of the negative and statistically significant coefficients for Tango’s score and the positive and statistically significant coefficients on the standard deviation measure. In regressions with both measures, the standard deviation measure encompasses Tango’s measure, except in the AVG equation.

So what does this all mean? If you’re looking at rival forecasts, 80% of the standard deviation between the HR forecasts and about 50% of the standard deviation of the forecasts of the other stats is legitimate randomness. This means that you can tell how random a player’s performance will be by the variation in the forecasts, especially home runs. If you don’t have time to compare different forecasts, then Tango’s reliability score is a rough approximation, but a pretty imprecise measure.

*For those of you unfamiliar with regression analysis, imagine a graph of dots and drawing a line through it. Now imagine the graph is 3 or 4 dimensions and doing the same, and the line is drawn such that the (sum of squares of) the distance between the dots and the line is minimized.

Comparing 2010 Hitter Forecasts Part 2: Creating Better Forecasts

by Will Larson

January 27, 2011

In Part 1 of this article, I looked at the ability of individual projection systems to forecast hitter performance. The six different projection systems considered are Zips, CHONE, Marcel, CBS Sportsline, ESPN, and Fangraphs Fans, and each is freely available online. It turns out that when we control for bias in the forecasts, each of the forecasting systems is, on average, pretty much the same. In what follows here, I show that the Fangraphs Fan projections and the Marcel projections contain the most unique, useful information. Also, I show that a weighted average of the six forecasts predicts hitter performance much better than any individual projection.

Forecast encompassing tests can be used to determine which of a set of individual projections contain the most valuable information. Based on the forecast encompassing test results, we can calculate a forecast that is a weighted average of the six forecasts that will outperform any individual forecast.

Read the rest of this entry »

Comparing 2010 Hitter Forecasts Part 1: Which System is the Best?

by Will Larson

January 26, 2011

There are a number of published baseball player forecasts that are freely available and online. As Dave Allen notes in his article on Fangraphs Fan Projections, and what I find as well, is that some projections are definitely better than others. Part 1 of this article examines the overall fit of each of six different player forecasts: Zips, CHONE, Marcel, CBS Sportsline, ESPN, and Fangraphs Fans. What I find is that the Marcel projections are the best based on average error, followed by the Zips and CHONE projections. However, if we control for the over-optimism of each of these projection systems, each of the forecasts are virtually indistinguishable.

This second result is important in that it requires us to dig a little deeper to see how much each of these forecasts is actually helping to predict player performance. This is addressed in Part 2 of this article.

The tool that is generally used to compare the average fit of a set of forecasts is Root Mean Squared Forecasting Error (RMSFE). This measure is imperfect in that it doesn’t consider the relative value of an over-projection versus and under-projection; for example, in earlier rounds of a fantasy draft we may be drafting to limit risk while in later rounds we may be seeking risk. That being said, RMSE is pretty easy to understand and is thus the standard for comparing average fit of a projection.

Table 1 shows the RMSFE of each of the projection systems in each of the main five fantasy categories for hitters. Here, we see that each of the “mechanical” projection systems (Marcel, Zips, and CHONE) are the best compared to the three “human” projections. The value is the standard deviation of the error of a particular forecast. In other words, 2/3rds of the time, a player projected by Marcel to score 100 runs will score between 75 and 125 runs.

Table 1. Root Mean Squared Forecasting Error

	Runs	HRs	RBIs	SBs	AVG
Marcel	24.43	7.14	23.54	7.37	0.0381
Zips	25.59	7.47	26.23	7.63	0.0368
CHONE	25.35	7.35	24.12	7.26	0.0369
Fangraphs Fans	29.24	7.98	32.91	7.61	0.0396
ESPN	26.58	8.20	26.32	7.28	0.0397
CBS	27.43	8.36	27.79	7.55	0.0388

Another measure that is important is bias. Bias occurs when a projection consistently over or under predicts. Bias inflates the MSFE, so a simple bias correction may improve a forecast’s fit substantially. In Table 2, we see that the human projection systems exhibit substantially more bias than the mechanical ones.

Table 2. Average Bias

	Runs	HRs	RBIs	SBs	AVG
Marcel	7.12	2.09	5.82	1.16	0.0155
Zips	11.24	2.55	11.62	0.73	0.0138
CHONE	10.75	2.67	9.14	0.61	0.0140
Fangraphs Fans	17.75	4.03	23.01	2.80	0.0203
ESPN	13.26	3.78	11.59	1.42	0.0173
CBS	15.09	4.08	14.17	2.05	0.0173

We can get a better picture about which forecasting system is best by correcting for bias in the individual forecasts. Table 3 presents the results of bias corrected RMSFEs. What we see here is a tightening in the results of the forecasts across each of the forecasting systems. Here, we see that each forecasting system is about the same.

Table 3. Bias-corrected Root Mean Squared Forecasting Error

	Runs	HRs	RBIs	SBs	AVG
Marcel	23.36	6.83	22.81	7.28	0.0348
Zips	22.98	7.02	23.52	7.59	0.0341
CHONE	22.96	6.85	22.33	7.24	0.0341
Fangraphs Fans	23.24	6.88	23.53	7.08	0.0340
ESPN	23.03	7.27	23.62	7.14	0.0357
CBS	22.91	7.29	23.90	7.27	0.0347

So where does this leave us if each of these six forecasts are basically indistinguishable? As it turns out, evaluating the performance of individual forecasts doesn’t tell the whole story. It may be true that there is useful information in each of the different forecasting systems, so that an average or a weighted average of forecasts may prove to be a better predictor than any individual forecast. Part 2 of this article examines this in some detail. Stay tuned!

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG

Author Archive