Evaluating 2013 Projections
Welcome to the 3rd annual forecast competition, where each forecaster who submits projections to bbprojectionproject.com is evaluated based on RMSE and model R^2 relative to actuals (see last year’s results here). Categories evaluated for hitters are: AVG, Runs, HR, RBI, and SB, and for pitchers are: Wins, ERA, WHIP, and Strikeouts. RMSE is a popular metric to evaluate forecast accuracy, but I actually prefer R^2. This metric removes average bias (see here) and effectively evaluates forecasted player-by-player variation, making it more useful when attempting to rank players (i.e. for fantasy baseball purposes).
Here are the winners for 2014 for R^2 (more detailed tables are below):
Place
|
Forecast System
|
Hitters
|
Pitchers
|
Average
|
1st |
Dan Rosenheck |
2.80 |
2.50 |
2.65 |
2nd |
Steamer |
1.60 |
6.00 |
3.80 |
3rd |
FanGraphs Fans |
5.80 |
2.75 |
4.28 |
4th |
Will Larson |
6.60 |
3.00 |
4.80 |
5th |
AggPro |
6.40 |
4.25 |
5.33 |
6th |
CBS Sportsline |
5.40 |
8.00 |
6.70 |
7th |
ESPN |
6.60 |
7.50 |
7.05 |
8th |
John Grenci |
8.00 |
8.00 |
|
9th |
ZiPS |
9.80 |
7.25 |
8.53 |
10th |
Razzball |
6.80 |
10.25 |
8.53 |
11th |
Rotochamp |
8.60 |
9.00 |
8.80 |
12th |
Sports Illustrated |
8.80 |
12.00 |
10.40 |
13th |
Guru |
10.60 |
12.00 |
11.30 |
14th |
Marcel |
11.20 |
12.50 |
11.85 |
And here are the winners for the RMSE portion of the competition:
Place
|
Forecast System
|
Hitters
|
Pitchers
|
Average
|
1st |
Dan Rosenheck |
2.60 |
2.00 |
2.30 |
2nd |
Will Larson |
3.60 |
2.50 |
3.05 |
3rd |
Steamer |
1.80 |
5.00 |
3.40 |
4th |
AggPro |
4.00 |
3.00 |
3.50 |
5th |
ZIPS |
6.00 |
5.75 |
5.88 |
6th |
Guru |
4.80 |
7.25 |
6.03 |
7th |
Marcel |
6.20 |
8.50 |
7.35 |
8th |
John Grenci |
7.50 |
7.50 |
|
9th |
Rotochamp |
9.40 |
9.00 |
9.20 |
10th |
ESPN |
9.20 |
10.50 |
9.85 |
11th |
Fangraphs Fans |
11.80 |
8.75 |
10.28 |
12th |
Razzball |
9.40 |
11.25 |
10.33 |
13th |
Sports Illustrated |
10.60 |
11.75 |
11.18 |
14th |
CBS Sportsline |
11.60 |
12.25 |
11.93 |
I’m beginning to notice some trends in the results across years. First, systems that include averaging do particularly well. This is pretty well established by now, but it’s always useful to reflect upon. It’s been asked in the past to perform evaluations separating forecasts computed by averaging with those that do not include information from others’ forecasts (more “structural” forecasts). I decided not to do this because the nature of the baseball forecasting “season” makes it impossible to be sure forecasts are created without taking into account information from others’ forecasts. This can include direct influence (forecasting as a weighted average of others’ forecasts), but can also occur in more subtle ways, such as model selection based on forecasts that others have put forward. Second, FanGraphs Fans are always fascinating to me, and how they can be so biased, but yet contain some of the best unique and relevant information for forecasting player variation. The takeaway from the Fans forecast set is that crowdsourced-averaging works, as long as you can remove the bias in some way, or ignore it by instead focusing on ordinal ranks.
Some additional notes: it would be interesting to decompose these aggregate stats in to rates multiplied by playing time, but it’s difficult to gather all of this for each projection system. Therefore, I focus on top-line output metrics. Also, absolute rankings are presented, but many of these are likely statistically indistinguishable from each other. If someone wants to run Diebold-Mariano tests, you can download the data used in this comparison from bbprojectionproject.com
Thanks for reading, and please submit your projections for next year! Also, as always, I welcome any comments, and I’ll do my best to respond.
R^2 Detailed Tables
system
|
r
|
rank
|
hr
|
rank
|
rbi
|
rank
|
avg
|
rank
|
sb
|
rank
|
AVG
|
AggPro |
0.250 |
6 |
0.42 |
9 |
0.308 |
8 |
0.32 |
1 |
0.538 |
8 |
6.4 |
Dan Rosenheck |
0.296 |
3 |
0.45 |
1 |
0.340 |
3 |
0.3 |
3 |
0.568 |
4 |
2.8 |
Steamer |
0.376 |
1 |
0.45 |
2 |
0.393 |
1 |
0.31 |
2 |
0.572 |
2 |
1.6 |
Will Larson |
0.336 |
2 |
0.43 |
6 |
0.345 |
2 |
0.21 |
13 |
0.509 |
10 |
6.6 |
Marcel |
0.146 |
12 |
0.36 |
12 |
0.236 |
12 |
0.27 |
8 |
0.477 |
12 |
11.2 |
ZIPS |
0.118 |
13 |
0.42 |
8 |
0.230 |
13 |
0.3 |
4 |
0.504 |
11 |
9.8 |
CBS Sportsline |
0.278 |
4 |
0.44 |
3 |
0.320 |
4 |
0.25 |
10 |
0.542 |
6 |
5.4 |
ESPN |
0.241 |
7 |
0.43 |
5 |
0.317 |
5 |
0.29 |
7 |
0.532 |
9 |
6.6 |
Razzball |
0.239 |
8 |
0.43 |
4 |
0.314 |
6 |
0.24 |
11 |
0.553 |
5 |
6.8 |
Rotochamp |
0.234 |
9 |
0.41 |
10 |
0.287 |
9 |
0.23 |
12 |
0.569 |
3 |
8.6 |
Fangraphs Fans |
0.268 |
5 |
0.42 |
7 |
0.272 |
10 |
0.3 |
6 |
0.574 |
1 |
5.8 |
Guru |
0.186 |
11 |
0.33 |
13 |
0.263 |
11 |
0.3 |
5 |
0.476 |
13 |
10.6 |
Sports Illustrated |
0.221 |
10 |
0.4 |
11 |
0.314 |
7 |
0.27 |
9 |
0.541 |
7 |
8.8 |
system
|
W
|
rank
|
ERA
|
rank
|
WHIP
|
rank
|
SO
|
rank
|
AVG rank
|
AggPro |
0.13 |
3 |
0.15 |
4 |
0.25 |
4 |
0.402 |
6 |
4.25 |
Dan Rosenheck |
0.17 |
1 |
0.19 |
2 |
0.27 |
2 |
0.406 |
5 |
2.5 |
Steamer |
0.09 |
6 |
0.15 |
3 |
0.26 |
3 |
0.341 |
12 |
6 |
Will Larson |
0.16 |
2 |
0.19 |
1 |
0.24 |
5 |
0.413 |
4 |
3 |
Marcel |
0.05 |
14 |
0.02 |
13 |
0.17 |
9 |
0.293 |
14 |
12.5 |
ZIPS |
0.09 |
7 |
0.07 |
9 |
0.21 |
6 |
0.375 |
7 |
7.25 |
CBS Sportsline |
0.1 |
5 |
0.08 |
7 |
0.15 |
10 |
0.359 |
10 |
8 |
ESPN |
0.08 |
10 |
0.05 |
11 |
0.2 |
7 |
0.43 |
2 |
7.5 |
Razzball |
0.06 |
13 |
0.07 |
8 |
0.14 |
12 |
0.374 |
8 |
10.3 |
Rotochamp |
0.08 |
9 |
0.06 |
10 |
0.17 |
8 |
0.359 |
9 |
9 |
Fangraphs Fans |
0.11 |
4 |
0.08 |
5 |
0.28 |
1 |
0.435 |
1 |
2.75 |
Guru |
0.07 |
11 |
0.05 |
12 |
0.11 |
14 |
0.343 |
11 |
12 |
Sports Illustrated |
0.09 |
8 |
0.02 |
14 |
0.14 |
13 |
0.338 |
13 |
12 |
John Grenci |
0.07 |
12 |
0.08 |
6 |
0.15 |
11 |
0.42 |
3 |
8 |
RMSE Detailed Tables
system
|
r
|
rank
|
hr
|
rank
|
rbi
|
rank
|
avg
|
rank
|
sb
|
rank
|
AVG
|
AggPro |
22.495 |
4 |
7.34 |
4 |
23.217 |
4 |
0.03 |
4 |
7.096 |
4 |
4 |
Dan Rosenheck |
20.792 |
3 |
6.91 |
1 |
21.867 |
2 |
0.03 |
5 |
6.467 |
2 |
2.6 |
Steamer |
20.355 |
2 |
7.02 |
2 |
21.817 |
1 |
0.03 |
3 |
6.258 |
1 |
1.8 |
Will Larson |
20.091 |
1 |
7.2 |
3 |
22.234 |
3 |
0.03 |
8 |
6.864 |
3 |
3.6 |
Marcel |
23.473 |
6 |
7.51 |
6 |
23.831 |
6 |
0.03 |
7 |
7.334 |
6 |
6.2 |
ZIPS |
25.380 |
7 |
7.43 |
5 |
25.662 |
7 |
0.03 |
1 |
8.048 |
10 |
6 |
CBS Sportsline |
25.866 |
10 |
8.63 |
13 |
26.837 |
10 |
0.03 |
12 |
8.527 |
13 |
11.6 |
ESPN |
25.698 |
8 |
8.37 |
12 |
26.418 |
9 |
0.03 |
6 |
8.120 |
11 |
9.2 |
Razzball |
25.831 |
9 |
8.01 |
9 |
27.842 |
12 |
0.03 |
9 |
7.920 |
8 |
9.4 |
Rotochamp |
26.199 |
11 |
8 |
8 |
25.995 |
8 |
0.04 |
13 |
7.686 |
7 |
9.4 |
Fangraphs Fans |
26.854 |
13 |
8.12 |
10 |
30.804 |
13 |
0.03 |
11 |
8.289 |
12 |
11.8 |
Guru |
23.187 |
5 |
7.58 |
7 |
23.608 |
5 |
0.03 |
2 |
7.198 |
5 |
4.8 |
Sports Illustrated |
26.609 |
12 |
8.24 |
11 |
27.173 |
11 |
0.03 |
10 |
8.009 |
9 |
10.6 |
system
|
W
|
rank
|
ERA
|
rank
|
WHIP
|
rank
|
SO
|
rank
|
AVG rank
|
AggPro |
4.4 |
3 |
1.031 |
4 |
0.17 |
4 |
47.01 |
1 |
3 |
Dan Rosenheck |
4.25 |
1 |
1.014 |
1 |
0.17 |
1 |
47.9 |
5 |
2 |
Steamer |
5.02 |
8 |
1.030 |
3 |
0.17 |
2 |
49.45 |
7 |
5 |
Will Larson |
4.34 |
2 |
1.017 |
2 |
0.17 |
3 |
47.44 |
3 |
2.5 |
Marcel |
4.62 |
5 |
1.158 |
13 |
0.18 |
8 |
50.84 |
8 |
8.5 |
ZIPS |
4.78 |
7 |
1.101 |
7 |
0.17 |
5 |
47.85 |
4 |
5.75 |
CBS Sportsline |
5.56 |
13 |
1.134 |
11 |
0.19 |
11 |
57.14 |
14 |
12.3 |
ESPN |
5.81 |
14 |
1.126 |
10 |
0.18 |
7 |
53.54 |
11 |
10.5 |
Razzball |
5.39 |
12 |
1.115 |
8 |
0.19 |
12 |
55.55 |
13 |
11.3 |
Rotochamp |
4.71 |
6 |
1.138 |
12 |
0.18 |
9 |
51.81 |
9 |
9 |
Fangraphs Fans |
5.29 |
10 |
1.123 |
9 |
0.17 |
6 |
52.57 |
10 |
8.75 |
Guru |
4.51 |
4 |
1.093 |
6 |
0.19 |
13 |
48.79 |
6 |
7.25 |
Sports Illustrated |
5.33 |
11 |
1.176 |
14 |
0.18 |
10 |
55.32 |
12 |
11.8 |
John Grenci |
5.14 |
9 |
1.080 |
5 |
0.19 |
14 |
47.26 |
2 |
7.5 |
thanks for doing this Will yet again it’s one of the most interesting articles on fangraphs all year
which of these projection systems will be publicly available? seems like the highest performing ones, Rosenheck, AggPro, Larson, are not available, at least not yet
So here’s my question: we know that Fangraphs Fan projections have some systematic biases – at least as of a few years ago, they were too optimistic about performance for almost all players. There may be other systematic biases. What happens to the Fangraphs Fan projections if you correct for the systematic biases?
No problem! In the next several weeks, I’ll begin to put up this year’s projections. It’s a bit early in the season to gather all of them–we still don’t know about all of the pre-season trades/injuries/signings yet. I know Dan Rosenheck’s won’t be publicly available until after the season, but I’ll publish mine.
In terms of the Fan projections, the R^2 ranking shows how good they are when the bias is removed. I haven’t looked at the relative magnitude of the bias over the last few years, so I’d hesitate to guess how much the bias will be this year. I would suggest that the ordering of the players is probably pretty good, and better than most other publicly available sources out there.
Should we infer that FanGraph Fans projections are doing a better job of projecting IP for pitchers and thus are pretty good at projecting Ks? Or am I missing something?
Pitcher rate stats look great for Steamer, but Ks look terrible, so IP projection problems jumped to mind for me.
He’s no ape, Marcel clearly loves RMSE!
Hey, nice analysis Will. I was wondering if there is any easy way for you to test out how well, say, if we combined all the projection systems together, how that composite projection would do with your analysis. Or, say, if you combined the top three public forecasts, your forecast plus Steamer plus ZIPS, how that would have done in the rankings. I assume they would be better, just curious how they would have done.
I was surprised that Oliver was not in the study since their data is in Fangraphs. Why wasn’t it?
Any feel for how PECOTA might stack up?