Author Archive

2014 Projection Review (Updated)

Update: The previous version of this post, published last week, contained a data error that has now been fixed. Steamer/Razzball and Pod projections have been added and the hitter sample has been corrected from the prior version of this article.

Welcome to my 5th annual forecast review.  Each year, every projection submitted to me at http://www.bbprojectionproject.com is tested for error (RMSE), overall predictive power (R^2), and is then ranked.  I present both RMSE and R^2 because both have their uses. RMSE is a standard measure of forecast error, but this metric penalizes general optimism/pessimism about the run environment, even if a forecast has low error after controlling for the bias. For instance, Marcel is very good at predicting the run environment and the FanGraphs Fans are pretty terrible, so Marcel will usually have a better RMSE than the Fans. On the other hand, R^2 serves as a better test of the relative performance of players by ignoring any general biases in the forecasts that are pervasive in the forecasting system. Marcel tends to be lower in this metric versus other systems due to its rigid formula, whereas more sophisticated methods like ZIPS or Steamer tend to do better.

Comparisons are based on the set of players that every system projected. This amounts to 70 pitchers and 141 hitters for 2014. This is certainly limiting, but there is an inherent tradeoff in the number of projection systems that can be analyzed vs. the number of players that are projected by all systems. My policy is to consider as many projection systems as possible, as long as the number of players doesn’t get too low.

Now, on to the contest!

This year certainly saw some interesting results.  By the R^2 metric, the best forecaster for hitters (Dan Rosenheck) only published forecasts for hitter categories–evidently there’s some benefit in specialization when it comes to projecting baseball players. The best pitcher forecasts came from Mike Podhorzer’s Pod forecasts.  The best composite score came from my own personal forecast brew, which is computed based on an algorithm that estimates weights of other main-line forecasts. In a sense, this is not an original forecast, so I now note forecasts that I know use other forecasts as inputs with an “*” (I realize that to some degree, most everyone calibrates their forecasts to what they see other people doing). The next two forecasts are also of this same type, the AggPro and the Steamer/Razzball forecasts. The top “structural” forecast was Pod, followed by ZIPS, Rotovalue, and CBS.

In terms of RMSE, Dan Rosenheck ran away with the hitters, and my weighted average did the best among pitchers.  The top overall performers across categories were MORPS, Marcel, Rotovalue, and AggPro.

Overall, there are a few interesting comparisons to be made between projection systems across different years. Among the open-source stats community, Steamer vs ZIPS is always interesting to watch. In prior years, Steamer has been better. This year, however, ZIPS made huge gains and beat Steamer.  Marcel, had a typical year—with a very favorable ranking on RMSE but not R^2. The FanGraphs Fans had a down year, finishing near the bottom in most metrics.  CBS Sportsline is the top forecast by a major media company, which in general, tend to do poorly. Finally, most every projection submitted beat the naïve previous-season benchmark, where the 2014 forecast is simply the actual performance in 2013.  At least we’re all doing something right.

Thank you again to all who submitted projections. I invite anyone who is interested to submit their top-line hitter and pitcher projections to me at larsonwd@gmail.com.  You projection will be put up on http://www.bbprojectionproject.com as soon as I receive it, unless you want me to embargo it until the end of the season, which some people choose to do because of fantasy baseball or other proprietary reason.  All the code (STATA) and data for these evaluations are available upon request. If I’m using the wrong versions of anyone’s projections (which can happen!), please let me know.

 

R^2 Rankings:

Place Forecast System Hitters Pitchers Average
N/A Dan Rosenheck* 1.60 1.60
N/A Beans 5.00 5.00
1st Will Larson* 6.60 5.25 5.93
2nd AggPro* 8.40 6.25 7.33
3rd Steamer/Razzball* 6.20 9.00 7.60
4th Pod 11.20 4.75 7.98
5th ZIPS 10.00 7.25 8.63
6th Rotovalue 9.00 8.25 8.63
7th CBS Sportsline 10.20 8.00 9.10
8th ESPN 9.40 10.50 9.95
9th Steamer 9.60 11.50 10.55
10th Fangraphs Fans 13.60 9.00 11.30
11th Rotochamp 7.60 15.25 11.43
12th Razzball 11.60 12.25 11.93
13th MORPS 13.20 11.00 12.10
14th Clay Davenport 14.60 11.50 13.05
15th Cairo 8.20 18.00 13.10
16th Marcel 16.60 10.00 13.30
17th Bayesball 9.80 20.50 15.15
18th Guru 16.80 14.00 15.40
19th Oliver 16.40 15.00 15.70
20th Prior Season 20.40 18.75 19.58

 

RMSE Rankings:

Place System Hitters Pitchers Average
N/A Dan Rosenheck* 1.40 1.40
1st MORPS 4.20 8.50 6.35
N/A Beans 6.50 6.50
2nd Marcel 8.00 7.00 7.50
3rd Rotovalue 8.60 7.00 7.80
4th AggPro* 7.60 8.25 7.93
5th ZIPS 9.60 7.75 8.68
6th Clay Davenport 6.60 10.75 8.68
7th Steamer 7.80 11.00 9.40
8th Cairo 4.80 14.00 9.40
9th Steamer/Razzball* 9.80 10.00 9.90
10th Will Larson* 15.60 4.75 10.18
11th Guru 7.80 13.00 10.40
12th Rotochamp 10.20 11.50 10.85
13th Bayesball 7.20 15.25 11.23
14th Pod 15.80 8.75 12.28
15th Razzball 16.20 13.00 14.60
16th Oliver 14.40 15.25 14.83
17th ESPN 18.40 11.50 14.95
18th CBS Sportsline 17.40 13.50 15.45
19th Fangraphs Fans 19.40 13.25 16.33
20th Prior Season 20.00 20.50 20.25

 

RMSE, Hitters:

system r rank hr rank rbi rank avg rank sb rank AVG
Dan Rosenheck* 19.22 1 7.07 1 20.91 1 0.024 2 6.24 2 1.40
MORPS 20.56 2 7.70 3 22.35 2 0.027 13 6.13 1 4.20
Cairo 21.55 3 7.87 6 22.53 3 0.025 9 6.30 3 4.80
Clay Davenport 21.91 6 7.92 7 23.74 8 0.025 8 6.33 4 6.60
Bayesball 22.47 9 8.24 10 24.03 10 0.022 1 6.39 6 7.20
AggPro* 22.64 12 8.23 9 23.34 6 0.024 3 6.42 8 7.60
Steamer 22.58 10 8.22 8 23.37 7 0.025 7 6.41 7 7.80
Guru 22.62 11 7.74 4 23.76 9 0.025 6 6.88 9 7.80
Marcel 21.67 4 7.62 2 22.76 4 0.027 16 7.04 14 8.00
Rotovalue 22.03 7 7.77 5 23.02 5 0.026 10 7.07 16 8.60
ZIPS 22.11 8 8.46 11 25.30 14 0.024 4 6.94 11 9.60
Steamer/Razzball* 23.87 13 8.73 13 24.75 13 0.024 5 6.35 5 9.80
Rotochamp 21.73 5 8.49 12 24.60 12 0.026 12 6.93 10 10.20
Oliver 24.67 16 9.26 18 26.86 16 0.026 11 6.94 11 14.40
Will Larson* 24.88 17 8.75 14 24.37 11 0.029 19 7.08 17 15.60
Pod 24.23 14 9.10 16 26.54 15 0.035 21 7.04 13 15.80
Razzball 24.57 15 8.90 15 27.45 19 0.027 14 7.14 18 16.20
CBS Sportsline 26.28 19 9.94 21 26.90 17 0.027 15 7.06 15 17.40
ESPN 25.88 18 9.88 20 27.25 18 0.028 17 7.32 19 18.40
Fangraphs Fans 27.20 21 9.24 17 28.98 21 0.029 18 7.62 20 19.40
Prior Season 26.56 20 9.39 19 28.77 20 0.033 20 7.84 21 20.00

 

R^2, Hitters:

system r rank hr rank rbi rank avg rank sb rank AVG
Dan Rosenheck* 0.267 1 0.329 1 0.181 1 0.373 2 0.679 3 1.60
Steamer/Razzball* 0.143 12 0.270 5 0.150 8 0.325 5 0.689 1 6.20
Will Larson* 0.162 10 0.263 8 0.165 5 0.320 6 0.676 4 6.60
Rotochamp 0.227 2 0.268 7 0.127 15 0.293 9 0.675 5 7.60
Cairo 0.166 7 0.259 10 0.165 4 0.288 12 0.659 8 8.20
AggPro* 0.129 15 0.269 6 0.141 11 0.352 3 0.660 7 8.40
Rotovalue 0.164 8 0.272 3 0.167 2 0.278 14 0.574 18 9.00
ESPN 0.166 6 0.253 12 0.166 3 0.273 16 0.656 10 9.40
Steamer 0.130 14 0.260 9 0.135 12 0.317 7 0.661 6 9.60
Bayesball 0.144 11 0.235 17 0.148 9 0.424 1 0.655 11 9.80
ZIPS 0.180 4 0.244 14 0.124 16 0.347 4 0.652 12 10.00
CBS Sportsline 0.162 9 0.243 15 0.151 7 0.266 18 0.682 2 10.20
Pod 0.183 3 0.271 4 0.128 14 0.111 21 0.641 14 11.20
Razzball 0.128 16 0.281 2 0.159 6 0.256 19 0.639 15 11.60
MORPS 0.174 5 0.217 19 0.132 13 0.288 13 0.636 16 13.20
Fangraphs Fans 0.103 19 0.255 11 0.116 18 0.289 11 0.657 9 13.60
Clay Davenport 0.134 13 0.237 16 0.143 10 0.271 17 0.622 17 14.60
Oliver 0.065 21 0.223 18 0.101 20 0.289 10 0.648 13 16.40
Marcel 0.119 17 0.250 13 0.122 17 0.275 15 0.515 21 16.60
Guru 0.118 18 0.210 20 0.109 19 0.311 8 0.555 19 16.80
Prior Season 0.094 20 0.206 21 0.093 21 0.197 20 0.525 20 20.40

 

RMSE, Pitchers:

system W rank ERA rank WHIP rank SO rank AVG
Will Larson* 4.77 2 0.992 6 0.148 10 56.62 1 4.75
Beans 4.82 4 0.983 3 0.148 11 58.88 8 6.50
Marcel 4.90 8 1.003 11 0.143 4 57.93 5 7.00
Rotovalue 4.83 6 0.978 2 0.151 17 57.26 3 7.00
ZIPS 5.06 15 0.965 1 0.139 1 60.06 14 7.75
AggPro* 4.94 9 0.992 7 0.144 7 59.18 10 8.25
MORPS 4.71 1 1.026 18 0.149 13 56.69 2 8.50
Pod 4.82 5 0.995 10 0.144 8 59.75 12 8.75
Steamer/Razzball* 4.89 7 1.004 12 0.150 15 58.20 6 10.00
Clay Davenport 4.78 3 1.015 15 0.148 12 59.80 13 10.75
Steamer 4.94 10 1.006 14 0.150 16 57.89 4 11.00
ESPN 5.40 18 0.994 8 0.141 3 63.31 17 11.50
Rotochamp 5.04 14 0.989 4 0.145 9 64.18 19 11.50
Razzball 5.25 17 0.990 5 0.149 14 62.89 16 13.00
Guru 4.96 12 1.055 19 0.144 6 61.96 15 13.00
Fangraphs Fans 5.56 20 1.005 13 0.141 2 64.09 18 13.25
CBS Sportsline 5.47 19 0.995 9 0.143 5 67.18 21 13.50
Cairo 4.96 11 1.022 17 0.170 21 58.76 7 14.00
Oliver 5.12 16 1.019 16 0.151 18 59.73 11 15.25
Bayesball 5.04 13 1.082 20 0.163 19 59.11 9 15.25
Prior Season 5.64 21 1.157 21 0.169 20 64.99 20 20.50

 

R^2 Pitchers:

system W rank ERA rank WHIP rank SO rank AVG
Pod 0.229 1 0.174 9 0.302 5 0.134 4 4.75
Beans 0.184 5 0.196 3 0.269 10 0.136 2 5.00
Will Larson* 0.194 3 0.199 2 0.269 11 0.133 5 5.25
AggPro* 0.190 4 0.190 6 0.287 7 0.121 8 6.25
ZIPS 0.137 12 0.207 1 0.331 2 0.102 14 7.25
CBS Sportsline 0.222 2 0.176 8 0.330 3 0.079 19 8.00
Rotovalue 0.158 9 0.183 7 0.242 16 0.179 1 8.25
Fangraphs Fans 0.122 16 0.161 13 0.372 1 0.125 6 9.00
Steamer/Razzball* 0.167 8 0.192 4 0.254 14 0.111 10 9.00
Marcel 0.137 13 0.146 14 0.302 6 0.122 7 10.00
ESPN 0.146 11 0.171 11 0.309 4 0.101 16 10.50
MORPS 0.181 6 0.112 18 0.236 17 0.134 3 11.00
Steamer 0.128 15 0.192 5 0.254 13 0.104 13 11.50
Clay Davenport 0.177 7 0.120 15 0.252 15 0.117 9 11.50
Razzball 0.154 10 0.174 10 0.257 12 0.097 17 12.25
Guru 0.115 17 0.106 19 0.281 9 0.109 11 14.00
Oliver 0.133 14 0.119 16 0.225 18 0.107 12 15.00
Rotochamp 0.079 20 0.170 12 0.283 8 0.037 21 15.25
Cairo 0.115 18 0.118 17 0.178 19 0.097 18 18.00
Prior Season 0.088 19 0.028 21 0.164 20 0.102 15 18.75
Bayesball 0.077 21 0.103 20 0.159 21 0.060 20 20.50

 


Evaluating 2013 Projections

Welcome to the 3rd annual forecast competition, where each forecaster who submits projections to bbprojectionproject.com is evaluated based on RMSE and model R^2 relative to actuals (see last year’s results here).  Categories evaluated for hitters are: AVG, Runs, HR, RBI, and SB, and for pitchers are: Wins, ERA, WHIP, and Strikeouts. RMSE is a popular metric to evaluate forecast accuracy, but I actually prefer R^2.  This metric removes average bias (see here) and effectively evaluates forecasted player-by-player variation, making it more useful when attempting to rank players (i.e. for fantasy baseball purposes).

Here are the winners for 2014 for R^2 (more detailed tables are below):

Place
Forecast System
Hitters
Pitchers
Average
1st
Dan Rosenheck
2.80
2.50
2.65
2nd
Steamer
1.60
6.00
3.80
3rd
FanGraphs Fans
5.80
2.75
4.28
4th
Will Larson
6.60
3.00
4.80
5th
AggPro
6.40
4.25
5.33
6th
CBS Sportsline
5.40
8.00
6.70
7th
ESPN
6.60
7.50
7.05
8th
John Grenci
8.00
8.00
9th
ZiPS
9.80
7.25
8.53
10th
Razzball
6.80
10.25
8.53
11th
Rotochamp
8.60
9.00
8.80
12th
Sports Illustrated
8.80
12.00
10.40
13th
Guru
10.60
12.00
11.30
14th
Marcel
11.20
12.50
11.85

 

And here are the winners for the RMSE portion of the competition:

Place
Forecast System
Hitters
Pitchers
Average
1st
Dan Rosenheck
2.60
2.00
2.30
2nd
Will Larson
3.60
2.50
3.05
3rd
Steamer
1.80
5.00
3.40
4th
AggPro
4.00
3.00
3.50
5th
ZIPS
6.00
5.75
5.88
6th
Guru
4.80
7.25
6.03
7th
Marcel
6.20
8.50
7.35
8th
John Grenci
7.50
7.50
9th
Rotochamp
9.40
9.00
9.20
10th
ESPN
9.20
10.50
9.85
11th
Fangraphs Fans
11.80
8.75
10.28
12th
Razzball
9.40
11.25
10.33
13th
Sports Illustrated
10.60
11.75
11.18
14th
CBS Sportsline
11.60
12.25
11.93

 

I’m beginning to notice some trends in the results across years.  First, systems that include averaging do particularly well.  This is pretty well established by now, but it’s always useful to reflect upon.  It’s been asked in the past to perform evaluations separating forecasts computed by averaging with those that do not include information from others’ forecasts (more “structural” forecasts). I decided not to do this because the nature of the baseball forecasting “season” makes it impossible to be sure forecasts are created without taking into account information from others’ forecasts. This can include direct influence (forecasting as a weighted average of others’ forecasts), but can also occur in more subtle ways, such as model selection based on forecasts that others have put forward.  Second, FanGraphs Fans are always fascinating to me, and how they can be so biased, but yet contain some of the best unique and relevant information for forecasting player variation. The takeaway from the Fans forecast set is that crowdsourced-averaging works, as long as you can remove the bias in some way, or ignore it by instead focusing on ordinal ranks.

Some additional notes: it would be interesting to decompose these aggregate stats in to rates multiplied by playing time, but it’s difficult to gather all of this for each projection system. Therefore, I focus on top-line output metrics.  Also, absolute rankings are presented, but many of these are likely statistically indistinguishable from each other.  If someone wants to run Diebold-Mariano tests, you can download the data used in this comparison from bbprojectionproject.com

Thanks for reading, and please submit your projections for next year! Also, as always, I welcome any comments, and I’ll do my best to respond.

R^2 Detailed Tables

system
r
rank
hr
rank
rbi
rank
avg
rank
sb
rank
AVG
AggPro
0.250
6
0.42
9
0.308
8
0.32
1
0.538
8
6.4
Dan Rosenheck
0.296
3
0.45
1
0.340
3
0.3
3
0.568
4
2.8
Steamer
0.376
1
0.45
2
0.393
1
0.31
2
0.572
2
1.6
Will Larson
0.336
2
0.43
6
0.345
2
0.21
13
0.509
10
6.6
Marcel
0.146
12
0.36
12
0.236
12
0.27
8
0.477
12
11.2
ZIPS
0.118
13
0.42
8
0.230
13
0.3
4
0.504
11
9.8
CBS Sportsline
0.278
4
0.44
3
0.320
4
0.25
10
0.542
6
5.4
ESPN
0.241
7
0.43
5
0.317
5
0.29
7
0.532
9
6.6
Razzball
0.239
8
0.43
4
0.314
6
0.24
11
0.553
5
6.8
Rotochamp
0.234
9
0.41
10
0.287
9
0.23
12
0.569
3
8.6
Fangraphs Fans
0.268
5
0.42
7
0.272
10
0.3
6
0.574
1
5.8
Guru
0.186
11
0.33
13
0.263
11
0.3
5
0.476
13
10.6
Sports Illustrated
0.221
10
0.4
11
0.314
7
0.27
9
0.541
7
8.8

 

system
W
rank
ERA
rank
WHIP
rank
SO
rank
AVG rank
AggPro
0.13
3
0.15
4
0.25
4
0.402
6
4.25
Dan Rosenheck
0.17
1
0.19
2
0.27
2
0.406
5
2.5
Steamer
0.09
6
0.15
3
0.26
3
0.341
12
6
Will Larson
0.16
2
0.19
1
0.24
5
0.413
4
3
Marcel
0.05
14
0.02
13
0.17
9
0.293
14
12.5
ZIPS
0.09
7
0.07
9
0.21
6
0.375
7
7.25
CBS Sportsline
0.1
5
0.08
7
0.15
10
0.359
10
8
ESPN
0.08
10
0.05
11
0.2
7
0.43
2
7.5
Razzball
0.06
13
0.07
8
0.14
12
0.374
8
10.3
Rotochamp
0.08
9
0.06
10
0.17
8
0.359
9
9
Fangraphs Fans
0.11
4
0.08
5
0.28
1
0.435
1
2.75
Guru
0.07
11
0.05
12
0.11
14
0.343
11
12
Sports Illustrated
0.09
8
0.02
14
0.14
13
0.338
13
12
John Grenci

0.07

12

0.08

6

0.15

11

0.42

3

8

 

RMSE Detailed Tables

system
r
rank
hr
rank
rbi
rank
avg
rank
sb
rank
AVG
AggPro
22.495
4
7.34
4
23.217
4
0.03
4
7.096
4
4
Dan Rosenheck
20.792
3
6.91
1
21.867
2
0.03
5
6.467
2
2.6
Steamer
20.355
2
7.02
2
21.817
1
0.03
3
6.258
1
1.8
Will Larson
20.091
1
7.2
3
22.234
3
0.03
8
6.864
3
3.6
Marcel
23.473
6
7.51
6
23.831
6
0.03
7
7.334
6
6.2
ZIPS
25.380
7
7.43
5
25.662
7
0.03
1
8.048
10
6
CBS Sportsline
25.866
10
8.63
13
26.837
10
0.03
12
8.527
13
11.6
ESPN
25.698
8
8.37
12
26.418
9
0.03
6
8.120
11
9.2
Razzball
25.831
9
8.01
9
27.842
12
0.03
9
7.920
8
9.4
Rotochamp
26.199
11
8
8
25.995
8
0.04
13
7.686
7
9.4
Fangraphs Fans
26.854
13
8.12
10
30.804
13
0.03
11
8.289
12
11.8
Guru
23.187
5
7.58
7
23.608
5
0.03
2
7.198
5
4.8
Sports Illustrated
26.609
12
8.24
11
27.173
11
0.03
10
8.009
9
10.6

 

system
W
rank
ERA
rank
WHIP
rank
SO
rank
AVG rank
AggPro
4.4
3
1.031
4
0.17
4
47.01
1
3
Dan Rosenheck
4.25
1
1.014
1
0.17
1
47.9
5
2
Steamer
5.02
8
1.030
3
0.17
2
49.45
7
5
Will Larson
4.34
2
1.017
2
0.17
3
47.44
3
2.5
Marcel
4.62
5
1.158
13
0.18
8
50.84
8
8.5
ZIPS
4.78
7
1.101
7
0.17
5
47.85
4
5.75
CBS Sportsline
5.56
13
1.134
11
0.19
11
57.14
14
12.3
ESPN
5.81
14
1.126
10
0.18
7
53.54
11
10.5
Razzball
5.39
12
1.115
8
0.19
12
55.55
13
11.3
Rotochamp
4.71
6
1.138
12
0.18
9
51.81
9
9
Fangraphs Fans
5.29
10
1.123
9
0.17
6
52.57
10
8.75
Guru
4.51
4
1.093
6
0.19
13
48.79
6
7.25
Sports Illustrated
5.33
11
1.176
14
0.18
10
55.32
12
11.8
John Grenci
5.14
9
1.080
5
0.19
14
47.26
2
7.5

 


Evaluating 2012 Projections

Evaluating 2012 Projections

Hello loyal readers.  It’s time for the annual evaluation of last year’s player projections.  Last year saw Gore, Snapp, and Highly’s Aggpro forecasts win among hitter projections (http://www.fangraphs.com/community/comparing-2011-hitter-forecasts/) and Baseball Dope win among pitchers http://www.fangraphs.com/community/comparing-2011-pitcher-forecasts/ .  In general, projections computed using averages or weighted averages tended to perform best among hitters, while for pitchers, structural models computed using “deep” statistics (k/9, hr/fb%, etc.) did better.

2012 Summary

In 2012, there were 12 projections submitted for hitters and 12 for pitchers (11 submitted projections for both).  The evaluation only considers players where every projection system has a projection.

Read the rest of this entry »


Comparing 2011 Pitcher Forecasts

This article is the second of a two part series evaluating 2011 baseball player forecasts. The first looks at hitters and found that forecast averages outperform any particular forecasting system. For pitchers, it appears as though the results are somewhat reversed. Structural forecasts that are computed using “deep” statistics (k/9, hr/fb%, etc.) seem to have done particularly well.

As with the other article, I will look at two main bases of comparison: Root Mean Squared Error both with and without bias. Bias is important to consider because it is easily removed from a forecast and it can mask an otherwise good forecasting approach. For example Fangraphs Fan hitter projections are often quite biased, but are very good at predicting numbers when this bias is removed.

Read the rest of this entry »


Comparing 2011 Hitter Forecasts

This article is an update to the article I wrote last year on Fangraphs.

This year, I’m going to look at the forecasting performance of 12 different baseball player forecasting systems. I will look at two main bases of comparison: Root Mean Squared Error both with and without bias. Bias is important to consider because it is easily removed from a forecast and it can mask an otherwise good forecasting approach. For example, Fangraphs Fan projections are often quite biased, but are very good at predicting numbers when this bias is removed.

Read the rest of this entry »


Comparing 2010 Pitcher Forecasts

In two previous articles, I considered the ability of freely available forecasts to predict hitter performance (part 1 and part 2), and how forecasts can be used to predict player randomness (here).  In this article, I look at the performance of the same six forecasts as before (ZIPS, Marcel, CHONE, Fangraphs Fans, ESPN, CBS), but instead look at starting pitchers’ wins, strikeouts, ERA, and WHIP.

Results are quite different than for hitters. ESPN is the clear winner here, with the most accurate forecasts and the ones with the most unique and relevant information. Fangraphs Fan projections are highly biased, as with the hitters, yet they add a large amount of distinct information, and thus are quite useful.  Surprisingly, the mechanical forecasts are, for the most part, failures. While ZIPS has the least bias, it is encompassed by other models in every statistic.*  Marcel and CHONE are also poor performers with no useful and unique information, but with higher bias.

Read the rest of this entry »


Projecting Uncertainty

This article explores the ability to predict the randomness of players’ performance in 5 standard hitting categories: HRs, Runs, RBIs, SBs, and AVG. There have been efforts to do so by forecasters, most notably by Tango’s “reliability score.” (See Matt Klaassen’s article) I also test the idea that variation among forecasts (among ESPN, CHONE, Fangraphs Fans, ZIPS, Marcel, and CBS Sportsline) can predict player randomness as well.

I find that 1) variance among forecasts is a strong predictor of actual forecast error variance for HRs, Runs, RBIs and Steals, but a weak one for batting average, 2) Tango’s reliability score serves as a weak predictor of all 5 stats, and that 3), the forecast variance information dominates Tango’s measures in all categories but AVG.

Now let’s set up the analysis. Say, for example, that three forecasts say that Player A will hit 19, 20, and 21 home runs, respectively, and Player B will hit 10, 20, and 30 home runs. Does the fact that there is agreement in Player A’s forecast and disagreement in Player B’s provide some information about the randomness of Player A’s eventual performance relative to Player B’s?

To answer this, we need to do a few things first. We need a measure of dispersion of the forecasts. To do this, I define the forecast variance as the variance of the six forecasts for each stat, for each player.  If we take the square root of this number, we get the standard deviation of the forecast. So, the standard deviation of the forecasts of Player A’s HRs would be 1, and the standard deviation of the forecasts for Player 2 would be 10.

Next we turn to some regression analysis.* The dependent variable is the absolute error for a particular player’s consensus forecast (defined as the average among the six different forecasts). For both players A and B in the example, this number would be 20. This is my measure for performance randomness. Controlling for the projected counting stats, we can estimate this absolute error as a function of some measure of forecast reliability.

Tango’s reliability score is one such measure, and the forecast standard deviation is another.  What we would predict is that Tango’s score (where 0 means least reliable and 1 means most) would have a negative effect on the error. We would also predict that the forecast standard deviation would have a positive effect on the error. Now let’s see what the data tell us:

Runs:

R absolute error
[1] [2] [3]
R Standard Deviation 0.45 0.44
(0.27) (0.32)
R mean forecast 0.05 0.02 0.03
(0.06) (0.05) (0.06)
Tango’s reliability measure -8.15 -0.59
(9.09) (10.60)
Constant 22.94 14.93 15.36

HRs:

HR absolute error
[1] [2] [3]
HR Standard Deviation 0.82 0.78
(0.30) (0.32)
HR mean forecast 0.20 0.12 0.13
(0.03) (0.04) (0.04)
Tango’s reliability measure -3.26 -0.84
(2.52) (2.69)
Constant 5.32 2.31 2.94

RBIs:

RBI absolute error
[1] [2] [3]
RBI Standard Deviation 0.44 0.34
(0.28) (0.31)
RBI mean forecast 0.09 0.05 0.08
(0.05) (0.05) (0.05)
Tango’s reliability measure -12.52 -7.83
(9.12) (10.08)
Constant 23.78 12.66 18.37

SBs:

SB absolute error
[1] [2] [3]
SB Standard Deviation 0.50 0.41
(0.24) (0.27)
SB mean forecast 0.37 0.30 0.31
(0.03) (0.04) (0.04)
Tango’s reliability measure -3.47 -1.90
(2.19) (2.42)
Constant 3.80 0.75 2.30

AVG:

AVG absolute error
[1] [2] [3]
AVG Standard Deviation 0.567 0.287
(0.689) (0.713)
AVG mean forecast -0.085 -0.107 -0.083
(0.091) (0.090) (0.092)
Tango’s reliability measure -0.023 -0.022
(0.014) (0.015)
Constant 0.069 0.054 0.066

We see that HRs are the statistic for which errors are most easily forecasted, errors for Rs, RBIs, and SBs are moderately forecastable, and errors for AVG are not very forecastable. We see this because of the negative and statistically significant coefficients for Tango’s score and the positive and statistically significant coefficients on the standard deviation measure.  In regressions with both measures, the standard deviation measure encompasses Tango’s measure, except in the AVG equation.

So what does this all mean? If you’re looking at rival forecasts, 80% of the standard deviation between the HR forecasts and about 50% of the standard deviation of the forecasts of the other stats is legitimate randomness. This means that you can tell how random a player’s performance will be by the variation in the forecasts, especially home runs. If you don’t have time to compare different forecasts, then Tango’s reliability score is a rough approximation, but a pretty imprecise measure.

*For those of you unfamiliar with regression analysis, imagine a graph of dots and drawing a line through it. Now imagine the graph is 3 or 4 dimensions and doing the same, and the line is drawn such that the (sum of squares of) the distance between the dots and the line is minimized.


Comparing 2010 Hitter Forecasts Part 2: Creating Better Forecasts

In Part 1 of this article, I looked at the ability of individual projection systems to forecast hitter performance. The six different projection systems considered are Zips, CHONE, Marcel, CBS Sportsline, ESPN, and Fangraphs Fans, and each is freely available online.  It turns out that when we control for bias in the forecasts, each of the forecasting systems is, on average, pretty much the same.  In what follows here, I show that the Fangraphs Fan projections and the Marcel projections contain the most unique, useful information. Also, I show that a weighted average of the six forecasts predicts hitter performance much better than any individual projection.

Forecast encompassing tests can be used to determine which of a set of individual projections contain the most valuable information. Based on the forecast encompassing test results, we can calculate a forecast that is a weighted average of the six forecasts that will outperform any individual forecast.

Read the rest of this entry »


Comparing 2010 Hitter Forecasts Part 1: Which System is the Best?

There are a number of published baseball player forecasts that are freely available and online.  As Dave Allen notes in his article on Fangraphs Fan Projections, and what I find as well, is that some projections are definitely better than others.  Part 1 of this article examines the overall fit of each of six different player forecasts: Zips, CHONE, Marcel, CBS Sportsline, ESPN, and Fangraphs Fans.  What I find is that the Marcel projections are the best based on average error, followed by the Zips and CHONE projections.  However, if we control for the over-optimism of each of these projection systems, each of the forecasts are virtually indistinguishable.

This second result is important in that it requires us to dig a little deeper to see how much each of these forecasts is actually helping to predict player performance.  This is addressed in Part 2 of this article.

The tool that is generally used to compare the average fit of a set of forecasts is Root Mean Squared Forecasting Error (RMSFE).  This measure is imperfect in that it doesn’t consider the relative value of an over-projection versus and under-projection; for example, in earlier rounds of a fantasy draft we may be drafting to limit risk while in later rounds we may be seeking risk.  That being said, RMSE is pretty easy to understand and is thus the standard for comparing average fit of a projection.

Table 1 shows the RMSFE of each of the projection systems in each of the main five fantasy categories for hitters.  Here, we see that each of the “mechanical” projection systems (Marcel, Zips, and CHONE) are the best compared to the three “human” projections.  The value is the standard deviation of the error of a particular forecast.  In other words, 2/3rds of the time, a player projected by Marcel to score 100 runs will score between 75 and 125 runs.

Table 1. Root Mean Squared Forecasting Error

Runs HRs RBIs SBs AVG
Marcel 24.43 7.14 23.54 7.37 0.0381
Zips 25.59 7.47 26.23 7.63 0.0368
CHONE 25.35 7.35 24.12 7.26 0.0369
Fangraphs Fans 29.24 7.98 32.91 7.61 0.0396
ESPN 26.58 8.20 26.32 7.28 0.0397
CBS 27.43 8.36 27.79 7.55 0.0388

Another measure that is important is bias.  Bias occurs when a projection consistently over or under predicts.  Bias inflates the MSFE, so a simple bias correction may improve a forecast’s fit substantially.  In Table 2, we see that the human projection systems exhibit substantially more bias than the mechanical ones.

Table 2. Average Bias

Runs HRs RBIs SBs AVG
Marcel 7.12 2.09 5.82 1.16 0.0155
Zips 11.24 2.55 11.62 0.73 0.0138
CHONE 10.75 2.67 9.14 0.61 0.0140
Fangraphs Fans 17.75 4.03 23.01 2.80 0.0203
ESPN 13.26 3.78 11.59 1.42 0.0173
CBS 15.09 4.08 14.17 2.05 0.0173

We can get a better picture about which forecasting system is best by correcting for bias in the individual forecasts. Table 3 presents the results of bias corrected RMSFEs. What we see here is a tightening in the results of the forecasts across each of the forecasting systems.  Here, we see that each forecasting system is about the same.

Table 3. Bias-corrected Root Mean Squared Forecasting Error

Runs HRs RBIs SBs AVG
Marcel 23.36 6.83 22.81 7.28 0.0348
Zips 22.98 7.02 23.52 7.59 0.0341
CHONE 22.96 6.85 22.33 7.24 0.0341
Fangraphs Fans 23.24 6.88 23.53 7.08 0.0340
ESPN 23.03 7.27 23.62 7.14 0.0357
CBS 22.91 7.29 23.90 7.27 0.0347

So where does this leave us if each of these six forecasts are basically indistinguishable?  As it turns out, evaluating the performance of individual forecasts doesn’t tell the whole story.  It may be true that there is useful information in each of the different forecasting systems, so that an average or a weighted average of forecasts may prove to be a better predictor than any individual forecast. Part 2 of this article examines this in some detail. Stay tuned!