Comparing 2010 Pitcher Forecasts

In two previous articles, I considered the ability of freely available forecasts to predict hitter performance (part 1 and part 2), and how forecasts can be used to predict player randomness (here).  In this article, I look at the performance of the same six forecasts as before (ZIPS, Marcel, CHONE, Fangraphs Fans, ESPN, CBS), but instead look at starting pitchers’ wins, strikeouts, ERA, and WHIP.

Results are quite different than for hitters. ESPN is the clear winner here, with the most accurate forecasts and the ones with the most unique and relevant information. Fangraphs Fan projections are highly biased, as with the hitters, yet they add a large amount of distinct information, and thus are quite useful.  Surprisingly, the mechanical forecasts are, for the most part, failures. While ZIPS has the least bias, it is encompassed by other models in every statistic.*  Marcel and CHONE are also poor performers with no useful and unique information, but with higher bias.

We see from Table 1 that each forecasting system performs about the same when it comes to forecasting wins. A simple average of the forecasts or an optimally weighted average (see Table 4) does much better.  For strikeouts, the results are similar, with CBS as a bit of an outlier. For ERA, the non-technical forecasts (ESPN, Fans, and CBS) each perform better than the mechanical forecasts. For WHIP, all are about the same, with the Fans at the bottom.

Table 1: RMSE

Wins Ks ERA WHIP
Marcel 2.94 30.43 0.671 0.115
Zips 3.02 29.85 0.684 0.116
CHONE 3.01 28.85 0.678 0.111
Fangraphs Fans 2.95 28.81 0.652 0.125
ESPN 2.97 28.04 0.657 0.107
CBS 3.01 32.28 0.666 0.111
Simple Average 2.73 27.36 0.653 0.108
Weighted Average 2.65 26.48 0.640 0.105

Table 2 shows that, as a whole, bias is only a small part of the forecasting error, unlike for hitters where it can be quite large. The one exception is ERA, where the non-technical forecasts are over-optimistic by 0.05-0.08 points.  There isn’t much to see here, which frankly, is a good thing.

Table 2: Bias

Wins Ks ERA WHIP
Marcel -0.45 -5.13 -0.012 -0.011
Zips 0.17 0.79 0.032 -0.006
CHONE -0.59 -3.31 0.043 0.002
Fangraphs Fans 0.72 6.62 -0.079 0.023
ESPN 0.66 5.37 -0.065 -0.011
CBS 0.86 9.89 -0.055 -0.018
Simple Average 0.23 2.37 -0.023 -0.003
Weighted Average 0.00 0.00 0.000 0.000

Table 3 presents the bias-corrected RMSEs for each stat. The bias correction is done by subtracting the bias from each forecast, then re-computing the forecast errors. Unlike the hitters, where bias corrections mattered quite a bit, they don’t seem to affect the forecast rankings for pitchers. We still see that CBS, ESPN, and the Fangraphs Fans seem to do the best.

Table 3: Bias-corrected RMSE

Wins Ks ERA WHIP
Marcel 2.88 29.74 0.671 0.114
Zips 3.01 29.83 0.683 0.115
CHONE 2.92 28.54 0.676 0.111
Fangraphs Fans 2.80 27.57 0.644 0.122
ESPN 2.85 27.21 0.652 0.106
CBS 2.80 29.78 0.663 0.109
Simple Average 2.71 27.20 0.653 0.108
Weighted Average 2.65 26.48 0.640 0.105

Table 4 shows the optimal forecast weights. These are the result of forecast encompassing tests that recursively drop the forecasts that have the least amount of unique information in them.  By this metric, the mechanical forecasts are nearly worthless.  Marcel, ZIPS, and CHONE forecasts have no unique information for any statistic when compared to the Fans, ESPN, and CBS forecasts. Put another way—if someone had the Fangraphs Fans, ESPN, and CBS forecasts, they couldn’t add any value by adding one of the mechanical forecasts.

Table 4: Optimal forecast weights

Wins Ks ERA WHIP
Marcel 0 0 0 0
Zips 0 0 0 0
CHONE 0 0 0 0
Fangraphs Fans 0.29 0.45 0.88 0
ESPN 0.22 0.55 0.12 0.71
CBS 0.49 0 0 0.29

So what does this article tell us?

1)      ESPN is really good at predicting pitcher performance.

2)      Mechanical forecasts are bad at predicting pitcher performance.

3)      Fangraphs fan projections add a large amount of information that you can’t get anywhere else

Thanks for reading!

Next up; 2011 forecasts!

*for descriptions of some of the technical terms and concepts here, please consult the earlier articles in this series, here and here.





10 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Tyler
13 years ago

Interesting…thanks for the info!

philosofoolmember
13 years ago

ESPN is the best pitcher forecasting system? Are you sure there isn’t some sort of weird selection bias here? How do you determine the player pools that you assess?

By the way, it seems incorrect to me to assess different pools for different systems. You have to pick a player pool and see how each system does with that pool. Then the problem becomes “what do you do with players that a system doesn’t project?” If CBS and ESPN aren’t projecting certain players, I’m worried that they’re getting an unfair edge because the players they aren’t projecting are the hardest to project.

Half Full
13 years ago

Nice, I’ve been looking forward to your pitching weights. Since reading your hitting articles I’ve began compiling a simple average between MARCEL, FANS, ESPN, RotoChamps, CAIRO, and Bill James (ESPN.csv from your site). It looks like I’ll be dropping the mechanical forecasts for pitching thanks to this latest finding. I decided to exclude CBS since their excel format for names was {last name, first name} and unlike all others. Is there any way to get around this? I’ve been using averageif in excel to compile averages to one sheet. One final note: In the comments of your last hitters forecast article you mentioned you’d be posting the weighted projections on your site. Any idea when that will be?

evo34
13 years ago

I like the idea here, and appreciate the work. But the forward-looking conclusions simply aren’t supported by looking at a single year of data. To say, “If someone had the Fangraphs Fans, ESPN, and CBS forecasts, they couldn’t add any value by adding one of the mechanical forecasts.” and “1) ESPN is really good at predicting pitcher performance. 2) Mechanical forecasts are bad at predicting pitcher performance,”
is just reckless. You can say in the past tense that systems X,Y and Z provided no value over the competition in 2010; but in the grand scheme of things, this is a very small sample, and certainly not enough to declare the future value of any system.

Half Full
13 years ago

@Will No problem, I can put those together for you. I’ll try submitting them through your website. RotoChamp projections are available on the FanGraphs projection page already if you hadn’t noticed.

Wade8813
13 years ago

Just to clarify – this only means ESPN did the best in 2010, correct? So they could be worse than the others, but in the year of the pitcher happened to have guessed pitchers the best?

Joel
12 years ago

Will, did you graduate? Wondering what happened to your intriguing 2011 projection comparisons..