Comparing 2010 Pitcher Forecasts
In two previous articles, I considered the ability of freely available forecasts to predict hitter performance (part 1 and part 2), and how forecasts can be used to predict player randomness (here). In this article, I look at the performance of the same six forecasts as before (ZIPS, Marcel, CHONE, Fangraphs Fans, ESPN, CBS), but instead look at starting pitchers’ wins, strikeouts, ERA, and WHIP.
Results are quite different than for hitters. ESPN is the clear winner here, with the most accurate forecasts and the ones with the most unique and relevant information. Fangraphs Fan projections are highly biased, as with the hitters, yet they add a large amount of distinct information, and thus are quite useful. Surprisingly, the mechanical forecasts are, for the most part, failures. While ZIPS has the least bias, it is encompassed by other models in every statistic.* Marcel and CHONE are also poor performers with no useful and unique information, but with higher bias.
We see from Table 1 that each forecasting system performs about the same when it comes to forecasting wins. A simple average of the forecasts or an optimally weighted average (see Table 4) does much better. For strikeouts, the results are similar, with CBS as a bit of an outlier. For ERA, the non-technical forecasts (ESPN, Fans, and CBS) each perform better than the mechanical forecasts. For WHIP, all are about the same, with the Fans at the bottom.
Table 1: RMSE
Wins | Ks | ERA | WHIP | |
Marcel | 2.94 | 30.43 | 0.671 | 0.115 |
Zips | 3.02 | 29.85 | 0.684 | 0.116 |
CHONE | 3.01 | 28.85 | 0.678 | 0.111 |
Fangraphs Fans | 2.95 | 28.81 | 0.652 | 0.125 |
ESPN | 2.97 | 28.04 | 0.657 | 0.107 |
CBS | 3.01 | 32.28 | 0.666 | 0.111 |
Simple Average | 2.73 | 27.36 | 0.653 | 0.108 |
Weighted Average | 2.65 | 26.48 | 0.640 | 0.105 |
Table 2 shows that, as a whole, bias is only a small part of the forecasting error, unlike for hitters where it can be quite large. The one exception is ERA, where the non-technical forecasts are over-optimistic by 0.05-0.08 points. There isn’t much to see here, which frankly, is a good thing.
Table 2: Bias
Wins | Ks | ERA | WHIP | |
Marcel | -0.45 | -5.13 | -0.012 | -0.011 |
Zips | 0.17 | 0.79 | 0.032 | -0.006 |
CHONE | -0.59 | -3.31 | 0.043 | 0.002 |
Fangraphs Fans | 0.72 | 6.62 | -0.079 | 0.023 |
ESPN | 0.66 | 5.37 | -0.065 | -0.011 |
CBS | 0.86 | 9.89 | -0.055 | -0.018 |
Simple Average | 0.23 | 2.37 | -0.023 | -0.003 |
Weighted Average | 0.00 | 0.00 | 0.000 | 0.000 |
Table 3 presents the bias-corrected RMSEs for each stat. The bias correction is done by subtracting the bias from each forecast, then re-computing the forecast errors. Unlike the hitters, where bias corrections mattered quite a bit, they don’t seem to affect the forecast rankings for pitchers. We still see that CBS, ESPN, and the Fangraphs Fans seem to do the best.
Table 3: Bias-corrected RMSE
Wins | Ks | ERA | WHIP | |
Marcel | 2.88 | 29.74 | 0.671 | 0.114 |
Zips | 3.01 | 29.83 | 0.683 | 0.115 |
CHONE | 2.92 | 28.54 | 0.676 | 0.111 |
Fangraphs Fans | 2.80 | 27.57 | 0.644 | 0.122 |
ESPN | 2.85 | 27.21 | 0.652 | 0.106 |
CBS | 2.80 | 29.78 | 0.663 | 0.109 |
Simple Average | 2.71 | 27.20 | 0.653 | 0.108 |
Weighted Average | 2.65 | 26.48 | 0.640 | 0.105 |
Table 4 shows the optimal forecast weights. These are the result of forecast encompassing tests that recursively drop the forecasts that have the least amount of unique information in them. By this metric, the mechanical forecasts are nearly worthless. Marcel, ZIPS, and CHONE forecasts have no unique information for any statistic when compared to the Fans, ESPN, and CBS forecasts. Put another way—if someone had the Fangraphs Fans, ESPN, and CBS forecasts, they couldn’t add any value by adding one of the mechanical forecasts.
Table 4: Optimal forecast weights
Wins | Ks | ERA | WHIP | |
Marcel | 0 | 0 | 0 | 0 |
Zips | 0 | 0 | 0 | 0 |
CHONE | 0 | 0 | 0 | 0 |
Fangraphs Fans | 0.29 | 0.45 | 0.88 | 0 |
ESPN | 0.22 | 0.55 | 0.12 | 0.71 |
CBS | 0.49 | 0 | 0 | 0.29 |
So what does this article tell us?
1) ESPN is really good at predicting pitcher performance.
2) Mechanical forecasts are bad at predicting pitcher performance.
3) Fangraphs fan projections add a large amount of information that you can’t get anywhere else
Thanks for reading!
Next up; 2011 forecasts!
*for descriptions of some of the technical terms and concepts here, please consult the earlier articles in this series, here and here.
Interesting…thanks for the info!
ESPN is the best pitcher forecasting system? Are you sure there isn’t some sort of weird selection bias here? How do you determine the player pools that you assess?
By the way, it seems incorrect to me to assess different pools for different systems. You have to pick a player pool and see how each system does with that pool. Then the problem becomes “what do you do with players that a system doesn’t project?” If CBS and ESPN aren’t projecting certain players, I’m worried that they’re getting an unfair edge because the players they aren’t projecting are the hardest to project.
@philo: I was pretty surprised myself. ESPN is the best? Really??
You’re absolutely right that you can’t consider different pools of players and then try to compare the metrics. That’s why the pitchers I consider are those with forecasts from each projection system, so the player pool is the same for each metric (146 pitchers total). Hopefully this eliminates any selection bias across different projection systems.
Nice, I’ve been looking forward to your pitching weights. Since reading your hitting articles I’ve began compiling a simple average between MARCEL, FANS, ESPN, RotoChamps, CAIRO, and Bill James (ESPN.csv from your site). It looks like I’ll be dropping the mechanical forecasts for pitching thanks to this latest finding. I decided to exclude CBS since their excel format for names was {last name, first name} and unlike all others. Is there any way to get around this? I’ve been using averageif in excel to compile averages to one sheet. One final note: In the comments of your last hitters forecast article you mentioned you’d be posting the weighted projections on your site. Any idea when that will be?
I like the idea here, and appreciate the work. But the forward-looking conclusions simply aren’t supported by looking at a single year of data. To say, “If someone had the Fangraphs Fans, ESPN, and CBS forecasts, they couldn’t add any value by adding one of the mechanical forecasts.” and “1) ESPN is really good at predicting pitcher performance. 2) Mechanical forecasts are bad at predicting pitcher performance,”
is just reckless. You can say in the past tense that systems X,Y and Z provided no value over the competition in 2010; but in the grand scheme of things, this is a very small sample, and certainly not enough to declare the future value of any system.
@Half Full: Wow, I didn’t realize anyone was looking at that yet. It’s still in development, and I haven’t put all the forecasts up there yet. I’ve finished my hitter forecasts for next year and am working on the pitchers. They should be up next week sometime. At that point, I’ll write something up for Fangraphs and hopefully they’ll give me some press. For those of you who don’t know what I’m talking about, go to http://www.williamlarson.com/projections
As for your excel issue, go to tools->text to columns, then choose “,” as the delimiter. Can you email me the CAIRO, CBS and RotoChamps forecasts so I can put them on my site? I haven’t had the time to get them yet.
@evo34: You’re absolutely right that these are backward looking, but past optimal forecast weights have performed well in other areas, and so I think they might have some use going forward. This article isn’t the end of the analysis. For next year, I’ll compute forecasts based on these weights to see if they outperform individual forecasts, a simple average, and see if the “optimal weights” change. Thanks for the comment!
@Will No problem, I can put those together for you. I’ll try submitting them through your website. RotoChamp projections are available on the FanGraphs projection page already if you hadn’t noticed.
Just to clarify – this only means ESPN did the best in 2010, correct? So they could be worse than the others, but in the year of the pitcher happened to have guessed pitchers the best?
@Wade: Yes, you’re right. I’m only looking at 2010. I’ll need to look at other years to see if ESPN consistently outperforms the others or not. That being said, in other areas where people look at these weighted-average forecasts, the weights tend to be correlated from period to period.
Will, did you graduate? Wondering what happened to your intriguing 2011 projection comparisons..