Projecting Uncertainty by Will Larson February 4, 2011 This article explores the ability to predict the randomness of players’ performance in 5 standard hitting categories: HRs, Runs, RBIs, SBs, and AVG. There have been efforts to do so by forecasters, most notably by Tango’s “reliability score.” (See Matt Klaassen’s article) I also test the idea that variation among forecasts (among ESPN, CHONE, Fangraphs Fans, ZIPS, Marcel, and CBS Sportsline) can predict player randomness as well. I find that 1) variance among forecasts is a strong predictor of actual forecast error variance for HRs, Runs, RBIs and Steals, but a weak one for batting average, 2) Tango’s reliability score serves as a weak predictor of all 5 stats, and that 3), the forecast variance information dominates Tango’s measures in all categories but AVG. Now let’s set up the analysis. Say, for example, that three forecasts say that Player A will hit 19, 20, and 21 home runs, respectively, and Player B will hit 10, 20, and 30 home runs. Does the fact that there is agreement in Player A’s forecast and disagreement in Player B’s provide some information about the randomness of Player A’s eventual performance relative to Player B’s? To answer this, we need to do a few things first. We need a measure of dispersion of the forecasts. To do this, I define the forecast variance as the variance of the six forecasts for each stat, for each player. If we take the square root of this number, we get the standard deviation of the forecast. So, the standard deviation of the forecasts of Player A’s HRs would be 1, and the standard deviation of the forecasts for Player 2 would be 10. Next we turn to some regression analysis.* The dependent variable is the absolute error for a particular player’s consensus forecast (defined as the average among the six different forecasts). For both players A and B in the example, this number would be 20. This is my measure for performance randomness. Controlling for the projected counting stats, we can estimate this absolute error as a function of some measure of forecast reliability. Tango’s reliability score is one such measure, and the forecast standard deviation is another. What we would predict is that Tango’s score (where 0 means least reliable and 1 means most) would have a negative effect on the error. We would also predict that the forecast standard deviation would have a positive effect on the error. Now let’s see what the data tell us: Runs: R absolute error    R Standard Deviation 0.45 0.44 (0.27) (0.32) R mean forecast 0.05 0.02 0.03 (0.06) (0.05) (0.06) Tango’s reliability measure -8.15 -0.59 (9.09) (10.60) Constant 22.94 14.93 15.36 HRs: HR absolute error    HR Standard Deviation 0.82 0.78 (0.30) (0.32) HR mean forecast 0.20 0.12 0.13 (0.03) (0.04) (0.04) Tango’s reliability measure -3.26 -0.84 (2.52) (2.69) Constant 5.32 2.31 2.94 RBIs: RBI absolute error    RBI Standard Deviation 0.44 0.34 (0.28) (0.31) RBI mean forecast 0.09 0.05 0.08 (0.05) (0.05) (0.05) Tango’s reliability measure -12.52 -7.83 (9.12) (10.08) Constant 23.78 12.66 18.37 SBs: SB absolute error    SB Standard Deviation 0.50 0.41 (0.24) (0.27) SB mean forecast 0.37 0.30 0.31 (0.03) (0.04) (0.04) Tango’s reliability measure -3.47 -1.90 (2.19) (2.42) Constant 3.80 0.75 2.30 AVG: AVG absolute error    AVG Standard Deviation 0.567 0.287 (0.689) (0.713) AVG mean forecast -0.085 -0.107 -0.083 (0.091) (0.090) (0.092) Tango’s reliability measure -0.023 -0.022 (0.014) (0.015) Constant 0.069 0.054 0.066 We see that HRs are the statistic for which errors are most easily forecasted, errors for Rs, RBIs, and SBs are moderately forecastable, and errors for AVG are not very forecastable. We see this because of the negative and statistically significant coefficients for Tango’s score and the positive and statistically significant coefficients on the standard deviation measure. In regressions with both measures, the standard deviation measure encompasses Tango’s measure, except in the AVG equation. So what does this all mean? If you’re looking at rival forecasts, 80% of the standard deviation between the HR forecasts and about 50% of the standard deviation of the forecasts of the other stats is legitimate randomness. This means that you can tell how random a player’s performance will be by the variation in the forecasts, especially home runs. If you don’t have time to compare different forecasts, then Tango’s reliability score is a rough approximation, but a pretty imprecise measure. *For those of you unfamiliar with regression analysis, imagine a graph of dots and drawing a line through it. Now imagine the graph is 3 or 4 dimensions and doing the same, and the line is drawn such that the (sum of squares of) the distance between the dots and the line is minimized.