Weighting Past Results: Hitters by Brandon Reppert November 15, 2013 We all know by now that we should look at more than one year of player data when we evaluate players. Looking at the past three years is the most common way to do this, and it makes sense why: three years is a reasonable time frame to try and increase your sample size while not reaching back so far that you’re evaluating an essentially different player. The advice for looking at previous years of player data, however, usually comes with a caveat. “Weigh them”, they’ll say. And then you’ll hear some semi-arbitrary numbers such as “20%, 30%, 50%”, or something in that range. Well, buckle up, because we’re about to get a little less arbitrary. Some limitations: The point of this study isn’t to replace projection systems—we’re not trying to project declines/improvements here. We’re simply trying to understand how past data tends to translate into future data. The methodology is pretty simple. We’re going to take three years of player data (I’m going to use wRC+ since it’s league-adjusted etc., and I’m only trying to measure offensive production), and then weight the years so that we can get an expected 4th year wRC+. We’re then going to compare our expected wRC+ against the actual wRC+*. The closer the expected to our actual, the better the weights. *Note: I am using four-year spans of player data from 2008-2013, and limiting to players that had at least 400 PA in four consecutive years. This should help throw out outliers and to give more consistent results. Our initial sample size is 244, which is good enough to give meaningful results. I’ll start with the “dumb” case. Let’s just weigh all of the years equally, so that each year counts for 33.3% of our expected outcome. Expected vs. Actual wRC+, unweighted Weight1 Weight2 Weight3 Average Inaccuracy 33.3% 33.3% 33.3% 16.55 Okay, so we’re averaging missing the actual wRC+ by roughly 16.5. That means that we’re averaging 16.5% inaccuracy when extrapolating the past into the future with no weights. Now let’s try being a little smarter about it and try some different weights out. Expected vs. Actual wRC+, various weights Weight1 Weight2 Weight3 Average Inaccuracy 20% 30% 50% 16.73 25% 30% 45% 16.64 30% 30% 40% 16.58 15% 40% 45% 16.62 0% 50% 50% 16.94 0% 0% 100% 20.15 Huh! It seems that no matter what we do, “intelligently weighting” each year never actually increases our accuracy. If you’re just generally trying to extrapolate several past years of wRC+ data to try and predict a fourth year of wRC+, your best bet is to just unweightedly average the past wRC+ data. Now, the differences are small (for example, our weights of [.3, .3, .4] were only .03 different in accuracy the unweighted total, which is statistically insignificant), but the point remains: weighing data from past years simply does not increase your accuracy. Pretty counter-intuitive. Let’s dive a little deeper now—is there any situation in which weighting a player’s past does help? We’ll test this by limiting our ages. For example: are players that are younger than 30 better served by weighing their most previous years heavily? This would make sense, since younger players are most likely to experience a true-talent change. (Sample size: 106) Expected vs. Actual wRC+, players younger than 30 Weight1 Weight2 Weight3 Average Inaccuracy 33.3% 33.3% 33.3% 16.17 20% 30% 50% 16.37 25% 30% 45% 16.29 30% 30% 40% 16.26 15% 40% 45% 16.20 0% 50% 50% 16.50 0% 0% 100% 20.16 Ok, so that didn’t work either. Even with young players, using unweighted totals is the best way to go. What about old players? Surely with aging players the recent years would most represent a player’s decline. Let’s find out (Sample size: 63). Expected vs. Actual wRC+, players older than 32 Weight1 Weight2 Weight3 Average Inaccuracy 33.3% 33.3% 33.3% 16.52 16% 30% 50% 16.18 25% 30% 45% 16.27 30% 30% 40% 16.37 15% 40% 45% 16.00 0% 50% 50% 15.77 0% 55% 45% 15.84 0% 45% 55% 15.77 0% 0% 100% 18.46 Hey, we found something! With aging players you should weight a player’s last two seasons equally, and you should not even worry about three seasons ago! Again, notice that the difference is small (you’ll be about 0.8% more correct by doing this than using unweighted totals). And as with any stat, you should always think about why you’re coming to the conclusion that you’re coming to. You might want to weight some players more aggressively than others, especially if they’re older. In the end, it just really doesn’t matter that much. You should, however, generally use unweighted weights since differences in wRC+ are pretty much always results of random fluctuation and very rarely the result of actual talent change. That’s what the data shows. So next time you hear someone say “weigh their past three years 3/4/5” (or similar), you can snicker a little. Because you know better.