Weighting Past Results: Hitters

November 15, 2013

We all know by now that we should look at more than one year of player data when we evaluate players. Looking at the past three years is the most common way to do this, and it makes sense why: three years is a reasonable time frame to try and increase your sample size while not reaching back so far that you’re evaluating an essentially different player.

The advice for looking at previous years of player data, however, usually comes with a caveat. “Weigh them”, they’ll say. And then you’ll hear some semi-arbitrary numbers such as “20%, 30%, 50%”, or something in that range. Well, buckle up, because we’re about to get a little less arbitrary.

Some limitations: The point of this study isn’t to replace projection systems—we’re not trying to project declines/improvements here. We’re simply trying to understand how past data tends to translate into future data.

The methodology is pretty simple. We’re going to take three years of player data (I’m going to use wRC+ since it’s league-adjusted etc., and I’m only trying to measure offensive production), and then weight the years so that we can get an expected 4^th year wRC+. We’re then going to compare our expected wRC+ against the actual wRC+*. The closer the expected to our actual, the better the weights.

*Note: I am using four-year spans of player data from 2008-2013, and limiting to players that had at least 400 PA in four consecutive years. This should help throw out outliers and to give more consistent results. Our initial sample size is 244, which is good enough to give meaningful results.

I’ll start with the “dumb” case. Let’s just weigh all of the years equally, so that each year counts for 33.3% of our expected outcome.

Expected vs. Actual wRC+, unweighted

Weight1	Weight2	Weight3	Average Inaccuracy
33.3%	33.3%	33.3%	16.55

Okay, so we’re averaging missing the actual wRC+ by roughly 16.5. That means that we’re averaging 16.5% inaccuracy when extrapolating the past into the future with no weights. Now let’s try being a little smarter about it and try some different weights out.

Expected vs. Actual wRC+, various weights

Weight1	Weight2	Weight3	Average Inaccuracy
20%	30%	50%	16.73
25%	30%	45%	16.64
30%	30%	40%	16.58
15%	40%	45%	16.62
0%	50%	50%	16.94
0%	0%	100%	20.15

Huh! It seems that no matter what we do, “intelligently weighting” each year never actually increases our accuracy. If you’re just generally trying to extrapolate several past years of wRC+ data to try and predict a fourth year of wRC+, your best bet is to just unweightedly average the past wRC+ data. Now, the differences are small (for example, our weights of [.3, .3, .4] were only .03 different in accuracy the unweighted total, which is statistically insignificant), but the point remains: weighing data from past years simply does not increase your accuracy. Pretty counter-intuitive.

Let’s dive a little deeper now—is there any situation in which weighting a player’s past does help? We’ll test this by limiting our ages. For example: are players that are younger than 30 better served by weighing their most previous years heavily? This would make sense, since younger players are most likely to experience a true-talent change. (Sample size: 106)

Expected vs. Actual wRC+, players younger than 30

Weight1	Weight2	Weight3	Average Inaccuracy
33.3%	33.3%	33.3%	16.17
20%	30%	50%	16.37
25%	30%	45%	16.29
30%	30%	40%	16.26
15%	40%	45%	16.20
0%	50%	50%	16.50
0%	0%	100%	20.16

Ok, so that didn’t work either. Even with young players, using unweighted totals is the best way to go. What about old players? Surely with aging players the recent years would most represent a player’s decline. Let’s find out (Sample size: 63).

Expected vs. Actual wRC+, players older than 32

Weight1	Weight2	Weight3	Average Inaccuracy
33.3%	33.3%	33.3%	16.52
16%	30%	50%	16.18
25%	30%	45%	16.27
30%	30%	40%	16.37
15%	40%	45%	16.00
0%	50%	50%	15.77
0%	55%	45%	15.84
0%	45%	55%	15.77
0%	0%	100%	18.46

Hey, we found something! With aging players you should weight a player’s last two seasons equally, and you should not even worry about three seasons ago! Again, notice that the difference is small (you’ll be about 0.8% more correct by doing this than using unweighted totals). And as with any stat, you should always think about why you’re coming to the conclusion that you’re coming to. You might want to weight some players more aggressively than others, especially if they’re older.

In the end, it just really doesn’t matter that much. You should, however, generally use unweighted weights since differences in wRC+ are pretty much always results of random fluctuation and very rarely the result of actual talent change. That’s what the data shows. So next time you hear someone say “weigh their past three years 3/4/5” (or similar), you can snicker a little. Because you know better.

Two Different Scenarios of a Mike Trout Extension

The Effect of Devastating Blown Saves

Brandon Reppert is a computer "scientist" who finds talking about himself in the third-person peculiar.

12 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

J. Cross

11 years ago

I replicated this with seasons post 1960 to get a larger dataset (4020 predicted seasons)

unweighted by PA:

system, RMSE
even weights, 19.75
5:3:2, 19.72
1:1:0, 20.38
5:4:3, 19.65

weighed by PA

system, RMSE
even weights, 19.73
5:3:2, 19.69
1:1:0, 20.43
5:4:3, 19.63

so, in this dataset 5/4/3 has a slight edge over even weights and 5:4:3 does a bit better than that.

Brandon

11 years ago

Reply to J. Cross

Awesome! The larger sample size is definitely a good thing.

The point remains, though, that weighting years doesn’t dramatically improve your predictions. An increase of (roughly) .1% accuracy is pretty darn small, even though it’s significant with that sample size. The benefits of weighting seem to be mostly cancelled out by the fact that you’re increasing the noise of single-season data.

J. Cross

11 years ago

Reply to Brandon

Agreed. Using absolute error instead of RMSE, the even weighing edges out 5/3/2 although 5/4/3 is still a touch ahead.

Garbanzo

11 years ago

Only using players with 400+ PA in all 4 years basically guarantees your pool is all at-least-decent players playing at-least-decently, and among that group of players, most of the fluctuation is evidently just noise. You’re basically cutting off all the player who imploded in year 4 and were kept from 400 PA because they sucked (or got seriously hurt, which is less interesting), and those are the guys it would be most interesting to test on.

Brandon

11 years ago

Reply to Garbanzo

Definitely, and this is something I knew when I was preparing the numbers. The numbers presented above should be considered more strongly when evaluating players that are similar to the players I sampled (players with roughly 400 PA several years in a row). I should have stated that more explicitly in my sampling note.

It would be interesting to do a follow-up study on ONLY players that had large fluctuations in production/playing time, to see the extent of predictive value to recent fluctuations for those types of players. I’ve got some other pieces I want to get to first, though.

ronusiah

11 years ago

Hello, everybody, the good shoping place, the new season approaching, click in.
( w/w/w.sheptrade.c/o/m )
(Discount Air jordan shoes) $36,
(Air Max shoes) $35,
(Nike shox shoes) $36,
(Handbags) $39,
(Sunglasses) $16,
(wallet) $18,
(Belt) $17,
(T-shirts) $20,
(Jeans) $37,
(NFL/MLB/NBA)Jerseys $25,
( w/w/w.sheptrade.c/o/m )

-3

samyoung

11 years ago

Good stuff! I’m curious also about how much to weight the current season versus the start-of-season prior. (Obviously for “mid-season” use.)

Brandon Reppert

11 years ago

Reply to samyoung

This would be interesting indeed. I imagine the answer is to weight the current season’s plate appearances very, very slightly more than past seasons.

A better way to diagnose current seasons though is probably to just look at peripheral stats that stabilize much faster than wRC+ to determine true talent change. There’s just so much noise in wRC+ due to it being affected by the results of batted balls that it takes a pretty big sample size to start becoming reliable.

rgeryhtr

11 years ago

▬▬▬▬▬▬▬▬▬▬▬ஜ۩۞۩ஜ▬▬▬▬▬▬▬▬▬▬▬▬▬
Hi friend, we are a prefession online store(company), you can see more photos and price in our website which is show in the photos
we take credit card,westernunion,bank transfer,cash,T/T as payment, and free shipping.shoes shox af1 $28-42 free shiping.hellow we operate a good online mall, our website is see our website in the photos attached, we have large of brand new shoes,clothing, handbag,sunglasses,hats etc for sale, our product is 10000000% best quality with the amazing price. You can find the more photos and the price for our product in our website, if interested please email me by we are selling all brand new products.
OUR WEBSITE: WWW . GOBUYSTYLE2 . COM
▬▬▬▬▬▬▬▬▬▬▬ஜ۩۞۩ஜ▬▬▬▬▬▬▬▬▬▬▬▬▬

-2

11 years ago

Reply to rgeryhtr

WTF? Since when did spammers start learning calligraphy?

samyoung

11 years ago

Dude, this is really cool! Good work, Brandon.

Jonathan Judge

11 years ago

Nice job.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG