Can Past Calendar Year Stats be Trusted?

April 1, 2015

To me, the first few weeks of baseball each year are small sample size season. It seems that every article is either a) drawing wildly irresponsible conclusions based on a few dozen plate appearances or innings (either with or without the routine “This is a small sample, but…” disclaimer), or b) showing why those claims are wildly irresponsible and not very useful. This is how we get articles comparing Charlie Blackmon and Mike Trout. It gets a little repetitive, but writing this in March, when the closest thing to real baseball I can experience is play-by-play tweeting of a spring training game, it honestly sounds lovely.

Fairly often in those early articles, I see analyses that use past calendar year stats, that incorporate the first x games of the current season and the last 162-x games of the previous season. The idea is to rely on more than a few games of evidence, but still incorporate hot first months in some way. I’m always conflicted about how much trust to put in those stats and the resulting conclusions.

On the one hand, they have a reasonable sample size, and aren’t drawing any crazy conclusions off a few good games. Including a large portion of the prior season limits the effect a first month can have on the results, which is probably a good thing. On the other hand, it seems like a lot of changes could be made in the offseason, and those changes could have major effects on a player’s performance basically immediately. If that were the case, stat lines that treated game 1 of 2014 as following game 162 of year 2013 in the same way game 162 of 2013 followed game 161 of 2013 would not be presenting an accurate picture of skill.

Consider the case of Brandon McCarthy, who made a lot of changes to his offseason training regimen between the 2013 and 2014 seasons (detailed in this Eno Sarris article). He went on to record his healthiest season to date in 2014, hitting 200 innings exactly with the second-best WAR (3.0) and best xFIP (2.87) of his career. Combining his results from September/October 2013 (42.0 IP, 7.6% K-BB%, 3.74 xFIP) and March/April 2014 (37.1 IP, 15.2% K-BB%, 2.89 xFIP) would not give an accurate sense of McCarthy going into 2014. But is he the exception, or the rule?

To test this, I looked at the correlations between players’ stats in the first and second halves of 2014, and compared that to the correlation between their stats in the second half of 2013 and the first half of 2014. I expect the six-month discontinuity in the second case to make the correlations weaker, but by how much? If it’s a lot, that’s a sign that analysis relying on stats from the last calendar year probably shouldn’t be trusted; if it’s not, then incorporating the last few months of the previous season to boost sample size is more likely to be a good idea. I also looked at the correlations between stats in 2013 and 2014, to provide a sort of baseline for how predictable each statistic is from season-to-season.

I tried to choose stats that reflect primarily the skill of each player, but that they can control to some extent. Hopefully these are stats that won’t change due to a player switching teams, but might if he changes his approach. I settled on BB%, K%, ISO, and BsR for batters, and BB%, K%, GB%, and HR% for pitchers. Those look reasonable to me, but I’d welcome any suggestions.

I set a minimum of 400 PAs or 160 IP for the full-year samples, and 200 PAs or 80 IP for the half-year samples, and looked at all the players that showed up in both of the time frames being compared. I’m going to look at position players first, then starters. In the following table, the value in each cell is the linear R² of the stats in the two time periods, except in the last row, which shows the number of players in the sample. I bolded the stronger of the two half vs. half correlations.

	2nd Half ’13 v. 1st Half ’14	1st Half ’14 v. 1st Half ’14	Full 13 v. Full 14
BB%	.552	.481	.608
K%	.672	.661	.771
ISO	.572	.519	.654
BsR	.565	.849	.605
n	140	138	142

So these are some seriously unintuitive results, to the point that I went back and triple-checked the data, but it’s accurate. BB%, K%, and ISO all tracked better from player to player from the second half of 2013 to the first half of 2014 than they did from the first half of 2014 to the second half of 2014. Of the four selected stats, only BsR had a stronger correlation inside 2014, but it was odd in its own way, as it was also the only stat for which the full year correlation wasn’t the strongest.

What could explain this? First, it’s possible that this is just randomness, and if we looked at this over a larger sample, the in-year correlations would tend to be stronger. But even if that’s the case, the fact that randomness can make the cross-year correlations stronger (as opposed to just making the lead of the in-year correlations larger) suggests that the difference between the two is relatively small. One possible explanation is survivor bias – perhaps players that get a lot worse between the first and second halves are still likely to see playing time until the end of the season, while players who get substantially worse between seasons might be benched in the first month or two and not get to the 200 PA/80 IP minimum. There’s no doubt that there is survivor bias in this sample, but I’m not convinced by that explanation. Settling on randomness always feels half-hearted, but I really have no idea what else it could be. If anyone has any thoughts, post them in the comments!

The table for the pitchers is set up in the same way.

	2nd Half ’13 v. 1st Half ’14	1st Half ’14 v. 1st Half ’14	Full 13 v. Full 14
BB%	.533	.663	.738
K%	.489	.844	.723
GB%	.742	.799	.779
HR%	.243	.213	.357
n	38	45	47

This looks a lot more like I expected. Three of the four stats are more strongly correlated in season than between seasons, and the exception (HR%) also has the smallest gap between the two correlations, making me inclined to chalk that up to random variation. Interestingly, the gap between the season-to-season correlations and the half-to-half correlations is relatively small (again with the exception of HR%), which fits with my perception of BB%, K%, and GB% as stats that stabilize relatively quickly.

It also doesn’t surprise me that pitchers are less predictable than hitters from the second half of one season to the first half of the other, relative to their in-season predictability. Intuitively, pitchers seem to have a lot more control over their approach, and a much greater ability to shift significantly in the offseason by adding a new pitch, changing a grip, or just getting healthy for the first time in a while. Hitters, on the other hand, seem like they have less ability to change their approach drastically. Even when they can make a change, it’s not necessarily the sort of thing that has to happen in the offseason; if a hitter wants to be more aggressive, he can just decide to be more aggressive, whereas a pitcher looking to throw more strikes is probably going to have to work at that. If true, hitter changes would happen throughout the season and offseason, while pitcher changes would be clustered in the offseason. These correlations don’t provide nearly enough evidence to conclude that’s true, but they do fit these perceptions, which is encouraging.

Overall, this suggests that while going back to last season to get a year’s worth of PAs for a hitter might be a good way to beef up your sample size, it’s probably not as good idea for a pitcher, and also less necessary. After the first few starts, most starters have thrown enough innings that the interesting metrics – BB%, K%, Zone%, etc. – are more signal than noise, and not a lot is added by going to the previous season. This analysis also suggests that adding old stats may even reduce accuracy, by ignoring the potentially significant shifts made by pitchers in the offseason. So the next time you read about a starter’s performance in his last 30 starts, stretching back to May 2014, beware! Or at least be skeptical.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG