Can One Month of Statcast Data Be Used To Evaluate Hitters?

by Ryan Brock

September 16, 2020

You have probably done this if you know about Statcast, “xStats,” and Baseball Savant. You pull up the xStats list, sort by under- or over-performers, and use it to draw broad and sweeping conclusions about your fantasy teams. Which of your fantasy players are poised for quick resurgence, or which of your opponents’ players are prime trade targets? Which guys should you be selling high on before the bottom drops out?

But in the same way that you can’t really sort the FanGraphs leaderboards by ERA minus FIP and just magically find pitching diamonds in the rough (homer rates complicate things…), this is maybe not the best way to be applying our vast wealth of fancy Statcast-based metrics. I’ve personally found that early-season Statcast data is difficult to trust, so I decided to dive in and see what exactly we can learn from one month of xStats.

It turns out there may be something useful here — the method I arrived at after this work would have advised you to buy-in on José Ramírez after his rough start to 2019! But we’ll get to that.

I will dive into gritty details below, but first to quickly outline, here are the major questions I’m setting out to answer (and what I ended up finding):

Do one-month xStat under-/over-performers play better/worse rest-of-season? (Yes!)
Do one-month xStats have more predictive power than Steamer projections? (No…)
Do one-month xStats help to evaluate players Steamer projected inaccurately? (Not really.)

The general takeaway is that it seems players who underperform their xStats in the first month are good targets to acquire.

To collect the data for this study, I made use of Alex Chamberlain’s version of xStats, available here. Alex’s version of these leaderboards focuses on xwOBAcon, which is xwOBA on contact, or in other words, it ignores K/BB rates. This is fine by me for the purpose of this study. I am trying to understand the batted-ball changes that players are showing, and at this point in the season, changes in K%/BB% are both prone to noise and maybe more easily analyzed via “SABR 1.0” approaches anyway.

I pulled the set of players that posted at least 300 plate appearances in 2019 and at least 75 PA before May 1st of that year, a total of 208 hitters.

As a brief aside, I made one more tweak to the data used in this work. One common complaint that you’ll see about xStats is that it doesn’t account for everything and therefore can indicate xwOBA-wOBA differences that are inherent to the player. For example, foot speed, even when accounted for, appears to be a factor. Some of the patron saints of “look, he’s about to break out based on xStats,” are guys with SPD scores below 2.0: Kendrys Morales, Justin Smoak, Miguel Cabrera. Conversely, check the other end of the list and you’ll find Mallex Smith, Victor Robles, and Adalberto Mondesi are always able to outrun their xStats. Lacking the time to develop my own framework for xStats to try to fully address these issues, the tweak I made is simple: Alex’s data goes back to 2017, so I downloaded the full dataset and used the career xwOBAcon-wOBAcon differences for each player as a “correction factor” for their xwOBAcon. All xwOBAcon’s referenced from here on will actually be “corrected” xwOBAcon’s.

No. 1: One-Month xStats vs. Rest-of-Season

A total of 37 players underperformed their xwOBAcon by at least .050 in April of 2019 (for reference, the average wOBAcon for this group of players is about .390). Of those 37 players, 30 of them performed better (+.066 wOBAcon on average) rest-of-season, indicating that this group of players mostly overcame their bad-luck Aprils. In fact, on average, these 37 players still slightly beat their Steamer-projected wOBAcon’s for the season!

On the other side of the spectrum, 19 players overperformed their xwOBAcon by at least .050, and 18 of those 19 performed worse (-.081 wOBAcon on average) rest-of-season! It’s not that these players did poorly either — on average they beat their projections, including some big breakouts like Omar Narváez and Austin Meadows. But xStats correctly identified that these players were way hot in April and getting a little lucky.

This is a good baseline to work from – as we might hope, xStats is able to identify players that are getting unfair results from their batted balls during the first month of the season, and that behavior tends to regress over the rest of the season. But the nuance is that “over/under-performing xStats” does not actually equate to players being good or bad.

No. 2: One-Month xStats vs. Steamer

Here’s the bummer for xStats: even if you just looked at the preseason Steamer projections, they would still tell you more about rest-of-season performance than one month of xStats data would. From May onward, the preseason Steamer projections averaged .037 points of wOBAcon error across all players. Meanwhile, if you pro-rated the April wOBAcon’s forward, you’d get .063 points of error, and if you did the same with the April xwOBAcon’s, you’d get .053 points of error. So xwOBAcon is closer than just pro-rating the raw April stats, but still far inferior to an actual projection system, as one might expect.

If you isolate just the xStat underperformers, the results are largely the same, however something interesting jumps out of xStat overperformers. For that particular group of players, pro-rating their xwOBAcon’s led to just .041 points of error, not for from Steamer’s .036 points. It’s not entirely clear to me why – perhaps being an overperformer just increases the likelihood of having stats which regress toward your xStats? This could be something especially driven by sample size opposed to underperformers, whose reasons might be more complicated and hard to overcome, like loss of foot 1speed or increases in defensive shifting.

One could reasonably assume from these findings that the next best option would be to have a projection system which incorporates expected outcomes. Maybe pro-rating one month of stats is not a good idea (it’s not), but incorporating that new info into an existing weighted average of past performances could yield some useful tweaks! I’m never clear on which in-season datasets are used to create the “rest-of-season” projections for the publicly available systems, but my best guess here would be that Derek Carty’s “The Bat X” is already doing this. I’m very curious to see how that new system performs this year.

No. 3: One-Month xStats vs. Inaccurately-Projected Players

The final thing we can do is to turn this around and look at the groups of players that the projection system “missed” in 2019. It turns out that the big breakouts and busts actually show some pretty clear signs in April of how the rest of their season is going to go, as many of the Steamer overperformers in April are the Steamer overperformers RoS (and same for underperformers). However, there are also a ton of players that have hot/cold Aprils and end up finishing out the season either as-projected or in the opposite cold/hot direction. I’ve sliced and diced this every which way, and it’s pretty much noise. The hot April players for xStats are pretty much the hot April players for regular stats, and it can be difficult to tease out who is having a “breakout” vs. who was hot for a month without diving into deeper analysis.

Buys/Sells Based on April 2019 xStats

Even though your best bet is to rely on the projections based on the above analysis, it could still be worth pairing that with xStats to understand which players are having legitimate breakouts or busts vs. which players are running into some large wOBA vs. xwOBA disparities. You can find the full list of players used for this study in the spreadsheet here if you want to follow along at home.

First up is the underperformers, where we want to find players with very bad current performance (wOBAcon) but high expected performance. Limiting our list down to the 20 largest April – xApril differences, combined with the worst wOBAcon values, here are the top three guys you might have targeted in fantasy trades:

Buy No. 1: José Ramírez (.226 April wOBAcon, .344 April xwOBAcon, .385 RoS wOBAcon)
Would have been a great buy. The RoS wOBAcon is a little above the initial Steamer projection. He went .276/.340/.536 the rest of the way with 21 HR and 15 SB.

Buy No. 2: Marwin Gonzalez (.242 April wOBAcon, .343 April xwOBAcon, .393 RoS wOBAcon)
Another pretty good one, I seem to recall him being on waiver wires after hitting .167 through the first month. Again, the RoS wOBAcon is a little above the initial Steamer projection. He was good for .285/.340/.450 with 13 homers from May onward.

Buy No. 3: Jesús Aguilar (.254 April wOBAcon, .350 April xwOBAcon, .368 RoS wOBAcon)
Steamer had him down for a .409 wOBAcon, so this was a rough one if you bought in on Aguilar. Sure, at least he rebounded to something much better than the dreadful April, but the nine homers and .261 AVG from May onward probably just did not get it done for you.

Honorable “Buy” Mentions: Jurickson Profar, Jackie Bradley Jr., Niko Goodrum, Ramón Laureano.
Dishonorable “Buy” Mentions: Yonder Alonso, Ryan O’Hearn.

For the overperformers, this is all about the guys you would have tried to sell high on. Ideally, we wouldn’t sell high on players with high xwOBAcon numbers, so this is more about guys that have low xwOBAcons but fraudulently high wOBAcon. Otherwise we would “sell” Austin Meadows because his .499 April xwOBAcon was 60 points lower than his actual performance (yikes!). Limiting our list down to the 20 largest April – xApril differences, combined with the lowest xwOBAcon values, here are the top three guys you might have sold in fantasy trades in 2019:

Sell No. 1) Jarrod Dyson (.408 April wOBAcon, .322 April xwOBAcon, .269 RoS wOBAcon)
By the end of April he had hit over .300 with three homers and three stolen bases, and you might have been daring to dream on a 30+ SB season paired with suddenly improved AVG and HRs from Dyson. He did end up nabbing those 30 steals, but everything else was pretty putrid. If someone was buying, it was a good sell, assuming you had other stolen-base options.

Sell No. 2) Robinson Chirinos (.435 April wOBAcon, .337 April xwOBAcon, .407 RoS wOBAcon)
This would have been a bad sell. By the end of the season, his xStats and regular stats actually converged, and he wOBA’d .047 points higher than Steamer projected when all was said and done. Seventeen homers is nothing to sneeze at.

Sell No. 3) Nick Ahmed (.347 April wOBAcon, .298 April xwOBAcon, .348 RoS wOBAcon)
Another bad sell, he was actually very consistent between April and RoS at about .020 points higher than what Steamer originally projected. He was probably only rostered in deep fantasy leagues, but he finished pretty decently with 19 homers and eight stolen bases.

Honorable “Sell” Mentions: Dan Vogelbach, Rhys Hoskins, Joey Votto.
Dishonorable “Sell” Mention: Jorge Soler, Max Muncy, Wil Myers.

All in all, the “buy” recommendations look a lot more credible than the “sells.” I might guess that this is because teams are also using this data very intelligently. Players that perform poorly, both in real stats and in xStats, are not given as much rope as they might have been given in the pre-data-driven era of baseball.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG