When I was playing in the Arizona Fall League in 2012, I led the league in line-outs. At least it seemed like it. It was the fall before I was Rule-5 eligible and I was hoping to show the Padres I could hit high level pitching. Unfortunately, a .726 OPS in the desert wasn’t going to have them breaking down my door with a team-friendly extension in hand.
If only there were x-stats! XwOBA is the shiny new eight-figure toy that we hitters can play with after an 0-15 slump. “But I was hitting the ball hard. See, look!” Back in the pre-Statcast dark-ages, a lineout might have had some anecdotal benefit buried in the bottom of a report. Now we have the data.
Judge’s study compared season xwOBA for pitchers with the following season. Tango explored the correlations of small sample sizes of xwOBA to a larger sample.
I looked at this through the lens of a player. When a guy is getting lots of hits but they are bloopers and seeing-eye grounders (remember when ground balls went through the infield?) it’s soft hot streak. Likewise, a guy might be hitting the loudest .220 in the history of the PCL.
If you’re hitting the ball hard, they’ll start falling. Right? I wanted to test this theory by measuring xwOBA’s predictive capability month-to-month.
(All data from BaseballSavant)
I started by getting data for each month of the regular season in the Statcast Era (2015-) for players with 50 PA in that month. I then did a series of inner joins in R to get what I’ll call “double-months.” A double month is when a player has 50 PA in two consecutive months. So Aaron Judge in April-May 2017 is one player-double-month.
The column labels in the Double Month data frame were: “wOBA,” “xwOBA,” and “Next month wOBA.” I ended up with 3,173 data points. Running these correlations gives us an idea of how your month might predict your next month.
I also wanted to see whether you’d be better off using your entire previous season to predict the next month. For this I got full-season data (min 200 PA) for 2015 and 2016 and did another series of inner joins to get a data frame representing the previous full-season metrics and the current month metric. These columns would look like this:
“Previous season wOBA,” “Previous Season xwOBA,” “Current season month wOBA.”
I got 2311 of these data points.
For good measure, I also created a data frame for double-seasons. If you had 200 PA in two consecutive seasons, congratulations: you just got a double-season. There ended up being 532 of them.
Finally, I ran all the correlations.
wOBA to Next Month wOBA: r=0.203
xwOBA to Next Month wOBA: r=0.274
Previous season to current month:
wOBA to wOBA: r=0.238
xwOBA to wOBA: r=0.25
wOBA to wOBA: 0.403
xwOBA to wOBA: 0.451
The differences are small, but they are consistent. xwOBA appears to be a better short term predictor than wOBA. What interested me the most was that while wOBA predicts your next month better if used in large sample size, the opposite is true for xwOBA. If you want to use xwOBA, you’re (slightly) better off using the most recent data.
Let’s talk about this in baseball terms. Baseball is so complex that a couple broken bat bloopers here and there can give you a really good month. Maybe you’re getting shifted but the pitcher doesn’t execute his spot and misses away and you shoot the wide open side of the infield a couple times. Maybe you made the mistake of hitting the ball hard in the middle of the field against the Cubs. Stats like wOBA practically scream regression to the mean.
But there’s no hiding from Statcast. If you’re hitting the ball hard it probably means you’re seeing the ball well and are consistently on time. Plate appearances aren’t independent events; we feel things in the cage one day that might get us locked in for a week. Or the other way around.
2013-2014 Oakland A’s