Updating Hitter xISO and Second-Half Predictions
In late May, I posted a version of expected ISO (xISO), inspired by Alex Chamberlain’s work, which incorporated the publicly available Statcast data, easily accessible from the Baseball Savant leaderboard. I’ve been tinkering with it since, and figured I would post an updated version, as well as some second-half predictions based on the current “leaders and laggards”.
The original version of xISO was a simple linear regression model using GB% and average LD/FV exit velocity (LDFBEV). The only feature of any real note was the inclusion of the square of LDFBEV as an additional term. I knew then that I could get better correlation to data if I used LD% and FB% and removed GB% from the model, but I thought the simpler model would be better. I also thought it would be weird to have LD% and FB% as separate terms, and then one combined term for average exit velocity. I guess I just changed my mind. Whatever, it’s all empirical, and the only rule is it has to…predict better. Let’s examine the model, again trained on 2015 qualified hitters, and using LD% and FB% instead of GB%.
As you can see, the coefficient of determination went up a little bit from the previous version. It’s not a big deal, but it’s basically free, so we’ll take it. The updated model equation is as follows:
Now, we also have a fair bit of data for this year. I don’t yet want to update the model parameters using 2015 and 2016 data to train, but I will at least check how the model correlates to this year’s outcomes so far. I arbitrarily selected a minimum of 175 batted ball events (BBE), which limits the pool to 141 players, as of July 8th.
Look at that! Not too bad overall. Armed with some confidence in the method, let’s now take a look at some of the hitters who most over- and under-performed xISO in the first half (numbers current as of July 9). I will also attempt to avoid talking about any of the players I mentioned previously, or that Alex mentioned in his June xISO report.
Jay Bruce: ISO = .274, xISO = .187
Bruce is actually hitting his line drives and fly balls with less authority than last year (92.8 mph down from 93.2). His overall batted-ball profile looks similar as well. After a couple down years, it’s nice to see Bruce succeeding, but I’m not betting on it to continue.
Anthony Rizzo: ISO = .282, xISO = .201
At the risk of enraging my pal, league-mate, and curator of Harper Wallbanger, we might need to calm down a little bit on Rizzo. Don’t get me wrong, I think he’s a very good player, but odds are he won’t continue to hit for quite this much power.
Jake Lamb: ISO = .330, xISO = .256
Right now, Jake Lamb is second in the majors in ISO behind David Ortiz. He does hit the ball hard (97.9 mph LDFBEV), but he hits 46% of his balls on the ground. Even a .256 ISO would be quite good, given his decent walk rate. This will likely go down as a true breakout season for Lamb.
Wil Myers: ISO = .242, xISO = .188
While some of the guys on this list play in hitters’ parks, Myers is an example of a first half overperformer in a pitcher’s park. Between expected power regression and his spotty injury history, I’m nervous about the second half.
Andrew McCutchen: ISO = .165, xISO = .233
Now, ‘Cutch is hitting more popups this year than last year, which could be fooling xISO a bit. Still, I like his ISO to get back to around .200. Of more concern might be his spike in strikeouts.
Ryan Zimmerman: ISO = .181, xISO = .236
Zimmerman’s exit velocity is up from last year (96.8 mph from 95.0). He probably won’t hit for average, but if he continue making hard contact, he should accumulate plenty of RBIs in the second half.
Yasiel Puig: ISO = .133, xISO = .188
xISO basically expects Puig to get back to his career average of .183. My main worry with the burly Cuban is his struggle to maintain a healthy pair of hamstrings.
Colby Rasmus: ISO = .157, xISO = .211
At this point, we basically know who Rasmus is. He is a player who consistently sports an ISO over .200. After a bump in fly balls last year, he’s sitting below his career average this season. That’s not ideal for power output, but he’s also hitting the ball a bit harder. I’ll still bet on him doubling his homer total over the remainder of the season, and surpassing 20 for the second season in Houston.
That’s it! Please feel free to to leave comments, questions, or suggestions for improvement. I’m working on a public document with the xISO calculation available for every player, updated daily-ish. Feel free to follow me on Twitter for updates, or badger me in the comments.
Andrew is a research engineer from Waltham, Massachusetts. He has contributed to the FanGraphs Community blog, presented at Saberseminar, and appeared as an analytical correspondent on Japanese television. He can be found on Twitter @ADominijanni, where he'll happily talk science, sports, beer, and dogs.
Check out the following Google doc to look up any player. Please let me know on Twitter if you see any issues!
You’re a saint.
Thanks for that!
I will temper my Rizzo-induced rage by using your spreadsheet to defeat you.
Also nice work!
Your xISO work is excellent! Really concise and presentable. I have been working on a model of my own, but have run into a statistical roadblock. I am also something of an autodidact but I hope to take AP Stats next year. There are a lot of things I don’t quite understand and I was hoping you would be able to answer a few questions. Thanks.
Thanks for reading Jackson. As you allude to, i’m mostly self-taught, but through my job and my baseball hobby, i’ve managed to delve into a few areas of statistics to some depth. I’ve also got a good amount of computational experience in Matlab and Python, which is what I use for this project.
If you are on Twitter, please DM me, and i’d be happy to lend any advice I can. If not, let me know and we’ll find a way to get in touch.
That would be great. I do not have twitter or instagram. Would email work?
I was originally intending to not give out my email address publicly, but really, it’s not too different than giving out a twitter handle, plus it’s not too hard to guess anyway.