Using Double-A Stats to Predict Future Performance
Over the last couple of weeks, I’ve been looking into how a players’ stats, age, and prospect status can be used to predict whether he’ll ever play in the majors. I used a methodology that I named KATOH (after Yankees prospect Gosuke Katoh), which consists of running a probit regression analysis. In a nutshell, a probit regression tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. While KATOH technically predicts the likelihood that a player will reach the majors, I’d argue it can also serve as a decent proxy for major league success. If something makes a player more likely to make the majors, there’s a good chance it also makes him more likely to succeed there.
Things that were predictive for players in low-A and high-A included age, strikeout rate, ISO, BABIP, and whether or not he was deemed a top 100 prospect by Baseball America in the pre-season. However, a player’s walk rate was not significant in predicting a player’s ascension to the majors. Today, I’ll look into what KATOH has to say about players in double-A leagues. For those interested, here’s the R output based on all players with at least 400 plate appearances in a season in double-A from 1995-2010. Due to varying offensive environments in different years and leagues, all players’ stats were adjusted to reflect his league’s average for that year.
Unlike in the A-ball iterations of KATOH, a player’s double-A walk rate is predictive — albeit only slightly — of whether or not he’ll make it to the show. While walk rate is statistically significant, it still matters much less than the other stats: it takes 3 or 4 percentage points on a player’s walk rate to match what 1 percentage point of strikeout rate does to a player’s MLB probability.
This version is also different in that there are a couple of significant interaction terms, signified by the last two coefficients in the above output. The “I(Age^2)” term adds a little bit of nuance into how a players’ age can predict his future success. While the “ISO:BA.Top.100.Prospect” term basically says that if you’re a top 100 prospect, hitting for power is slightly less important than it would be otherwise. Hitting for power and making Baseball America’s top 100 list both make a player much more likely to make it to the majors, but if he does both, he’s a tad less likely to make it than his power output and prospect status would suggest independently. Put another way, a few top 100 prospects hit for power in double-A, but never cracked the majors — such as Jason Stokes (.241 ISO), Nick Weglarz (.204 ISO) and Eric Duncan (.173 ISO). But virtually all of the low-power guys made it, including Elvis Andrus (.073 ISO), Luis Castillo (.076 ISO), and Carl Crawford (.078). For non-top 100 guys, many more punchless hitters topped out in double-A and triple-A.
By clicking here, you can see what KATOH spits out for all current prospects who logged at least 250 PA’s in double-A as of July 7th, as well as a few that fell short of the cutoff — most notably Joey Gallo, Kevin Plawecki, and Robert Refsnyder. Topping the list is Mookie Betts with a probability of 99.95%, and of course the prophesy was fulfilled when the Red Sox called up the 21-year-old last month. Here’s an excerpt of the top players from double-A this year:
Player | Organization | Age | MLB Probability |
---|---|---|---|
Mookie Betts | BOS | 21 | 100% |
Francisco Lindor | CLE | 20 | 100% |
Gary Sanchez | NYY | 21 | 99% |
Austin Hedges | SDP | 21 | 99% |
Alen Hanson | PIT | 21 | 99% |
Jorge Bonifacio | KCR | 21 | 98% |
Blake Swihart | BOS | 22 | 98% |
Kris Bryant | CHC | 22 | 93% |
Ketel Marte | SEA | 20 | 91% |
Rangel Ravelo | CHW | 22 | 90% |
Robert Refsnyder | NYY | 23 | 86% |
Jake Lamb | ARI | 23 | 85% |
Jake Hager | TBR | 21 | 84% |
Darnell Sweeney | LAD | 23 | 83% |
Joey Gallo | TEX | 20 | 82% |
Preston Tucker | HOU | 23 | 81% |
Scott Schebler | LAD | 23 | 79% |
Kevin Plawecki | NYM | 23 | 79% |
Cheslor Cuthbert | KCR | 21 | 78% |
Kyle Kubitza | ATL | 23 | 77% |
Michael Taylor | WSN | 23 | 76% |
Christian Walker | BAL | 23 | 76% |
Ryan Brett | TBR | 22 | 75% |
Keep an eye out for the next installment, which will dive into what KATOH says about hitters at the triple-A level.
Statistics courtesy of FanGraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.
Chris works in economic development by day, but spends most of his nights thinking about baseball. He writes for Pinstripe Pundits, FanGraphs and The Hardball Times. He's also on the twitter machine: @_chris_mitchell None of the views expressed in his articles reflect those of his daytime employer.
I very much like the approach in general but to live up to the title of the post I think you ought to consider stripping out the BA.Top.100.Prospect variables entirely.
Seconded
That makes sense. But I think layering in the prospect ranking piece improves the prediction since it doesn’t entirely “scout the stat line.” Maybe I just need to come up with a better title 🙂
This is similar to Nate Silver’s reasoning for his March Madness model, when he incorporates the preseason poll rankings as having some predictive value.
In situations where there are limited data points and/or weaker than normal connections between historical data and the analysis being projected, it can be useful to throw in stuff like this.
This is incredibly intriguing research. I’m very interested in this article and the upcoming one for AAA. This has huge advantages for fantasy purposes as well. Keep it up!
Thanks! Should be able to finish up the AAA one within the next few days.
I’m just not getting the Alen Hanson thing. The kid’s an error magnet, and has 100 of them at SS in the last 3 seasons alone, along with a career .934 Fld% and 3.97 RF/G. I realize the bat is decent for the age/level (assuming we *know his real age), but seriously, his agent must be working overtime with a high powered marketing firm to jack his stock way up. Where is the excitement coming from for this kid, and how on earth does he sport a 99% chance of reaching the bigs? If he does make it, my money says he’s benched after his first week for clanking plays.
Oh wait, can we just pinch hit him?
I think once you get to this level, the test of “making it” or not becomes a bit less meaningful, in that most players will at least make it. Is it possible to look at those who make it and do well vs. those who do not?
Yeah that’s more than fair. I’ve thought about doing something like that, but its a little harder to quantify. This considers players at various points of their careers, so it wouldn’t make sense to use straight WAR, but something centered around WAR/PA might work. I’ll probably do something like that (in addition to the “making it” probability) when I re-do this exercise with end-of-season stats.
It’d be interesting to see what this method says when applied to last year’s AA crop, or the year before.
Agreed. I’ll put it on my to-do list!
I like this series, good work. It will take some more time but what about if where the prospect was ranked and giving some credit to the prospects who just missed.
Glad you’re enjoying it. I did try incorporating a player’s rankings for the A-ball iterations, but I found that simply specifying whether or not a player made the list was more predictive. For consistency’s sake, I just stuck with that approach for the higher levels. Its something worth exploring further, though. And accounting for the near misses is a good idea too.