Using Double-A Stats to Predict Future Performance

July 23, 2014

Over the last couple of weeks, I’ve been looking into how a players’ stats, age, and prospect status can be used to predict whether he’ll ever play in the majors. I used a methodology that I named KATOH (after Yankees prospect Gosuke Katoh), which consists of running a probit regression analysis. In a nutshell, a probit regression tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. While KATOH technically predicts the likelihood that a player will reach the majors, I’d argue it can also serve as a decent proxy for major league success. If something makes a player more likely to make the majors, there’s a good chance it also makes him more likely to succeed there.

Things that were predictive for players in low-A and high-A included age, strikeout rate, ISO, BABIP, and whether or not he was deemed a top 100 prospect by Baseball America in the pre-season. However, a player’s walk rate was not significant in predicting a player’s ascension to the majors. Today, I’ll look into what KATOH has to say about players in double-A leagues. For those interested, here’s the R output based on all players with at least 400 plate appearances in a season in double-A from 1995-2010. Due to varying offensive environments in different years and leagues, all players’ stats were adjusted to reflect his league’s average for that year.

Unlike in the A-ball iterations of KATOH, a player’s double-A walk rate is predictive — albeit only slightly — of whether or not he’ll make it to the show. While walk rate is statistically significant, it still matters much less than the other stats: it takes 3 or 4 percentage points on a player’s walk rate to match what 1 percentage point of strikeout rate does to a player’s MLB probability.

This version is also different in that there are a couple of significant interaction terms, signified by the last two coefficients in the above output. The “I(Age^2)” term adds a little bit of nuance into how a players’ age can predict his future success. While the “ISO:BA.Top.100.Prospect” term basically says that if you’re a top 100 prospect, hitting for power is slightly less important than it would be otherwise. Hitting for power and making Baseball America’s top 100 list both make a player much more likely to make it to the majors, but if he does both, he’s a tad less likely to make it than his power output and prospect status would suggest independently. Put another way, a few top 100 prospects hit for power in double-A, but never cracked the majors — such as Jason Stokes (.241 ISO), Nick Weglarz (.204 ISO) and Eric Duncan (.173 ISO). But virtually all of the low-power guys made it, including Elvis Andrus (.073 ISO), Luis Castillo (.076 ISO), and Carl Crawford (.078). For non-top 100 guys, many more punchless hitters topped out in double-A and triple-A.

By clicking here, you can see what KATOH spits out for all current prospects who logged at least 250 PA’s in double-A as of July 7th, as well as a few that fell short of the cutoff — most notably Joey Gallo, Kevin Plawecki, and Robert Refsnyder. Topping the list is Mookie Betts with a probability of 99.95%, and of course the prophesy was fulfilled when the Red Sox called up the 21-year-old last month. Here’s an excerpt of the top players from double-A this year:

Player	Organization	Age	MLB Probability
Mookie Betts	BOS	21	100%
Francisco Lindor	CLE	20	100%
Gary Sanchez	NYY	21	99%
Austin Hedges	SDP	21	99%
Alen Hanson	PIT	21	99%
Jorge Bonifacio	KCR	21	98%
Blake Swihart	BOS	22	98%
Kris Bryant	CHC	22	93%
Ketel Marte	SEA	20	91%
Rangel Ravelo	CHW	22	90%
Robert Refsnyder	NYY	23	86%
Jake Lamb	ARI	23	85%
Jake Hager	TBR	21	84%
Darnell Sweeney	LAD	23	83%
Joey Gallo	TEX	20	82%
Preston Tucker	HOU	23	81%
Scott Schebler	LAD	23	79%
Kevin Plawecki	NYM	23	79%
Cheslor Cuthbert	KCR	21	78%
Kyle Kubitza	ATL	23	77%
Michael Taylor	WSN	23	76%
Christian Walker	BAL	23	76%
Ryan Brett	TBR	22	75%

Keep an eye out for the next installment, which will dive into what KATOH says about hitters at the triple-A level.

Statistics courtesy of FanGraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.

13 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Aaron (UK)

11 years ago

I very much like the approach in general but to live up to the title of the post I think you ought to consider stripping out the BA.Top.100.Prospect variables entirely.

Spencer00

Reply to Aaron (UK)

Seconded

Chris Mitchell

That makes sense. But I think layering in the prospect ranking piece improves the prediction since it doesn’t entirely “scout the stat line.” Maybe I just need to come up with a better title 🙂

Reply to Chris Mitchell

This is similar to Nate Silver’s reasoning for his March Madness model, when he incorporates the preseason poll rankings as having some predictive value.

In situations where there are limited data points and/or weaker than normal connections between historical data and the analysis being projected, it can be useful to throw in stuff like this.

Mark Williamson

This is incredibly intriguing research. I’m very interested in this article and the upcoming one for AAA. This has huge advantages for fantasy purposes as well. Keep it up!

Reply to Mark Williamson

Thanks! Should be able to finish up the AAA one within the next few days.

GLars

I’m just not getting the Alen Hanson thing. The kid’s an error magnet, and has 100 of them at SS in the last 3 seasons alone, along with a career .934 Fld% and 3.97 RF/G. I realize the bat is decent for the age/level (assuming we *know his real age), but seriously, his agent must be working overtime with a high powered marketing firm to jack his stock way up. Where is the excitement coming from for this kid, and how on earth does he sport a 99% chance of reaching the bigs? If he does make it, my money says he’s benched after his first week for clanking plays.

Oh wait, can we just pinch hit him?

evo34Member since 2023

I think once you get to this level, the test of “making it” or not becomes a bit less meaningful, in that most players will at least make it. Is it possible to look at those who make it and do well vs. those who do not?

Reply to evo34

Yeah that’s more than fair. I’ve thought about doing something like that, but its a little harder to quantify. This considers players at various points of their careers, so it wouldn’t make sense to use straight WAR, but something centered around WAR/PA might work. I’ll probably do something like that (in addition to the “making it” probability) when I re-do this exercise with end-of-season stats.

Fish

It’d be interesting to see what this method says when applied to last year’s AA crop, or the year before.

Reply to Fish

Agreed. I’ll put it on my to-do list!

Josh

I like this series, good work. It will take some more time but what about if where the prospect was ranked and giving some credit to the prospects who just missed.

Reply to Josh

Glad you’re enjoying it. I did try incorporating a player’s rankings for the A-ball iterations, but I found that simply specifying whether or not a player made the list was more predictive. For consistency’s sake, I just stuck with that approach for the higher levels. Its something worth exploring further, though. And accounting for the near misses is a good idea too.