Using Short-Season A Stats to Predict Future Performance

Over the last couple of weeks, I’ve been looking into how a player’s stats, age, and prospect status can be used to predict whether he’ll ever play in the majors. So far, I’ve analyzed hitters in Rookie leaguesLow-A, High-A, Double-A and Triple-A using a methodology that I named KATOH (after Yankees prospect Gosuke Katoh), which consists of running a probit regression analysis. In a nutshell, a probit regression tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. While KATOH technically predicts the likelihood that a player will reach the majors, I’d argue it can also serve as a decent proxy for major league success. If something makes a player more likely to make the majors, there’s a good chance it also makes him more likely to succeed there.

For hitters in Low-A and High-A, age, strikeout rate, ISO, BABIP, and whether or not he was deemed a top 100 prospect by Baseball America all played a role in forecasting future success. And walk rate, while not predictive for players in Rookie ball, Low-A, or High-A, added a little bit to the model for Double-A and Triple-A hitters. Today, I’ll look into what KATOH has to say about players in Short-Season A-ball. Due to varying offensive environments in different years and leagues, all players’ stats were adjusted to reflect his league’s average for that year. For those interested, here’s the R output based on all players with at least 200 plate appearances in a season in SS A-ball from 1995-2007.

Short Season Output

Just like we saw with hitters in Rookie ball, a player’s Baseball America prospect status couldn’t tell us anything about his future as a big leaguer. This was entirely due the scarcity players top 100 prospects in the sample, as only a handful of players spent the year in SS A-ball after making BA’s top 100 list. Somewhat surprisingly, walk rate is predictive for players in SS-A, despite being statistically insignificant for hitters in Rookie ball and the more advanced A-ball levels. Another interesting wrinkle is the “Strikeout_Rate:Age” variable. Basically, this says that strikeout rate matters more for younger players than for older players at this level. Although frequent strikeouts are obviously a bad thing no matter how old you are:

Rplot

The season is less than 50 games old for most teams in the New York-Penn and Northwest Leagues, which makes it a little premature to start analyzing players’ stats. But just for kicks, here’s a look at what KATOH says about this year’s crop of players with at least 100 plate appearances through July 28th. The full list of players can be found here, and you’ll find an excerpt of those who broke the 40% barrier below:

Player Organization Age MLB Probability
Rowan Wick STL 21 82%
Eduard Pinto TEX 19 68%
Marcus Greene TEX 19 60%
Mauricio Dubon BOS 19 59%
Franklin Barreto TOR 18 57%
Christian Arroyo SFG 19 57%
Skyler Ewing SFG 21 56%
Taylor Gushue PIT 20 55%
Domingo Leyba DET 18 55%
Raudy Read WSN 20 53%
Nick Longhi BOS 18 52%
Andrew Reed HOU 21 52%
Danny Mars BOS 20 51%
Amed Rosario NYM 18 49%
Yairo Munoz OAK 19 48%
Seth Spivey TEX 21 47%
Mike Gerber DET 21 47%
Mark Zagunis CHC 21 47%
Kevin Krause PIT 21 46%
Leo Castillo CLE 20 45%
Jordan Luplow PIT 20 45%
Mason Davis MIA 21 40%
Kevin Ross PIT 20 40%
Franklin Navarro DET 19 40%

As we saw with Rookie league hitters, KATOH doesn’t think any of these players are shoo-ins to make it to the majors. Even Rowan Wick, who hit a Bondsian .378/.475/.815 before getting promoted, gets just 82%. This goes to show that SS A-ball stats just aren’t all that meaningful.

Once the season’s over, I’ll re-run everything using the final 2014 stats, which will give us a better sense of which prospects had the most promising years statistically. I also plan to engineer an alternative methodology — to supplement this one — that will take into account how a player performs in the majors, rather than his just getting there. Additionally, I hope to create something similar for projecting pitchers based on their statistical performance. In the meantime, I’ll apply the KATOH model to historical prospects and highlight some of its biggest “hits” and “misses” from years past. Keep an eye out for the next post in the coming days.

Statistics courtesy of FanGraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.





Chris works in economic development by day, but spends most of his nights thinking about baseball. He writes for Pinstripe Pundits, FanGraphs and The Hardball Times. He's also on the twitter machine: @_chris_mitchell None of the views expressed in his articles reflect those of his daytime employer.

8 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
nicknowsky
9 years ago

I would of added Rob Refsynder in this mix, hes having a heck of season in AA/AAA

nicknowsky
9 years ago
Reply to  nicknowsky

actually he doesnt really apply to this article now that Ive read it thoroughly, nevermind

nicknowsky
9 years ago
Reply to  Chris Mitchell

thats better then I thought, either way great piece. The eye test for scouts nowadays is hit or miss

Nathaniel Dawson
9 years ago

I’m curious, Chris. Did you run any comparable tests that exclude the Baseball America rankings to gauge whether they were adding anything beyond what you could get with the stats alone? I’m unsure of the mechanics of what you’re doing, so if that’s something that’s inherent within what you’re doing, i guess just disregard. From what little I can make out of what you’re doing, it seems maybe your regression analysis accounts for that.

Would comparing players both on and off the list with or without their BA100 ranking included tell us anything different? I’m speaking of course of the higher levels where the BA100 ranking seems to have some predictive power.

Clowery
9 years ago
Reply to  Chris Mitchell

Did yo run a multiple regression? y=MLB appearance, y=f(SO rate, Walk Rate, etc.) and if so, what was the r-squared?