Using Short-Season A Stats to Predict Future Performance
Over the last couple of weeks, I’ve been looking into how a player’s stats, age, and prospect status can be used to predict whether he’ll ever play in the majors. So far, I’ve analyzed hitters in Rookie leagues, Low-A, High-A, Double-A and Triple-A using a methodology that I named KATOH (after Yankees prospect Gosuke Katoh), which consists of running a probit regression analysis. In a nutshell, a probit regression tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. While KATOH technically predicts the likelihood that a player will reach the majors, I’d argue it can also serve as a decent proxy for major league success. If something makes a player more likely to make the majors, there’s a good chance it also makes him more likely to succeed there.
For hitters in Low-A and High-A, age, strikeout rate, ISO, BABIP, and whether or not he was deemed a top 100 prospect by Baseball America all played a role in forecasting future success. And walk rate, while not predictive for players in Rookie ball, Low-A, or High-A, added a little bit to the model for Double-A and Triple-A hitters. Today, I’ll look into what KATOH has to say about players in Short-Season A-ball. Due to varying offensive environments in different years and leagues, all players’ stats were adjusted to reflect his league’s average for that year. For those interested, here’s the R output based on all players with at least 200 plate appearances in a season in SS A-ball from 1995-2007.
Just like we saw with hitters in Rookie ball, a player’s Baseball America prospect status couldn’t tell us anything about his future as a big leaguer. This was entirely due the scarcity players top 100 prospects in the sample, as only a handful of players spent the year in SS A-ball after making BA’s top 100 list. Somewhat surprisingly, walk rate is predictive for players in SS-A, despite being statistically insignificant for hitters in Rookie ball and the more advanced A-ball levels. Another interesting wrinkle is the “Strikeout_Rate:Age” variable. Basically, this says that strikeout rate matters more for younger players than for older players at this level. Although frequent strikeouts are obviously a bad thing no matter how old you are:
The season is less than 50 games old for most teams in the New York-Penn and Northwest Leagues, which makes it a little premature to start analyzing players’ stats. But just for kicks, here’s a look at what KATOH says about this year’s crop of players with at least 100 plate appearances through July 28th. The full list of players can be found here, and you’ll find an excerpt of those who broke the 40% barrier below:
Player | Organization | Age | MLB Probability |
---|---|---|---|
Rowan Wick | STL | 21 | 82% |
Eduard Pinto | TEX | 19 | 68% |
Marcus Greene | TEX | 19 | 60% |
Mauricio Dubon | BOS | 19 | 59% |
Franklin Barreto | TOR | 18 | 57% |
Christian Arroyo | SFG | 19 | 57% |
Skyler Ewing | SFG | 21 | 56% |
Taylor Gushue | PIT | 20 | 55% |
Domingo Leyba | DET | 18 | 55% |
Raudy Read | WSN | 20 | 53% |
Nick Longhi | BOS | 18 | 52% |
Andrew Reed | HOU | 21 | 52% |
Danny Mars | BOS | 20 | 51% |
Amed Rosario | NYM | 18 | 49% |
Yairo Munoz | OAK | 19 | 48% |
Seth Spivey | TEX | 21 | 47% |
Mike Gerber | DET | 21 | 47% |
Mark Zagunis | CHC | 21 | 47% |
Kevin Krause | PIT | 21 | 46% |
Leo Castillo | CLE | 20 | 45% |
Jordan Luplow | PIT | 20 | 45% |
Mason Davis | MIA | 21 | 40% |
Kevin Ross | PIT | 20 | 40% |
Franklin Navarro | DET | 19 | 40% |
As we saw with Rookie league hitters, KATOH doesn’t think any of these players are shoo-ins to make it to the majors. Even Rowan Wick, who hit a Bondsian .378/.475/.815 before getting promoted, gets just 82%. This goes to show that SS A-ball stats just aren’t all that meaningful.
Once the season’s over, I’ll re-run everything using the final 2014 stats, which will give us a better sense of which prospects had the most promising years statistically. I also plan to engineer an alternative methodology — to supplement this one — that will take into account how a player performs in the majors, rather than his just getting there. Additionally, I hope to create something similar for projecting pitchers based on their statistical performance. In the meantime, I’ll apply the KATOH model to historical prospects and highlight some of its biggest “hits” and “misses” from years past. Keep an eye out for the next post in the coming days.
Statistics courtesy of FanGraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.
Chris works in economic development by day, but spends most of his nights thinking about baseball. He writes for Pinstripe Pundits, FanGraphs and The Hardball Times. He's also on the twitter machine: @_chris_mitchell None of the views expressed in his articles reflect those of his daytime employer.
I would of added Rob Refsynder in this mix, hes having a heck of season in AA/AAA
actually he doesnt really apply to this article now that Ive read it thoroughly, nevermind
I covered Refsnyder in the AA and AAA articles. He graded out very well: 86% for his time in AA and 94% in AAA.
thats better then I thought, either way great piece. The eye test for scouts nowadays is hit or miss
Agreed. Scouts can obviously add some insight, but I feel like statistical analysis of minor league players is really lacking, especially compared to all that’s out there for MLB players.
I’m curious, Chris. Did you run any comparable tests that exclude the Baseball America rankings to gauge whether they were adding anything beyond what you could get with the stats alone? I’m unsure of the mechanics of what you’re doing, so if that’s something that’s inherent within what you’re doing, i guess just disregard. From what little I can make out of what you’re doing, it seems maybe your regression analysis accounts for that.
Would comparing players both on and off the list with or without their BA100 ranking included tell us anything different? I’m speaking of course of the higher levels where the BA100 ranking seems to have some predictive power.
I did test both with and without the BA 100 ranking included. The stats alone do a good job of predicting a player’s MLB probability, but factoring in the player’s prospect status makes the model a little more accurate.
Did yo run a multiple regression? y=MLB appearance, y=f(SO rate, Walk Rate, etc.) and if so, what was the r-squared?