Using Short-Season A Stats to Predict Future Performance

August 3, 2014

Over the last couple of weeks, I’ve been looking into how a player’s stats, age, and prospect status can be used to predict whether he’ll ever play in the majors. So far, I’ve analyzed hitters in Rookie leagues, Low-A, High-A, Double-A and Triple-A using a methodology that I named KATOH (after Yankees prospect Gosuke Katoh), which consists of running a probit regression analysis. In a nutshell, a probit regression tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. While KATOH technically predicts the likelihood that a player will reach the majors, I’d argue it can also serve as a decent proxy for major league success. If something makes a player more likely to make the majors, there’s a good chance it also makes him more likely to succeed there.

For hitters in Low-A and High-A, age, strikeout rate, ISO, BABIP, and whether or not he was deemed a top 100 prospect by Baseball America all played a role in forecasting future success. And walk rate, while not predictive for players in Rookie ball, Low-A, or High-A, added a little bit to the model for Double-A and Triple-A hitters. Today, I’ll look into what KATOH has to say about players in Short-Season A-ball. Due to varying offensive environments in different years and leagues, all players’ stats were adjusted to reflect his league’s average for that year. For those interested, here’s the R output based on all players with at least 200 plate appearances in a season in SS A-ball from 1995-2007.

Just like we saw with hitters in Rookie ball, a player’s Baseball America prospect status couldn’t tell us anything about his future as a big leaguer. This was entirely due the scarcity players top 100 prospects in the sample, as only a handful of players spent the year in SS A-ball after making BA’s top 100 list. Somewhat surprisingly, walk rate is predictive for players in SS-A, despite being statistically insignificant for hitters in Rookie ball and the more advanced A-ball levels. Another interesting wrinkle is the “Strikeout_Rate:Age” variable. Basically, this says that strikeout rate matters more for younger players than for older players at this level. Although frequent strikeouts are obviously a bad thing no matter how old you are:

The season is less than 50 games old for most teams in the New York-Penn and Northwest Leagues, which makes it a little premature to start analyzing players’ stats. But just for kicks, here’s a look at what KATOH says about this year’s crop of players with at least 100 plate appearances through July 28th. The full list of players can be found here, and you’ll find an excerpt of those who broke the 40% barrier below:

Player	Organization	Age	MLB Probability
Rowan Wick	STL	21	82%
Eduard Pinto	TEX	19	68%
Marcus Greene	TEX	19	60%
Mauricio Dubon	BOS	19	59%
Franklin Barreto	TOR	18	57%
Christian Arroyo	SFG	19	57%
Skyler Ewing	SFG	21	56%
Taylor Gushue	PIT	20	55%
Domingo Leyba	DET	18	55%
Raudy Read	WSN	20	53%
Nick Longhi	BOS	18	52%
Andrew Reed	HOU	21	52%
Danny Mars	BOS	20	51%
Amed Rosario	NYM	18	49%
Yairo Munoz	OAK	19	48%
Seth Spivey	TEX	21	47%
Mike Gerber	DET	21	47%
Mark Zagunis	CHC	21	47%
Kevin Krause	PIT	21	46%
Leo Castillo	CLE	20	45%
Jordan Luplow	PIT	20	45%
Mason Davis	MIA	21	40%
Kevin Ross	PIT	20	40%
Franklin Navarro	DET	19	40%

As we saw with Rookie league hitters, KATOH doesn’t think any of these players are shoo-ins to make it to the majors. Even Rowan Wick, who hit a Bondsian .378/.475/.815 before getting promoted, gets just 82%. This goes to show that SS A-ball stats just aren’t all that meaningful.

Once the season’s over, I’ll re-run everything using the final 2014 stats, which will give us a better sense of which prospects had the most promising years statistically. I also plan to engineer an alternative methodology — to supplement this one — that will take into account how a player performs in the majors, rather than his just getting there. Additionally, I hope to create something similar for projecting pitchers based on their statistical performance. In the meantime, I’ll apply the KATOH model to historical prospects and highlight some of its biggest “hits” and “misses” from years past. Keep an eye out for the next post in the coming days.

Statistics courtesy of FanGraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.

8 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

nicknowsky

10 years ago

I would of added Rob Refsynder in this mix, hes having a heck of season in AA/AAA

Reply to nicknowsky

actually he doesnt really apply to this article now that Ive read it thoroughly, nevermind

Chris Mitchell

I covered Refsnyder in the AA and AAA articles. He graded out very well: 86% for his time in AA and 94% in AAA.

Reply to Chris Mitchell

thats better then I thought, either way great piece. The eye test for scouts nowadays is hit or miss

Agreed. Scouts can obviously add some insight, but I feel like statistical analysis of minor league players is really lacking, especially compared to all that’s out there for MLB players.

Nathaniel Dawson

I’m curious, Chris. Did you run any comparable tests that exclude the Baseball America rankings to gauge whether they were adding anything beyond what you could get with the stats alone? I’m unsure of the mechanics of what you’re doing, so if that’s something that’s inherent within what you’re doing, i guess just disregard. From what little I can make out of what you’re doing, it seems maybe your regression analysis accounts for that.

Would comparing players both on and off the list with or without their BA100 ranking included tell us anything different? I’m speaking of course of the higher levels where the BA100 ranking seems to have some predictive power.

Reply to Nathaniel Dawson

I did test both with and without the BA 100 ranking included. The stats alone do a good job of predicting a player’s MLB probability, but factoring in the player’s prospect status makes the model a little more accurate.

Clowery

Did yo run a multiple regression? y=MLB appearance, y=f(SO rate, Walk Rate, etc.) and if so, what was the r-squared?

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG