Applying KATOH to Historical Prospects

Over the last few weeks, I have written a series of posts looking into how a player’s stats, age, and prospect status can be used to predict whether he’ll ever play in the majors. I analyzed hitters in Rookie leagues, Short-Season A, Low-A, High-A, Double-A, and Triple-A using a methodology that I named KATOH (after Yankees prospect Gosuke Katoh), which consists of running a probit regression analysis. In a nutshell, a probit regression tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. While KATOH technically predicts the likelihood that a player will reach the majors, I’d argue it can also serve as a decent proxy for major league success. If something makes a player more likely to make the majors, there’s a good chance it also makes him more likely to succeed there.

After receiving a few requests, I decided to apply the model to players of years past. In what follows, I dive into what KATOH would have said about recent top prospects, look at the highest KATOH scores of the last 20 years, and highlight some instances where KATOH missed the boat on a prospect. If you’re feeling really ambitious, here’s a giant google doc of KATOH scores for all 40,051 player seasons since 1995 ( minimum 100 plate appearances in a short-season league or 200 in full-season ball).

Before I delve into the parade of lists, I want to point out one disclaimer to what I’m doing here. KATOH was derived from the performances of historical players, so applying the model to those same players might make it look a little better than it is. Take a player like Jason Stokes for example. Although he was a very well-regarded prospect in the early 2000’s (#15 and #51 per Baseball America in 2003 and 2004), KATOH consistently gave him probabilities in the 70’s and 80’s. But part of that is likely because Stokes’ data points were incorporated into the model. If I had created KATOH in 2005, Stokes’ MLB% may have been a few percentage points higher. Even so, a few data points generally aren’t enough to substantially change a model that incorporates thousands. In other words, it’s probably safe to assume that a player’s MLB% using today’s KATOH is roughly in line with what he would have received at the time.

Now, onto the results. Here’s what KATOH thought about some of the most recent top 100 prospects:

2013 Top 100 Prospects

Player Year Age Level MLB Probability
Xander Bogaerts 2013 20 AA 99.888%
Xander Bogaerts 2013 20 AAA 99.869%
George Springer 2013 23 AAA 99.816%
Gregory Polanco 2013 21 AA 99.614%
Nick Castellanos 2013 21 AAA 99.608%
Kolten Wong 2013 22 AAA 99.428%
Wil Myers 2013 22 AAA 99.418%
Miguel Sano 2013 20 A+ 99.335%
Tyler Austin 2013 21 AA 99.194%
Jackie Bradley 2013 23 AAA 99.079%
Kaleb Cowart 2013 21 AA 99%
Byron Buxton 2013 19 A+ 98%
Francisco Lindor 2013 19 A+ 98%
Christian Yelich 2013 21 AA 97%
Byron Buxton 2013 19 A 97%
Addison Russell 2013 19 A+ 97%
Billy Hamilton 2013 22 AAA 96%
Brian Goodwin 2013 22 AA 96%
Carlos Correa 2013 18 A 96%
Slade Heathcott 2013 22 AA 96%
Javier Baez 2013 20 A+ 95%
Jake Marisnick 2013 22 AA 95%
Albert Almora 2013 19 A 95%
Jonathan Singleton 2013 21 AAA 94%
Mike Zunino 2013 22 AAA 94%
Alen Hanson 2013 20 A+ 94%
Gregory Polanco 2013 21 A+ 92%
Javier Baez 2013 20 AA 91%
Jorge Soler 2013 21 A+ 90%
Gary Sanchez 2013 20 A+ 89%
Austin Hedges 2013 20 A+ 89%
Mike Olt 2013 24 AAA 87%
Miguel Sano 2013 20 AA 83%
George Springer 2013 23 AA 82%
Mason Williams 2013 21 A+ 78%
Trevor Story 2013 20 A+ 61%
Bubba Starling 2013 20 A 61%
Courtney Hawkins 2013 19 A+ 58%
Roman Quinn 2013 20 A 58%

2012 Top 100 Prospects

Player Year Age Level MLB Probability
Jurickson Profar 2012 19 AA 99.975%
Anthony Rizzo 2012 22 AAA 99.947%
Manny Machado 2012 19 AA 99.937%
Billy Hamilton 2012 21 AA 99.856%
Oscar Taveras 2012 20 AA 99.827%
Kolten Wong 2012 21 AA 99.824%
Nolan Arenado 2012 21 AA 99.759%
Leonys Martin 2012 24 AAA 99.737%
Nick Franklin 2012 21 AA 99.737%
Yasmani Grandal 2012 23 AAA 99.714%
Wil Myers 2012 21 AAA 99.659%
Andrelton Simmons 2012 22 AA 99.566%
Travis D’Arnaud 2012 23 AAA 99.512%
Jedd Gyorko 2012 23 AAA 99.493%
Hak-Ju Lee 2012 21 AA 99.492%
Jonathan Singleton 2012 20 AA 99.482%
Nick Castellanos 2012 20 AA 99.465%
Jonathan Schoop 2012 20 AA 99.443%
Jean Segura 2012 22 AA 99.423%
Nick Castellanos 2012 20 A+ 99.051%
Starling Marte 2012 23 AAA 99.015%
Anthony Gose 2012 21 AAA 99%
Rymer Liriano 2012 21 AA 99%
Jake Marisnick 2012 21 AA 99%
Xander Bogaerts 2012 19 A+ 98%
Michael Choice 2012 22 AA 98%
Gary Brown 2012 23 AA 98%
Christian Yelich 2012 20 A+ 98%
Nick Franklin 2012 21 AAA 97%
Javier Baez 2012 19 A 97%
Brett Jackson 2012 23 AAA 96%
Zack Cox 2012 23 AAA 92%
Mason Williams 2012 20 A 91%
Gary Sanchez 2012 19 A 89%
Jake Marisnick 2012 21 A+ 88%
Francisco Lindor 2012 18 A 88%
Cheslor Cuthbert 2012 19 A+ 87%
Miguel Sano 2012 19 A 86%
Billy Hamilton 2012 21 A+ 83%
George Springer 2012 22 A+ 80%
Christian Villanueva 2012 21 A+ 80%
Mike Olt 2012 23 AA 79%
Matt Szczur 2012 22 A+ 78%
Rymer Liriano 2012 21 A+ 76%
Blake Swihart 2012 20 A 66%
Cory Spangenberg 2012 21 A+ 64%
Bubba Starling 2012 19 R 17%

2011 Top 100 Prospects

Player Year Age Level MLB Probability
Mike Trout 2011 19 AA 99.973%
Brett Lawrie 2011 21 AAA 99.969%
Anthony Rizzo 2011 21 AAA 99.911%
Wil Myers 2011 20 AA 99.654%
Christian Colon 2011 22 AA 99.495%
Brandon Belt 2011 23 AAA 99.414%
Austin Romine 2011 22 AA 99.393%
Jesus Montero 2011 21 AAA 99.379%
Devin Mesoraco 2011 23 AAA 99.205%
Brett Jackson 2011 22 AAA 99.199%
Dustin Ackley 2011 23 AAA 99.196%
Yonder Alonso 2011 24 AAA 99%
Lonnie Chisenhall 2011 22 AAA 99%
Zack Cox 2011 22 AA 98%
Jason Kipnis 2011 24 AAA 98%
Mike Moustakas 2011 22 AAA 98%
Desmond Jennings 2011 24 AAA 98%
Jonathan Villar 2011 20 AA 98%
Matt Dominguez 2011 21 AAA 98%
Jurickson Profar 2011 18 A 97%
Bryce Harper 2011 18 A 97%
Tony Sanchez 2011 23 AA 97%
Dee Gordon 2011 23 AAA 97%
Grant Green 2011 23 AA 97%
Manny Machado 2011 18 A+ 97%
Nolan Arenado 2011 20 A+ 96%
Chris Carter 2011 24 AAA 96%
Travis D’Arnaud 2011 22 AA 96%
Wilmer Flores 2011 19 A+ 95%
Jose Iglesias 2011 21 AAA 95%
Hak-Ju Lee 2011 20 A+ 94%
Brett Jackson 2011 22 AA 93%
Jonathan Singleton 2011 19 A+ 92%
Joe Benson 2011 23 AA 91%
Gary Sanchez 2011 18 A 86%
Wilin Rosario 2011 22 AA 86%
Nick Castellanos 2011 19 A 85%
Nick Franklin 2011 20 A+ 83%
Jean Segura 2011 21 A+ 82%
Cesar Puello 2011 20 A+ 82%
Derek Norris 2011 22 AA 76%
Jonathan Villar 2011 20 A+ 73%
Aaron Hicks 2011 21 A+ 68%
Billy Hamilton 2011 20 A 61%
Miguel Sano 2011 18 R 44%
Josh Sale 2011 19 R 15%

Next, lets take a look at some of the highest KATOH scores of all time, namely those who received a score of at least 99.9%. There aren’t any complete busts among these players, as virtually all of them went on to play in the majors.

All-Time Top KATOH Scores

Player Year Age Level MLB Probability
Sean Burroughs 2000 19 AA 99.998%
Luis Castillo 1996 20 AA 99.995%
Fernando Martinez 2007 18 AA 99.994%
Daric Barton 2005 19 AA 99.992%
Alex Rodriguez 1995 19 AAA 99.992%
Carl Crawford 2001 19 AA 99.992%
Elvis Andrus 2008 19 AA 99.992%
Adam Dunn 2001 21 AAA 99.990%
Joe Mauer 2003 20 AA 99.989%
Ryan Sweeney 2005 20 AA 99.984%
Nick Johnson 1999 20 AA 99.984%
Jose Tabata 2009 20 AA 99.983%
Jose Tabata 2008 19 AA 99.983%
Travis Snider 2009 21 AAA 99.981%
Joaquin Arias 2005 20 AA 99.980%
Matt Kemp 2006 21 AAA 99.979%
Jose Reyes 2002 19 AA 99.979%
Jurickson Profar 2012 19 AA 99.975%
Mike Trout 2011 19 AA 99.973%
Jay Bruce 2008 21 AAA 99.971%
Brett Lawrie 2011 21 AAA 99.969%
B.J. Upton 2004 19 AAA 99.959%
Howie Kendrick 2006 22 AAA 99.951%
Ryan Howard 2005 25 AAA 99.951%
Dioner Navarro 2004 20 AA 99.950%
Luis Rivas 1999 19 AA 99.949%
Lastings Milledge 2005 20 AA 99.948%
Anthony Rizzo 2012 22 AAA 99.947%
Billy Butler 2006 20 AA 99.946%
Fernando Martinez 2008 19 AA 99.944%
Alberto Callaspo 2004 21 AA 99.944%
Jose Lopez 2003 19 AA 99.939%
Freddie Freeman 2010 20 AAA 99.939%
Manny Machado 2012 19 AA 99.937%
Rickie Weeks 2005 22 AAA 99.935%
Casey Kotchman 2004 21 AAA 99.932%
Eric Chavez 1998 20 AAA 99.930%
Adrian Beltre 1998 19 AA 99.927%
Shannon Stewart 1995 21 AA 99.917%
Anthony Rizzo 2011 21 AAA 99.911%
Karim Garcia 1995 19 AAA 99.910%
Jay Bruce 2007 20 AAA 99.907%
Jeff Clement 2008 24 AAA 99.902%
Miguel Cabrera 2003 20 AA 99.900%

All of the players who registered a KATOH score of at least 99.9% did so while playing in either Double- or Triple-A. This isn’t all that surprising since these are the levels closest to the big leagues. But what about the lower levels? Like we saw in Double- and Triple-A, there weren’t any complete busts among the highest ranking hitters from full-season A-ball. For both full-season leagues, each of the 20 top ranked players has either made it to the majors, or in the case of Carlos Correa, is young enough to still has an excellent chance to do so. But on the bottom two rungs on the minor league ladder, we come across a few instances where KATOH whiffed, most notably in Garrett Guzman (74%), Richard Stuart (72%), and Pat Manning (72%).

Top KATOH Scores for Seasons in High-A

Player Year Age Level MLB Probability
Adrian Beltre 1997 18 A+ 99.863%
Andruw Jones 1996 19 A+ 99.568%
Giancarlo Stanton 2009 19 A+ 99.405%
Billy Butler 2005 19 A+ 99.348%
Miguel Sano 2013 20 A+ 99.335%
Chris Snelling 2001 19 A+ 99.241%
Jason Heyward 2009 19 A+ 99.097%
Andy LaRoche 2005 21 A+ 99.091%
Wilmer Flores 2010 18 A+ 99.075%
Nick Castellanos 2012 20 A+ 99.051%
Jose Reyes 2002 19 A+ 99%
Casey Kotchman 2003 20 A+ 99%
Vernon Wells 1999 20 A+ 99%
Travis Lee 1997 22 A+ 99%
Brandon Wood 2005 20 A+ 98%
Xander Bogaerts 2012 19 A+ 98%
Justin Huber 2003 20 A+ 98%
Aramis Ramirez 1997 19 A+ 98%
Jay Bruce 2007 20 A+ 98%
Byron Buxton 2013 19 A+ 98%

Top KATOH Scores for Seasons in Low-A

Player Year Age Level MLB Probability
Mike Trout 2010 18 A 99%
Adrian Beltre 1996 17 A 98%
Jurickson Profar 2011 18 A 97%
Bryce Harper 2011 18 A 97%
Sean Burroughs 1999 18 A 97%
Andruw Jones 1995 18 A 97%
Byron Buxton 2013 19 A 97%
Jason Heyward 2008 18 A 97%
Corey Patterson 1999 19 A 97%
Vladimir Guerrero 1995 20 A 97%
Javier Baez 2012 19 A 97%
Ian Stewart 2004 19 A 96%
Lastings Milledge 2004 19 A 96%
Carlos Correa 2013 18 A 96%
Prince Fielder 2003 19 A 96%
Delmon Young 2004 18 A 96%
Josh Vitters 2009 19 A 96%
Chad Hermansen 1996 18 A 95%
Wilmer Flores 2010 18 A 95%
B.J. Upton 2003 18 A 95%

Top KATOH Scores for Seasons in Short-Season A

Player Year Age Level MLB Probability Played in Majors
Chris Snelling 1999 17 A- 82% 1
Richard Stuart 1996 19 A- 72% 0
Aramis Ramirez 1996 18 A- 71% 1
Ryan Kalish 2007 19 A- 71% 1
Cory Spangenberg 2011 20 A- 66% 0
Hanley Ramirez 2002 18 A- 66% 1
Wilson Betemit 2000 18 A- 65% 1
Ismael Castro 2002 18 A- 65% 0
Vernon Wells 1997 18 A- 64% 1
Carlos Figueroa 2000 17 A- 61% 0
Carson Kelly 2013 18 A- 61% 0
Pablo Sandoval 2005 18 A- 60% 1
Dan Vogelbach 2012 19 A- 59% 0
Manny Ravelo 2000 18 A- 57% 0
Chip Ambres 1999 19 A- 57% 1
Maikel Franco 2011 18 A- 55% 0
Jurickson Profar 2010 17 A- 55% 1
Derek Norris 2008 19 A- 54% 1
Cesar Saba 1999 17 A- 54% 0
Edinson Rincon 2009 18 A- 52% 0

Top KATOH Scores for Seasons in Rookie ball

Player Year Age Level MLB Probability Played in Majors
Jeff Bianchi 2005 18 R 76% >1
Justin Morneau 2000 19 R 74% 1
Addison Russell 2012 18 R 74% 0
Garrett Guzman 2001 18 R 74% 0
James Loney 2002 18 R 74% 1
Prince Fielder 2002 18 R 73% 1
Pat Manning 1999 19 R 72% 0
Wilmer Flores 2008 16 R 70% 1
Alex Fernandez 1998 17 R 70% 0
Dorssys Paulino 2012 17 R 69% 0
Tony Blanco 2000 18 R 69% 1
Hank Blalock 1999 18 R 69% 1
Joe Mauer 2001 18 R 69% 1
Hanley Ramirez 2002 18 R 69% 1
Ramon Hernandez 1995 19 R 68% 1
Angel Salome 2005 19 R 68% 1
Marcos Vechionacci 2004 17 R 67% 0
Gary Sanchez 2010 17 R 66% 0
Scott Heard 2000 18 R 65% 0
Jose Tabata 2005 16 R 65% 1

Now for KATOH’s biggest whiffs. Looking at seasons prior to 2011, the following players had very high KATOH ratings, but never made it to baseball’s highest level. The biggest miss was Cesar King, a defensive-minded catcher from the Rangers organization. Though to KATOH’s credit, King did spend five days on the Kansas City Royals’ roster in 2001 without getting into a game. Following King are a couple of busted Yankees prospects in Jackson Melian and Eric Duncan. Not to make excuses for KATOH, but these guys’ high scores may have had something to do with the way the Yankees over-hyped their prospects back then. If those two weren’t on Baseball America’s top 100 list, KATOH would have pegged them in the 70’s, rather than in the high-90’s.

KATOH’s Biggest Misses

Player Year Age Level MLB Probability
Cesar King 1998 20 AA 99.427%
Jackson Melian 2000 20 AA 99%
Eric Duncan 2005 20 AA 98%
Matt Moses 2006 21 AA 98%
Juan Williams 1995 21 AA 98%
Jeff Natale 2005 22 AA 97%
Eric Duncan 2006 21 AA 97%
Nick Weglarz 2010 22 AAA 96%
Nick Weglarz 2009 21 AA 96%
Tony Mota 1999 21 AA 95%
Micah Franklin 1998 26 AAA 94%
Billy Martin 2003 27 AAA 94%
Bill McCarthy 2004 24 AAA 94%
Jackson Melian 1999 19 A+ 94%
Tagg Bozied 2004 24 AAA 94%
Kevin Grijak 1995 23 AAA 93%
Angel Villalona 2008 17 A 93%
Danny Dorn 2010 25 AAA 93%
Nic Jackson 2003 23 AAA 92%
Pat Cline 1997 22 AA 92%

And here are the major leaguers who KATOH deemed least likely to make it when they were in the minors. Its worth noting that a couple of them — Jorge Sosa and Jason Roach — made it as pitchers.

Worst KATOH Scores Who Made it to the Majors

Player Year Age Level MLB Probability
Justin Christian 2004 24 A- 0.017%
Jorge Sosa 1999 21 A- 0.027%
Tyler Graham 2006 22 A- 0.087%
Gary Johnson 1999 23 A- 0.136%
Bo Hart 1999 22 A- 0.155%
Tommy Manzella 2005 22 A- 0.181%
Michael Martinez 2006 23 A- 0.185%
Eddy Rodriguez 2012 26 A+ 0.194%
Kevin Mahar 2004 23 A- 0.215%
Will Venable 2005 22 A- 0.232%
Brent Dlugach 2004 21 A- 0.268%
Sean Barker 2002 22 A- 0.270%
Steve Holm 2002 22 A- 0.301%
Edgar V. Gonzalez 2000 22 A- 0.315%
Peter Zoccolillo 1999 22 A- 0.328%
Konrad Schmidt 2007 22 A- 0.337%
Tommy Medica 2010 22 A- 0.365%
Brian Esposito 2008 29 AA 0.392%
Jason Roach 1997 21 A- 0.396%
Jorge Sosa 2000 22 A- 0.439%

KATOH’s far from perfect, but overall, I think it does a pretty decent job of forecasting which players will make it to the majors. That being said, it’s still a work in progress, and I have a few ideas rolling around in my head to improve on the model. Furthermore, I’m working to develop something that will forecast how a minor leaguer will perform upon reaching the majors, to complement his MLB%. I’ll be dropping these new and improved KATOH projections (for both hitters and pitchers) after this year’s World Series, when we’ll all be desperate for something baseball-related to get us through the winter.

Statistics courtesy of FanGraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.


Using Short-Season A Stats to Predict Future Performance

Over the last couple of weeks, I’ve been looking into how a player’s stats, age, and prospect status can be used to predict whether he’ll ever play in the majors. So far, I’ve analyzed hitters in Rookie leaguesLow-A, High-A, Double-A and Triple-A using a methodology that I named KATOH (after Yankees prospect Gosuke Katoh), which consists of running a probit regression analysis. In a nutshell, a probit regression tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. While KATOH technically predicts the likelihood that a player will reach the majors, I’d argue it can also serve as a decent proxy for major league success. If something makes a player more likely to make the majors, there’s a good chance it also makes him more likely to succeed there.

For hitters in Low-A and High-A, age, strikeout rate, ISO, BABIP, and whether or not he was deemed a top 100 prospect by Baseball America all played a role in forecasting future success. And walk rate, while not predictive for players in Rookie ball, Low-A, or High-A, added a little bit to the model for Double-A and Triple-A hitters. Today, I’ll look into what KATOH has to say about players in Short-Season A-ball. Due to varying offensive environments in different years and leagues, all players’ stats were adjusted to reflect his league’s average for that year. For those interested, here’s the R output based on all players with at least 200 plate appearances in a season in SS A-ball from 1995-2007.

Short Season Output

Just like we saw with hitters in Rookie ball, a player’s Baseball America prospect status couldn’t tell us anything about his future as a big leaguer. This was entirely due the scarcity players top 100 prospects in the sample, as only a handful of players spent the year in SS A-ball after making BA’s top 100 list. Somewhat surprisingly, walk rate is predictive for players in SS-A, despite being statistically insignificant for hitters in Rookie ball and the more advanced A-ball levels. Another interesting wrinkle is the “Strikeout_Rate:Age” variable. Basically, this says that strikeout rate matters more for younger players than for older players at this level. Although frequent strikeouts are obviously a bad thing no matter how old you are:

Rplot

The season is less than 50 games old for most teams in the New York-Penn and Northwest Leagues, which makes it a little premature to start analyzing players’ stats. But just for kicks, here’s a look at what KATOH says about this year’s crop of players with at least 100 plate appearances through July 28th. The full list of players can be found here, and you’ll find an excerpt of those who broke the 40% barrier below:

Player Organization Age MLB Probability
Rowan Wick STL 21 82%
Eduard Pinto TEX 19 68%
Marcus Greene TEX 19 60%
Mauricio Dubon BOS 19 59%
Franklin Barreto TOR 18 57%
Christian Arroyo SFG 19 57%
Skyler Ewing SFG 21 56%
Taylor Gushue PIT 20 55%
Domingo Leyba DET 18 55%
Raudy Read WSN 20 53%
Nick Longhi BOS 18 52%
Andrew Reed HOU 21 52%
Danny Mars BOS 20 51%
Amed Rosario NYM 18 49%
Yairo Munoz OAK 19 48%
Seth Spivey TEX 21 47%
Mike Gerber DET 21 47%
Mark Zagunis CHC 21 47%
Kevin Krause PIT 21 46%
Leo Castillo CLE 20 45%
Jordan Luplow PIT 20 45%
Mason Davis MIA 21 40%
Kevin Ross PIT 20 40%
Franklin Navarro DET 19 40%

As we saw with Rookie league hitters, KATOH doesn’t think any of these players are shoo-ins to make it to the majors. Even Rowan Wick, who hit a Bondsian .378/.475/.815 before getting promoted, gets just 82%. This goes to show that SS A-ball stats just aren’t all that meaningful.

Once the season’s over, I’ll re-run everything using the final 2014 stats, which will give us a better sense of which prospects had the most promising years statistically. I also plan to engineer an alternative methodology — to supplement this one — that will take into account how a player performs in the majors, rather than his just getting there. Additionally, I hope to create something similar for projecting pitchers based on their statistical performance. In the meantime, I’ll apply the KATOH model to historical prospects and highlight some of its biggest “hits” and “misses” from years past. Keep an eye out for the next post in the coming days.

Statistics courtesy of FanGraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.


Using Rookie League Stats to Predict Future Performance

Over the last couple of weeks, I’ve been looking into how a player’s stats, age, and prospect status can be used to predict whether he’ll ever play in the majors. I used a methodology that I named KATOH (after Yankees prospect Gosuke Katoh), which consists of running a probit regression analysis. In a nutshell, a probit regression tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. While KATOH technically predicts the likelihood that a player will reach the majors, I’d argue it can also serve as a decent proxy for major league success. If something makes a player more likely to make the majors, there’s a good chance it also makes him more likely to succeed there. In the future, I plan to engineer an alternative methodology to go along with this one, that takes into account how a player performs in the majors, rather than his just getting there.

For hitters in Low-A and High-A, age, strikeout rate, ISO, BABIP, and whether or not he was deemed a top 100 prospect by Baseball America all played a role in forecasting future success. And walk rate, while not predictive for players in A-ball, added a little bit to the model for Double-A and Triple-A hitters. Today, I’ll look into what KATOH has to say about players in Rookie leagues. Due to varying offensive environments in different years and leagues, all players’ stats were adjusted to reflect his league’s average for that year. For those interested, here’s the R output based on all players with at least 200 plate appearances in a season in Rookie ball from 1995-2007.

Rookie Output

Just like we saw with hitters in the A-ball leagues, a player’s walk rate is not at all predictive of whether or not he’ll crack the majors. Unlike all of the other levels I’ve looked at so far, a player’s Baseball America prospect status couldn’t tell us anything about his future as a big-leaguer. This was entirely due the scarcity of top-100 prospects in the sample, as only a handful of players spent the year in rookie ball after making BA’s top-100 list.

The season is less than 40 games old for most rookie league teams, which makes it a little premature to start analyzing players’ stats. But just for kicks, here’s a look at what KATOH says about this year’s crop of rookie-ballers with at least 80 plate appearances through July 28th. This only considers players in the American rookie leagues — the Appalachian, Arizona, Gulf Coast, and Pioneer Leagues, meaning it excludes the Dominican and Venezuelan Summer Leagues. The full list of players can be found here, and you’ll find an excerpt of those who broke the 40% barrier below:

Player Organization Age MLB Probability
Kevin Padlo COL 17 73%
Bobby Bradley CLE 18 67%
Alex Verdugo LAD 18 65%
Luke Dykstra ATL 18 64%
Yu-Cheng Chang CLE 18 59%
Magneuris Sierra STL 18 56%
Juan Santana HOU 19 54%
Joshua Morgan TEX 18 50%
Jason Martin HOU 18 49%
Edmundo Sosa STL 18 48%
Oliver Caraballo TEX 19 46%
Sthervin Matos MIL 20 46%
Alexander Palma NYY 18 45%
Eloy Jimenez CHC 17 45%
Javier Guerra BOS 18 44%
Zach Shepherd DET 18 44%
Tito Polo PIT 19 44%
Jose Godoy STL 19 43%
Henry Castillo ARI 19 42%
David Gonzalez DET 20 42%
Dan Jansen TOR 19 42%
Max George COL 18 42%
Gleyber Torres CHC 17 42%
Luis Guzman WSN 18 41%
Jose Martinez KCR 17 41%
Alex Jackson SEA 18 40%
Emmanuel Tapia CLE 18 40%

What stands out most is that KATOH doesn’t think any of these players are shoo-ins to make it to the majors. Even those who are hitting the snot out of the ball get probabilities that fall short of what we saw for unremarkable performances in Double-A. Kevin Padlo, for example, gets just a 73%, despite hitting a ridiculous .317/.463/.619 as a 17-year-old. Its hard to do much better than that. I think this really speaks to how little rookie ball stats matter in the grand scheme of things. A good offensive showing is obviously better than a poor one, but numbers from this level need to be taken with a huge grain of salt. A hitter’s performance against pitchers who are fresh out of high school just can’t tell us much about how he’ll fare when matched up against more advanced pitching at the higher levels.

Next up, I’ll complete the series by looking at stats from short-season A-ball. Teams at that level are also only a few weeks into their season, but at the very least, it will be interesting to see how KATOH feels about SS A-ballers in general. Next week, I’ll apply the KATOH model to historical prospects and highlight some of its biggest “hits” and “misses” from the past.

Statistics courtesy of FanGraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.


Using Triple-A Stats to Predict Future Performance

Over the last couple of weeks, I’ve been looking into how a players’ stats, age, and prospect status can be used to predict whether he’ll ever play in the majors. I used a methodology that I named KATOH (after Yankees prospect Gosuke Katoh), which consists of running a probit regression analysis. In a nutshell, a probit regression tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. While KATOH technically predicts the likelihood that a player will reach the majors, I’d argue it can also serve as a decent proxy for major league success. If something makes a player more likely to make the majors, there’s a good chance it also makes him more likely to succeed there. This hypothesis may be less true for players at the Triple-A level since such a high proportion of these players make it to the majors, but I still think it provides some insight. To address this issue, In the future, I plan to engineer an alternative methodology that takes into account how a player performs in the majors, rather than his just getting there.

For hitters in Low-A and High-A, age, strikeout rate, ISO, BABIP, and whether or not he was deemed a top 100 prospect by Baseball America all played a role in forecasting future success. And walk rate, while not predictive for players in A-ball, added a little bit to the model for Double-A hitters. Today, I’ll look into what KATOH has to say about players in Triple-A leagues. Due to varying offensive environments in different years and leagues, all players’ stats were adjusted to reflect his league’s average for that year. I also only considered what happened during or after the sample season. So if a former big leaguer spends the full season in Triple-A, he’s only considered to have “made it to the majors” if he resurfaces again. For those interested, here’s the R output based on all players with at least 400 plate appearances in a season in Triple-A from 1995-2011.

AAA Output

This output looks pretty similar to what we saw for Double-A hitters, including the “I(Age^2)” coefficient, which adds a bit of nuance into how a players’ age can predict his future success. But in this version, there’s also an interaction between ISO and age. Basically, this says that the ability to hit for power is much more important for older players than younger players at the Triple-A league level.

Rplot

By clicking here, you can see what KATOH spits out for all players who logged at least 250 PA’s in Triple-A as of July 7th. . I also included a few interesting players who missed the 250 PA cut off, including Mookie Betts, Rob Refsnyder, Ramon Flores, and Kris Bryant. Here’s an excerpt of the top players from Triple-A this year. Joc Pederson tops the charts with an impressive 99.91% probability. Many of these players have already played in the majors, so these values can be interpreted as the odds that said player will play in the majors in the future.

Player Organization Age MLB Probability
Joc Pederson LAD 22 100%
Gregory Polanco PIT 22 100%
Kris Bryant CHC 22 100%
Mookie Betts BOS 21 100%
Arismendy Alcantara CHC 22 100%
Oscar Taveras STL 22 99%
Stephen Piscotty STL 23 98%
Steven Souza WSN 25 98%
Javier Baez CHC 21 98%
Maikel Franco PHI 21 97%
Taylor Lindsey LAA 22 97%
Domingo Santana HOU 21 97%
Enrique Hernandez HOU 22 96%
Chris Taylor SEA 23 95%
Jake Marisnick MIA 23 95%
Mikie Mahtook TBR 24 94%
Rob Refsnyder NYY 23 94%
Alfredo Marte ARI 25 93%
Carlos Sanchez CHW 22 93%
Nick Franklin SEA 23 93%
Ramon Flores NYY 22 92%
Ronald Torreyes HOU 21 92%
Joe Panik SFG 23 91%
Tyler Saladino CHW 24 91%
Giovanny Urshela CLE 22 90%

Now that I’ve gone through all levels of full-season ball, I’ll start at the bottom and cycle through the short-season leagues. These samples will be pretty small, but perhaps not completely useless now that those players have a few weeks’ worth of games under their belts. At the very least, it will be interesting to see what KATOH’s able to tell us about batters so far away from the big leagues, even if it’s a little premature to ask KATOH about 2014’s players.


Using Double-A Stats to Predict Future Performance

Over the last couple of weeks, I’ve been looking into how a players’ stats, age, and prospect status can be used to predict whether he’ll ever play in the majors. I used a methodology that I named KATOH (after Yankees prospect Gosuke Katoh), which consists of running a probit regression analysis. In a nutshell, a probit regression tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. While KATOH technically predicts the likelihood that a player will reach the majors, I’d argue it can also serve as a decent proxy for major league success. If something makes a player more likely to make the majors, there’s a good chance it also makes him more likely to succeed there.

Things that were predictive for players in low-A and high-A included age, strikeout rate, ISO, BABIP, and whether or not he was deemed a top 100 prospect by Baseball America in the pre-season. However, a player’s walk rate was not significant in predicting a player’s ascension to the majors. Today, I’ll look into what KATOH has to say about players in double-A leagues. For those interested, here’s the R output based on all players with at least 400 plate appearances in a season in double-A from 1995-2010. Due to varying offensive environments in different years and leagues, all players’ stats were adjusted to reflect his league’s average for that year.

AA Output

Unlike in the A-ball iterations of KATOH, a player’s double-A walk rate is predictive — albeit only slightly — of whether or not he’ll make it to the show. While walk rate is statistically significant, it still matters much less than the other stats: it takes 3 or 4 percentage points on a player’s walk rate to match what 1 percentage point of strikeout rate does to a player’s MLB probability.

This version is also different in that there are a couple of significant interaction terms, signified by the last two coefficients in the above output. The “I(Age^2)” term adds a little bit of nuance into how a players’ age can predict his future success. While the “ISO:BA.Top.100.Prospect” term basically says that if you’re a top 100 prospect, hitting for power is slightly less important than it would be otherwise. Hitting for power and making Baseball America’s top 100 list both make a player much more likely to make it to the majors, but if he does both, he’s a tad less likely to make it than his power output and prospect status would suggest independently. Put another way, a few top 100 prospects hit for power in double-A, but never cracked the majors — such as Jason Stokes (.241 ISO), Nick Weglarz (.204 ISO) and Eric Duncan (.173 ISO). But virtually all of the low-power guys made it, including Elvis Andrus (.073 ISO), Luis Castillo (.076 ISO), and Carl Crawford (.078). For non-top 100 guys, many more punchless hitters topped out in double-A and triple-A.

By clicking here, you can see what KATOH spits out for all current prospects who logged at least 250 PA’s in double-A as of July 7th, as well as a few that fell short of the cutoff — most notably Joey Gallo, Kevin Plawecki, and Robert Refsnyder. Topping the list is Mookie Betts with a probability of 99.95%, and of course the prophesy was fulfilled when the Red Sox called up the 21-year-old last month. Here’s an excerpt of the top players from double-A this year:

Player Organization Age MLB Probability
Mookie Betts BOS 21 100%
Francisco Lindor CLE 20 100%
Gary Sanchez NYY 21 99%
Austin Hedges SDP 21 99%
Alen Hanson PIT 21 99%
Jorge Bonifacio KCR 21 98%
Blake Swihart BOS 22 98%
Kris Bryant CHC 22 93%
Ketel Marte SEA 20 91%
Rangel Ravelo CHW 22 90%
Robert Refsnyder NYY 23 86%
Jake Lamb ARI 23 85%
Jake Hager TBR 21 84%
Darnell Sweeney LAD 23 83%
Joey Gallo TEX 20 82%
Preston Tucker HOU 23 81%
Scott Schebler LAD 23 79%
Kevin Plawecki NYM 23 79%
Cheslor Cuthbert KCR 21 78%
Kyle Kubitza ATL 23 77%
Michael Taylor WSN 23 76%
Christian Walker BAL 23 76%
Ryan Brett TBR 22 75%

Keep an eye out for the next installment, which will dive into what KATOH says about hitters at the triple-A level.

Statistics courtesy of FanGraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.


Using High-A Stats to Predict Future Performance

Last week, I looked into how a player’s low-A stats — along with his age and prospect status at the time — can be used to predict whether he’ll ever play in the majors. I used a methodology that I named KATOH (after Yankees prospect Gosuke Katoh), which consists of running a probit regression analysis. In a nutshell, a probit regression tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. While KATOH technically predicts the likelihood that a player will reach the majors, I’d argue it can also serve as a decent proxy for major league success. If something makes a player more likely to make the majors, there’s a good chance it also makes him more likely to succeed there.

Things that were predictive for players in low-A included: age, strikeout rate, ISO, BABIP, and whether or not he was deemed a top 100 prospect by Baseball America in the pre-season. However, a player’s walk rate was not significant in predicting a player’s ascension to the majors. Today, I’ll analyze what KATOH has to say about players in class-A-advanced leagues. Here’s the R output based on all players with at least 400 plate appearances in a season in high-A from 1995-2009:

High-A Output

This looks very similar to what I found for low-A players: Walk rate isn’t significant, and everything else has very similar effects on the final probability. However, the coefficients from this model are all a tad bigger than those from the low-A version, implying that high-A stats might be a bit more telling of a player’s future. Intuitively, this makes sense: The closer a player is to the big leagues, the more his stats start to reflect his future potential.

By clicking here, you can see what KATOH spits out for all current prospects who logged at least 250 PA’s in high-A as of July 7th. I also included a few notable players who fell short of the threshold, namely Joey Gallo (who checks in at a remarkable 99.8%), Peter O’Brien, and Jesse Winker. Here’s an excerpt of the top-ranking players:

Player Organization Age MLB Probability
Joey Gallo TEX 20 100%
Corey Seager LAD 20 99%
Carlos Correa HOU 19 99%
Albert Almora CHC 20 93%
Nick Williams TEX 20 93%
D.J. Peterson SEA 22 93%
Jesse Winker CIN 20 91%
Orlando Arcia MIL 19 88%
Jose Peraza ATL 20 87%
Colin Moran MIA 21 87%
Renato Nunez OAK 20 86%
Tyrone Taylor MIL 20 85%
Hunter Renfroe SDP 22 84%
Josh Bell PIT 21 84%
Raul Mondesi KCR 18 83%
Daniel Robertson OAK 20 83%
Jorge Polanco MIN 20 81%
Dilson Herrera NYM 20 77%
Breyvic Valera STL 21 77%
Peter O’Brien NYY 23 76%
Matt Olson OAK 20 75%
Jorge Alfaro TEX 21 75%
Patrick Leonard TBR 21 75%
Dalton Pompey TOR 21 73%
Billy McKinney OAK 19 73%
Teoscar Hernandez HOU 21 73%
Brandon Nimmo NYM 21 72%
Jose Rondon LAA 20 70%
Rio Ruiz HOU 20 70%
Brandon Drury ARI 21 70%

Next up will be double-A. Unlike A-ball, double-A tends to be a random mishmash of prospects and minor-league lifers, so it will be interesting to see how KATOH handles this wide array of players. And perhaps double-A is where a player’s walk rate finally starts to tell us something about his future success.

Statistics courtesy of Fangraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.


Using low-A Stats to Predict Future Performance

For a piece I wrote a couple of weeks ago, I used historical minor league stats to to construct a model that predicts how likely it is that a teenager in A-ball will make it to the major leagues. While this method produced some interesting results, it also had some flaws, most notably that it didn’t take scouting or defense into account. This basically meant that a great defensive player — or a raw, toolsy player — could easily get an undeserving low rating if he had a poor year at the plate. Another drawback was that it only applied to teenaged players in low-A, who represent a pretty small portion of players at the level, and just a sliver of the prospect population.

With these shortcomings in mind, I’ve taken another stab at predicting which players from the South Atlantic and Midwest leagues are most and least likely to make it to the show. Like last time, I ran a probit regression, which tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. But instead of limiting my analysis to players under the age of 20, I considered all players and included age as a variable in my model. I also attempted to quantify scouting by taking into account whether or not a player made Baseball America’s pre-season prospect rankings. The model still relies heavily on offensive performance, but isn’t entirely guilty of “scouting the stat line.”

It’s come to my attention that Chris St. John of Beyond the Boxscore is doing something very similar with his JAVIER projection system, and it will be interesting to see where his model and mine agree and disagree once I repeat this exercise for all minor leaguers. Chris named his system after Chicago Cubs prospect Javier Baez, so I’ll follow suit and also name mine after a prospect. Yankees’ prospect Gosuke Katoh was my original my inspiration for this idea, so I’ll call my methodology KATOH. Without further adu, here’s the resulting R output if you’re into that kind of stuff:

Low-A Output
All hitting stats were taken relative to league average and then scaled to 2014 low-A league averages.

A player’s age, prospect status, strikeout rate, ISO, and even BABIP all proved to be predictive in the direction you’d expect. But the show-stopper here is that a player’s walk rate isn’t at all predictive of whether or not he’ll make it to the majors. One possible explanation is that — unlike power or speed — plate discipline is a skill that can be learned, and many players in low-A are still developing their batting eye and learning to lay off pitches. As one example, Brian McCann walked less than 5% of the time as a 19-year-old in the Sally League, but still developed into a relatively patient big leaguer.

Another possibility is that you don’t have to be a particularly good hitter to run a high walk rate in low-A. Pitchers at that level often have little idea where the ball’s going, which enables hitters to take an ultra-passive approach in the hopes that they’ll see four balls before they see three strikes. That strategy might work in the low minors, but can lose it’s effectiveness in the upper-levels where pitchers have a better handle on their control. I’ve included an excerpt of what KATOH spits out for modern-day players in low-A who logged at least 250 plate appearances through July 7th. The full list of qualifying players can be seen here.

Player Name Organization Player’s Age MLB Probability
David Dahl COL 20 89%
Jake Bauers SDP 18 89%
J.P. Crawford PHI 19 87%
Dominic Smith NYM 19 79%
Willy Adames DET 18 78%
Chance Sisco BAL 19 74%
Reese McGuire PIT 19 73%
Andrew Velazquez ARI 19 70%
Manuel Margot BOS 19 69%
Ryan McMahon COL 19 68%
Franmil Reyes SDP 18 66%
Brett Phillips HOU 20 65%
Wendell Rijo BOS 18 64%
Carson Kelly STL 19 63%
Kean Wong TBR 19 63%
Trey Michalczewski CHW 19 62%
Clint Frazier CLE 19 62%
Clint Coulter MIL 20 62%
Evan Van Hoosier TEX 20 59%
Austin Dean MIA 20 59%
Drew Ward WSN 19 58%
Raimel Tapia COL 20 56%
Tanner Rahier CIN 20 55%
Correlle Prime COL 20 55%
Carlos Asuaje BOS 22 54%
Dustin Peterson SDP 19 54%
Jesmuel Valentin LAD 20 54%
Dawel Lugo TOR 19 54%
Avery Romero MIA 21 53%
Chad Wallach MIA 22 53%
Nomar Mazara TEX 19 52%

Over the next couple of weeks, I plan to repeat this exercise for all levels of minor league play. As I climb the minor league ladder, it will be interesting to see when — or even if — a hitter’s walk rate starts to be predictive of whether or not he’ll make it to the majors. Keep an eye out for the next iteration, which will look at high-A stats and slap probabilities on current high-A players.

Statistics courtesy of FanGraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.