Archive for Research

Using Short-Season A Stats to Predict Future Performance

Over the last couple of weeks, I’ve been looking into how a player’s stats, age, and prospect status can be used to predict whether he’ll ever play in the majors. So far, I’ve analyzed hitters in Rookie leaguesLow-A, High-A, Double-A and Triple-A using a methodology that I named KATOH (after Yankees prospect Gosuke Katoh), which consists of running a probit regression analysis. In a nutshell, a probit regression tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. While KATOH technically predicts the likelihood that a player will reach the majors, I’d argue it can also serve as a decent proxy for major league success. If something makes a player more likely to make the majors, there’s a good chance it also makes him more likely to succeed there.

For hitters in Low-A and High-A, age, strikeout rate, ISO, BABIP, and whether or not he was deemed a top 100 prospect by Baseball America all played a role in forecasting future success. And walk rate, while not predictive for players in Rookie ball, Low-A, or High-A, added a little bit to the model for Double-A and Triple-A hitters. Today, I’ll look into what KATOH has to say about players in Short-Season A-ball. Due to varying offensive environments in different years and leagues, all players’ stats were adjusted to reflect his league’s average for that year. For those interested, here’s the R output based on all players with at least 200 plate appearances in a season in SS A-ball from 1995-2007.

Short Season Output

Just like we saw with hitters in Rookie ball, a player’s Baseball America prospect status couldn’t tell us anything about his future as a big leaguer. This was entirely due the scarcity players top 100 prospects in the sample, as only a handful of players spent the year in SS A-ball after making BA’s top 100 list. Somewhat surprisingly, walk rate is predictive for players in SS-A, despite being statistically insignificant for hitters in Rookie ball and the more advanced A-ball levels. Another interesting wrinkle is the “Strikeout_Rate:Age” variable. Basically, this says that strikeout rate matters more for younger players than for older players at this level. Although frequent strikeouts are obviously a bad thing no matter how old you are:

Rplot

The season is less than 50 games old for most teams in the New York-Penn and Northwest Leagues, which makes it a little premature to start analyzing players’ stats. But just for kicks, here’s a look at what KATOH says about this year’s crop of players with at least 100 plate appearances through July 28th. The full list of players can be found here, and you’ll find an excerpt of those who broke the 40% barrier below:

Player Organization Age MLB Probability
Rowan Wick STL 21 82%
Eduard Pinto TEX 19 68%
Marcus Greene TEX 19 60%
Mauricio Dubon BOS 19 59%
Franklin Barreto TOR 18 57%
Christian Arroyo SFG 19 57%
Skyler Ewing SFG 21 56%
Taylor Gushue PIT 20 55%
Domingo Leyba DET 18 55%
Raudy Read WSN 20 53%
Nick Longhi BOS 18 52%
Andrew Reed HOU 21 52%
Danny Mars BOS 20 51%
Amed Rosario NYM 18 49%
Yairo Munoz OAK 19 48%
Seth Spivey TEX 21 47%
Mike Gerber DET 21 47%
Mark Zagunis CHC 21 47%
Kevin Krause PIT 21 46%
Leo Castillo CLE 20 45%
Jordan Luplow PIT 20 45%
Mason Davis MIA 21 40%
Kevin Ross PIT 20 40%
Franklin Navarro DET 19 40%

As we saw with Rookie league hitters, KATOH doesn’t think any of these players are shoo-ins to make it to the majors. Even Rowan Wick, who hit a Bondsian .378/.475/.815 before getting promoted, gets just 82%. This goes to show that SS A-ball stats just aren’t all that meaningful.

Once the season’s over, I’ll re-run everything using the final 2014 stats, which will give us a better sense of which prospects had the most promising years statistically. I also plan to engineer an alternative methodology — to supplement this one — that will take into account how a player performs in the majors, rather than his just getting there. Additionally, I hope to create something similar for projecting pitchers based on their statistical performance. In the meantime, I’ll apply the KATOH model to historical prospects and highlight some of its biggest “hits” and “misses” from years past. Keep an eye out for the next post in the coming days.

Statistics courtesy of FanGraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.


Pitch Win Values for Starting Pitchers – July 2014

Introduction

A couple months back, I introduced a new method of calculating pitch values using a FIP-based WAR methodology.  That post details the basic framework of these calculations and  can be found here .  The May and June updates can be found here and here respectively.  This post is simply the July 2014 update of the same data.  What follows is predominantly data-heavy but should still provide useful talking points for discussion.  Let’s dive in and see what we can find.  Please note that the same caveats apply as previous months.  We’re at the mercy of pitch classification.  I’m sure your favorite pitcher doesn’t throw that pitch that has been rated as incredibly below average, but we have to go off of the data that is available.  Also, Baseball Prospectus’s PitchF/x leaderboards list only nine pitches (Four-Seam Fastball, Sinker, Cutter, Splitter, Curveball, Slider, Changeup, Screwball, and Knuckleball).  Anything that may be classified outside of these categories is not included.  Also, anything classified as a “slow curve” is not included in Baseball Prospectus’s curveball data.

Constants

Before we begin, we must first update the constants used in calculation for Jule.  As a refresher, we need three different constants for calculation: strikes per strikeout, balls per walk, and a FIP constant to bring the values onto the right scale.  We will tackle them each individually.

First, let’s discuss the strikeout constant.  In July, there were 47,449 strikes thrown by starting pitchers.  Of these 47,449 strikes, 4,585 were turned into hits and 13,750 outs were recorded.  Of these 13,750 outs, 3,725 were converted via the strikeout, leaving us with 10,025 ball-in-play outs.  10,025 ball-in-play strikes and 4,585 hits sum to 14,610 balls-in-play.  Subtracting 14,610 balls-in-play from our original 47,449 strikes leaves us with 32,839 strikes to distribute over our 3,725 strikeouts.  That’s a ratio of 8.82 strikes per strikeout.  This is exactly the same as our from 8.82 strikes per strikeout in June.

The next two constants are much easier to ascertain.  In July, there were 26,244 balls thrown by starters and 1,328 walked batters.  That’s a ratio of 19.76 balls per walk, up from 19.36 balls per walk in June.  This data would suggest that hitters were slightly less likely to walk in July than previously.  The FIP subtotal for all pitches in July was 0.52.  The MLB Run Average for July was 4.17, meaning our FIP constant for May is 3.65.

Constant Value
Strikes/K 8.82
Balls/BB 19.76
cFIP 3.65

The following table details how the constants have changed month-to-month.

Month K BB cFIP
March/April 8.47 18.50 3.68
May 8.88 18.77 3.58
June 8.82 19.36 3.59
July 8.82 19.76 3.65

Pitch Values – July 2014

For reference, the following table details the FIP for each pitch type in the month of July.

Pitch FIP
Four-Seam 4.06
Sinker 4.20
Cutter 4.42
Splitter 3.50
Curveball 4.08
Slider 3.87
Changeup 4.79
Screwball 3.58
Knuckleball 3.97
MLB RA 4.16

As we can see, only three pitches would be classified as below average for the month of July: sinkers, cutters, and changeups.  Four-Seam Fastballs and curveballs also came in right around league average.  Pitchers that were able to stand out in these categories tended to have better overall months than pitchers who excelled at the other pitches.  Now, let’s proceed to the data for the month of July.

Four-Seam Fastball

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Ian Kennedy 0.6 180 Brad Peacock -0.3
2 Clayton Kershaw 0.6 181 Jake Odorizzi -0.3
3 Jose Quintana 0.6 182 Jason Hammel -0.3
4 Drew Hutchison 0.5 183 Edwin Jackson -0.3
5 Jacob deGrom 0.5 184 Chris Young -0.3

Sinker

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Brandon McCarthy 0.4 167 Chase Whitley -0.2
2 Roberto Hernandez 0.4 168 Andrew Heaney -0.2
3 Doug Fister 0.4 169 Jon Niese -0.2
4 Hisashi Iwakuma 0.4 170 David Buchanan -0.2
5 Wade Miley 0.3 171 Nick Tepesch -0.3

Cutter

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Josh Collmenter 0.3 77 Brandon McCarthy -0.2
2 Jon Lester 0.3 78 Drew Smyly -0.2
3 Kevin Correia 0.2 79 Brandon Workman -0.2
4 Jarred Cosart 0.2 80 Dan Haren -0.3
5 Adam Wainwright 0.2 81 Hector Noesi -0.4

Splitter

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Hisashi Iwakuma 0.3 27 Daisuke Matsuzaka 0.0
2 Hiroki Kuroda 0.3 28 Ubaldo Jimenez 0.0
3 Jake Odorizzi 0.2 29 Tim Lincecum -0.1
4 Alex Cobb 0.2 30 Doug Fister -0.1
5 Tim Hudson 0.2 31 Clay Buchholz -0.1

Curveball

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Sonny Gray 0.3 155 Hiroki Kuroda -0.1
2 Clay Buchholz 0.2 156 Josh Tomlin -0.2
3 Jesse Hahn 0.2 157 Kevin Correia -0.2
4 Adam Wainwright 0.2 158 Eric Stults -0.3
5 Jose Quintana 0.2 159 Josh Beckett -0.3

Slider

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Garrett Richards 0.5 125 Jair Jurrjens -0.1
2 Tyson Ross 0.4 126 Jason Lane -0.1
3 Jake Arrieta 0.3 127 Jake Buchanan -0.1
4 Brett Anderson 0.3 128 Matt Cain -0.1
5 Kyle Lohse 0.3 129 C.J. Wilson -0.1

Changeup

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Cole Hamels 0.3 156 Rubby de la Rosa -0.2
2 David Price 0.3 157 David Holmberg -0.2
3 Chris Sale 0.2 158 Mike Minor -0.2
4 Zack Greinke 0.2 159 Jeff Locke -0.3
5 James Shields 0.2 160 Drew Hutchison -0.4

Screwball

Rank Pitcher Pitch Value
1 Trevor Bauer 0.0
2 Julio Teheran 0.0
3 Hector Santiago 0.0

Knuckleball

Rank Pitcher Pitch Value
1 R.A. Dickey 0.4

Overall

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Cole Hamels 1.0 187 Jair Jurrjens -0.4
2 Jacob deGrom 0.9 188 Erik Bedard -0.4
3 Tyson Ross 0.9 189 Jason Hammel -0.4
4 Jose Quintana 0.9 190 Brad Peacock -0.4
5 Chris Sale 0.9 191 Nick Tepesch -0.4

Pitch Ratings – July 2014

Four-Seam Fastball

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Drew Hutchison 59 83 Jake Odorizzi 38
2 Jose Quintana 59 84 Jake Peavy 38
3 Cole Hamels 58 85 Josh Tomlin 36
4 Mark Buehrle 58 86 Brad Peacock 35
5 Tim Lincecum 58 87 Jason Hammel 34

Sinker

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Travis Wood 58 73 Kevin Correia 36
2 Scott Kazmir 57 74 John Danks 36
3 Matt Garza 57 75 Jeff Samardzija 35
4 Brandon McCarthy 57 76 Dan Haren 32
5 Doug Fister 57 77 Nick Tepesch 25

Cutter

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Marcus Stroman 58 32 Mike Minor 33
2 Jon Lester 58 33 Tim Hudson 33
3 Daisuke Matsuzaka 57 34 Brandon McCarthy 32
4 Phil Hughes 57 35 Dan Haren 28
5 Franklin Morales 57 36 Hector Noesi 20

Splitter

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Tim Hudson 57 8 Jorge de la Rosa 53
2 Kyle Kendrick 56 9 Alfredo Simon 53
3 Hisashi Iwakuma 56 10 Jeff Samardzija 53
4 Kevin Gausman 56 11 Alex Cobb 52
5 Hiroki Kuroda 56 12 Tim Lincecum 42

Curveball

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Jacob deGrom 59 65 Franklin Morales 38
2 Felix Hernandez 59 66 Chase Anderson 38
3 Clay Buchholz 58 67 Jered Weaver 37
4 Brandon McCarthy 58 68 Kevin Correia 26
5 David Phelps 58 69 Josh Beckett 20

Slider

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Jordan Zimmermann 59 55 Zack Wheeler 44
2 Brett Anderson 59 56 Miles Mikolas 43
3 Wei-Yin Chen 58 57 Miguel Gonzalez 42
4 Kyle Lohse 58 58 Carlos Martinez 40
5 Corey Kluber 58 59 Yu Darvish 39

Changeup

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Chase Whitley 60 65 Jeff Locke 30
2 Cole Hamels 59 66 Joe Kelly 27
3 Chase Anderson 59 67 Rubby de la Rosa 26
4 Hector Santiago 58 68(t) Drew Hutchison 20
5 Jered Weaver 57 68(t) Mike Minor 20

Screwball

Rank Pitcher Pitch Rating
1 Trevor Bauer 52

Knuckleball

Rank Pitcher Pitch Rating
1 R.A. Dickey 52

Monthly Discussion

As we can see, Cole Hamels takes the top for this month due to the  strength of his overall repertoire.  Hamels was classified as throwing five different pitches in July (Four-Seam, Sinker, Cutter, Curveball, and Changeup) and managed to earn at least 0.1 WAR from all five.  The most valuable pitch overall in July was Ian Kennedy’s Four-Seam Fastball.  The least valuable was Drew Hutchison’s Changeup.  As far as offspeed pitches, Garrett Richards’s 0.5 WAR from his slider lead the way.  The least valuable fastball was Hector Noesi’s cutter.

On our 20-80 scale pitch ratings, the highest rated qualifying pitch was Chase Whitley’s changeup.  The lowest rated pitches were the changeups thrown by Drew Hutchison and Mike Minor, Hector Noesi’s cutter, and Josh Beckett’s curveball.  The highest rated fastball was Drew Hutchison’s four-seam fastball.

Pitch Values – 2014 Season

Four-Seam Fastball

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Ian Kennedy 1.9 247 Masahiro Tanaka -0.4
2 Jose Quintana 1.7 248 Dan Straily -0.4
3 Phil Hughes 1.6 249 Nick Martinez -0.4
4 Jordan Zimmermann 1.6 250 Juan Nicasio -0.4
5 Clayton Kershaw 1.5 251 Marco Estrada -0.7

Sinker

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Charlie Morton 1.5 236 John Danks -0.3
2 Felix Hernandez 1.3 237 Wandy Rodriguez -0.3
3 David Price 1.1 238 Vidal Nuno -0.3
4 Chris Archer 1.1 239 Nick Tepesch -0.4
5 Cliff Lee 1.1 240 Andrew Heaney -0.4

Cutter

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Madison Bumgarner 1.2 110 Dan Haren -0.2
2 Adam Wainwright 1.2 111 Felipe Paulino -0.2
3 Corey Kluber 1.2 112 Hector Noesi -0.3
4 Jarred Cosart 1.2 113 C.J. Wilson -0.3
5 Josh Collmenter 1.0 114 Brandon McCarthy -0.5

Splitter

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Masahiro Tanaka 0.8 32 Jake Peavy -0.1
2 Alex Cobb 0.6 33 Franklin Morales -0.2
3 Hisashi Iwakuma 0.6 34 Miguel Gonzalez -0.2
4 Hiroki Kuroda 0.6 35 Danny Salazar -0.2
5 Tim Hudson 0.4 36 Clay Buchholz -0.4

Curveball

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Sonny Gray 1.1 210 Homer Bailey -0.2
2 A.J. Burnett 0.9 211 Alfredo Simon -0.2
3 Brandon McCarthy 0.8 212 Felipe Paulino -0.3
4 Adam Wainwright 0.7 213 Franklin Morales -0.3
5 Jose Fernandez 0.6 214 Eric Stults -0.4

Slider

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Garrett Richards 1.3 179 Roberto Hernandez -0.2
2 Tyson Ross 1.1 180 Liam Hendriks -0.2
3 Kyle Lohse 0.8 181 Erasmo Ramirez -0.3
4 Corey Kluber 0.8 182 Danny Salazar -0.3
5 Ervin Santana 0.8 183 Travis Wood -0.4

Changeup

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Felix Hernandez 0.9 232 Wandy Rodriguez -0.4
2 Stephen Strasburg 0.6 233 Matt Cain -0.4
3 Cole Hamels 0.6 234 Jordan Zimmermann -0.5
4 Chris Sale 0.5 235 Drew Hutchison -0.6
5 Roberto Hernandez 0.5 236 Marco Estrada -0.6

Screwball

Rank Pitcher Pitch Value
1 Trevor Bauer 0.1
2 Alfredo Simon 0.0
3 Hector Santiago 0.0
4 Julio Teheran 0.0

Knuckleball

Rank Pitcher Pitch Value
1 R.A. Dickey 1.2
2 C.J. Wilson 0.0

Overall

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Felix Hernandez 3.5 254 Felipe Paulino -0.5
2 Adam Wainwright 3.2 255 Juan Nicasio -0.5
3 Garrett Richards 2.9 256 Nick Martinez -0.6
4 Corey Kluber 2.9 257 Wandy Rodriguez -0.8
5 Jose Quintana 2.7 258 Marco Estrada -1.2

Year-to-Date Discussion

If we look at the year-to-date numbers, AL FIP and MLB WAR leader Felix Hernandez still sits in the top spot.  Current MLB FIP leader Clayton Kershaw ranks ninth.  The least valuable starter has been Marco Estrada.  On a per-pitch basis, the most valuable pitch has been Ian Kennedy’s four-seam fastball.  The most valuable offspeed pitch has been Garrett Richards’s slider.  The least valuable pitch has been Marco Estrada’s four-seam fastball.  The least value offspeed pitch has been Marco Estrada’s changeup.  Needless to say, it’s been a rough year for Marco.  Qualitatively, I feel fairly encouraged by the year-to-date results so far.  The leaderboard is topped by two no-doubt aces, both of whom currently in the top two in their respect leagues in FIP, and Marco Estrada comes in at the bottom after posting the highest FIP among qualified starters so far.  For reference, the top five in the year-to-date overall rankings are currently 1st, 12th, 10th, 2nd, and 9th on the FanGraphs WAR leaderboards respectively.


Using Rookie League Stats to Predict Future Performance

Over the last couple of weeks, I’ve been looking into how a player’s stats, age, and prospect status can be used to predict whether he’ll ever play in the majors. I used a methodology that I named KATOH (after Yankees prospect Gosuke Katoh), which consists of running a probit regression analysis. In a nutshell, a probit regression tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. While KATOH technically predicts the likelihood that a player will reach the majors, I’d argue it can also serve as a decent proxy for major league success. If something makes a player more likely to make the majors, there’s a good chance it also makes him more likely to succeed there. In the future, I plan to engineer an alternative methodology to go along with this one, that takes into account how a player performs in the majors, rather than his just getting there.

For hitters in Low-A and High-A, age, strikeout rate, ISO, BABIP, and whether or not he was deemed a top 100 prospect by Baseball America all played a role in forecasting future success. And walk rate, while not predictive for players in A-ball, added a little bit to the model for Double-A and Triple-A hitters. Today, I’ll look into what KATOH has to say about players in Rookie leagues. Due to varying offensive environments in different years and leagues, all players’ stats were adjusted to reflect his league’s average for that year. For those interested, here’s the R output based on all players with at least 200 plate appearances in a season in Rookie ball from 1995-2007.

Rookie Output

Just like we saw with hitters in the A-ball leagues, a player’s walk rate is not at all predictive of whether or not he’ll crack the majors. Unlike all of the other levels I’ve looked at so far, a player’s Baseball America prospect status couldn’t tell us anything about his future as a big-leaguer. This was entirely due the scarcity of top-100 prospects in the sample, as only a handful of players spent the year in rookie ball after making BA’s top-100 list.

The season is less than 40 games old for most rookie league teams, which makes it a little premature to start analyzing players’ stats. But just for kicks, here’s a look at what KATOH says about this year’s crop of rookie-ballers with at least 80 plate appearances through July 28th. This only considers players in the American rookie leagues — the Appalachian, Arizona, Gulf Coast, and Pioneer Leagues, meaning it excludes the Dominican and Venezuelan Summer Leagues. The full list of players can be found here, and you’ll find an excerpt of those who broke the 40% barrier below:

Player Organization Age MLB Probability
Kevin Padlo COL 17 73%
Bobby Bradley CLE 18 67%
Alex Verdugo LAD 18 65%
Luke Dykstra ATL 18 64%
Yu-Cheng Chang CLE 18 59%
Magneuris Sierra STL 18 56%
Juan Santana HOU 19 54%
Joshua Morgan TEX 18 50%
Jason Martin HOU 18 49%
Edmundo Sosa STL 18 48%
Oliver Caraballo TEX 19 46%
Sthervin Matos MIL 20 46%
Alexander Palma NYY 18 45%
Eloy Jimenez CHC 17 45%
Javier Guerra BOS 18 44%
Zach Shepherd DET 18 44%
Tito Polo PIT 19 44%
Jose Godoy STL 19 43%
Henry Castillo ARI 19 42%
David Gonzalez DET 20 42%
Dan Jansen TOR 19 42%
Max George COL 18 42%
Gleyber Torres CHC 17 42%
Luis Guzman WSN 18 41%
Jose Martinez KCR 17 41%
Alex Jackson SEA 18 40%
Emmanuel Tapia CLE 18 40%

What stands out most is that KATOH doesn’t think any of these players are shoo-ins to make it to the majors. Even those who are hitting the snot out of the ball get probabilities that fall short of what we saw for unremarkable performances in Double-A. Kevin Padlo, for example, gets just a 73%, despite hitting a ridiculous .317/.463/.619 as a 17-year-old. Its hard to do much better than that. I think this really speaks to how little rookie ball stats matter in the grand scheme of things. A good offensive showing is obviously better than a poor one, but numbers from this level need to be taken with a huge grain of salt. A hitter’s performance against pitchers who are fresh out of high school just can’t tell us much about how he’ll fare when matched up against more advanced pitching at the higher levels.

Next up, I’ll complete the series by looking at stats from short-season A-ball. Teams at that level are also only a few weeks into their season, but at the very least, it will be interesting to see how KATOH feels about SS A-ballers in general. Next week, I’ll apply the KATOH model to historical prospects and highlight some of its biggest “hits” and “misses” from the past.

Statistics courtesy of FanGraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.


Sonny Gray, Perfecting What Works

Tip: Click on any acronyms for an explanation in the FanGraphs glossary of terms.

With his final turn in the rotation for July completed, we’ve now had almost exactly one full year of Sonny Gray – one year of the 24-year-old starting pitcher, the up-and-coming staff ace, the dueler of Playoff Verlanders. In that year, we’ve seen him do some great things, like going eight innings with nine Ks and no runs against the Tigers in Game 2 of the 2013 ALDS. We’ve also seen MLB Fan Cave forcing him to prank New Yorkers as a result of some unknown fine print embedded in his rookie contract. Above all else, the one thing we’ve always known is that Sonny Gray has a really good curveball. Let’s take a look at it for all of its 12 to 6, 80-MPH Uncle Charlie glory, from a game against the Astros in August of last year:

Gray_Curve_Early_2

How good is his curveball? He has never given up a home run off of the pitch, with the only extra-base hits against the curve in his career being four doubles. In the past calendar year, Sonny Gray has saved more runs with his curveball than any other pitcher in baseball, and is behind only Corey Kluber and Yu Darvish in Runs Saved/100 curveballs. Having watched Kluber a lot, I suspect his slider/slurve is actually being classified as a curveball; I think it looks like a slider, but PITCHf/x doesn’t, so I will defer to the all-knowing pitch computer. Regardless, with the metrics we’re about to examine, Sonny Gray has one of the best curveballs in the game. What we’re going to focus on specifically are the advances in his curve’s effectiveness, spurred on by an adjustment in the way he throws the pitch.

To start, let’s take a look at the top-15 starters by wCB and wCB/C for the past calendar year:

wCB_Leaders

As stated before, Gray is at the top in both of these categories. We should put a little more stock into wCB/C, as it normalizes all pitchers to runs saved per 100 pitches, taking away the advantage that one player might have due to throwing a certain pitch more frequently than another player. This is important for what we’re looking at, because Sonny Gray throws a lot of curveballs. How frequently does he throw curveballs? Here are the leaders for percentage of curveballs thrown over the last calendar year:

Screen Shot 2014-07-29 at 9.03.14 PM

The words “second only to Scott Feldman” don’t come up very often, but here they are. Gray throws his curveball a ton. Not only has he always leaned on the curve as a major weapon in his arsenal, but he has actually increased his number of curves thrown since he came into the league every month except for May (when he maintained his % thrown) and June of this year, when he seemed to temporarily lose a feel for the pitch and threw more changeups. However, his first start of July had Gray saying this after holding Toronto to one run over seven innings:

“That was the idea, to really get (it) going again,” Gray said of the curveball. “I think the last five or six starts it’s been OK, but it hasn’t been a big factor. We did some things a little different this week and I was able to find that again.”

Over the last 30 days, Gray has thrown the curveball more than ever, up to over 32% for the month. Not only that, he has found more effectiveness in the pitch, with his whiff % on the curve up to a career-best 19.2% during July. There’s also reason to believe that this isn’t simply a good month for Sonny Gray’s curveball – what we are now seeing is the fruition of a change of approach with the way he throws the pitch that has been coming for some time now. Let’s take a look.

Here we have the release speed of Sonny Gray’s curveball for every start since he was called up:

Release_Speed

He’s throwing the curve harder than he ever has, adding over three miles per hour since he started pitching in the majors. That’s not a small change. On top of the speed increase, he’s cut about 2.5 inches of vertical movement off his curve between his first start in the majors and now:

Vertical_Movement

Finally, he’s added more three-dimensional depth to his curve in the form of a top-3 best horizontal movement over the past calendar year. Only Corey Kluber and Charlie Morton have had better horizontal movement on their curves in that time period.

Add all of that up, and we have this 84-MPH curve from his last start against the Orioles:

Gray_Curve_Late_2

It now looks more like a slurve, with its high release speed and nasty late break away from right-handed hitters. As Eno Sarris included in his great article from October of last year, Gray said he “adds and subtracts” with the same grip on his curve to move between the 12-to-6 and slurve (which is sometimes classified as a slider) varieties. However, it seems as if he has leaned more toward the slurve option as time has gone on.

One question that arises out of this is “why throw the slurve more?”

Given his whiff % on the curve has increased as he has added velocity, I’d say that fact alone has supported the move to the slurve over the 12-to-6. However, there’s another potential reason that isn’t strictly rooted in statistics, and could be more about what goes into an elite pitching approach: by increasing his arm speed and flattening out the vertical movement of his curve, Gray can further deceive batters into thinking he’s throwing hard pitches before the bottom drops out. His struggles to find consistency with the changeup are well documented, so why shouldn’t he adjust his best breaking pitch to better fool hitters for whiffs and weak contact? As we’ve seen with Yu Darvish, the pinnacle of an ace approach may be one that includes a “great convergence” of arm slots and release points, in which every pitch looks hard until it’s not, or until it is.

Gray’s horizontal release points for all of his pitches are closer to one another than they ever have been during his major league career. Not surprisingly, his curveball and fastball were released on average at the almost identical horizontal point during his May and July starts, when he posted career-best whiff rates on his curveball (18.6% & 19.2%, respectively). June was an aberration, as Gray seemed to lose his release point in general and was tinkering with his delivery, leaning more on the changeup:

Release_Points

Sonny Gray has work to do on parts of his game before he takes the next step into the true elite of starting pitchers. His walk rate has actually increased this year to 8.5%, owing mostly to a lack of fastball command in deep counts, and his changeup is still very much a work in progress as a third pitch. However, his adoption of the hard curve and syncing of arm angles is a positive step toward dominance, and is a sign that he knows what works; he’s now perfecting it.

And now, my first go at a DShep Darvish-like GIF of Sonny Gray’s 12-to-6 curve from last August along with his harder slurve from his last start to compare:

Sonny_Curves_Final

 

 

 

 

 

 

 

 


xHitting (Part 4): 2014 Fantasy Edition!

Welcome to the fourth installment of xHitting!  As always, reader comments and feedback are super encouraged and appreciated.  (Links to parts one, two, and three)

Briefly recapping the method, the gist is to estimate the expected rate of each individual hit type based on a player’s underlying peripherals, and in turn recover all the needed components to compute expected versions of wOBA, OPS, etc.  The only real change to the model since last time is that I now utilize a “hybrid” predicted home run rate, that averages between actual and (raw) predicted home run rate, with the weight given to actual HR rate increasing in the number of plate appearances.  (This is explained in part three, for those curious.)

Perhaps the more exciting change, though, is that this time I actually have results for an ongoing season, which potentially can help for fantasy purposes.  (Not that most readers need my help necessarily.)  Related to fantasy usage, there were a few requests to see a full spreadsheet of past results (2010-2013 seasons), which I have posted here.  Again feel free to take it or leave it at your leisure.

Note: I collected most of these data at the All-Star Break, so numbers may be a few weeks behind, but they’re still mostly true.  Also, for time considerations I only fetched 2014 stats for qualified leaders.  This even leaves out a few big names, but I couldn’t justify time to fetch every player.

So far, I’ve typically posted the biggest “over-” and “under”-achievers for a given season.  And I suppose I’ll continue that tradition today.  But while these lists are useful for highlighting which players seem most likely to regress, it overlooks another main use of the model, which is to assess the realness of a player’s apparent “breakout” or “decline;” at least in-sample.  (In some cases, the model may think that a player’s breakout is entirely justified, given peripherals, while others it may view more skeptically.)  Thus, today I’ll also post a second list, of players who seem to have taken a pronounced step forward/step back this season, and what the model thinks of their season-to-date performance.

Okay, time for results!  I’ll start with the list of “over-” and “underachievers.”

2014 Underachievers (1st half) 2014 Overachievers (1st half)
Name wOBA xWOBA Diff Name wOBA xWOBA Diff
Jean Segura 0.256 0.305 -0.049 Casey McGehee 0.345 0.277 0.068
Chris Davis 0.306 0.353 -0.047 Yasiel Puig 0.398 0.340 0.058
Mark Teixeira 0.352 0.397 -0.045 Matt Adams 0.376 0.324 0.052
Gerardo Parra 0.289 0.327 -0.038 Mike Trout 0.428 0.381 0.047
Brian McCann 0.298 0.330 -0.032 Marcell Ozuna 0.343 0.300 0.043
Torii Hunter 0.323 0.355 -0.032 Lonnie Chisenhall 0.396 0.359 0.037
Joe Mauer 0.308 0.340 -0.032 Scooter Gennett 0.355 0.320 0.035
Jimmy Rollins 0.320 0.352 -0.032 Marlon Byrd 0.344 0.309 0.035
Brian Roberts 0.304 0.334 -0.030 Giancarlo Stanton 0.397 0.363 0.034
Buster Posey 0.326 0.352 -0.026 Hunter Pence 0.359 0.325 0.034

A general pattern I notice is that, having worked with this model for a while now, there do seem to be players that give the model some trouble and have a disproportionate tendency to appear on this list from year to year.  A few of these players appear on this list… more on that later.

Partly for that reason, I wouldn’t necessarily say to “buy low” the guys on the left, nor “sell high” the guys on the right; although you can if you want.  I won’t address every player, but I have some scattered comments:

  • For readers who prefer OPS, .020 wOBA translates to about .050 OPS, on the margin.
  • .397 predicted for Teixeira?  Not sure where that came from…
  • Poor Segura.  All things considered, I think nobody deserves a big second half more than he does.
  • Whatever happened to Casey McGehee’s power?  The guy once hit 23 home runs in a season, but now has ISO of .073, with surprisingly low fly ball distance.
  • Although Chisenhall’s breakout is not as impressive if you take out what the model thinks is luck, it’s still a pretty impressive improvement.
  • Chris Davis is sort of the reverse of Chisenhall.  Adding back in what the model thinks has been bad luck, he’s still way down from what he did last year, but not nearly as disappointing as he probably has been to many owners thus far.

As mentioned, certain players do seem to be able to over/underperform the model somewhat consistently; the same way we think some pitchers are usually better or worse than their FIP.  With now 4.5 years of data to work with, however, I think I can make educated guesses about which players systematically deviate from the model predictions.  I’ll term this deviation the “player fixed effect.”

(Requiring at least 1000 PA from 2010 through 2014 first half)

Model loves too much Model loves too little
Name Player FE
estimate (wOBA)
Name Player FE
estimate (wOBA)
Brian Roberts -0.033 Wilson Betemit 0.032
Todd Helton -0.026 Brandon Moss 0.032
Jean Segura -0.026 Ryan Sweeney 0.028
Jose Lopez -0.025 Mike Trout 0.027
Mark Teixeira -0.025 Peter Bourjos 0.026
Russell Martin -0.024 Matt Carpenter 0.025
Darwin Barney -0.023 Brandon Belt 0.025
Chris Getz -0.023 Melky Cabrera 0.025
Jimmy Rollins -0.021 Carlos Ruiz 0.024
Jason Bay -0.020 Chris Johnson 0.024

Comments:

  • Again, .020 wOBA is equivalent to about .050 OPS, on the margin.
  • Taking out their apparent fixed effect, Teixeira is only underperforming his xWOBA by about .020, and Brian Roberts is actually doing about par.
  • On the reverse side, Mike Trout’s “adjusted” xWOBA jumps up to .408, where really it probably doesn’t surprise us that he’s outperforming even that, since he’s Mike Trout.  And although Giancarlo Stanton misses the Top 10 cutoff above, his apparent fixed effect of +.022 would be 11th; so his “adjusted” xWOBA is more like .385.
  • Yasiel Puig (.058) would also be on the list of “positive fixed effects” if we relaxed the PA requirement (he has 826 during this time).  And Matt Adams (~.040) might also be well on his way to that list; although he has fewer plate appearances still than Puig.
  • I don’t really have good explanations/know any common themes for players with negative fixed effects.  Maybe readers can help?
  • For Trout, home runs are pretty clearly the area where the model underestimates him.  In any given season (2010-2014), he hits about twice as many HR as the model thinks he should in the “raw” prediction.
  • And Trout’s not the only “HR rate defier,” either; just the most salient.  In general, the model has never done as well with home runs as it does with singles, doubles, and triples.  It seems there are other important determinants of home run hitting that really should be in the model, but currently are not.  Intuitively, I sort of would like velocity and angle of the ball off the bat, but so far have not found a good data source to actually include these.  (Maybe that will change in the coming years as MLBAM releases “Hit F/X” style data?)  Until then, reader suggestions are also super welcome here.

And now, finally, for the other usage: here’s a partial list of players who have taken either a pronounced step forward or back this season, relative to established norms.

2014 “Decliners” 2014 “Improvers”
Name Career wOBA 2014 wOBA 2014 xWOBA Name Career wOBA 2014 wOBA 2014 xWOBA
Nick Swisher 0.352 0.285 0.305 Michael Brantley 0.324 0.394 0.404
Joe Mauer 0.373 0.308 0.340 Lonnie Chisenhall 0.328 0.396 0.359
Allen Craig 0.350 0.289 0.309 Seth Smith* 0.334 0.389 0.356
Billy Butler 0.352 0.300 0.309 Victor Martinez 0.362 0.416 0.422
Evan Longoria 0.365 0.315 0.323 Jonathan Lucroy 0.342 0.383 0.354
Domonic Brown 0.315 0.267 0.267 Anthony Rizzo 0.342 0.382 0.382
Chris Davis 0.351 0.306 0.353 Nelson Cruz 0.356 0.393 0.380
Matt Holliday* 0.385 0.342 0.318 Jose Altuve 0.319 0.356 0.325
Jean Segura 0.299 0.256 0.305 Brian Dozier 0.311 0.344 0.362
David Wright 0.377 0.335 0.305 Kyle Seager 0.334 0.367 0.344
Buster Posey 0.366 0.326 0.352 Dee Gordon 0.297 0.329 0.318
Shin-Soo Choo 0.369 0.333 0.346 Alcides Escobar 0.284 0.312 0.300
Dustin Pedroia 0.356 0.325 0.337 Casey McGehee 0.321 0.345 0.277
Jed Lowrie 0.327 0.297 0.305
Jay Bruce 0.343 0.315 0.326

* – To avoid inflation from Coors Field, for these players I’ve taken the total from 2011-13 seasons only

Comments:

  • At least in-sample, Brantley’s breakout seems to be pretty much entirely justified.  Of course this doesn’t mean that he won’t regress somewhat, but if I were to guess, I’m a little more optimistic than ZiPS and Steamer (which currently project .341 and .333 RoS, respectively).  Similar deal for some others.
  • “Yikes” for Billy Butler and Domonic Brown, whose declines this season seem (at least in-sample) to be entirely justified.
  • I’m not sure why the model dislikes Casey McGehee so much.  Obviously his fly ball distance (mentioned earlier) isn’t doing him any favors, and his .369 first-half BABIP is probably unsustainable.  Still, .277 xWOBA?  Seems harsh.

As with any fantasy advice, don’t take any of this too literally…  Take it or leave it as you see fit.

Lastly, although I hyped this piece from a fantasy perspective, the overall goal remains that I would love to see more work done to de-luck hitter stats, the way people do so often for pitchers.  (FIP for pitchers, and xWOBA or xWRC+ for hitters! Is the dream.)

Reader thoughts on how to improve the model, or requests for players not already mentioned?


Looking at Attendance after Aces are Dealt

As baseball season and the summer months heat up, so too do the trade rumors. Almost every year, baseball media and fans postulate and prognosticate who might be traded before the annual trading deadline.

This year, the big fish on the market is Rays left-hander David Price. With only one year left on his contract, it is unlikely the Rays can afford to keep the former Cy Young Award Winner. But with the team winning eight in a row and 19 of their last 24, trading their ace doesn’t seem like a sure deal anymore. Most recent reports say the Rays management will wait until the absolute last minute to make a decision on if, where, and for whom the popular lefty will be traded.

With the Rays’ status with regards to popularity and market, some of the talk in regards to trading David Price has wound into the realm of attendance. The Rays are currently last in the Major Leagues in attendance, and some are concerned attendance could drop even lower if they traded their best pitcher. There are those who think Rays fans would consider the trade a message from ownership to wait until next year. And if that’s the message, why not wait until next year to buy a ticket?

To estimate how Rays attendance might react to a possible trade of David Price, I looked at 12 prior trades of ace pitchers over the last 37 years. Via Baseball-Reference.com, I looked at attendance before and after each trade. I also looked at winning percentage before and after.

My goal is to see if two maxims hold true:

  1. Attendance goes up when teams win and goes down when teams lose.
  2. A team that trades its best pitcher will have a worse record after the trade.

Hence, if attendance is attached to winning and ace pitchers are attached to winning, attendance should drop after ace pitchers are traded.

Is this really the case? Or is attendance in some cities more sensitive to major trades than others?

Let’s begin by looking at the granddaddy of superstar pitcher trades: the Tom Seaver trade. On June 15, 1977, after a slight tiff with ownership, the Mets shipped the franchise’s first ace to the Reds for Steve Henderson, Pete Flynn, Pat Zachary, and Dan Norman. The Mets were bad before but worse after and attendance followed suit.

Twelve years later, in 1989, two aces were traded during the season. On May 25th, the Mariners moved ace Mark Langston to the Expos for a bevy of prospects headlined by future ace Randy Johnson. Mariners fans reduced their attendance by nearly the same amount Mets fans did in 1977. Although playing .500 baseball prior to the trade, the Mariners winning percentage dropped significantly after the trade.

Two months after the Langston trade, the Minnesota Twins traded 1988 Cy Young Award winner Frank Viola to the Mets for Rick Aguilera, Kevin Tapani, and three other pitchers. The Twins were two games under .500 at the time of the trade, and then played .500 after the trade. Despite their slight improvement, attendance dropped 12.95% after the Viola trade.

We fast-forward to 1998 and another Mariners trade. During the 1998 season, the Mariners dealt the aforementioned Johnson to the Astros for Freddy Garcia, Carlos Guillen, and John Halama. While Johnson immediately did well in Houston, the Mariners played better after his departure, going 28-25 after the trade. Like the 1988 Twins, however, the positive play did not lead to an increase in attendance, as the average per game attendance went down after the trade.

Our next trade is the Bartolo Colon trade in 2002. On June 27, 2002, the Indians shipped Colon and Tim Drew to the Expos for Cliff Lee, Grady Sizemore, Brandon Phillips, and Lee Stevens. The Indians played .467 baseball before the trade and a lesser .447 clip following the deal. Attendance, however, jumped after the trade, up 10.04% over the team’s final 45 games.

We look at Cleveland again in 2008, when the Indians moved CC Sabathia to the Milwaukee Brewers for Michael Brantley, Matt LaPorta, and three other players. After trading Sabathia, the Brewers vastly improved their record, finishing the season 44-30. Attendance also went up after the Sabathia trade, from 25,964 to 27,766 per game, an increase of 6.94%.

The 2009 season saw the trade of three high profile pitchers. Two were legitimate aces, and the other a former ace that might give us insight to a Rays attendance prediction.

The first major pitcher trade in 2009 again involved the Indians. On July 29th, the Tribe shipped Cliff Lee and Ben Francisco to Philadelphia for Jason Knapp, Carlos Carrasco, Jason Donald and Lou Marson. Unlike the Colon or Sabathia trades, following the Lee trade, the Indians winning percentage and attendance per game both decreased.

Two days after the Indians traded Lee, the San Diego Padres moved right-hander Jake Peavy to the Chicago White Sox for Clayton Richard and three other players. Like the Twins in 1989 and the Mariners in 1998, the Padres played better after moving their ace, finishing the remaining 59 games with a 34-25 record. Unfortunately, also like the ’89 Twins and ’98 Mariners, less fans came out to see their now-winning team.

Our final pitcher trade of 2009 occurred on August 29th, when the Rays moved former ace Scott Kazmir to the Angels for Sean Rodriguez, Alex Torres, and Matthew Sweeney. Kazmir was no longer the Rays ace in 2009, handling over the title to James Shields and the up-and-coming David Price. But Kazmir still had name value in the Tampa Bay area, despite his decreased effectiveness.

After trading Kazmir, the Rays stumbled to a 15-20 finish. They went from being 4.5 games out of the wildcard to finishing 11 games out of the playoffs. Per game attendance following the Kazmir trade also dropped considerably, from 24,169 per game to 19,574 per game. This attendance decrease of 19.01% is the biggest drop of any of our surveyed trades.

The next year, two of our most frequent subjects collided when the Mariners traded Cliff Lee. After signing with Seattle in the offseason, Lee was sent to the Rangers for the stretch run. After the trade, the Mariners, who had played .400 baseball prior to trading Lee, finished the season with a .350 winning percentage and saw attendance drop 4.99% over the last 39 home games.

In 2012, the Brewers were on the dealing side when they sent Zack Grienke to the Angels for Jean Segura and two other players. While the Brewers were 10 games under .500 before the trade, they reversed fortune after the deal, going 39-25, a .609 clip. Attendance also increased after moving Grienke, albeit by 124 fans per game, or only 0.3%.

In our final trade, we look at the Chicago Cubs. Prior to trading Matt Garza on July 22, 2013, the Cubs were 10 games under .500 and averaging exactly 33,000 fans per game. After trading Garza, the Cubs dropped to 30 games under .500 and lost 919 fans per game in the seats, a 2.78% decrease.

There are many other trades and fanbases I could have looked at (the Ubaldo Jimmenez trade in 2011 comes to mind), but this small sample set gives a wide spectrum of possible outcomes resulting from trading an ace pitcher. From what we looked at, we found:

  • 50% of the data set decreased in both record and attendance
  • 25% increased in record and decreased in attendance
  • 16% increased in both record and attendance after trading their ace
  • 8% decreased in record but increased in attendance

The Indians are particularly interesting, seeing a different outcomes each time they traded an ace. The Mariners saw an attendance drop after both the Langston and Johnson trades but played better after trading Johnson and worse after moving Langston. Perhaps Langston had a bigger effect on the team in 1989 than Johnson did in 1998.

So what would happen if the Rays traded David Price? Given their current winning streak and the attendance sensitivity seen after the Kazmir trade, my initial estimate would have them in the same category as the 1989 Twins, 2009 Padres, and 1998 Mariners – an improved winning percentages but lower attendance. An better record post-trade might not be difficult considering the beginning of the Rays season was a disaster marred by injured players who are slowly returning (Alex Cobb, Jeremy Hellickson, David DeJesus, and possibly Wil Myers).

But with the Rays struggling to fill seats, moving fan favorite David Price might be a bad public relations move. From the studies I have done, games David Price has pitched in have drawn 6% more than average. That could be because Joe Maddon sometimes aligns the rotation so Price faces prime opponents such as the Yankees and Red Sox, teams that traditionally draw well at Tropicana Field. But some of Price’s “bump” could be the allure of seeing one of the best pitchers in the American League.

My estimate is the Rays would suffer an initial attendance drop if they traded David Price. Games against the Red Sox and Yankees (especially Jeter’s last series in Tampa Bay) will continue to do well. Bobbleheads and other promotions will also do well (expect a good turnout for the Don Zimmer sno-globe). And if the team plays well enough to contend, attendance may recover, but even then, the Rays won’t average over 20,000 per game.

Then again, doubtful they would draw 20K on average even with David Price in the rotation.


Using Double-A Stats to Predict Future Performance

Over the last couple of weeks, I’ve been looking into how a players’ stats, age, and prospect status can be used to predict whether he’ll ever play in the majors. I used a methodology that I named KATOH (after Yankees prospect Gosuke Katoh), which consists of running a probit regression analysis. In a nutshell, a probit regression tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. While KATOH technically predicts the likelihood that a player will reach the majors, I’d argue it can also serve as a decent proxy for major league success. If something makes a player more likely to make the majors, there’s a good chance it also makes him more likely to succeed there.

Things that were predictive for players in low-A and high-A included age, strikeout rate, ISO, BABIP, and whether or not he was deemed a top 100 prospect by Baseball America in the pre-season. However, a player’s walk rate was not significant in predicting a player’s ascension to the majors. Today, I’ll look into what KATOH has to say about players in double-A leagues. For those interested, here’s the R output based on all players with at least 400 plate appearances in a season in double-A from 1995-2010. Due to varying offensive environments in different years and leagues, all players’ stats were adjusted to reflect his league’s average for that year.

AA Output

Unlike in the A-ball iterations of KATOH, a player’s double-A walk rate is predictive — albeit only slightly — of whether or not he’ll make it to the show. While walk rate is statistically significant, it still matters much less than the other stats: it takes 3 or 4 percentage points on a player’s walk rate to match what 1 percentage point of strikeout rate does to a player’s MLB probability.

This version is also different in that there are a couple of significant interaction terms, signified by the last two coefficients in the above output. The “I(Age^2)” term adds a little bit of nuance into how a players’ age can predict his future success. While the “ISO:BA.Top.100.Prospect” term basically says that if you’re a top 100 prospect, hitting for power is slightly less important than it would be otherwise. Hitting for power and making Baseball America’s top 100 list both make a player much more likely to make it to the majors, but if he does both, he’s a tad less likely to make it than his power output and prospect status would suggest independently. Put another way, a few top 100 prospects hit for power in double-A, but never cracked the majors — such as Jason Stokes (.241 ISO), Nick Weglarz (.204 ISO) and Eric Duncan (.173 ISO). But virtually all of the low-power guys made it, including Elvis Andrus (.073 ISO), Luis Castillo (.076 ISO), and Carl Crawford (.078). For non-top 100 guys, many more punchless hitters topped out in double-A and triple-A.

By clicking here, you can see what KATOH spits out for all current prospects who logged at least 250 PA’s in double-A as of July 7th, as well as a few that fell short of the cutoff — most notably Joey Gallo, Kevin Plawecki, and Robert Refsnyder. Topping the list is Mookie Betts with a probability of 99.95%, and of course the prophesy was fulfilled when the Red Sox called up the 21-year-old last month. Here’s an excerpt of the top players from double-A this year:

Player Organization Age MLB Probability
Mookie Betts BOS 21 100%
Francisco Lindor CLE 20 100%
Gary Sanchez NYY 21 99%
Austin Hedges SDP 21 99%
Alen Hanson PIT 21 99%
Jorge Bonifacio KCR 21 98%
Blake Swihart BOS 22 98%
Kris Bryant CHC 22 93%
Ketel Marte SEA 20 91%
Rangel Ravelo CHW 22 90%
Robert Refsnyder NYY 23 86%
Jake Lamb ARI 23 85%
Jake Hager TBR 21 84%
Darnell Sweeney LAD 23 83%
Joey Gallo TEX 20 82%
Preston Tucker HOU 23 81%
Scott Schebler LAD 23 79%
Kevin Plawecki NYM 23 79%
Cheslor Cuthbert KCR 21 78%
Kyle Kubitza ATL 23 77%
Michael Taylor WSN 23 76%
Christian Walker BAL 23 76%
Ryan Brett TBR 22 75%

Keep an eye out for the next installment, which will dive into what KATOH says about hitters at the triple-A level.

Statistics courtesy of FanGraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.


Do Rookie Hitters Decline in the Second Half?

Do rookies perform worse after the All-Star break?

My claim over this statement is nonexistent, while the original thought of its occurrence was brought to my attention by Adam Aizer on the CBS Fantasy Baseball Podcast.

My judgment dissuaded, I thought that it would be worth the effort to look into the validity of the statement.

From the perspective of an offensive player, rookies infrequently make enough of an impact in the size of leagues (i.e. 10-team and 12-team leagues) that pedestrian Fantasy Baseball players occupy. For those sizes of leagues that the aforementioned owners participate in, a rookie hitter that is worth owning is either an elite prospect or a player that has preformed beyond their true talent level. As a result, the former is rare, while it would make sense for the latter to regress to their true talent level and is more common than the former. The idea that rookie hitters decline throughout the year is just a misevaluation of the player’s true talent level.

To put another way, it is the same logic that comes into play with a recent event: the Home Run Derby. Players that participate in the Home Run Derby are players that have exceptional first halves, which are often beyond their true talent level. These players often perform worse in the second half than they did in the first half, not because they participated in the monotonous and dated event that has become the Home Run Derby, but because, just like the rookies who perform worse in the second half of the season than the first, they have regressed toward their true talent level; when the rookies regress, they have just regressed to the point where they are not ownable.

The research looks at all player seasons between 1988 and 2013 where a batter was in their first season, had 250 plate appearances in the first half of the season, and had 250 plate appearances in the second half of the season.

Screen Shot 2014-07-20 at 8.48.48 PM

The rookie second half decline and the post Home Run Derby slump intuitively make sense, but intuition does not always bear truth. Through cognitive ease we rationalize that “Swinging that hard for that long throws off your timing”; “A rookie is too young to be able to make it through the long hot summer.”

Because most fantasy leagues are small, the only reason that the common rookie was on our teams to begin with is because they had to play beyond their ability in the first half of the season. The rookie who is on our team right now, unless he is a reputable prospect, is probably a safe bet to decline. But as a whole, we can see that there is no decline in rookie performance based on first half/second half splits.

Our desire to perceive a decline is just our desire to hold onto our ability as talent evaluators. We know that Yangervis Solarte is a great player, and the only reason he hasn’t been able to sustain his performance is because he is rookie that can’t play out the season: common baseball logic. In actuality, Solarte was not as good as some originally thought, and his true talent was never good enough to be on a 10 or 12 team league.

Summary:

Rookie hitters, as a generalization, are not good enough to play in 10 or 12 team leagues, and, as a generalization, those that do play in ten team leagues regress to their true talent level, which is not valuable enough to be ownable.

Devin Jordan is obsessed with statistical analysis, non-fiction literature, and electronic music. If you enjoyed reading him, follow him on Twitter @devinjjordan.


Bringing Bill James’ Famous Arbitration Case to 2014

“I helped prepare arbitration cases for George three straight years in the 1980’s… George had led the American League in errors the first year that we prepared a case for him. We were wondering what to do about that, so I drew up an exhibit entitled ‘What Was the Cost of George Bell’s Errors?’ The exhibit showed that while Bell had led the league in errors with 11, none of the errors had actually cost his team anything. Of the 11 errors, only about three led to unearned runs, all had occurred in games which Toronto had won anyway, and in those three games, Bell had driven in something like seven runs.”

Bill James, The New Bill James Historical Abstract

 

The case that Bill James made for George Bell in 1985, and later informed his readers about when he released his Historical Abstract, always fascinated me. As someone who is a big believer that fielding metrics have a long way to go (especially behind the plate), this arbitration case was my Zihuatanejo, that far away place that always gave me hope that errors were really as pointless a statistic as they seemed.

However, as Bill James points out in the rest of George Bell’s player ranking, the fact that nothing came of Bell’s errors in 1985 (his first arbitration year), as well as 1986 and 1987, when James used the same exhibit, was rather noteworthy. Although errors are definitely not the be all and end all of fielding statistics, one would have to imagine that some ill had to come of them, at some point, right?

With the All-Star break upon us, and sadly no real baseball for the last four days, the chance to finally look into this idea of how much errors actually cost the erring player’s team, presented itself. At the halfway point, there were exactly 20 players who had committed 10 or more errors in 2014. Since there was time to kill without baseball on, I decided to pour over some box scores and figure out just how much each of those leading “error-men” had cost their teams. Using baseball-references fielding game logs, it was easy to find the games in which each player had made their errors, and then going through the play-by-play made it (usually) straightforward as to whether their error led to a run or not.

For this study, I created a chart with columns for all of the parts mentioned in Bill James arbitration case: total errors, unearned runs as a result of those errors, games that the team lost when that player committed an error, and RBI in those games that were lost. The final column (RBI in games lost) was tweaked a tiny bit due to the inclusion of one other column. The column added was one called “true losses.” This was the measure of how many games the team lost by equal to, or fewer runs, than the player’s error cost the team. For example, if Pedro Alvarez made an error that cost his team three runs, and the Pirates lost 4-3, that would be a true loss. Or, if Derek Dietrich made an error that cost his team one run, and the Marlins lost 3-2, that would also be a true loss. Finally, if the game went to extra innings and was a loss, any error worth one run or more was counted as a true loss. Therefore, if Josh Donaldson committed an error which cost his team only one run and then the A’s lost 10-8, but that final came in extra innings, then that would still count as a true loss because the extra innings would have never occurred (hypothetically).

Now this is obviously not a foolproof study. There is no way to say for sure that the error committed for one run was any more the cause of the loss than the pitcher who gave up the home run the next inning. It is also starting to get into a bit of a messy “Butterfly Effect” situation, meaning that there is no way of knowing how the rest of the game (or our lives, bro) would be different if Jose Reyes hadn’t booted that grounder in the fifth inning.

However, it was a fun study to put together, and it can be revealing into how little (or in poor Starlin Castro’s case, how much) errors truly change a game. Here’s the official chart:

What Was the Cost of Player X’s Errors?

Name Errors UER from E Team L’s True L’s RBI in True L’s
Pedro Alvarez 3B 20 11 11 4 4
Josh Donaldson 3B 15 6 5 1 0
Ian Desmond SS 15 10 8 2 2
Asdrubal Cabrera SS 14 12 9 1 0
Jose Reyes SS 13 7 9 2 0
Brandon Crawford SS 13 6 5 0 0
Lonnie Chisenhall 3B 13 6 5 0 0
Everth Cabrera SS 13 7 6 0 0
Brad Miller SS 13 7 5 1 0
Martin Prado 3B 12 13 8 2 2
Jonathan Villar SS 12 14 8 0 0
David Wright 3B 11 5 4 1 0
Starlin Castro SS 11 12 6 5 0
Jean Segura SS 11 8 1 0 0
Elvis Andrus SS 11 7 8 0 0
Yan Gomes C 11 4 6 0 0
Chris Owings SS 11 8 7 2 1
Derek Dietrich 2B 11 6 5 1 0
Jarrod Saltalamacchia C 10 5 7 1 0
Hanley Ramirez SS 10 7 7 1 0

Key: UER from E – unearned runs from errors; Team L’s – team losses; True L’s – true losses (described above); RBI in True L’s – how many RBIs the player had in said True Loss games

 

Let’s tackle this table column by column.

Well, I don’t think a historiography of each player’s name is necessary in today’s article, so let’s skip over to the position column. It is interesting to note how many left-side of the infield players there are atop the error leaderboard. There’s nobody from the outfield to be found (the “top” outfielder per errors is Sports Illustrated cover boy, George Springer with seven), and there are only three players that don’t hail from third base or short stop as their main position. One branch off of this study that could be interesting would be to look at whether or not there was a correlation between a player’s position on the diamond, and how frequently an error led to runs or “true losses.” My gut instinct would be to guess no, but maybe errors in the outfield are often for more bases, and therefore more likely to lead to a run – just a hypothesis.

Jumping over to the errors column, Alvarez’s 20 errors stood out, as the difference between his total and the second place total is the same as the difference between second place total and the bottom of our table. In fact, seeing that high total made me curious as to just how many errors it would take to get into the record books. Well, if you’re including the entire history of baseball, the answer is: like a bajillion. Obviously the game was entirely different, but it’s hard to imagine that Herman Long’s 122 errors in 1889 weren’t embarrassing even back then. The record for errors in a single season since 1952 is 44 by Robin Yount in 1975, and the record since 1980 is Jose Offerman with 42 in 1992. So while Alvarez’s 20 errors may be pacing the league by a good margin now, it’s fair to say he won’t be joining even the modern record books this season.

The next column looks at unearned runs derived from each player’s errors, and the variance is quite extreme. With a range from only four runs (it’s interesting to note that the catchers have the two lowest unearned runs tallies, maybe that positional study would provide some analysis after all) all the way up to 14, there doesn’t seem to be too close of a connection between the amount of errors and the amount of unearned runs. For instance, Josh Donaldson has committed three more errors than Jonathan Villar in 2014, but Villar’s errors have led to eight more runs. This brings up the question of whether unearned run prevention is simply luck, or whether some teams (and pitchers) respond better after an error is committed in the field.

The A’s are one of baseball’s best teams, and have an excellent pitching staff, so it isn’t too surprising that Donaldson’s unearned runs are among the lowest, especially in comparison to how many errors he has committed. On the other end of the spectrum are players like Altuve and Castro who play on rebuilding teams, and it is unsurprising to see their names next to some of the highest unearned run totals. However, there is most certainly a lot to be said for luck playing a role in how many unearned runs come along after an error. For example, teammates Asdrubal Cabrera and Lonnie Chisenhall find themselves on opposite ends of the spectrum in terms of unearned runs after errors, a definite sign of the role random chance plays in unearned run prevention.

One other note on the extreme variance in unearned runs tied to errors. The variance could also come as the result of what kind of error was made. A bobbled ball that never even gets thrown across the infield does only one base of harm; whereas, an overthrow (many of Alvarez’s errors) may lead to two bases of harm. One could also try to really dig deep into this data and see if younger, more inexperienced players were more likely to commit errors late in games, when the pressure was ratcheted up, and maybe those errors were more likely to be costly. However, with this study, the idea is simply to get a feel for another way of looking at errors, and the main point that remains here is that there is a lot of luck to whether a player’s error costs his team a run or not.

There isn’t a whole lot to be said about the team losses column, as committing an error does indeed swing the pendulum (or WPA chart) towards a loss, but so minimally that it wouldn’t even bother one of Poe’s victims. For instance, implying that Jean Segura (only one team loss in games he committed an error) timed his errors better than Elvis Andrus (eight team losses in games he committed an error) is really just saying that the Brewers are better than the Rangers; which they are, but that doesn’t reflect on the individual player at all. That comparison is especially interesting given that Andrus’ errors have actually led to fewer unearned runs than Segura’s.

The next column, the “true losses” column, is where the fallacy of the error as a statistic truly shows its colors. The only players who cost their teams more than two wins in the first half (with teams having played well over 90 games in 2014, so far) were the league leader, Alvarez, and the incredibly unlucky Starlin Castro. Castro’s case could be an entire article itself, and the poor timing of his errors is remarkable. The fact that the Cubs have only lost six games in which he has committed an error, and five of those can be considered “true losses” is very much a statistical anomaly. Consider that in this chart there are 124 team losses outside of Castro’s Cubs. Of those 124 losses, 19 were true losses, or just over 15 percent. In Castro’s case, over 83 percent of his team losses were true losses, such a far outlier it warrants special attention.

Even when including Castro’s remarkable true loss numbers, the percent of losses that could be considered, even hypothetically, the erring player’s fault is merely 18.5 percent, and that’s not even accounting for all the games that the team’s still won in which one of  the listed player’s committed an error. This is a good time to point out that this study obviously does not take into account any of the good, run-saving plays that these fielders make, and even still the total impact on a team is minimal. As seen in Pedro Alvarez’s row, he drove in plenty of runs in those games in which he cost his team, and with his strong range, some of those errors he made likely would have been singles, with the majority of third baseman failing to even get to the ball. Josh Donaldson and David Wright stand out as particularly strong cases of top-notch fielders who, because of their strong range, get to more groundballs, but get to them in difficult positions, thus increasing the likelihood of an error.

All of this being said, let’s not take too much away from the potential impact of an error. It is indeed a mistake, and can have a negative impact on the team in ways more than just the scoreboard. For instance, for every error made, that is an extra batter that the pitcher has to face, and therefore, more pitches on his final pitch count. If the bases were clear before the error, the pitcher has to pitch out of the stretch now, and the threat of a potential steal is in play. If a certain player is prone to errors, it may also lead to his pitcher not having confidence in his defense behind him, and therefore getting himself in trouble by trying to do too much on the mound. Other fielders may feel that they have to cheat in the commonly erring fielder’s direction if there is likely to be a mistake made, which can mess up a team’s defensive positioning. Finally, there’s the fact that for all of us here at FanGraphs who realize the harm in relying on errors too much as a statistic, there are still those in baseball who do rely on it, and committing enough errors in the field, may lead to a player riding the pine for a few days.

In the end, it’s fair to say that errors are one metric out of many. They have historically been overused, and hopefully the chart above, has made it clear that frequently an error won’t really cost the team anything.

And if your error did cost your team, well, you’re probably Starlin Castro.


Using High-A Stats to Predict Future Performance

Last week, I looked into how a player’s low-A stats — along with his age and prospect status at the time — can be used to predict whether he’ll ever play in the majors. I used a methodology that I named KATOH (after Yankees prospect Gosuke Katoh), which consists of running a probit regression analysis. In a nutshell, a probit regression tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. While KATOH technically predicts the likelihood that a player will reach the majors, I’d argue it can also serve as a decent proxy for major league success. If something makes a player more likely to make the majors, there’s a good chance it also makes him more likely to succeed there.

Things that were predictive for players in low-A included: age, strikeout rate, ISO, BABIP, and whether or not he was deemed a top 100 prospect by Baseball America in the pre-season. However, a player’s walk rate was not significant in predicting a player’s ascension to the majors. Today, I’ll analyze what KATOH has to say about players in class-A-advanced leagues. Here’s the R output based on all players with at least 400 plate appearances in a season in high-A from 1995-2009:

High-A Output

This looks very similar to what I found for low-A players: Walk rate isn’t significant, and everything else has very similar effects on the final probability. However, the coefficients from this model are all a tad bigger than those from the low-A version, implying that high-A stats might be a bit more telling of a player’s future. Intuitively, this makes sense: The closer a player is to the big leagues, the more his stats start to reflect his future potential.

By clicking here, you can see what KATOH spits out for all current prospects who logged at least 250 PA’s in high-A as of July 7th. I also included a few notable players who fell short of the threshold, namely Joey Gallo (who checks in at a remarkable 99.8%), Peter O’Brien, and Jesse Winker. Here’s an excerpt of the top-ranking players:

Player Organization Age MLB Probability
Joey Gallo TEX 20 100%
Corey Seager LAD 20 99%
Carlos Correa HOU 19 99%
Albert Almora CHC 20 93%
Nick Williams TEX 20 93%
D.J. Peterson SEA 22 93%
Jesse Winker CIN 20 91%
Orlando Arcia MIL 19 88%
Jose Peraza ATL 20 87%
Colin Moran MIA 21 87%
Renato Nunez OAK 20 86%
Tyrone Taylor MIL 20 85%
Hunter Renfroe SDP 22 84%
Josh Bell PIT 21 84%
Raul Mondesi KCR 18 83%
Daniel Robertson OAK 20 83%
Jorge Polanco MIN 20 81%
Dilson Herrera NYM 20 77%
Breyvic Valera STL 21 77%
Peter O’Brien NYY 23 76%
Matt Olson OAK 20 75%
Jorge Alfaro TEX 21 75%
Patrick Leonard TBR 21 75%
Dalton Pompey TOR 21 73%
Billy McKinney OAK 19 73%
Teoscar Hernandez HOU 21 73%
Brandon Nimmo NYM 21 72%
Jose Rondon LAA 20 70%
Rio Ruiz HOU 20 70%
Brandon Drury ARI 21 70%

Next up will be double-A. Unlike A-ball, double-A tends to be a random mishmash of prospects and minor-league lifers, so it will be interesting to see how KATOH handles this wide array of players. And perhaps double-A is where a player’s walk rate finally starts to tell us something about his future success.

Statistics courtesy of Fangraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.