Category: Research | Page 70

Archive for Research

Using Short-Season A Stats to Predict Future Performance

August 3, 2014

Over the last couple of weeks, I’ve been looking into how a player’s stats, age, and prospect status can be used to predict whether he’ll ever play in the majors. So far, I’ve analyzed hitters in Rookie leagues, Low-A, High-A, Double-A and Triple-A using a methodology that I named KATOH (after Yankees prospect Gosuke Katoh), which consists of running a probit regression analysis. In a nutshell, a probit regression tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. While KATOH technically predicts the likelihood that a player will reach the majors, I’d argue it can also serve as a decent proxy for major league success. If something makes a player more likely to make the majors, there’s a good chance it also makes him more likely to succeed there.

For hitters in Low-A and High-A, age, strikeout rate, ISO, BABIP, and whether or not he was deemed a top 100 prospect by Baseball America all played a role in forecasting future success. And walk rate, while not predictive for players in Rookie ball, Low-A, or High-A, added a little bit to the model for Double-A and Triple-A hitters. Today, I’ll look into what KATOH has to say about players in Short-Season A-ball. Due to varying offensive environments in different years and leagues, all players’ stats were adjusted to reflect his league’s average for that year. For those interested, here’s the R output based on all players with at least 200 plate appearances in a season in SS A-ball from 1995-2007.

Just like we saw with hitters in Rookie ball, a player’s Baseball America prospect status couldn’t tell us anything about his future as a big leaguer. This was entirely due the scarcity players top 100 prospects in the sample, as only a handful of players spent the year in SS A-ball after making BA’s top 100 list. Somewhat surprisingly, walk rate is predictive for players in SS-A, despite being statistically insignificant for hitters in Rookie ball and the more advanced A-ball levels. Another interesting wrinkle is the “Strikeout_Rate:Age” variable. Basically, this says that strikeout rate matters more for younger players than for older players at this level. Although frequent strikeouts are obviously a bad thing no matter how old you are:

The season is less than 50 games old for most teams in the New York-Penn and Northwest Leagues, which makes it a little premature to start analyzing players’ stats. But just for kicks, here’s a look at what KATOH says about this year’s crop of players with at least 100 plate appearances through July 28th. The full list of players can be found here, and you’ll find an excerpt of those who broke the 40% barrier below:

Player	Organization	Age	MLB Probability
Rowan Wick	STL	21	82%
Eduard Pinto	TEX	19	68%
Marcus Greene	TEX	19	60%
Mauricio Dubon	BOS	19	59%
Franklin Barreto	TOR	18	57%
Christian Arroyo	SFG	19	57%
Skyler Ewing	SFG	21	56%
Taylor Gushue	PIT	20	55%
Domingo Leyba	DET	18	55%
Raudy Read	WSN	20	53%
Nick Longhi	BOS	18	52%
Andrew Reed	HOU	21	52%
Danny Mars	BOS	20	51%
Amed Rosario	NYM	18	49%
Yairo Munoz	OAK	19	48%
Seth Spivey	TEX	21	47%
Mike Gerber	DET	21	47%
Mark Zagunis	CHC	21	47%
Kevin Krause	PIT	21	46%
Leo Castillo	CLE	20	45%
Jordan Luplow	PIT	20	45%
Mason Davis	MIA	21	40%
Kevin Ross	PIT	20	40%
Franklin Navarro	DET	19	40%

As we saw with Rookie league hitters, KATOH doesn’t think any of these players are shoo-ins to make it to the majors. Even Rowan Wick, who hit a Bondsian .378/.475/.815 before getting promoted, gets just 82%. This goes to show that SS A-ball stats just aren’t all that meaningful.

Once the season’s over, I’ll re-run everything using the final 2014 stats, which will give us a better sense of which prospects had the most promising years statistically. I also plan to engineer an alternative methodology — to supplement this one — that will take into account how a player performs in the majors, rather than his just getting there. Additionally, I hope to create something similar for projecting pitchers based on their statistical performance. In the meantime, I’ll apply the KATOH model to historical prospects and highlight some of its biggest “hits” and “misses” from years past. Keep an eye out for the next post in the coming days.

Statistics courtesy of FanGraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.

Pitch Win Values for Starting Pitchers – July 2014

by Stats All Folks

August 2, 2014

Introduction

A couple months back, I introduced a new method of calculating pitch values using a FIP-based WAR methodology. That post details the basic framework of these calculations and can be found here . The May and June updates can be found here and here respectively. This post is simply the July 2014 update of the same data. What follows is predominantly data-heavy but should still provide useful talking points for discussion. Let’s dive in and see what we can find. Please note that the same caveats apply as previous months. We’re at the mercy of pitch classification. I’m sure your favorite pitcher doesn’t throw that pitch that has been rated as incredibly below average, but we have to go off of the data that is available. Also, Baseball Prospectus’s PitchF/x leaderboards list only nine pitches (Four-Seam Fastball, Sinker, Cutter, Splitter, Curveball, Slider, Changeup, Screwball, and Knuckleball). Anything that may be classified outside of these categories is not included. Also, anything classified as a “slow curve” is not included in Baseball Prospectus’s curveball data.

Constants

Before we begin, we must first update the constants used in calculation for Jule. As a refresher, we need three different constants for calculation: strikes per strikeout, balls per walk, and a FIP constant to bring the values onto the right scale. We will tackle them each individually.

First, let’s discuss the strikeout constant. In July, there were 47,449 strikes thrown by starting pitchers. Of these 47,449 strikes, 4,585 were turned into hits and 13,750 outs were recorded. Of these 13,750 outs, 3,725 were converted via the strikeout, leaving us with 10,025 ball-in-play outs. 10,025 ball-in-play strikes and 4,585 hits sum to 14,610 balls-in-play. Subtracting 14,610 balls-in-play from our original 47,449 strikes leaves us with 32,839 strikes to distribute over our 3,725 strikeouts. That’s a ratio of 8.82 strikes per strikeout. This is exactly the same as our from 8.82 strikes per strikeout in June.

The next two constants are much easier to ascertain. In July, there were 26,244 balls thrown by starters and 1,328 walked batters. That’s a ratio of 19.76 balls per walk, up from 19.36 balls per walk in June. This data would suggest that hitters were slightly less likely to walk in July than previously. The FIP subtotal for all pitches in July was 0.52. The MLB Run Average for July was 4.17, meaning our FIP constant for May is 3.65.

Constant	Value
Strikes/K	8.82
Balls/BB	19.76
cFIP	3.65

The following table details how the constants have changed month-to-month.

Month	K	BB	cFIP
March/April	8.47	18.50	3.68
May	8.88	18.77	3.58
June	8.82	19.36	3.59
July	8.82	19.76	3.65

Pitch Values – July 2014

For reference, the following table details the FIP for each pitch type in the month of July.

Pitch	FIP
Four-Seam	4.06
Sinker	4.20
Cutter	4.42
Splitter	3.50
Curveball	4.08
Slider	3.87
Changeup	4.79
Screwball	3.58
Knuckleball	3.97
MLB RA	4.16

As we can see, only three pitches would be classified as below average for the month of July: sinkers, cutters, and changeups. Four-Seam Fastballs and curveballs also came in right around league average. Pitchers that were able to stand out in these categories tended to have better overall months than pitchers who excelled at the other pitches. Now, let’s proceed to the data for the month of July.

Four-Seam Fastball

Rank	Pitcher	Pitch Value	Rank	Pitcher	Pitch Value
1	Ian Kennedy	0.6	180	Brad Peacock	-0.3
2	Clayton Kershaw	0.6	181	Jake Odorizzi	-0.3
3	Jose Quintana	0.6	182	Jason Hammel	-0.3
4	Drew Hutchison	0.5	183	Edwin Jackson	-0.3
5	Jacob deGrom	0.5	184	Chris Young	-0.3

Sinker

Rank	Pitcher	Pitch Value	Rank	Pitcher	Pitch Value
1	Brandon McCarthy	0.4	167	Chase Whitley	-0.2
2	Roberto Hernandez	0.4	168	Andrew Heaney	-0.2
3	Doug Fister	0.4	169	Jon Niese	-0.2
4	Hisashi Iwakuma	0.4	170	David Buchanan	-0.2
5	Wade Miley	0.3	171	Nick Tepesch	-0.3

Cutter

Rank	Pitcher	Pitch Value	Rank	Pitcher	Pitch Value
1	Josh Collmenter	0.3	77	Brandon McCarthy	-0.2
2	Jon Lester	0.3	78	Drew Smyly	-0.2
3	Kevin Correia	0.2	79	Brandon Workman	-0.2
4	Jarred Cosart	0.2	80	Dan Haren	-0.3
5	Adam Wainwright	0.2	81	Hector Noesi	-0.4

Splitter

Rank	Pitcher	Pitch Value	Rank	Pitcher	Pitch Value
1	Hisashi Iwakuma	0.3	27	Daisuke Matsuzaka	0.0
2	Hiroki Kuroda	0.3	28	Ubaldo Jimenez	0.0
3	Jake Odorizzi	0.2	29	Tim Lincecum	-0.1
4	Alex Cobb	0.2	30	Doug Fister	-0.1
5	Tim Hudson	0.2	31	Clay Buchholz	-0.1

Curveball

Rank	Pitcher	Pitch Value	Rank	Pitcher	Pitch Value
1	Sonny Gray	0.3	155	Hiroki Kuroda	-0.1
2	Clay Buchholz	0.2	156	Josh Tomlin	-0.2
3	Jesse Hahn	0.2	157	Kevin Correia	-0.2
4	Adam Wainwright	0.2	158	Eric Stults	-0.3
5	Jose Quintana	0.2	159	Josh Beckett	-0.3

Slider

Rank	Pitcher	Pitch Value	Rank	Pitcher	Pitch Value
1	Garrett Richards	0.5	125	Jair Jurrjens	-0.1
2	Tyson Ross	0.4	126	Jason Lane	-0.1
3	Jake Arrieta	0.3	127	Jake Buchanan	-0.1
4	Brett Anderson	0.3	128	Matt Cain	-0.1
5	Kyle Lohse	0.3	129	C.J. Wilson	-0.1

Changeup

Rank	Pitcher	Pitch Value	Rank	Pitcher	Pitch Value
1	Cole Hamels	0.3	156	Rubby de la Rosa	-0.2
2	David Price	0.3	157	David Holmberg	-0.2
3	Chris Sale	0.2	158	Mike Minor	-0.2
4	Zack Greinke	0.2	159	Jeff Locke	-0.3
5	James Shields	0.2	160	Drew Hutchison	-0.4

Screwball

Rank	Pitcher	Pitch Value
1	Trevor Bauer	0.0
2	Julio Teheran	0.0
3	Hector Santiago	0.0

Knuckleball

Rank	Pitcher	Pitch Value
1	R.A. Dickey	0.4

Overall

Rank	Pitcher	Pitch Value	Rank	Pitcher	Pitch Value
1	Cole Hamels	1.0	187	Jair Jurrjens	-0.4
2	Jacob deGrom	0.9	188	Erik Bedard	-0.4
3	Tyson Ross	0.9	189	Jason Hammel	-0.4
4	Jose Quintana	0.9	190	Brad Peacock	-0.4
5	Chris Sale	0.9	191	Nick Tepesch	-0.4

Pitch Ratings – July 2014

Four-Seam Fastball

Rank	Pitcher	Pitch Rating	Rank	Pitcher	Pitch Rating
1	Drew Hutchison	59	83	Jake Odorizzi	38
2	Jose Quintana	59	84	Jake Peavy	38
3	Cole Hamels	58	85	Josh Tomlin	36
4	Mark Buehrle	58	86	Brad Peacock	35
5	Tim Lincecum	58	87	Jason Hammel	34

Sinker

Rank	Pitcher	Pitch Rating	Rank	Pitcher	Pitch Rating
1	Travis Wood	58	73	Kevin Correia	36
2	Scott Kazmir	57	74	John Danks	36
3	Matt Garza	57	75	Jeff Samardzija	35
4	Brandon McCarthy	57	76	Dan Haren	32
5	Doug Fister	57	77	Nick Tepesch	25

Cutter

Rank	Pitcher	Pitch Rating	Rank	Pitcher	Pitch Rating
1	Marcus Stroman	58	32	Mike Minor	33
2	Jon Lester	58	33	Tim Hudson	33
3	Daisuke Matsuzaka	57	34	Brandon McCarthy	32
4	Phil Hughes	57	35	Dan Haren	28
5	Franklin Morales	57	36	Hector Noesi	20

Splitter

Rank	Pitcher	Pitch Rating	Rank	Pitcher	Pitch Rating
1	Tim Hudson	57	8	Jorge de la Rosa	53
2	Kyle Kendrick	56	9	Alfredo Simon	53
3	Hisashi Iwakuma	56	10	Jeff Samardzija	53
4	Kevin Gausman	56	11	Alex Cobb	52
5	Hiroki Kuroda	56	12	Tim Lincecum	42

Curveball

Rank	Pitcher	Pitch Rating	Rank	Pitcher	Pitch Rating
1	Jacob deGrom	59	65	Franklin Morales	38
2	Felix Hernandez	59	66	Chase Anderson	38
3	Clay Buchholz	58	67	Jered Weaver	37
4	Brandon McCarthy	58	68	Kevin Correia	26
5	David Phelps	58	69	Josh Beckett	20

Slider

Rank	Pitcher	Pitch Rating	Rank	Pitcher	Pitch Rating
1	Jordan Zimmermann	59	55	Zack Wheeler	44
2	Brett Anderson	59	56	Miles Mikolas	43
3	Wei-Yin Chen	58	57	Miguel Gonzalez	42
4	Kyle Lohse	58	58	Carlos Martinez	40
5	Corey Kluber	58	59	Yu Darvish	39

Changeup

Rank	Pitcher	Pitch Rating	Rank	Pitcher	Pitch Rating
1	Chase Whitley	60	65	Jeff Locke	30
2	Cole Hamels	59	66	Joe Kelly	27
3	Chase Anderson	59	67	Rubby de la Rosa	26
4	Hector Santiago	58	68(t)	Drew Hutchison	20
5	Jered Weaver	57	68(t)	Mike Minor	20

Screwball

Rank	Pitcher	Pitch Rating
1	Trevor Bauer	52

Knuckleball

Rank	Pitcher	Pitch Rating
1	R.A. Dickey	52

Monthly Discussion

As we can see, Cole Hamels takes the top for this month due to the strength of his overall repertoire. Hamels was classified as throwing five different pitches in July (Four-Seam, Sinker, Cutter, Curveball, and Changeup) and managed to earn at least 0.1 WAR from all five. The most valuable pitch overall in July was Ian Kennedy’s Four-Seam Fastball. The least valuable was Drew Hutchison’s Changeup. As far as offspeed pitches, Garrett Richards’s 0.5 WAR from his slider lead the way. The least valuable fastball was Hector Noesi’s cutter.

On our 20-80 scale pitch ratings, the highest rated qualifying pitch was Chase Whitley’s changeup. The lowest rated pitches were the changeups thrown by Drew Hutchison and Mike Minor, Hector Noesi’s cutter, and Josh Beckett’s curveball. The highest rated fastball was Drew Hutchison’s four-seam fastball.

Pitch Values – 2014 Season

Four-Seam Fastball

Rank	Pitcher	Pitch Value	Rank	Pitcher	Pitch Value
1	Ian Kennedy	1.9	247	Masahiro Tanaka	-0.4
2	Jose Quintana	1.7	248	Dan Straily	-0.4
3	Phil Hughes	1.6	249	Nick Martinez	-0.4
4	Jordan Zimmermann	1.6	250	Juan Nicasio	-0.4
5	Clayton Kershaw	1.5	251	Marco Estrada	-0.7

Sinker

Rank	Pitcher	Pitch Value	Rank	Pitcher	Pitch Value
1	Charlie Morton	1.5	236	John Danks	-0.3
2	Felix Hernandez	1.3	237	Wandy Rodriguez	-0.3
3	David Price	1.1	238	Vidal Nuno	-0.3
4	Chris Archer	1.1	239	Nick Tepesch	-0.4
5	Cliff Lee	1.1	240	Andrew Heaney	-0.4

Cutter

Rank	Pitcher	Pitch Value	Rank	Pitcher	Pitch Value
1	Madison Bumgarner	1.2	110	Dan Haren	-0.2
2	Adam Wainwright	1.2	111	Felipe Paulino	-0.2
3	Corey Kluber	1.2	112	Hector Noesi	-0.3
4	Jarred Cosart	1.2	113	C.J. Wilson	-0.3
5	Josh Collmenter	1.0	114	Brandon McCarthy	-0.5

Splitter

Rank	Pitcher	Pitch Value	Rank	Pitcher	Pitch Value
1	Masahiro Tanaka	0.8	32	Jake Peavy	-0.1
2	Alex Cobb	0.6	33	Franklin Morales	-0.2
3	Hisashi Iwakuma	0.6	34	Miguel Gonzalez	-0.2
4	Hiroki Kuroda	0.6	35	Danny Salazar	-0.2
5	Tim Hudson	0.4	36	Clay Buchholz	-0.4

Curveball

Rank	Pitcher	Pitch Value	Rank	Pitcher	Pitch Value
1	Sonny Gray	1.1	210	Homer Bailey	-0.2
2	A.J. Burnett	0.9	211	Alfredo Simon	-0.2
3	Brandon McCarthy	0.8	212	Felipe Paulino	-0.3
4	Adam Wainwright	0.7	213	Franklin Morales	-0.3
5	Jose Fernandez	0.6	214	Eric Stults	-0.4

Slider

Rank	Pitcher	Pitch Value	Rank	Pitcher	Pitch Value
1	Garrett Richards	1.3	179	Roberto Hernandez	-0.2
2	Tyson Ross	1.1	180	Liam Hendriks	-0.2
3	Kyle Lohse	0.8	181	Erasmo Ramirez	-0.3
4	Corey Kluber	0.8	182	Danny Salazar	-0.3
5	Ervin Santana	0.8	183	Travis Wood	-0.4

Changeup

Rank	Pitcher	Pitch Value	Rank	Pitcher	Pitch Value
1	Felix Hernandez	0.9	232	Wandy Rodriguez	-0.4
2	Stephen Strasburg	0.6	233	Matt Cain	-0.4
3	Cole Hamels	0.6	234	Jordan Zimmermann	-0.5
4	Chris Sale	0.5	235	Drew Hutchison	-0.6
5	Roberto Hernandez	0.5	236	Marco Estrada	-0.6

Screwball

Rank	Pitcher	Pitch Value
1	Trevor Bauer	0.1
2	Alfredo Simon	0.0
3	Hector Santiago	0.0
4	Julio Teheran	0.0

Knuckleball

Rank	Pitcher	Pitch Value
1	R.A. Dickey	1.2
2	C.J. Wilson	0.0

Overall

Rank	Pitcher	Pitch Value	Rank	Pitcher	Pitch Value
1	Felix Hernandez	3.5	254	Felipe Paulino	-0.5
2	Adam Wainwright	3.2	255	Juan Nicasio	-0.5
3	Garrett Richards	2.9	256	Nick Martinez	-0.6
4	Corey Kluber	2.9	257	Wandy Rodriguez	-0.8
5	Jose Quintana	2.7	258	Marco Estrada	-1.2

Year-to-Date Discussion

If we look at the year-to-date numbers, AL FIP and MLB WAR leader Felix Hernandez still sits in the top spot. Current MLB FIP leader Clayton Kershaw ranks ninth. The least valuable starter has been Marco Estrada. On a per-pitch basis, the most valuable pitch has been Ian Kennedy’s four-seam fastball. The most valuable offspeed pitch has been Garrett Richards’s slider. The least valuable pitch has been Marco Estrada’s four-seam fastball. The least value offspeed pitch has been Marco Estrada’s changeup. Needless to say, it’s been a rough year for Marco. Qualitatively, I feel fairly encouraged by the year-to-date results so far. The leaderboard is topped by two no-doubt aces, both of whom currently in the top two in their respect leagues in FIP, and Marco Estrada comes in at the bottom after posting the highest FIP among qualified starters so far. For reference, the top five in the year-to-date overall rankings are currently 1st, 12th, 10th, 2nd, and 9th on the FanGraphs WAR leaderboards respectively.

Using Rookie League Stats to Predict Future Performance

by Chris Mitchell

August 2, 2014

Over the last couple of weeks, I’ve been looking into how a player’s stats, age, and prospect status can be used to predict whether he’ll ever play in the majors. I used a methodology that I named KATOH (after Yankees prospect Gosuke Katoh), which consists of running a probit regression analysis. In a nutshell, a probit regression tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. While KATOH technically predicts the likelihood that a player will reach the majors, I’d argue it can also serve as a decent proxy for major league success. If something makes a player more likely to make the majors, there’s a good chance it also makes him more likely to succeed there. In the future, I plan to engineer an alternative methodology to go along with this one, that takes into account how a player performs in the majors, rather than his just getting there.

For hitters in Low-A and High-A, age, strikeout rate, ISO, BABIP, and whether or not he was deemed a top 100 prospect by Baseball America all played a role in forecasting future success. And walk rate, while not predictive for players in A-ball, added a little bit to the model for Double-A and Triple-A hitters. Today, I’ll look into what KATOH has to say about players in Rookie leagues. Due to varying offensive environments in different years and leagues, all players’ stats were adjusted to reflect his league’s average for that year. For those interested, here’s the R output based on all players with at least 200 plate appearances in a season in Rookie ball from 1995-2007.

Just like we saw with hitters in the A-ball leagues, a player’s walk rate is not at all predictive of whether or not he’ll crack the majors. Unlike all of the other levels I’ve looked at so far, a player’s Baseball America prospect status couldn’t tell us anything about his future as a big-leaguer. This was entirely due the scarcity of top-100 prospects in the sample, as only a handful of players spent the year in rookie ball after making BA’s top-100 list.

The season is less than 40 games old for most rookie league teams, which makes it a little premature to start analyzing players’ stats. But just for kicks, here’s a look at what KATOH says about this year’s crop of rookie-ballers with at least 80 plate appearances through July 28th. This only considers players in the American rookie leagues — the Appalachian, Arizona, Gulf Coast, and Pioneer Leagues, meaning it excludes the Dominican and Venezuelan Summer Leagues. The full list of players can be found here, and you’ll find an excerpt of those who broke the 40% barrier below:

Player	Organization	Age	MLB Probability
Kevin Padlo	COL	17	73%
Bobby Bradley	CLE	18	67%
Alex Verdugo	LAD	18	65%
Luke Dykstra	ATL	18	64%
Yu-Cheng Chang	CLE	18	59%
Magneuris Sierra	STL	18	56%
Juan Santana	HOU	19	54%
Joshua Morgan	TEX	18	50%
Jason Martin	HOU	18	49%
Edmundo Sosa	STL	18	48%
Oliver Caraballo	TEX	19	46%
Sthervin Matos	MIL	20	46%
Alexander Palma	NYY	18	45%
Eloy Jimenez	CHC	17	45%
Javier Guerra	BOS	18	44%
Zach Shepherd	DET	18	44%
Tito Polo	PIT	19	44%
Jose Godoy	STL	19	43%
Henry Castillo	ARI	19	42%
David Gonzalez	DET	20	42%
Dan Jansen	TOR	19	42%
Max George	COL	18	42%
Gleyber Torres	CHC	17	42%
Luis Guzman	WSN	18	41%
Jose Martinez	KCR	17	41%
Alex Jackson	SEA	18	40%
Emmanuel Tapia	CLE	18	40%

What stands out most is that KATOH doesn’t think any of these players are shoo-ins to make it to the majors. Even those who are hitting the snot out of the ball get probabilities that fall short of what we saw for unremarkable performances in Double-A. Kevin Padlo, for example, gets just a 73%, despite hitting a ridiculous .317/.463/.619 as a 17-year-old. Its hard to do much better than that. I think this really speaks to how little rookie ball stats matter in the grand scheme of things. A good offensive showing is obviously better than a poor one, but numbers from this level need to be taken with a huge grain of salt. A hitter’s performance against pitchers who are fresh out of high school just can’t tell us much about how he’ll fare when matched up against more advanced pitching at the higher levels.

Next up, I’ll complete the series by looking at stats from short-season A-ball. Teams at that level are also only a few weeks into their season, but at the very least, it will be interesting to see how KATOH feels about SS A-ballers in general. Next week, I’ll apply the KATOH model to historical prospects and highlight some of its biggest “hits” and “misses” from the past.

Statistics courtesy of FanGraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.

Sonny Gray, Perfecting What Works

by Owen Watson

August 1, 2014

Tip: Click on any acronyms for an explanation in the FanGraphs glossary of terms.

With his final turn in the rotation for July completed, we’ve now had almost exactly one full year of Sonny Gray – one year of the 24-year-old starting pitcher, the up-and-coming staff ace, the dueler of Playoff Verlanders. In that year, we’ve seen him do some great things, like going eight innings with nine Ks and no runs against the Tigers in Game 2 of the 2013 ALDS. We’ve also seen MLB Fan Cave forcing him to prank New Yorkers as a result of some unknown fine print embedded in his rookie contract. Above all else, the one thing we’ve always known is that Sonny Gray has a really good curveball. Let’s take a look at it for all of its 12 to 6, 80-MPH Uncle Charlie glory, from a game against the Astros in August of last year:

Gray_Curve_Early_2

How good is his curveball? He has never given up a home run off of the pitch, with the only extra-base hits against the curve in his career being four doubles. In the past calendar year, Sonny Gray has saved more runs with his curveball than any other pitcher in baseball, and is behind only Corey Kluber and Yu Darvish in Runs Saved/100 curveballs. Having watched Kluber a lot, I suspect his slider/slurve is actually being classified as a curveball; I think it looks like a slider, but PITCHf/x doesn’t, so I will defer to the all-knowing pitch computer. Regardless, with the metrics we’re about to examine, Sonny Gray has one of the best curveballs in the game. What we’re going to focus on specifically are the advances in his curve’s effectiveness, spurred on by an adjustment in the way he throws the pitch.

To start, let’s take a look at the top-15 starters by wCB and w CB/C for the past calendar year:

wCB_Leaders

As stated before, Gray is at the top in both of these categories. We should put a little more stock into wCB/C, as it normalizes all pitchers to runs saved per 100 pitches, taking away the advantage that one player might have due to throwing a certain pitch more frequently than another player. This is important for what we’re looking at, because Sonny Gray throws a lot of curveballs. How frequently does he throw curveballs? Here are the leaders for percentage of curveballs thrown over the last calendar year:

Screen Shot 2014-07-29 at 9.03.14 PM

The words “second only to Scott Feldman” don’t come up very often, but here they are. Gray throws his curveball a ton. Not only has he always leaned on the curve as a major weapon in his arsenal, but he has actually increased his number of curves thrown since he came into the league every month except for May (when he maintained his % thrown) and June of this year, when he seemed to temporarily lose a feel for the pitch and threw more changeups. However, his first start of July had Gray saying this after holding Toronto to one run over seven innings:

“That was the idea, to really get (it) going again,” Gray said of the curveball. “I think the last five or six starts it’s been OK, but it hasn’t been a big factor. We did some things a little different this week and I was able to find that again.”

Over the last 30 days, Gray has thrown the curveball more than ever, up to over 32% for the month. Not only that, he has found more effectiveness in the pitch, with his whiff % on the curve up to a career-best 19.2% during July. There’s also reason to believe that this isn’t simply a good month for Sonny Gray’s curveball – what we are now seeing is the fruition of a change of approach with the way he throws the pitch that has been coming for some time now. Let’s take a look.

Here we have the release speed of Sonny Gray’s curveball for every start since he was called up:

He’s throwing the curve harder than he ever has, adding over three miles per hour since he started pitching in the majors. That’s not a small change. On top of the speed increase, he’s cut about 2.5 inches of vertical movement off his curve between his first start in the majors and now:

Finally, he’s added more three-dimensional depth to his curve in the form of a top-3 best horizontal movement over the past calendar year. Only Corey Kluber and Charlie Morton have had better horizontal movement on their curves in that time period.

Add all of that up, and we have this 84-MPH curve from his last start against the Orioles:

Gray_Curve_Late_2

It now looks more like a slurve, with its high release speed and nasty late break away from right-handed hitters. As Eno Sarris included in his great article from October of last year, Gray said he “adds and subtracts” with the same grip on his curve to move between the 12-to-6 and slurve (which is sometimes classified as a slider) varieties. However, it seems as if he has leaned more toward the slurve option as time has gone on.

One question that arises out of this is “why throw the slurve more?”

Given his whiff % on the curve has increased as he has added velocity, I’d say that fact alone has supported the move to the slurve over the 12-to-6. However, there’s another potential reason that isn’t strictly rooted in statistics, and could be more about what goes into an elite pitching approach: by increasing his arm speed and flattening out the vertical movement of his curve, Gray can further deceive batters into thinking he’s throwing hard pitches before the bottom drops out. His struggles to find consistency with the changeup are well documented, so why shouldn’t he adjust his best breaking pitch to better fool hitters for whiffs and weak contact? As we’ve seen with Yu Darvish, the pinnacle of an ace approach may be one that includes a “great convergence” of arm slots and release points, in which every pitch looks hard until it’s not, or until it is.

Gray’s horizontal release points for all of his pitches are closer to one another than they ever have been during his major league career. Not surprisingly, his curveball and fastball were released on average at the almost identical horizontal point during his May and July starts, when he posted career-best whiff rates on his curveball (18.6% & 19.2%, respectively). June was an aberration, as Gray seemed to lose his release point in general and was tinkering with his delivery, leaning more on the changeup:

Release_Points

Sonny Gray has work to do on parts of his game before he takes the next step into the true elite of starting pitchers. His walk rate has actually increased this year to 8.5%, owing mostly to a lack of fastball command in deep counts, and his changeup is still very much a work in progress as a third pitch. However, his adoption of the hard curve and syncing of arm angles is a positive step toward dominance, and is a sign that he knows what works; he’s now perfecting it.

And now, my first go at a DShep Darvish-like GIF of Sonny Gray’s 12-to-6 curve from last August along with his harder slurve from his last start to compare:

xHitting (Part 4): 2014 Fantasy Edition!

by samyoung

July 29, 2014

Welcome to the fourth installment of xHitting! As always, reader comments and feedback are super encouraged and appreciated. (Links to parts one, two, and three)

Briefly recapping the method, the gist is to estimate the expected rate of each individual hit type based on a player’s underlying peripherals, and in turn recover all the needed components to compute expected versions of wOBA, OPS, etc. The only real change to the model since last time is that I now utilize a “hybrid” predicted home run rate, that averages between actual and (raw) predicted home run rate, with the weight given to actual HR rate increasing in the number of plate appearances. (This is explained in part three, for those curious.)

Perhaps the more exciting change, though, is that this time I actually have results for an ongoing season, which potentially can help for fantasy purposes. (Not that most readers need my help necessarily.) Related to fantasy usage, there were a few requests to see a full spreadsheet of past results (2010-2013 seasons), which I have posted here. Again feel free to take it or leave it at your leisure.

Note: I collected most of these data at the All-Star Break, so numbers may be a few weeks behind, but they’re still mostly true. Also, for time considerations I only fetched 2014 stats for qualified leaders. This even leaves out a few big names, but I couldn’t justify time to fetch every player.

So far, I’ve typically posted the biggest “over-” and “under”-achievers for a given season. And I suppose I’ll continue that tradition today. But while these lists are useful for highlighting which players seem most likely to regress, it overlooks another main use of the model, which is to assess the realness of a player’s apparent “breakout” or “decline;” at least in-sample. (In some cases, the model may think that a player’s breakout is entirely justified, given peripherals, while others it may view more skeptically.) Thus, today I’ll also post a second list, of players who seem to have taken a pronounced step forward/step back this season, and what the model thinks of their season-to-date performance.

Okay, time for results! I’ll start with the list of “over-” and “underachievers.”

2014 Underachievers (1st half)				2014 Overachievers (1st half)
Name	wOBA	xWOBA	Diff	Name	wOBA	xWOBA	Diff
Jean Segura	0.256	0.305	-0.049	Casey McGehee	0.345	0.277	0.068
Chris Davis	0.306	0.353	-0.047	Yasiel Puig	0.398	0.340	0.058
Mark Teixeira	0.352	0.397	-0.045	Matt Adams	0.376	0.324	0.052
Gerardo Parra	0.289	0.327	-0.038	Mike Trout	0.428	0.381	0.047
Brian McCann	0.298	0.330	-0.032	Marcell Ozuna	0.343	0.300	0.043
Torii Hunter	0.323	0.355	-0.032	Lonnie Chisenhall	0.396	0.359	0.037
Joe Mauer	0.308	0.340	-0.032	Scooter Gennett	0.355	0.320	0.035
Jimmy Rollins	0.320	0.352	-0.032	Marlon Byrd	0.344	0.309	0.035
Brian Roberts	0.304	0.334	-0.030	Giancarlo Stanton	0.397	0.363	0.034
Buster Posey	0.326	0.352	-0.026	Hunter Pence	0.359	0.325	0.034

A general pattern I notice is that, having worked with this model for a while now, there do seem to be players that give the model some trouble and have a disproportionate tendency to appear on this list from year to year. A few of these players appear on this list… more on that later.

Partly for that reason, I wouldn’t necessarily say to “buy low” the guys on the left, nor “sell high” the guys on the right; although you can if you want. I won’t address every player, but I have some scattered comments:

For readers who prefer OPS, .020 wOBA translates to about .050 OPS, on the margin.
.397 predicted for Teixeira? Not sure where that came from…
Poor Segura. All things considered, I think nobody deserves a big second half more than he does.
Whatever happened to Casey McGehee’s power? The guy once hit 23 home runs in a season, but now has ISO of .073, with surprisingly low fly ball distance.
Although Chisenhall’s breakout is not as impressive if you take out what the model thinks is luck, it’s still a pretty impressive improvement.
Chris Davis is sort of the reverse of Chisenhall. Adding back in what the model thinks has been bad luck, he’s still way down from what he did last year, but not nearly as disappointing as he probably has been to many owners thus far.

As mentioned, certain players do seem to be able to over/underperform the model somewhat consistently; the same way we think some pitchers are usually better or worse than their FIP. With now 4.5 years of data to work with, however, I think I can make educated guesses about which players systematically deviate from the model predictions. I’ll term this deviation the “player fixed effect.”

(Requiring at least 1000 PA from 2010 through 2014 first half)

Model loves too much		Model loves too little

Name	Player FE estimate (wOBA)	Name	Player FE estimate (wOBA)
Brian Roberts	-0.033	Wilson Betemit	0.032
Todd Helton	-0.026	Brandon Moss	0.032
Jean Segura	-0.026	Ryan Sweeney	0.028
Jose Lopez	-0.025	Mike Trout	0.027
Mark Teixeira	-0.025	Peter Bourjos	0.026
Russell Martin	-0.024	Matt Carpenter	0.025
Darwin Barney	-0.023	Brandon Belt	0.025
Chris Getz	-0.023	Melky Cabrera	0.025
Jimmy Rollins	-0.021	Carlos Ruiz	0.024
Jason Bay	-0.020	Chris Johnson	0.024

Comments:

Again, .020 wOBA is equivalent to about .050 OPS, on the margin.
Taking out their apparent fixed effect, Teixeira is only underperforming his xWOBA by about .020, and Brian Roberts is actually doing about par.
On the reverse side, Mike Trout’s “adjusted” xWOBA jumps up to .408, where really it probably doesn’t surprise us that he’s outperforming even that, since he’s Mike Trout. And although Giancarlo Stanton misses the Top 10 cutoff above, his apparent fixed effect of +.022 would be 11th; so his “adjusted” xWOBA is more like .385.
Yasiel Puig (.058) would also be on the list of “positive fixed effects” if we relaxed the PA requirement (he has 826 during this time). And Matt Adams (~.040) might also be well on his way to that list; although he has fewer plate appearances still than Puig.
I don’t really have good explanations/know any common themes for players with negative fixed effects. Maybe readers can help?
For Trout, home runs are pretty clearly the area where the model underestimates him. In any given season (2010-2014), he hits about twice as many HR as the model thinks he should in the “raw” prediction.
And Trout’s not the only “HR rate defier,” either; just the most salient. In general, the model has never done as well with home runs as it does with singles, doubles, and triples. It seems there are other important determinants of home run hitting that really should be in the model, but currently are not. Intuitively, I sort of would like velocity and angle of the ball off the bat, but so far have not found a good data source to actually include these. (Maybe that will change in the coming years as MLBAM releases “Hit F/X” style data?) Until then, reader suggestions are also super welcome here.

And now, finally, for the other usage: here’s a partial list of players who have taken either a pronounced step forward or back this season, relative to established norms.

2014 “Decliners”				2014 “Improvers”
Name	Career wOBA	2014 wOBA	2014 xWOBA	Name	Career wOBA	2014 wOBA	2014 xWOBA
Nick Swisher	0.352	0.285	0.305	Michael Brantley	0.324	0.394	0.404
Joe Mauer	0.373	0.308	0.340	Lonnie Chisenhall	0.328	0.396	0.359
Allen Craig	0.350	0.289	0.309	Seth Smith*	0.334	0.389	0.356
Billy Butler	0.352	0.300	0.309	Victor Martinez	0.362	0.416	0.422
Evan Longoria	0.365	0.315	0.323	Jonathan Lucroy	0.342	0.383	0.354
Domonic Brown	0.315	0.267	0.267	Anthony Rizzo	0.342	0.382	0.382
Chris Davis	0.351	0.306	0.353	Nelson Cruz	0.356	0.393	0.380
Matt Holliday*	0.385	0.342	0.318	Jose Altuve	0.319	0.356	0.325
Jean Segura	0.299	0.256	0.305	Brian Dozier	0.311	0.344	0.362
David Wright	0.377	0.335	0.305	Kyle Seager	0.334	0.367	0.344
Buster Posey	0.366	0.326	0.352	Dee Gordon	0.297	0.329	0.318
Shin-Soo Choo	0.369	0.333	0.346	Alcides Escobar	0.284	0.312	0.300
Dustin Pedroia	0.356	0.325	0.337	Casey McGehee	0.321	0.345	0.277
Jed Lowrie	0.327	0.297	0.305
Jay Bruce	0.343	0.315	0.326

* – To avoid inflation from Coors Field, for these players I’ve taken the total from 2011-13 seasons only

Comments:

At least in-sample, Brantley’s breakout seems to be pretty much entirely justified. Of course this doesn’t mean that he won’t regress somewhat, but if I were to guess, I’m a little more optimistic than ZiPS and Steamer (which currently project .341 and .333 RoS, respectively). Similar deal for some others.
“Yikes” for Billy Butler and Domonic Brown, whose declines this season seem (at least in-sample) to be entirely justified.
I’m not sure why the model dislikes Casey McGehee so much. Obviously his fly ball distance (mentioned earlier) isn’t doing him any favors, and his .369 first-half BABIP is probably unsustainable. Still, .277 xWOBA? Seems harsh.

As with any fantasy advice, don’t take any of this too literally… Take it or leave it as you see fit.

Lastly, although I hyped this piece from a fantasy perspective, the overall goal remains that I would love to see more work done to de-luck hitter stats, the way people do so often for pitchers. (FIP for pitchers, and xWOBA or xWRC+ for hitters! Is the dream.)

Reader thoughts on how to improve the model, or requests for players not already mentioned?

Looking at Attendance after Aces are Dealt

by Michael Lortz

July 27, 2014

As baseball season and the summer months heat up, so too do the trade rumors. Almost every year, baseball media and fans postulate and prognosticate who might be traded before the annual trading deadline.

This year, the big fish on the market is Rays left-hander David Price. With only one year left on his contract, it is unlikely the Rays can afford to keep the former Cy Young Award Winner. But with the team winning eight in a row and 19 of their last 24, trading their ace doesn’t seem like a sure deal anymore. Most recent reports say the Rays management will wait until the absolute last minute to make a decision on if, where, and for whom the popular lefty will be traded.

With the Rays’ status with regards to popularity and market, some of the talk in regards to trading David Price has wound into the realm of attendance. The Rays are currently last in the Major Leagues in attendance, and some are concerned attendance could drop even lower if they traded their best pitcher. There are those who think Rays fans would consider the trade a message from ownership to wait until next year. And if that’s the message, why not wait until next year to buy a ticket?

To estimate how Rays attendance might react to a possible trade of David Price, I looked at 12 prior trades of ace pitchers over the last 37 years. Via Baseball-Reference.com, I looked at attendance before and after each trade. I also looked at winning percentage before and after.

My goal is to see if two maxims hold true:

Attendance goes up when teams win and goes down when teams lose.
A team that trades its best pitcher will have a worse record after the trade.

Hence, if attendance is attached to winning and ace pitchers are attached to winning, attendance should drop after ace pitchers are traded.

Is this really the case? Or is attendance in some cities more sensitive to major trades than others?

Let’s begin by looking at the granddaddy of superstar pitcher trades: the Tom Seaver trade. On June 15, 1977, after a slight tiff with ownership, the Mets shipped the franchise’s first ace to the Reds for Steve Henderson, Pete Flynn, Pat Zachary, and Dan Norman. The Mets were bad before but worse after and attendance followed suit.

Twelve years later, in 1989, two aces were traded during the season. On May 25th, the Mariners moved ace Mark Langston to the Expos for a bevy of prospects headlined by future ace Randy Johnson. Mariners fans reduced their attendance by nearly the same amount Mets fans did in 1977. Although playing .500 baseball prior to the trade, the Mariners winning percentage dropped significantly after the trade.

Two months after the Langston trade, the Minnesota Twins traded 1988 Cy Young Award winner Frank Viola to the Mets for Rick Aguilera, Kevin Tapani, and three other pitchers. The Twins were two games under .500 at the time of the trade, and then played .500 after the trade. Despite their slight improvement, attendance dropped 12.95% after the Viola trade.

We fast-forward to 1998 and another Mariners trade. During the 1998 season, the Mariners dealt the aforementioned Johnson to the Astros for Freddy Garcia, Carlos Guillen, and John Halama. While Johnson immediately did well in Houston, the Mariners played better after his departure, going 28-25 after the trade. Like the 1988 Twins, however, the positive play did not lead to an increase in attendance, as the average per game attendance went down after the trade.

Our next trade is the Bartolo Colon trade in 2002. On June 27, 2002, the Indians shipped Colon and Tim Drew to the Expos for Cliff Lee, Grady Sizemore, Brandon Phillips, and Lee Stevens. The Indians played .467 baseball before the trade and a lesser .447 clip following the deal. Attendance, however, jumped after the trade, up 10.04% over the team’s final 45 games.

We look at Cleveland again in 2008, when the Indians moved CC Sabathia to the Milwaukee Brewers for Michael Brantley, Matt LaPorta, and three other players. After trading Sabathia, the Brewers vastly improved their record, finishing the season 44-30. Attendance also went up after the Sabathia trade, from 25,964 to 27,766 per game, an increase of 6.94%.

The 2009 season saw the trade of three high profile pitchers. Two were legitimate aces, and the other a former ace that might give us insight to a Rays attendance prediction.

The first major pitcher trade in 2009 again involved the Indians. On July 29th, the Tribe shipped Cliff Lee and Ben Francisco to Philadelphia for Jason Knapp, Carlos Carrasco, Jason Donald and Lou Marson. Unlike the Colon or Sabathia trades, following the Lee trade, the Indians winning percentage and attendance per game both decreased.

Two days after the Indians traded Lee, the San Diego Padres moved right-hander Jake Peavy to the Chicago White Sox for Clayton Richard and three other players. Like the Twins in 1989 and the Mariners in 1998, the Padres played better after moving their ace, finishing the remaining 59 games with a 34-25 record. Unfortunately, also like the ’89 Twins and ’98 Mariners, less fans came out to see their now-winning team.

Our final pitcher trade of 2009 occurred on August 29th, when the Rays moved former ace Scott Kazmir to the Angels for Sean Rodriguez, Alex Torres, and Matthew Sweeney. Kazmir was no longer the Rays ace in 2009, handling over the title to James Shields and the up-and-coming David Price. But Kazmir still had name value in the Tampa Bay area, despite his decreased effectiveness.

After trading Kazmir, the Rays stumbled to a 15-20 finish. They went from being 4.5 games out of the wildcard to finishing 11 games out of the playoffs. Per game attendance following the Kazmir trade also dropped considerably, from 24,169 per game to 19,574 per game. This attendance decrease of 19.01% is the biggest drop of any of our surveyed trades.

The next year, two of our most frequent subjects collided when the Mariners traded Cliff Lee. After signing with Seattle in the offseason, Lee was sent to the Rangers for the stretch run. After the trade, the Mariners, who had played .400 baseball prior to trading Lee, finished the season with a .350 winning percentage and saw attendance drop 4.99% over the last 39 home games.

In 2012, the Brewers were on the dealing side when they sent Zack Grienke to the Angels for Jean Segura and two other players. While the Brewers were 10 games under .500 before the trade, they reversed fortune after the deal, going 39-25, a .609 clip. Attendance also increased after moving Grienke, albeit by 124 fans per game, or only 0.3%.

In our final trade, we look at the Chicago Cubs. Prior to trading Matt Garza on July 22, 2013, the Cubs were 10 games under .500 and averaging exactly 33,000 fans per game. After trading Garza, the Cubs dropped to 30 games under .500 and lost 919 fans per game in the seats, a 2.78% decrease.

There are many other trades and fanbases I could have looked at (the Ubaldo Jimmenez trade in 2011 comes to mind), but this small sample set gives a wide spectrum of possible outcomes resulting from trading an ace pitcher. From what we looked at, we found:

50% of the data set decreased in both record and attendance
25% increased in record and decreased in attendance
16% increased in both record and attendance after trading their ace
8% decreased in record but increased in attendance

The Indians are particularly interesting, seeing a different outcomes each time they traded an ace. The Mariners saw an attendance drop after both the Langston and Johnson trades but played better after trading Johnson and worse after moving Langston. Perhaps Langston had a bigger effect on the team in 1989 than Johnson did in 1998.

So what would happen if the Rays traded David Price? Given their current winning streak and the attendance sensitivity seen after the Kazmir trade, my initial estimate would have them in the same category as the 1989 Twins, 2009 Padres, and 1998 Mariners – an improved winning percentages but lower attendance. An better record post-trade might not be difficult considering the beginning of the Rays season was a disaster marred by injured players who are slowly returning (Alex Cobb, Jeremy Hellickson, David DeJesus, and possibly Wil Myers).

But with the Rays struggling to fill seats, moving fan favorite David Price might be a bad public relations move. From the studies I have done, games David Price has pitched in have drawn 6% more than average. That could be because Joe Maddon sometimes aligns the rotation so Price faces prime opponents such as the Yankees and Red Sox, teams that traditionally draw well at Tropicana Field. But some of Price’s “bump” could be the allure of seeing one of the best pitchers in the American League.

My estimate is the Rays would suffer an initial attendance drop if they traded David Price. Games against the Red Sox and Yankees (especially Jeter’s last series in Tampa Bay) will continue to do well. Bobbleheads and other promotions will also do well (expect a good turnout for the Don Zimmer sno-globe). And if the team plays well enough to contend, attendance may recover, but even then, the Rays won’t average over 20,000 per game.

Then again, doubtful they would draw 20K on average even with David Price in the rotation.

Using Double-A Stats to Predict Future Performance

by Chris Mitchell

July 23, 2014

Over the last couple of weeks, I’ve been looking into how a players’ stats, age, and prospect status can be used to predict whether he’ll ever play in the majors. I used a methodology that I named KATOH (after Yankees prospect Gosuke Katoh), which consists of running a probit regression analysis. In a nutshell, a probit regression tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. While KATOH technically predicts the likelihood that a player will reach the majors, I’d argue it can also serve as a decent proxy for major league success. If something makes a player more likely to make the majors, there’s a good chance it also makes him more likely to succeed there.

Things that were predictive for players in low-A and high-A included age, strikeout rate, ISO, BABIP, and whether or not he was deemed a top 100 prospect by Baseball America in the pre-season. However, a player’s walk rate was not significant in predicting a player’s ascension to the majors. Today, I’ll look into what KATOH has to say about players in double-A leagues. For those interested, here’s the R output based on all players with at least 400 plate appearances in a season in double-A from 1995-2010. Due to varying offensive environments in different years and leagues, all players’ stats were adjusted to reflect his league’s average for that year.

Unlike in the A-ball iterations of KATOH, a player’s double-A walk rate is predictive — albeit only slightly — of whether or not he’ll make it to the show. While walk rate is statistically significant, it still matters much less than the other stats: it takes 3 or 4 percentage points on a player’s walk rate to match what 1 percentage point of strikeout rate does to a player’s MLB probability.

This version is also different in that there are a couple of significant interaction terms, signified by the last two coefficients in the above output. The “I(Age^2)” term adds a little bit of nuance into how a players’ age can predict his future success. While the “ISO:BA.Top.100.Prospect” term basically says that if you’re a top 100 prospect, hitting for power is slightly less important than it would be otherwise. Hitting for power and making Baseball America’s top 100 list both make a player much more likely to make it to the majors, but if he does both, he’s a tad less likely to make it than his power output and prospect status would suggest independently. Put another way, a few top 100 prospects hit for power in double-A, but never cracked the majors — such as Jason Stokes (.241 ISO), Nick Weglarz (.204 ISO) and Eric Duncan (.173 ISO). But virtually all of the low-power guys made it, including Elvis Andrus (.073 ISO), Luis Castillo (.076 ISO), and Carl Crawford (.078). For non-top 100 guys, many more punchless hitters topped out in double-A and triple-A.

By clicking here, you can see what KATOH spits out for all current prospects who logged at least 250 PA’s in double-A as of July 7th, as well as a few that fell short of the cutoff — most notably Joey Gallo, Kevin Plawecki, and Robert Refsnyder. Topping the list is Mookie Betts with a probability of 99.95%, and of course the prophesy was fulfilled when the Red Sox called up the 21-year-old last month. Here’s an excerpt of the top players from double-A this year:

Player	Organization	Age	MLB Probability
Mookie Betts	BOS	21	100%
Francisco Lindor	CLE	20	100%
Gary Sanchez	NYY	21	99%
Austin Hedges	SDP	21	99%
Alen Hanson	PIT	21	99%
Jorge Bonifacio	KCR	21	98%
Blake Swihart	BOS	22	98%
Kris Bryant	CHC	22	93%
Ketel Marte	SEA	20	91%
Rangel Ravelo	CHW	22	90%
Robert Refsnyder	NYY	23	86%
Jake Lamb	ARI	23	85%
Jake Hager	TBR	21	84%
Darnell Sweeney	LAD	23	83%
Joey Gallo	TEX	20	82%
Preston Tucker	HOU	23	81%
Scott Schebler	LAD	23	79%
Kevin Plawecki	NYM	23	79%
Cheslor Cuthbert	KCR	21	78%
Kyle Kubitza	ATL	23	77%
Michael Taylor	WSN	23	76%
Christian Walker	BAL	23	76%
Ryan Brett	TBR	22	75%

Keep an eye out for the next installment, which will dive into what KATOH says about hitters at the triple-A level.

Statistics courtesy of FanGraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.

Do Rookie Hitters Decline in the Second Half?

by Devin Jordan

July 23, 2014

Do rookies perform worse after the All-Star break?

My claim over this statement is nonexistent, while the original thought of its occurrence was brought to my attention by Adam Aizer on the CBS Fantasy Baseball Podcast.

My judgment dissuaded, I thought that it would be worth the effort to look into the validity of the statement.

From the perspective of an offensive player, rookies infrequently make enough of an impact in the size of leagues (i.e. 10-team and 12-team leagues) that pedestrian Fantasy Baseball players occupy. For those sizes of leagues that the aforementioned owners participate in, a rookie hitter that is worth owning is either an elite prospect or a player that has preformed beyond their true talent level. As a result, the former is rare, while it would make sense for the latter to regress to their true talent level and is more common than the former. The idea that rookie hitters decline throughout the year is just a misevaluation of the player’s true talent level.

To put another way, it is the same logic that comes into play with a recent event: the Home Run Derby. Players that participate in the Home Run Derby are players that have exceptional first halves, which are often beyond their true talent level. These players often perform worse in the second half than they did in the first half, not because they participated in the monotonous and dated event that has become the Home Run Derby, but because, just like the rookies who perform worse in the second half of the season than the first, they have regressed toward their true talent level; when the rookies regress, they have just regressed to the point where they are not ownable.

The research looks at all player seasons between 1988 and 2013 where a batter was in their first season, had 250 plate appearances in the first half of the season, and had 250 plate appearances in the second half of the season.

Screen Shot 2014-07-20 at 8.48.48 PM

The rookie second half decline and the post Home Run Derby slump intuitively make sense, but intuition does not always bear truth. Through cognitive ease we rationalize that “Swinging that hard for that long throws off your timing”; “A rookie is too young to be able to make it through the long hot summer.”

Because most fantasy leagues are small, the only reason that the common rookie was on our teams to begin with is because they had to play beyond their ability in the first half of the season. The rookie who is on our team right now, unless he is a reputable prospect, is probably a safe bet to decline. But as a whole, we can see that there is no decline in rookie performance based on first half/second half splits.

Our desire to perceive a decline is just our desire to hold onto our ability as talent evaluators. We know that Yangervis Solarte is a great player, and the only reason he hasn’t been able to sustain his performance is because he is rookie that can’t play out the season: common baseball logic. In actuality, Solarte was not as good as some originally thought, and his true talent was never good enough to be on a 10 or 12 team league.

Summary:

Rookie hitters, as a generalization, are not good enough to play in 10 or 12 team leagues, and, as a generalization, those that do play in ten team leagues regress to their true talent level, which is not valuable enough to be ownable.

Devin Jordan is obsessed with statistical analysis, non-fiction literature, and electronic music. If you enjoyed reading him, follow him on Twitter @devinjjordan.

Bringing Bill James’ Famous Arbitration Case to 2014

by Jim_Turvey

July 20, 2014

“I helped prepare arbitration cases for George three straight years in the 1980’s… George had led the American League in errors the first year that we prepared a case for him. We were wondering what to do about that, so I drew up an exhibit entitled ‘What Was the Cost of George Bell’s Errors?’ The exhibit showed that while Bell had led the league in errors with 11, none of the errors had actually cost his team anything. Of the 11 errors, only about three led to unearned runs, all had occurred in games which Toronto had won anyway, and in those three games, Bell had driven in something like seven runs.”

Bill James, The New Bill James Historical Abstract

The case that Bill James made for George Bell in 1985, and later informed his readers about when he released his Historical Abstract, always fascinated me. As someone who is a big believer that fielding metrics have a long way to go (especially behind the plate), this arbitration case was my Zihuatanejo, that far away place that always gave me hope that errors were really as pointless a statistic as they seemed.

However, as Bill James points out in the rest of George Bell’s player ranking, the fact that nothing came of Bell’s errors in 1985 (his first arbitration year), as well as 1986 and 1987, when James used the same exhibit, was rather noteworthy. Although errors are definitely not the be all and end all of fielding statistics, one would have to imagine that some ill had to come of them, at some point, right?

With the All-Star break upon us, and sadly no real baseball for the last four days, the chance to finally look into this idea of how much errors actually cost the erring player’s team, presented itself. At the halfway point, there were exactly 20 players who had committed 10 or more errors in 2014. Since there was time to kill without baseball on, I decided to pour over some box scores and figure out just how much each of those leading “error-men” had cost their teams. Using baseball-references fielding game logs, it was easy to find the games in which each player had made their errors, and then going through the play-by-play made it (usually) straightforward as to whether their error led to a run or not.

For this study, I created a chart with columns for all of the parts mentioned in Bill James arbitration case: total errors, unearned runs as a result of those errors, games that the team lost when that player committed an error, and RBI in those games that were lost. The final column (RBI in games lost) was tweaked a tiny bit due to the inclusion of one other column. The column added was one called “true losses.” This was the measure of how many games the team lost by equal to, or fewer runs, than the player’s error cost the team. For example, if Pedro Alvarez made an error that cost his team three runs, and the Pirates lost 4-3, that would be a true loss. Or, if Derek Dietrich made an error that cost his team one run, and the Marlins lost 3-2, that would also be a true loss. Finally, if the game went to extra innings and was a loss, any error worth one run or more was counted as a true loss. Therefore, if Josh Donaldson committed an error which cost his team only one run and then the A’s lost 10-8, but that final came in extra innings, then that would still count as a true loss because the extra innings would have never occurred (hypothetically).

Now this is obviously not a foolproof study. There is no way to say for sure that the error committed for one run was any more the cause of the loss than the pitcher who gave up the home run the next inning. It is also starting to get into a bit of a messy “Butterfly Effect” situation, meaning that there is no way of knowing how the rest of the game (or our lives, bro) would be different if Jose Reyes hadn’t booted that grounder in the fifth inning.

However, it was a fun study to put together, and it can be revealing into how little (or in poor Starlin Castro’s case, how much) errors truly change a game. Here’s the official chart:

What Was the Cost of Player X’s Errors?

Name		Errors	UER from E	Team L’s	True L’s	RBI in True L’s
Pedro Alvarez	3B	20	11	11	4	4
Josh Donaldson	3B	15	6	5	1	0
Ian Desmond	SS	15	10	8	2	2
Asdrubal Cabrera	SS	14	12	9	1	0
Jose Reyes	SS	13	7	9	2	0
Brandon Crawford	SS	13	6	5	0	0
Lonnie Chisenhall	3B	13	6	5	0	0
Everth Cabrera	SS	13	7	6	0	0
Brad Miller	SS	13	7	5	1	0
Martin Prado	3B	12	13	8	2	2
Jonathan Villar	SS	12	14	8	0	0
David Wright	3B	11	5	4	1	0
Starlin Castro	SS	11	12	6	5	0
Jean Segura	SS	11	8	1	0	0
Elvis Andrus	SS	11	7	8	0	0
Yan Gomes	C	11	4	6	0	0
Chris Owings	SS	11	8	7	2	1
Derek Dietrich	2B	11	6	5	1	0
Jarrod Saltalamacchia	C	10	5	7	1	0
Hanley Ramirez	SS	10	7	7	1	0

Key: UER from E – unearned runs from errors; Team L’s – team losses; True L’s – true losses (described above); RBI in True L’s – how many RBIs the player had in said True Loss games

Let’s tackle this table column by column.

Well, I don’t think a historiography of each player’s name is necessary in today’s article, so let’s skip over to the position column. It is interesting to note how many left-side of the infield players there are atop the error leaderboard. There’s nobody from the outfield to be found (the “top” outfielder per errors is Sports Illustrated cover boy, George Springer with seven), and there are only three players that don’t hail from third base or short stop as their main position. One branch off of this study that could be interesting would be to look at whether or not there was a correlation between a player’s position on the diamond, and how frequently an error led to runs or “true losses.” My gut instinct would be to guess no, but maybe errors in the outfield are often for more bases, and therefore more likely to lead to a run – just a hypothesis.

Jumping over to the errors column, Alvarez’s 20 errors stood out, as the difference between his total and the second place total is the same as the difference between second place total and the bottom of our table. In fact, seeing that high total made me curious as to just how many errors it would take to get into the record books. Well, if you’re including the entire history of baseball, the answer is: like a bajillion. Obviously the game was entirely different, but it’s hard to imagine that Herman Long’s 122 errors in 1889 weren’t embarrassing even back then. The record for errors in a single season since 1952 is 44 by Robin Yount in 1975, and the record since 1980 is Jose Offerman with 42 in 1992. So while Alvarez’s 20 errors may be pacing the league by a good margin now, it’s fair to say he won’t be joining even the modern record books this season.

The next column looks at unearned runs derived from each player’s errors, and the variance is quite extreme. With a range from only four runs (it’s interesting to note that the catchers have the two lowest unearned runs tallies, maybe that positional study would provide some analysis after all) all the way up to 14, there doesn’t seem to be too close of a connection between the amount of errors and the amount of unearned runs. For instance, Josh Donaldson has committed three more errors than Jonathan Villar in 2014, but Villar’s errors have led to eight more runs. This brings up the question of whether unearned run prevention is simply luck, or whether some teams (and pitchers) respond better after an error is committed in the field.

The A’s are one of baseball’s best teams, and have an excellent pitching staff, so it isn’t too surprising that Donaldson’s unearned runs are among the lowest, especially in comparison to how many errors he has committed. On the other end of the spectrum are players like Altuve and Castro who play on rebuilding teams, and it is unsurprising to see their names next to some of the highest unearned run totals. However, there is most certainly a lot to be said for luck playing a role in how many unearned runs come along after an error. For example, teammates Asdrubal Cabrera and Lonnie Chisenhall find themselves on opposite ends of the spectrum in terms of unearned runs after errors, a definite sign of the role random chance plays in unearned run prevention.

One other note on the extreme variance in unearned runs tied to errors. The variance could also come as the result of what kind of error was made. A bobbled ball that never even gets thrown across the infield does only one base of harm; whereas, an overthrow (many of Alvarez’s errors) may lead to two bases of harm. One could also try to really dig deep into this data and see if younger, more inexperienced players were more likely to commit errors late in games, when the pressure was ratcheted up, and maybe those errors were more likely to be costly. However, with this study, the idea is simply to get a feel for another way of looking at errors, and the main point that remains here is that there is a lot of luck to whether a player’s error costs his team a run or not.

There isn’t a whole lot to be said about the team losses column, as committing an error does indeed swing the pendulum (or WPA chart) towards a loss, but so minimally that it wouldn’t even bother one of Poe’s victims. For instance, implying that Jean Segura (only one team loss in games he committed an error) timed his errors better than Elvis Andrus (eight team losses in games he committed an error) is really just saying that the Brewers are better than the Rangers; which they are, but that doesn’t reflect on the individual player at all. That comparison is especially interesting given that Andrus’ errors have actually led to fewer unearned runs than Segura’s.

The next column, the “true losses” column, is where the fallacy of the error as a statistic truly shows its colors. The only players who cost their teams more than two wins in the first half (with teams having played well over 90 games in 2014, so far) were the league leader, Alvarez, and the incredibly unlucky Starlin Castro. Castro’s case could be an entire article itself, and the poor timing of his errors is remarkable. The fact that the Cubs have only lost six games in which he has committed an error, and five of those can be considered “true losses” is very much a statistical anomaly. Consider that in this chart there are 124 team losses outside of Castro’s Cubs. Of those 124 losses, 19 were true losses, or just over 15 percent. In Castro’s case, over 83 percent of his team losses were true losses, such a far outlier it warrants special attention.

Even when including Castro’s remarkable true loss numbers, the percent of losses that could be considered, even hypothetically, the erring player’s fault is merely 18.5 percent, and that’s not even accounting for all the games that the team’s still won in which one of the listed player’s committed an error. This is a good time to point out that this study obviously does not take into account any of the good, run-saving plays that these fielders make, and even still the total impact on a team is minimal. As seen in Pedro Alvarez’s row, he drove in plenty of runs in those games in which he cost his team, and with his strong range, some of those errors he made likely would have been singles, with the majority of third baseman failing to even get to the ball. Josh Donaldson and David Wright stand out as particularly strong cases of top-notch fielders who, because of their strong range, get to more groundballs, but get to them in difficult positions, thus increasing the likelihood of an error.

All of this being said, let’s not take too much away from the potential impact of an error. It is indeed a mistake, and can have a negative impact on the team in ways more than just the scoreboard. For instance, for every error made, that is an extra batter that the pitcher has to face, and therefore, more pitches on his final pitch count. If the bases were clear before the error, the pitcher has to pitch out of the stretch now, and the threat of a potential steal is in play. If a certain player is prone to errors, it may also lead to his pitcher not having confidence in his defense behind him, and therefore getting himself in trouble by trying to do too much on the mound. Other fielders may feel that they have to cheat in the commonly erring fielder’s direction if there is likely to be a mistake made, which can mess up a team’s defensive positioning. Finally, there’s the fact that for all of us here at FanGraphs who realize the harm in relying on errors too much as a statistic, there are still those in baseball who do rely on it, and committing enough errors in the field, may lead to a player riding the pine for a few days.

In the end, it’s fair to say that errors are one metric out of many. They have historically been overused, and hopefully the chart above, has made it clear that frequently an error won’t really cost the team anything.

And if your error did cost your team, well, you’re probably Starlin Castro.

Using High-A Stats to Predict Future Performance

by Chris Mitchell

July 20, 2014

Last week, I looked into how a player’s low-A stats — along with his age and prospect status at the time — can be used to predict whether he’ll ever play in the majors. I used a methodology that I named KATOH (after Yankees prospect Gosuke Katoh), which consists of running a probit regression analysis. In a nutshell, a probit regression tells us how a variety of inputs can predict the probability of an event that has two possible outcomes — such as whether or not a player will make it to the majors. While KATOH technically predicts the likelihood that a player will reach the majors, I’d argue it can also serve as a decent proxy for major league success. If something makes a player more likely to make the majors, there’s a good chance it also makes him more likely to succeed there.

Things that were predictive for players in low-A included: age, strikeout rate, ISO, BABIP, and whether or not he was deemed a top 100 prospect by Baseball America in the pre-season. However, a player’s walk rate was not significant in predicting a player’s ascension to the majors. Today, I’ll analyze what KATOH has to say about players in class-A-advanced leagues. Here’s the R output based on all players with at least 400 plate appearances in a season in high-A from 1995-2009:

This looks very similar to what I found for low-A players: Walk rate isn’t significant, and everything else has very similar effects on the final probability. However, the coefficients from this model are all a tad bigger than those from the low-A version, implying that high-A stats might be a bit more telling of a player’s future. Intuitively, this makes sense: The closer a player is to the big leagues, the more his stats start to reflect his future potential.

By clicking here, you can see what KATOH spits out for all current prospects who logged at least 250 PA’s in high-A as of July 7th. I also included a few notable players who fell short of the threshold, namely Joey Gallo (who checks in at a remarkable 99.8%), Peter O’Brien, and Jesse Winker. Here’s an excerpt of the top-ranking players:

Player	Organization	Age	MLB Probability
Joey Gallo	TEX	20	100%
Corey Seager	LAD	20	99%
Carlos Correa	HOU	19	99%
Albert Almora	CHC	20	93%
Nick Williams	TEX	20	93%
D.J. Peterson	SEA	22	93%
Jesse Winker	CIN	20	91%
Orlando Arcia	MIL	19	88%
Jose Peraza	ATL	20	87%
Colin Moran	MIA	21	87%
Renato Nunez	OAK	20	86%
Tyrone Taylor	MIL	20	85%
Hunter Renfroe	SDP	22	84%
Josh Bell	PIT	21	84%
Raul Mondesi	KCR	18	83%
Daniel Robertson	OAK	20	83%
Jorge Polanco	MIN	20	81%
Dilson Herrera	NYM	20	77%
Breyvic Valera	STL	21	77%
Peter O’Brien	NYY	23	76%
Matt Olson	OAK	20	75%
Jorge Alfaro	TEX	21	75%
Patrick Leonard	TBR	21	75%
Dalton Pompey	TOR	21	73%
Billy McKinney	OAK	19	73%
Teoscar Hernandez	HOU	21	73%
Brandon Nimmo	NYM	21	72%
Jose Rondon	LAA	20	70%
Rio Ruiz	HOU	20	70%
Brandon Drury	ARI	21	70%

Next up will be double-A. Unlike A-ball, double-A tends to be a random mishmash of prospects and minor-league lifers, so it will be interesting to see how KATOH handles this wide array of players. And perhaps double-A is where a player’s walk rate finally starts to tell us something about his future success.

Statistics courtesy of Fangraphs, Baseball-Reference, and The Baseball Cube; Pre-season prospect lists courtesy of Baseball America.

« Previous Page — « Previous entries

Next entries » — Next Page »