The Cubs Hope Lightning Can Strike Twice

In the 2013 offseason, the Cubs did something smart. They signed RHP Scott Feldman. Feldman had a rough 2012 season in Texas, posting an ERA of 5.09. However, his peripherals indicated that he was fairly unlucky during that season, leading him to be vastly undervalued. FanGraphs’ own Dave Cameron opined that Scott Feldman was the poor man’s Brandon McCarthy. Feldman was a nice, cheap addition for one year, $6 million.

The Cubs’ strategy of betting on FIP and xFIP seemed to pay off as Feldman quickly became an asset by the time the trade deadline rolled around. In a move that flew under the radar, the Cubs traded Feldman for Steve Clevenger, Pedro Stroop, international bonus slots, and a struggling Jake Arrieta.

It hasn’t taken the Cubs long to see the fruits of their return as Jake Arrieta has become a bright spot on an otherwise struggling Cubs team. In 64 innings, he has compiled a 2.4 WAR and an ERA/FIP/xFIP line of 1.81/1.97/2.50.

Arrieta has been downright filthy for the Cubs in the 64 innings that he has pitched this season. While this is a small sample, it’s indicative that there has been a change in Arrieta’s approach to pitching that is proving to be successful.

While the acquisition of Arrieta didn’t make headlines last year, the Cubs have definitely made headlines over when they completed potentially the largest blockbuster trade of the season, sending pitcher Jeff Samardzija and Jason Hammel to the Oakland Athletics in return for Addison Russell, Billy McKinney, Dan Straily and a PTBNL.

While Russell and Samardzija are the main components of the trade, there is something interesting about the other acquisitions.  If you break the trade into two parts, there’s McKinney and Russell for Samardzija, and then there’s Straily and a PTBNL for Hammel.

It looks as though the Cubs are hoping that history can repeat itself.

The Cubs signed Hammel — for not a lot of money — hoping that he would perform well, and that he could be used as ‘trade bait’ midway through the season. Hammel exceeded expectations during his time with the Cubs, and now he is netting another reclamation project for the Cubs. Sounds an awful lot like the Feldman trade the Cubs made a year ago.

Straily has struggled this year, posting an ERA/FIP/xFIP line of 4.93/5.64/4.43. This is a small sample size of only seven starts,  however the projection systems don’t rate him too favorably for the rest of the year. ZiPs projects Straily to have an ERA of 4.44 and FIP of 4.80 by the end of this year. Steamer projects Straily to have an ERA of 4.45 and FIP of 4.93.

Straily has been getting a decent number of strikeouts, however the root of his struggles have been keeping the ball in the park (16.4% HR/FB), and keeping his walks down. It’s reasonable to think that Straily’s HR/FB will come down given that this is a small sample size, and he’s not nearly this bad at keeping the ball in the park; regression to the mean is expected.

Unlike Arrieta, Straily doesn’t necessarily have the blazing raw stuff. Arrieta flashed a 94 MPH fastball even through his struggles with the Orioles. You could definitely see some raw talent. Straily is in the midst of a velocity decline in which his fastball has declined from 90 MPH in 2013 to 88 MPH this year, and he has lost at least a mile and a half on each of his other pitches. However,  Straily does appear to have a good slider and decent changeup which — combined with regression back to the mean — is a good enough reason for the Cubs to think that there is some talent that can be unlocked.

It’s unlikely that the Cubs will be able to turn Straily into a potential ace, however it’s hard to bet against their track record. They have managed to turn Feldman, Hammel, and Arrieta into something. The have proved that they are good at scouting as they boast arguably the best farm system in the league. Maybe they see something in Straily with which they think that they can work, and realize that he might be good to buy low and hope that he turns into an asset. The Cubs trust their ability to turn pitchers that are nothing into something. While Russell, Samardzija, and Hammel may be grabbing all the headlines, it might just be Straily that surprises us in a year or two.


Pitch Win Values for Starting Pitchers – June 2014

Introduction

A couple months back, I introduced a new method of calculating pitch values using a FIP-based WAR methodology.  That post details the basic framework of these calculations and  can be found here.  The May update can be found here.  This post is simply the June 2014 update of the same data.  What follows is predominantly data-heavy but should still provide useful talking points for discussion.  Let’s dive in and see what we can find.  Please note that the same caveats apply as previous months.  We’re at the mercy of pitch classification.  I’m sure your favorite pitcher doesn’t throw that pitch that has been rated as incredibly below average, but we have to go off of the data that is available.  Also, Baseball Prospectus’s PitchF/x leaderboards list only nine pitches (Four-Seam Fastball, Sinker, Cutter, Splitter, Curveball, Slider, Changeup, Screwball, and Knuckleball).  Anything that may be classified outside of these categories is not included.  Also, anything classified as a “slow curve” is not included in Baseball Prospectus’s curveball data.

Constants

Before we begin, we must first update the constants used in calculation for June.  As a refresher, we need three different constants for calculation: strikes per strikeout, balls per walk, and a FIP constant to bring the values onto the right scale.  We will tackle them each individually.

First, let’s discuss the strikeout constant.  In June, there were 50,861 strikes thrown by starting pitchers.  Of these 50,861 strikes, 4,837 were turned into hits and 14,888 outs were recorded.  Of these 14,888 outs, 3,981 were converted via the strikeout, leaving us with 10,907 ball-in-play outs.  10,907 ball-in-play strikes and 4,837 hits sum to 15,744 balls-in-play.  Subtracting 15,744 balls-in-play from our original 50,861 strikes leaves us with 35,117 strikes to distribute over our 3,981 strikeouts.  That’s a ratio of 8.82 strikes per strikeout.  This is down from 8.88 strikes per strikeout in May.  Hitters were slightly easier to strikeout in June than they were in May.

The next two constants are much easier to ascertain.  In June, there were 28,442 balls thrown by starters and 1,469 walked batters.  That’s a ratio of 19.36 balls per walk, up from 18.77 balls per walk in May.  This data would suggest that hitters were slightly less likely to walk in June than previously.  The FIP subtotal for all pitches in June was 0.57.  The MLB Run Average for June was 4.16, meaning our FIP constant for May is 3.59.

Constant Value
Strikes/K 8.82
Balls/BB 19.36
cFIP 3.59

The following table details how the constants have changed month-to-month.

Month K BB cFIP
March/April 8.47 18.50 3.68
May 8.88 18.77 3.58
June 8.82 19.36 3.59

Pitch Values – June 2014

For reference, the following table details the FIP for each pitch type in the month of June.

Pitch FIP
Four-Seam 4.16
Sinker 4.14
Cutter 4.00
Splitter 4.43
Curveball 3.98
Slider 4.03
Changeup 4.64
Screwball 3.24
Knuckleball 6.30
MLB RA 4.16

As we can see, only three pitches would be classified as below average for the month of June: splitters, changeups, and knuckleballs.  Four-Seam Fastballs and Sinkers also came in right around league average.  Pitchers that were able to stand out in these categories tended to have better overall months than pitchers who excelled at the other pitches.  Now, let’s proceed to the data for the month of June.

Four-Seam Fastball

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Jordan Zimmermann 0.8 171 Marco Estrada -0.3
2 Brandon Cumpton 0.6 172 Masahiro Tanaka -0.3
3 Clayton Kershaw 0.6 173 Juan Nicasio -0.3
4 Matt Garza 0.5 174 Edwin Jackson -0.3
5 Nathan Eovaldi 0.5 175 Nick Martinez -0.3

Sinker

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Tanner Roark 0.5 160 Wei-Yin Chen -0.2
2 Chris Archer 0.5 161 Andrew Heaney -0.2
3 Charlie Morton 0.5 162 Jake Peavy -0.2
4 Alfredo Simon 0.4 163 Jered Weaver -0.2
5 Brandon McCarthy 0.4 164 Dan Haren -0.4

Cutter

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Jarred Cosart 0.6 73 Chris Tillman -0.1
2 Madison Bumgarner 0.4 74 Brandon McCarthy -0.1
3 Corey Kluber 0.3 75 Mike Minor -0.1
4 Adam Wainwright 0.3 76 Brad Mills -0.1
5 Josh Collmenter 0.3 77 Scott Feldman -0.2

Splitter

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Alex Cobb 0.3 26 Tim Hudson -0.1
2 Masahiro Tanaka 0.3 27 Charlie Morton -0.1
3 Tim Lincecum 0.2 28 Jake Peavy -0.1
4 Kyle Kendrick 0.2 29 Ubaldo Jimenez -0.2
5 Alfredo Simon 0.2 30 Miguel Gonzalez -0.3

Curveball

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Jered Weaver 0.2 150 Vance Worley -0.1
2 Edinson Volquez 0.2 151 Christian Bergman -0.1
3 Roenis Elias 0.2 152 Alfredo Simon -0.2
4 Collin McHugh 0.2 153 Marcus Stroman -0.2
5 A.J. Burnett 0.2 154 David Price -0.3

Slider

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Garrett Richards 0.4 113 Aaron Harang -0.2
2 Ervin Santana 0.4 114 Wily Peralta -0.2
3 Chris Archer 0.3 115 Wei-Yin Chen -0.2
4 Homer Bailey 0.3 116 Juan Nicasio -0.2
5 Tyson Ross 0.3 117 Vidal Nuno -0.3

Changeup

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Felix Hernandez 0.3 154 Ervin Santana -0.2
2 Jeff Locke 0.3 155 Mark Buehrle -0.2
3 Henderson Alvarez 0.3 156 David Buchanan -0.3
4 Jeremy Guthrie 0.2 157 Hyun-Jin Ryu -0.3
5 Jason Vargas 0.2 158 Scott Kazmir -0.3

Screwball

Rank Pitcher Pitch Value
1 Trevor Bauer 0.0

Knuckleball

Rank Pitcher Pitch Value
1 C.J. Wilson 0.0
2 R.A. Dickey -0.4

Overall

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Jordan Zimmermann 1.0 177 Dan Haren -0.4
2 Felix Hernandez 1.0 178 Miguel Gonzalez -0.4
3 Chris Archer 0.9 179 Joe Saunders -0.4
4 Clayton Kershaw 0.9 180 Juan Nicasio -0.5
5 Matt Garza 0.9 181 R.A. Dickey -0.6

Pitch Ratings – June 2014

Four-Seam Fastball

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Drew Smyly 60 80 Samuel Deduno 36
2 Drew Hutchison 59 81 Wade Miley 34
3 Matt Garza 59 82 Nick Martinez 34
4 Hector Santiago 59 83 Tony Cingrani 33
5 J.A. Happ 59 84 Ricky Nolasco 33

Sinker

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 J.A. Happ 61 62 Andrew Heaney 38
2 Jeff Samardzija 59 63 Jered Weaver 38
3 Jake Arrieta 59 64 Tommy Milone 35
4 Jesse Hahn 58 65 Jake Peavy 32
5 Felix Hernandez 58 66 Dan Haren 24

Cutter

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 David Price 59 28 Brandon Workman 46
2 Corey Kluber 59 29 Mike Bolsinger 44
3 Jarred Cosart 57 30 Scott Feldman 40
4 Mike Leake 57 31 Dan Haren 39
5 Phil Hughes 57 32 Mike Minor 34

Splitter

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Masahiro Tanaka 59 12 Dan Haren 42
2 Doug Fister 58 13 Wei-Yin Chen 40
3 Kevin Gausman 58 14 Jake Odorizzi 40
4 Alfredo Simon 58 15 Tim Hudson 36
5 Alex Cobb 57 16 Ubaldo Jimenez 25

Curveball

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Stephen Strasburg 60 63 David Phelps 42
2 Erik Bedard 59 64 Aaron Harang 38
3 Drew Pomeranz 59 65 Alfredo Simon 34
4 Collin McHugh 59 66 Marcus Stroman 28
5 Josh Tomlin 58 67 David Price 20

Slider

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Jeff Samardzija 62 50 Zack Greinke 37
2 Max Scherzer 60 51 Matt Cain 32
3 Tanner Roark 59 52 Wei-Yin Chen 30
4 Vance Worley 59 53 Aaron Harang 29
5 Jhoulys Chacin 59 54 Vidal Nuno 27

Changeup

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Gio Gonzalez 61 58 Scott Kazmir 24
2 Jeff Locke 59 59 Drew Hutchison 22
3 Jeremy Guthrie 58 60 Ervin Santana 22
4 Josh Collmenter 58 61 T.J. House 22
5 Sonny Gray 58 62 Hyun-Jin Ryu 20

Screwball

Rank Pitcher Pitch Rating
1 Trevor Bauer 54

Knuckleball

Rank Pitcher Pitch Rating
1 R.A. Dickey 41

Monthly Discussion

As we can see, Jordan Zimmermann takes the top for this month most due to the  quality of his Four-Seam Fastball.  Zimmermann was classified as throwing five different pitches in June (Four-Seam, Sinker, Curveball, Slider, and Changeup) and managed to earn at least 0.1 WAR from the Four-Seam, Curveball, and Slider.  The most valuable pitch overall in June was Zimmermann’s Four-Seam Fastball.  The least valuable was R.A. Dickey’s Knuckleball.  As far as offspeed pitches, Garrett Richards’s 0.4 WAR from his slider lead the way.  The least valuable fastball was Dan Haren’s sinker.

On our 20-80 scale pitch ratings, the highest rated qualifying pitch was Jeff Samardzija’s slider.  Somewhat surprisingly, the lowest rated was David Price’s curveball.  The highest rated fastball was J.A. Happ’s sinker, and the lowest rated fastball was Dan Haren’s sinker.

Pitch Values – 2014 Season

Four-Seam Fastball

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Jordan Zimmermann 1.5 228 Nick Martinez -0.3
2 Phil Hughes 1.3 229 Dan Straily -0.4
3 Ian Kennedy 1.3 230 Doug Fister -0.4
4 Michael Wacha 1.2 231 Juan Nicasio -0.4
5 Jose Quintana 1.2 232 Marco Estrada -0.6

Sinker

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Charlie Morton 1.4 216 Vidal Nuno -0.3
2 Felix Hernandez 1.2 217 Dan Straily -0.3
3 Chris Archer 1.1 218 Jake Peavy -0.3
4 Cliff Lee 1.0 219 Erasmo Ramirez -0.3
5 Justin Masterson 1.0 220 Wandy Rodriguez -0.3

Cutter

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Madison Bumgarner 1.2 102 Cliff Lee -0.2
2 Corey Kluber 1.0 103 Felipe Paulino -0.3
3 Adam Wainwright 1.0 104 Johnny Cueto -0.3
4 Jarred Cosart 0.9 105 C.J. Wilson -0.3
5 Josh Collmenter 0.7 106 Brandon McCarthy -0.3

Splitter

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Masahiro Tanaka 0.7 32 Charlie Morton -0.2
2 Alex Cobb 0.4 33 Franklin Morales -0.2
3 Tim Lincecum 0.4 34 Clay Buchholz -0.2
4 Hisashi Iwakuma 0.3 35 Danny Salazar -0.3
5 Hiroki Kuroda 0.3 36 Miguel Gonzalez -0.3

Curveball

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Sonny Gray 0.8 197 J.A. Happ -0.2
2 A.J. Burnett 0.7 198 Erasmo Ramirez -0.2
3 Jose Fernandez 0.6 199 David Price -0.2
4 Brandon McCarthy 0.6 200 Franklin Morales -0.2
5 Stephen Strasburg 0.5 201 Felipe Paulino -0.3

Slider

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Garrett Richards 0.8 159 Jered Weaver -0.2
2 Tyson Ross 0.6 160 Liam Hendriks -0.2
3 Jason Hammel 0.6 161 Travis Wood -0.3
4 Ervin Santana 0.6 162 Erasmo Ramirez -0.3
5 Corey Kluber 0.6 163 Danny Salazar -0.4

Changeup

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Felix Hernandez 0.7 211 Jordan Zimmermann -0.3
2 Henderson Alvarez 0.6 212 Tony Cingrani -0.3
3 Stephen Strasburg 0.6 213 Matt Cain -0.3
4 Francisco Liriano 0.5 214 Wandy Rodriguez -0.4
5 John Danks 0.5 215 Marco Estrada -0.6

Screwball

Rank Pitcher Pitch Value
1 Trevor Bauer 0.0
2 Alfredo Simon 0.0
3 Hector Santiago 0.0

Knuckleball

Rank Pitcher Pitch Value
1 R.A. Dickey 0.7
2 C.J. Wilson 0.0

Overall

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Felix Hernandez 2.8 235 Dan Straily -0.4
2 Adam Wainwright 2.5 236 Felipe Paulino -0.5
3 Chris Archer 2.1 237 Juan Nicasio -0.5
4 Corey Kluber 2.1 238 Wandy Rodriguez -0.8
5 Garrett Richards 2.1 239 Marco Estrada -1.0

Year-to-Date Discussion

If we look at the year-to-date numbers, MLB FIP and WAR leader Felix Hernandez still sits in the top spot.  Current NL FIP leader Adam Wainwright ranks second.  The least valuable starter has been Marco Estrada.  On a per-pitch basis, the most valuable pitch has been Jordan Zimmermann’s four-seam fastball.  The most valuable offspeed pitch has been Garrett Richards’s slider.  The least valuable pitch has been Marco Estrada’s four-seam fastball.  The least value offspeed pitch has been Marco Estrada’s changeup.  Needless to say, it’s been a rough year for Marco.  Qualitatively, I feel fairly encouraged by the year-to-date results so far.  The leaderboard is topped by two no-doubt aces, both of whom currently their respective leagues in FIP, and Marco Estrada comes in at the bottom after posting the highest FIP among qualified starters so far.  For reference, the top five in the year-to-date overall rankings are currently 1st, 6th, 23rd, 3rd, and 7th on the FanGraphs WAR leaderboards respectively.


Baseball Analytics, Arthritis, and the Search for Better Health Forecasts

This article originally appeared on my blog “Biotech, Baseball, Big Data, Business, Biology…”

It’s Fourth of July weekend in Seattle as I write this. Which means it’s overcast. This was predictable, just as it’s predictable that for the two months after July 4th the Pacific Northwest will be beautiful, sunny and warm. Mostly.

Too bad forecasting so many other things–baseball, earthquakes, health outcomes–isn’t nearly as easy. But that doesn’t mean people have given up. There’s a lot to be gained from better forecasting, even if the improvement is just by a little bit.

And so I was eager to see the results from a recent research competition in health forecasting. The challenge, which was organized as a crowdsourcing competition, was to find a classifier for whether and how rheumatoid arthritis (RA) patients will respond to a specific drug treatment. The winning methods are able to predict drug response to a degree significantly better than chance, which is a nice advance over previous research.

And imagine my surprise when I saw that the winning entries also have an algorithmic relationship to tools that have been used for forecasting baseball performance for years.

The best predictor was a first cousin of PECOTA. Read the rest of this entry »


Quantifying “Good” and “Bad” Pitches

I found Jeff’s recent post on Jake Arrieta fascinating, because he goes into a game and pulls out Arrieta’s eight worst pitches from that game. This is something I’d never really thought deeply about before. We all know what bad pitches look like, right? An 0-2 fastball down the heart of the plate, a hanging slider, a pitch in the dirt on a full count, sure. But can we quantify this? Is there a way to say mathematically (in a way that makes some sort of sense) whether one pitch was better than another? Follow me beyond the jump and I’ll share some thoughts about how we might do this.
Read the rest of this entry »


Finding the Ideal Leadoff Hitter

We know, in 2014, that lineup construction has little effect on winning. And yet, it’s not any less frustrating when managers set their batting orders in ways that seem to defy any semblance of logic. Lineup construction matters to us. We may know it’s not terribly important, but we’re fascinated in spite of ourselves.

The lineup position subject to the most debate is probably leadoff. Multiple writers and analysts have noted that players who would make the best leadoff hitters are normally too valuable to use in the leadoff position. Bill James wrote in his New Historical Abstract, “All of the greatest leadoff men … would be guys who aren’t leadoff men, starting with Ted Williams … if you had two Ted Williamses, and could afford to use one of them as a leadoff man, he would be the greatest leadoff man who ever lived.”

Every method I’ve seen to determine great leadoff batters produces names like Ted Williams, Barry Bonds, Mickey Mantle, Ty Cobb … players who are probably better suited to the second through fourth spots in the batting order. I think I’ve found a simple method that solves the problem. I’ve always been interested in singles hitters who walk. It’s a skill set that matches our image of the prototypical leadoff batter.

Most fans agree that a good leadoff man should get on base and run the bases well. Most fans further agree that a player who both gets on base and hits with power is more valuable a little later in the order, where he can drive in runs. If we accept that we probably can’t have two Ted Williamses, a realistic ideal of the leadoff batter has a high on-base percentage but doesn’t hit with a lot of power.

With this in mind, I’m adapting a stat I’ve talked about elsewhere to identify optimal leadoff men: OBP minus ISO. In my head, I’ve always called this reverse ISO, but that’s sort of a misnomer, and it’s a little unwieldy, so from here on let’s call this stat combination Leadoff Rating, or LOR. We know a good leadoff man gets on base, but most players with high on-base percentage are great all-around hitters. We know power hitters are usually better suited to other spots in the batting order, but many players with low ISO just aren’t that great. By subtracting isolated power from OBP, we can identify players specially suited to hitting leadoff.

This stat does not include baserunning (because I have no idea how to incorporate it with two percentages) but it turns out not to matter very much. A significant majority of players who rank well in LOR were also accomplished baserunners, and base stealers in particular. Among the top 300 hitters of all time (basically everyone with 2,000 career hits), I found a fairly strong positive correlation between LOR and SB (r=.465). The relationship is weaker if you only look at 1947-present (r=.356), but a degree of positive correlation is clear. In both data sets, n=300.

When you calculate LOR for the all-time top 300 hitters, the leader is Billy Hamilton. That’s Sliding Billy Hamilton, the Hall of Fame outfielder for Philadelphia and Boston in the 1890s, not the rookie phenom for the Cincinnati Reds. The original Hamilton retired with 1,782 singles, 1,187 bases on balls, and 376 extra-base hits. He hit .344/.455/.432, with an ISO of just .088, and an OBP higher than his slugging percentage. Hamilton also stole 912 bases. He is a superb example of the hitter we’re looking for, and he leads the new stat by a huge margin. His .367 LOR rates 12% higher than second-place Eddie Collins (.328). Here’s the top 75: Read the rest of this entry »


Breaking Down the Aging Curve: Late 20s

This will cover the last set of cohorts, click the links for parts 1, 2, and 3 if you want more info on what I am doing or read on if you are already up to speed.

Age 27 Cohort:

This group started at 173 players with 54 only playing one season leaving 119 for my purposes, and they averaged 5 full seasons each.  Out of the 119, 49 (41%) maxed out their wRC+ in their first full season and 44 (37%) maxed WAR.  Both of the groups that maxed out in year one averaged 3.2 full seasons in the big leagues.

 photo 27percentofmaxchart_zps5e6fb276.jpg

 

The same thing we have seen since the age 25 cohort continues, a clearly declining performance trend in aggregate from the time they show up until they leave.  In year 1, these players are hitting on average at nearly 90% of their max, so there is almost no chance of a large increase in subsequent seasons.

Age 28 Cohort:

Sample sizes are going to start becoming a big issue again as only 110 started and 38 only played one 300+ PA season.  The remaining 72 averaged only 3.7 full seasons.  For those that were maxing wRC+ or WAR in year one, both groups included 32 of the 72 (44%) and averaged 2.7 seasons and 3 full seasons respectively.

 photo 28percentofmaxchart_zpsa53fa5a4.jpg

 

The chart does show an increase in WAR from year 1 to 2 do to an anomaly, but the hitting shows the 90% of peak on average and decreasing from there.  You can ignore the spikes in age 40 and 41 seasons as there was only one player accounted for there, Davey Lopes, who happened to hit pretty well those two seasons.  Without him it drops off like all of the others and ends at age 38.  You can see that by WAR the entirety of their decline is pretty much done by 30 years old, only their third seasons and thus the short careers.

Age 29 Cohort:

This group is nearing the point where it might be worth ignoring anything you see with a starting group of 62 that gets whittled down to 41 players with more than one full season.  Those 41 averaged 4.6 full seasons in their careers, longer than the 28-year-olds because of a few guys that hung around awhile and the small sample.  One was Hideki Matsui who was a professional long before 29, but not in the United States.  I will discuss two others in a moment.  Out of our 41 players here 23 (56%) had their max wRC+ in their first full season, and almost 50%, 20 out of the 41 had their best WAR.  At a coin flip for whether we have seen their best or not immediately we have definitely hit the point where any real growth as a player is unlikely or purely luck driven.  Those two groups of year one max wRC+ and WAR had average career lengths of 3.7 and 3.1 years respectively.

 photo 29percentofmaxchart_zpsb8acfefc.jpg

 

Like the last group we see a little uptick at the end, and these were two of the odd players from this group that hung around.  Actually, Raul Ibanez is still hanging around currently in Minnesota with the Royals, and the other is a former Royal too in Matt Stairs.  Again, in reality this group is pretty much all declining from year 1 on and almost all are finished by their late 30s.  There is a large spike in WAR for ages 32 and 33 and a smaller corresponding one in wRC+ because 4 players had their best season at 32 and 5 players at 33 which is a significant amount out of a pool of 41 players.  Those two years along with the first full season of the cohort comprise over 70% of the players and so it is probably just a sample size issue that we see the early 30s uptick here.

I am done with the cohorts, or at least running through them all the first time.  Players that play their first full season at 30 or older were mostly ignored.  There were 92 of them total and about 80% of them maxed in year one or only had one full season, so to chart a growth pattern would be ludicrous for the other 18 to 20 players who didn’t all come up at the same age.  Next I will summarize this all and try and point out several other things that I learned from breaking these cohorts apart so that you can get the full picture, or at least as much of the picture as I have managed to see.


How Telling are a Teenager’s A-ball Stats?

The Charleston RiverDogs, the Yankees’ low-A affiliate, has rostered several of the team’s more interesting prospects this year, with Luis Severino, Ian Clarkin, Aaron Judge, Abiatal Avelino, Miguel Andujar, Luis Torrens, Gosuke Katoh, and Tyler Wade all having spent time in Charleston thus far. A few of these players are still teenagers, and despite having promising potential, are very raw in terms of their overall development. Despite being just 19-years-old, infielders Avelino, Andujar, Katoh, and Wade spent the entire first half in Charleston with varrying degrees of success. Avelino (108 wRC+) hit fairly well before going down with injury, but Andujar (78 wRC+), Katoh (79 wRC+), and Wade (98 wRC+) have looked a bit over-matched at the plate so far.

These players have been facing pitchers two or three years older than them, so it’s hard be too critical of their poor batting lines; and the fact that they’re even playing in full season ball as teenagers is an accomplishment on its own. Still, performance obviously matters, and you’d prefer to see them hit well than not. But it’s hard to know how much weight should be put on their stat lines. Should we be worried that Gosuke Katoh’s striking out 35% of the time? Or should we still be more focused on the tools that got him drafted in the second round last year? It’s a little hard to say.

To get a better idea of what to make of these guys’ performances, I turned to the reams of minor league data compiled over the last couple of decades. Below, you’ll find some heat maps representing the likelihood that a player will play in the majors based on his low-A stats as a teenager. “Average Power” refers to players with an ISO within .025 of their league’s average, and within each panel, walk rate above league average and strikeout rate above league average occupy the X and Y axes respectively. I considered all 321 player seasons where a teenager logged at least 400 PA’s from 1995-2008.

A couple things to keep in mind before I delve into the results:

1) This methodology measures the likelihood that a player made it to the majors and doesn’t take into account how well he played upon arriving. So a player with one big league game is counted the same as a player who went on to have a Hall of Fame-caliber career. A stat that predicts a player’s making the majors may also predict his level of big league success, but that’s not something I attempt to quantify here.

2) This methodology does not account for a player’s defensive skill or position. Obviously, a weak-hitting catcher or shortstop is more likely to crack the majors than a weak-hitting first baseman, but defensive skill is a little hard to quantify for minor leaguers. Prospects change positions all the time and there’s a good chance any given A-baller won’t stick at his current position as he navigates through three more minor league levels.

All Players

Low Power

Average Power

High Power

Overall, there’s not a ton of predictability here: There are examples of players who made it — or didn’t make it — from nearly every corner of every heat map. Players like Rocco Baldelli, Austin Jackson, Jhonny Peralta, and Pablo Sandoval, turned into fine hitters despite scuffling as teenagers, yet plenty of others hit for good power and put up healthy plate discipline numbers, only to flop at the higher levels. Jeff Goldbach, Nick Weglarz, and Mike Whitlock all raked in A-ball, but never made it to the show. Stats alone can’t tell us everything, but there are definitely some obvious trends. Most notably, players who hit for power appear to be much more likely to play in the majors than those who don’t. Completely ignoring strikeouts and walks, 70% players from the high power group made it to the bigs compared to 65% of players with average power and just 44% from the low power demographic. Plate discipline stats seem to matter a little, but power is clearly king.

The heat maps give a nice visual of what’s happening, but don’t really give us a precise estimate of how likely these players are to make it. To better quantify each player’s chances, I ran a probit regression analysis on this group of players. In a nutshell, a probit tells us how a variety of inputs can predict the probability of an event that has two possible outcomes. In this case, it shows that hitter’s strikeout rate, isolated power, and BABIP are predictive of whether or not he’ll play in the majors.

It’s worth pointing out that there are some obvious flaws in this model. As previously mentioned, it doesn’t consider defense, so if an elite defensive shortstop and a lumbering first baseman had the same batting line, they would receive the same probability, which obviously doesn’t seem right. It also doesn’t take scouting reports into account. We all know that there’s more to a player’s potential than his stat line, especially for minor leaguers; and in some cases, a good scouting report is worth more than a dog’s age of statistical regressions. Still, I think it does a good job of slapping an unbiased probability on a player’s MLB chances. For those interested, here’s the R output from my model:

R Output

Without getting too technical, the “Estimate” column basically tells us (in Z-scores) how a change in each stat affects a player’s MLB likelihood. As you’d expect, players with higher strikeout rates are less likely to crack the majors, while players with higher power and higher BABIPs have a better shot. Interestingly, walk rate was not statistically significant in predicting whether or not he’ll reach the big leagues. This might be partly due to the relatively small sample of players, but it’s probably safe to say that a player’s walk rate isn’t a make-or-break. Several players — including Erick Aybar, Michael Barrett, Engel Beltre, and A.J. Pierzynski — managed to reach baseball’s highest level despite walking around 3% of the time in their first tastes of full-season ball, while Mike Whitlock and Nick Weglarz fizzled after walking over 15% of the time.

That’s well and good, but what does it tell us about today’s prospects? Here’s what we get by applying my model to all low-A teenagers with at least 200 PA’s (I also included Abi Avelino, who’s logged 131). What stands out to me is how few players are true long-shots. Of the 36 players, two thirds are more likely than not to make it to the bigs and only one player gets less than a 27% chance. If a player’s talented enough to play in full season ball at 19, there’s a good chance he’ll make it to the majors one way or another.

Player Organization MLB Probability
Jake Bauers Padres 96%
Chance Sisco Orioles 87
Ryan McMahon Rockies 86
Andrew Velazquez Diamondbacks 84
Trey Michalczewski White Sox 80
Drew Ward Nationals 79
Manuel Margot Red Sox 78
J.P. Crawford Phillies 77
Abiatal Avelino Yankees 75
Kean Wong Rays 74
Willy Adames Tigers 74
Carson Kelly Cardinals 73
Harold Ramirez Pirates 71
Nomar Mazara Rangers 67
Travis Demeritte Rangers 65
Dustin Peterson Padres 62
Wendell Rijo Red Sox 61
Franmil Reyes Padres 61
Dawel Lugo Blue Jays 60
Reese McGuire Pirates 57
Dominic Smith Mets 54
Jamie Westbrook Diamondbacks 52
Javier Betancourt Tigers 52
Tyler Wade Yankees 52
Miguel Andujar Yankees 48
Elier Hernandez Royals 48
Victor Reyes Braves 47
Clint Frazier Indians 46
Alfredo Escalera-Maldonado Royals 42
Dorssys Paulino Indians 41
Ronald Guzman Rangers 41
Josh Van Meter Padres 39
Carlos Tocci Phillies 37
D.J. Davis Blue Jays 27
Gosuke Katoh Yankees 27
Jairo Beras Rangers 16

 

There are some highly-touted prospects on this list, but other than J.P. Crawford, they aren’t among the names listed near the top. Reese McGuire, Dominic Smith, and Clint Frazier all graced top 100 lists in the pre-season, but have had disappointing power outputs this year, which has lead to such mediocre probabilities. Instead, most of the top ranked players are relatively fringy prospects who have broken out in a big way this year.

As for the Yankees’ prospects, the model thinks Avelino has a pretty good shot at making it, but is relatively low on the others. Even so, Andujar and Wade both have around a 50-50 chance, which isn’t too bad — especially when you consider the model ignores their defensive skills. Things don’t look as promising for Katoh, who’s struck out a ton and hit for only modest power. The lone bright spot in Katoh’s line is his 12% walk rate, which unfortunately for him, proved un-predictive of a player’s big league future.


Baseball’s Most Under-Popular Hitters

Lists of baseball’s most underrated players are often interesting and thought-provoking exercises, because by definition they focus on players that tend to get less attention than they should. However, there isn’t an easy way to definitively say how players are “rated” by baseball followers. Writers often just list off players who have the attributes that they are looking for (grit, plate discipline, small market players, etc.), which isn’t a bad way of doing it.

However, there is a more scientific way of approaching a list like this. We could look at how many people are doing Google searches for specific players. It wouldn’t exactly tell us what players are most underrated, but it can tell us which players should be getting more attention; these two things are very tightly correlated. The key difference is that plenty of players get attention for things that don’t necessarily mean they are considered good players. Ryan Braun got a lot of attention during his steroid drama, Robinson Cano was heavily talked about during free agency, and people search for Carlos Santana because of this and this. But when good players draw very little interest from fans, they’re probably underrated. But the term I’ll use is under-popular.

Using Google’s Adwords Keyword Tool, I gathered the data on every player who has achieved a WAR of at least 3.0 since the beginning of the 2013 season. A regression model with those 132 players showed that an additional 1 WAR was worth 6,000 Google searches per month – not too shabby.

Here is a plot of these players, with the expected amount of Google searches on the horizontal axis, and the actual amount of searches on the vertical. While the keyword tool was incredibly useful, it rounds numbers when they get too high, and you can see a handful of players were rounded off to exactly 165,000 searches per month (FYI, these players were Mike Trout, Miguel Cabrera, David Ortiz, Robinson Cano, Bryce Harper, and Yasiel Puig). Derek Jeter has roughly double that amount, but his WAR did not qualify him for this list.

Searches vs. Expected

There are a lot of players who have played very well the last two years who are by no means household names. Welington Castillo has put up 3.8 WAR since the start of 2013, A.J. Pollock has been worth 6.1 wins, and Brian Dozier 5.8. In order to really measure who the most under-popular players are, I’ll use two methods. The first is just to simply subtract how many Google searches were expected and how many there really were.

difference

According to this measurement, Josh Donaldson is the most under-popular player in baseball, because he should have been looked up 53,000 times per month more often than he was (68k vs. 15k). That’s a big difference. There are some excellent players on this list, with many players who have an argument as the best or one of the few best players at their position. But for the most part, these are well known players who should just be more well known.

A different way to measure under-popularity, and the way I think is more telling, is to find the ratio between expected and actual searches, as opposed to just subtracting. For instance, is Edwin Encarnacion more under-popular than, say, Luis Valbuena? Encarnacion should have gotten 41,000 searches per month, but actually only got 18,000. Valbuena, however, played like someone who should have been searched 20,000 times, but was only Googled 2,400 per month. Since I believe Valbuena’s numbers are more out of whack, I prefer the second method.

Here are the top 20 players using that measurement, where we see how many times a player was searched as a percentage of how many times you would expect them to be:

Jarrod Dyson has quietly become a well above average baseball player. In about 800 career PA, Dyson has a WAR of 6.8. That is All-Star level production. His elite fielding and baserunning skills (which have combined to be worth more than 3 wins these last two years) make his wRC+ of 91 more than acceptable.

A.J. Pollock appears high on both lists, and for great reason. This year he is quietly hitting .316/.366/.554, after putting up 3.6 WAR last year.

This method of establishing players who deserve more credit for their play certainly has some flaws. WAR is not the only way to measure how good a player is, and Google searches are not a perfect representation of how popular or famous players are. However, it takes away the guess work and opinions from the standard underrated player lists, and in that there is some value.


Ottoneu Tools: Advanced Standings Part Two

In early May I introduced Ottoneu players to the Advanced Standings Dashboard, a tool that allows team owners to decipher the early season standings in an effort to better gauge where their team might be headed as the 2014 season comes together. You can download that tool here (http://goo.gl/pbXI5), but now that we’ve just entered July, the traditional halfway point of the baseball season, it’s time to take a deeper look at a few ways this tool can be used to effectively to manage your team into contention in the second half.

Since the tool can be updated easily with just a couple of copy/paste actions, I use this tool almost daily in my own FGPoints Ottoneu league.  But for fun, let’s walk through a few features as they apply to the FanGraphs Staff League, with a special focus on Eno Sarris’ team, “It’s A Perm“.

Eno enters July as a 3rd place team, nearly 400 points out of 1st place, and 150 out of 2nd.  In general, with at least seven teams over the 8,000 point mark, this league looks competitive at a glance.  But with the recent pickup of Ryan Braun, Eno clearly has his sights set on a title (https://twitter.com/enosarris/status/483016142831644672), so let’s break down the standings using the tool to see if Eno has the momentum to win it all in the 2nd half.

The first tab of the tool is simply the statistical breakdown of the Ottoneu standings into some common sabermetric calculations.  While we can easily see Eno leads the league offensively at 5.44 P/G, the underlying statistics also support it, showing he maintains an (slight) advantage in OPS, OPS+, wOBA, Runs Created, and Total Bases.  What may be more interesting is that Eno has more points scored from his offense than any other team in the league.  In fact, just over 58% of his points have come from his hitters (tab 3, ‘Projected Finish”). With roughly 55% of league scoring in Ottoneu coming from offense, Eno is clearly banking on this approach of shoring up the side of the ledger that carries the most weight.  The acquisition of Braun will only help.

So It’s A Perm is built on bats, but what about the pitching? Unfortunately, this is a weak spot, as Eno’s FIP, WHIP, and BB/9 are all higher than the two teams he’s chasing.  I’m sure he knows this instinctively as his 5.03 P/IP is below the league average of 5.13 P/IP (and further below the top 7 teams of 5.19 P/IP), but the dashboard makes it quicker and easier to point out these pitching deficiencies.  One possible area of improvement: the bullpen.  Without looking at his roster, I can tell you pretty quickly he’s probably pretty frustrated with his bullpen, which has been almost 42% less effective (“PEN” = Saves + Holds/IP) than the league leader, A Little Out of Context.  Shoring up a bullpen is often easier and cheaper than finding an ace SP mid season, so does Eno speculate on the eventual Sergio Romo replacement? Does he approach John Heyman’s Last Sirloin about shedding some of his bullpen pieces in a plea to “deal from strength”?

Once you’ve taken the time to digest some of the traditional sabermetric outputs in the Dashboard, your eyes will naturally gravitate toward the end of the first tab into the “League Projections” section, which is where the real power of the tool comes alive.  The key takeaways here are the “Otto” score and the “Pace” columns.  The Otto score can be better explained here by Chad Young (http://goo.gl/KK4Xy), while the “Pace” attempts to project the season-ending point totals for each team based up a range of factors, including current P/G and P/IP values, remaining IP and GP, and league averages in these areas.  In many leagues these are the columns that can better identify contenders from pretenders, but for the FanGraphs Staff league we see more evidence that the actual standings are, for the most part, very accurate, as Eno is also projected to end the season with the 3rd most points (18,042, or about 400 points out of 1st place).

There are a few interesting things to note here, however. First, John Heyman’s Last Sirloin actually has the third highest Otto score (13.66), but is still projected for 4th place, most likely due to his slower pace in IP (1,416 projected).  If this team can pick up the IP pace in the 2nd half with similar quality IP (5.54 P/IP), this team could make up ground quickly.  This team is clearly riding a league-best bullpen and trying to maximize its RP innings as much as possible.

Second, Ground Rule Double Helmet, despite sitting in 4th place with a strong 9,000 points, has had to overtax a very week pitching staff (4.74 P/IP) just to get there (1,577 IP projected).  The tool sees as much and projects this team to end the season in 5th place, but unless the pitching staff sees a significant improvement in the 2nd half, I’d expect this team to possibly fall even further as the season shakes out.

And that’s just the first tab…Once you get familiar with the tool, you’ll actually find the third tab, “Projected Finish” to be the most useful summary of some of these features described above, as it will give you a daily update of the projected champion for the league.  With Eno just 400 points out of both the actual and projected season-ending standings, this league is just too close to call on July 1st, but there are at least four clear contenders here, and It’s A Perm is one of them.  Will Ryan Braun help the cause? Just for fun, let’s say Braun increases Eno’s offense by just 2.00% (from 5.44 to 5.55).  Well, that could be all it takes, as that small increase moves the needle for It’s A Perm enough to overtake Johan Santa Claus by 100 points in the projected season-ending standings, and less than 200 points out from 1st place.  Of course, that’s if everything else stays the same, and, as in life, the only thing constant in baseball is change.  This will be a fun league to watch as the summer heats up, so enjoy the tool and use it where possible to get that 2% edge.


Breaking Down the Aging Curve: Mid 20s

In case you missed parts 1 and 2, you can follow the links especially back to one if you want to see what I am doing.  Otherwise it is time to look at the 24 year old cohort:

There were 362 players in this group, 64 of which only had one season of 300+ PAs, leaving us with 298 in the sample.  Those 298 averaged 7.2 years of full seasons.  Almost 21% of them (62 total) had their best season in year one according to wRC+, and for war it was just below 20% (59).  For those players the average career length was 4.3 and 4 years respectively.  I’m going to start speeding up the discussion only highlighting things of interest so that we can get to a more comprehensive picture.
 photo 24percentofmaxchart_zps0b3bf593.jpg
The 24 cohort chart shows a couple of years of modest improvement before starting their decline though wRC+ stays pretty flat until age 30 or so.  We have seen some similar patters up to this point, but those are going to end with the next group.

Age 25 Cohort:

This group was comprised of 343 players in total.  After taking out the 59 that only had one season I had 284 left at an average number of 5.9 full seasons.  About 30% of those players had their best season in their first full big league chance (86 for wRC+ and 87 for WAR) with average length of career for the 1st year max group of 4 years for wRC+ and 3.7 for WAR.

 photo 25percentofmaxchart_zps0e1b58f0.jpg

 

This is where this cohort is getting more interesting.  They seem to only decline as a group after their first full season.  There doesn’t seem to be any appreciable increase in hitting or overall performance throughout their careers.  You will also see that they are therefore nearer their max as a group out of the gate as well.  Once I am through all of the cohorts we can discuss overall threshold of performance relative to these which will help us understand everything that is going on hopefully.

Age 26 Cohort:

Here is where the sample sizes start to shrink again as we get to ages where a lot of players have either quit or will never make it.  There are still 238 players in this group so it is relatively large (4th largest cohort), and 64 had only one full season leaving a group of 174 players who on average had 5.2 full seasons.  65 (37%) maxed out their wRC+ in year 1 along with 54 (31%) maxing WAR right off the bat.  Those groups averaged 3.6 full seasons and 3.3 respectively.

 photo 26percentofmaxchart_zps7e58f79d.jpg

 

Like the last group, this group seems to max out on average in their first year and are declining by their late 20s.  They keep up 80 or near 80% of their max in hitting into their mid 30s, but that I think is going to prove out as being two things.  The first will be survivorship issues since on average most of this group retired or were forced out of the game around age 31, and the second being that their starting threshold won’t be as high and will be easier to stay near.

We are getting close.  I will try and blow through the late 20s before the end of the week so I can summarize and give some things that I think are of interest overall.