Run Distribution Using the Negative Binomial Distribution

In this post I use the negative binomial distribution to better model the how MLB teams score runs in an inning or in a game. I wrote a primer on the math of the different distributions mentioned in the post for reference, and this post is divided to a baseball-centric section and a math-centric section.

The Baseball Side

A team in the American League will average .4830 runs per inning, but does this mean they will score a run every two innings? This seems intuitive if you apply math from Algebra I [1 run / 2 innings ~ .4830 runs/inning]. However, if you attend a baseball game, the vast majority of innings you’ll watch will be scoreless. This large number of scoreless innings can be described by discrete probability distributions that account for teams scoring none, one, or multiple runs in one inning.

Runs in baseball are considered rare events and count data, so they will follow a discrete probability distribution if they are random. The overall goal of this post is to describe the random process that arises with scoring runs in baseball. Previously, I’ve used the Poisson distribution (PD) to describe the probability of getting a certain number of runs within an inning. The Poisson distribution describes count data like car crashes or earthquakes over a given period of time and defined space. This worked reasonably well to get the general shape of the distribution, but it didn’t capture all the variance that the real data set contained. It predicted fewer scoreless innings and many more 1-run innings than what really occured. The PD makes an assumption that the mean and variance are equal. In both runs per inning and runs per game, the variance is about twice as much as the mean, so the real data will ‘spread out’ more than a PD predicts.

Negative Binomial Fit

The graph above shows an example of the application of count data distributions. The actual data is in gray and the Poisson distribution is in yellow. It’s not a terrible way to approximate the data or to conceptually understand the randomness behind baseball scoring, but the negative binomial distribution (NBD) works much better. The NBD is also a discrete probability distribution, but it finds the probability of a certain number of failures occurring before a certain number of successes. It would answer the question, what’s the probability that I get 3 TAILS before I get 5 HEADS when I continue to flip a coin. This doesn’t at first intuitively seem like it relates to a baseball game or an inning, but that will be explained later.

From a conceptual stand point, the two distributions are closely related. So if you are trying to describe why 73% of all MLB innings are scoreless to a friend over a beer, either will work. I’ve plotted both distributions for comparison throughout the post. The second section of the post will discuss the specific equations and their application to baseball.

Runs per Inning

Because of the difference in rules regarding the designated hitter between the two different leagues there will be a different expected value [average] and variance of runs/inning for each league. I separated the two leagues to get a better fit for the data. Using data from 2011-2013, the American League had an expected value of 0.4830 runs/inning with a 1.0136 variance, while the National League had 0.4468 runs/innings as the expected value with a .9037 variance. [So NL games are shorter and more boring to watch.] Using only the expected value and the variance, the negative binomial distribution [the red line in the graph] approximates the distribution of runs per inning more accurately than the Poisson distribution.

Runs Per Inning -- 2011-2013

It’s clear that there are a lot of scoreless innings, and very few innings having multiple runs scored. The NBD allows someone to calculate the probability of the likelihood of an MLB team scoring more than 7 runs in an inning or the probability that the home team forces extra innings down by a run in the bottom of the 9th. Using a pitcher’s expected runs/inning, the NBD could be used to approximate the pitcher’s chances of throwing a no-hitter assuming he will pitch for all 9 innings.

Runs Per Game

The NBD and PD can be used to describe the runs scored in a game by a team as well. Once again, I separated the AL and NL, because the AL had an expected run value of 4.4995 runs/game and a 9.9989 variance, and the NL had 4.2577 runs/game expected value and 9.1394 variance. This data is taken from 2008-2013. I used a larger span of years to increase the total number of games.

Runs Per Game 2008-2013

Even though MLB teams average more than 4 runs in a game, the single most likely run total for one team in a game is actually 3 runs. The negative binomial distribution once again modeled the empirical distribution well, but the PD had a terrible fit when compared to the previous graph. Both models, however, underestimate the shut-out rate. A remedy for this is to adjust for zero-inflation. This would increase the likelihood of getting a shut out in the model and adjust the rest of the probabilities accordingly. An inference of needing zero-inflation is that baseball scoring isn’t completely random. A manager is more likely to use his best pitchers to continue a shut out rather than randomly assign pitchers from the bullpen.

Hits Per Inning

It turns out the NBD/PD are useful with many other baseball statistics like hits per inning.

Hits Per Inning 2011-2013

The distribution for hits per inning are slightly similar to runs per inning, except the expected value is higher and the variance is lower. [AL: .9769 hits/inning, 1.2847 variance | NL: .9677 hits/inning, 1.2579 variance (2011-2013)] Since the variance is much closer to the expected value, hits per inning has more values in the middle and fewer at the extremes than the runs per inning distribution.

I could spend all day finding more applications of the NBD and PD, because there are really a lot of examples within baseball. Understanding these discrete probability distributions will help you understand how the game works, and they could be used to model outcomes within baseball.

The Math Side

Hopefully, you skipped down to this section right away if you are curious about the math behind this. I’ve compiled the numbers used in the graphs for the American League for those curious enough to look at examples of the actual values.

The Poisson distribution is given by the equation:

There are two parameters for this equation: expected value [λ] and the number of runs you are looking to calculate [x]. To determine the probability of a team scoring exactly three runs in a game, you would set x = 3 and using the AL expected runs per game you’d calculate:

This is repeated for the entire set of x = {0, 1, 2, 3, 4, 5, 6, … } to get the Poisson distribution used through out the post.

One of the assumption the PD makes is that mean and the variance are equal. For these examples, this assumption doesn’t hold true, so the empirical data from actual baseball results doesn’t quite fit the PD and is overdispersed. The NBD accounts for the variance by including it in the parameters.

The negative binomial distribution is usually symbolized by the following equation:

where r is the number of successes, k is the number of failures, and p is the probability of success. A key restriction is that a success has to be the last event in the series of successes and failures.

Unfortunately, we don’t have a clear value for p or a clear concept on what will be measured, because the NBD measures the probability of binary, Bernoulli trials. It’s helpful to view this problem from the vantage point of the fielding team or pitcher, because a SUCCESS will be defined as getting out of the inning or game, and a FAILURE will be allowing 1 run to score. This will conform to the restriction by having a success [getting out of the inning/game] being the ultimate event of the series.

In order to make this work the NBD needs to be parameterized differently for mean, variance, and number of runs allowed [failures]. The NBD can be written as

where

Hits Per Inning 2011-2013

So using the same example as the PD distribution, this would yield:

The above equations are adapted from this blog about negative binomials and this one about applying the distribution to baseball. The Γ function used in the equation instead of a combination operator because the combination operator can’t handle the non-whole numbers we are using to describe the number of successes.

Conclusion

The negative binomial distribution is really useful in modeling the distribution of discrete count data from baseball for a given inning or game. The most interesting aspect of the NBD is that a success is considered getting out of the inning/game, while a failure would be letting a run score. This is a little counterintuitive if you approach modeling the distribution from the perspective of the batting team. While the NBD has a better fit, the Poisson distribution has a simpler concept to explain: the count of discrete events over a given period of time, which might make it better to discuss over beers with your friends.

The fit of the NBD suggests that run scoring is a negative binomial process, but inconsistencies especially with shut outs indicate elements of the game aren’t completely random. I’m explaining the underestimation of the number of shut outs as the increase use of the best relievers in shut out games over other games increasing the total number of shut outs and subsequently decreasing the frequency of other run-total games.

All MLB data is from retrosheet.org. It’s available free of charge from there. So please check it out, because it’s a great data set. If there are any errors or if you have questions, comments, or want to grab a beer to talk about the Poisson distribution please feel free to tweet me @seandolinar.


Pitch Win Values for Starting Pitchers — August 2014

Introduction

A couple months back, I introduced a new method of calculating pitch values using a FIP-based WAR methodology.  That post details the basic framework of these calculations and  can be found here .  The May, June, and July updates can be found herehere, and here respectively.  This post is simply the August 2014 update of the same data.  What follows is predominantly data-heavy but should still provide useful talking points for discussion.  Let’s dive in and see what we can find.  Please note that the same caveats apply as previous months.  We’re at the mercy of pitch classification.  I’m sure your favorite pitcher doesn’t throw that pitch that has been rated as incredibly below average, but we have to go off of the data that is available.  Also, Baseball Prospectus’s PitchF/x leaderboards list only nine pitches (Four-Seam Fastball, Sinker, Cutter, Splitter, Curveball, Slider, Changeup, Screwball, and Knuckleball).  Anything that may be classified outside of these categories is not included.  Also, anything classified as a “slow curve” is not included in Baseball Prospectus’s curveball data.

Constants

Before we begin, we must first update the constants used in calculation for August.  As a refresher, we need three different constants for calculation: strikes per strikeout, balls per walk, and a FIP constant to bring the values onto the right scale.  We will tackle them each individually.

First, let’s discuss the strikeout constant.  In August, there were 52,238 strikes thrown by starting pitchers.  Of these 52,238 strikes, 4,887 were turned into hits and 15,293 outs were recorded.  Of these 15,293 outs, 4,118 were converted via the strikeout, leaving us with 11,175 ball-in-play outs.  11,175 ball-in-play strikes and 4,887 hits sum to 16,062 balls-in-play.  Subtracting 16,062 balls-in-play from our original 52,238 strikes leaves us with 36,176 strikes to distribute over our 4,118 strikeouts.  That’s a ratio of 8.78 strikes per strikeout.  This is slightly lower than our from 8.82 strikes per strikeout in June and July, meaning batters were slightly easier to strikeout in August.

The next two constants are much easier to ascertain.  In August, there were 28,957 balls thrown by starters and 1,521 walked batters.  That’s a ratio of 19.04 balls per walk, down from 19.76 balls per walk in August.  This data would suggest that hitters were more likely to walk in August than previously.  The FIP subtotal for all pitches in August was 0.48.  The MLB Run Average for August was 4.12, meaning our FIP constant for  is 3.65.

Constant Value
Strikes/K 8.78
Balls/BB 19.04
cFIP 3.65

The following table details how the constants have changed month-to-month.

Month K BB cFIP
March/April 8.47 18.50 3.68
May 8.88 18.77 3.58
June 8.82 19.36 3.59
July 8.82 19.76 3.65
August 8.78 19.04 3.65

Pitch Values – August 2014

For reference, the following table details the FIP for each pitch type in the month of August.

Pitch FIP
Four-Seam 4.03
Sinker 4.17
Cutter 4.14
Splitter 4.48
Curveball 4.21
Slider 4.15
Changeup 4.47
Screwball 2.22
Knuckleball 4.56
MLB RA 4.12

As we can see, only two pitches would be classified as above average for the month of August: four-seam fastballs and screwballs.  Sinkers, cutters, and sliders also came in right around league average.  Pitchers that were able to stand out in other categories tended to have better overall months than pitchers who excelled at the these pitches.  Now, let’s proceed to the data for the month of August.

Four-Seam Fastball

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Chris Tillman 0.7 183 Sean O’Sullivan -0.2
2 Jose Quintana 0.6 184 John Danks -0.2
3 Phil Hughes 0.6 185 Anthony Ranaudo -0.3
4 Max Scherzer 0.6 186 Jason Hammel -0.3
5 Madison Bumgarner 0.5 187 Stephen Strasburg -0.4

Sinker

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Mike Leake 0.5 169 Shelby Miller -0.2
2 Rick Porcello 0.4 170 Travis Wood -0.2
3 Kyle Hendricks 0.4 171 Mat Latos -0.3
4 Dallas Keuchel 0.3 172 Tsuyoshi Wada -0.3
5 Jimmy Nelson 0.3 173 Kyle Kendrick -0.3

Cutter

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Jarred Cosart 0.6 74 Scott Carroll -0.1
2 Josh Collmenter 0.4 75 Jorge de la Rosa -0.1
3 Corey Kluber 0.3 76 J.A. Happ -0.1
4 James Shields 0.3 77 Kevin Correia -0.2
5 Jerome Williams 0.2 78 Dan Haren -0.2

Splitter

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Alex Cobb 0.4 26 Miguel Gonzalez -0.1
2 Mat Latos 0.2 27 Hisashi Iwakuma -0.1
3 Alfredo Simon 0.1 28 Felix Hernandez -0.1
4 Hiroki Kuroda 0.1 29 Jorge de la Rosa -0.1
5 Kyle Kendrick 0.1 30 Tim Hudson -0.2

Curveball

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Alex Wood 0.3 157 James Shields -0.2
2 Brandon McCarthy 0.3 158 Jesse Hahn -0.2
3 Adam Wainwright 0.3 159 Max Scherzer -0.2
4 Clay Buchholz 0.2 160 Zack Greinke -0.3
5 Scott Feldman 0.2 161 Nick Martinez -0.3

Slider

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Clayton Kershaw 0.4 123 Dallas Keuchel -0.2
2 Chris Archer 0.3 124 Scott Baker -0.2
3 Tyler Matzek 0.3 125 Rubby de la Rosa -0.2
4 Collin McHugh 0.3 126 Bartolo Colon -0.2
5 Kyle Gibson 0.2 127 Rafael Montero -0.2

Changeup

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Chris Capuano 0.4 154 Jon Niese -0.2
2 Jeremy Guthrie 0.3 155 Henderson Alvarez -0.2
3 Roberto Hernandez 0.2 156 Zack Greinke -0.2
4 David Price 0.2 157 Brad Peacock -0.3
5 Max Scherzer 0.2 158 Brad Hand -0.4

Screwball

Rank Pitcher Pitch Value
1 Trevor Bauer 0.0

Knuckleball

Rank Pitcher Pitch Value
1 R.A. Dickey 0.1

Overall

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Alex Cobb 0.9 186 Jason Hammel -0.2
2 Jordan Zimmermann 0.8 187 Justin Masterson -0.2
3 Corey Kluber 0.8 188 Sean O’Sullivan -0.3
4 Jarred Cosart 0.8 189 Kyle Lohse -0.4
5 Collin McHugh 0.8 190 Brad Hand -0.4

Pitch Ratings – August 2014

Four-Seam Fastball

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Jose Quintana 59 87 Vance Worley 39
2 Brad Peacock 59 88 Stephen Strasburg 37
3 Michael Pineda 59 89 Justin Masterson 36
4 Phil Hughes 58 90 Anthony Ranaudo 35
5 Franklin Morales 58 91 John Danks 35

Sinker

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Rick Porcello 58 68 Travis Wood 37
2 Jake Arrieta 58 69 Kyle Kendrick 36
3 Gio Gonzalez 57 70 John Lackey 35
4 J.A. Happ 57 71 Mat Latos 35
5 Marcus Stroman 57 72 Tsuyoshi Wada 33

Cutter

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Franklin Morales 58 27 Brandon McCarthy 43
2 Corey Kluber 58 28 Jake Peavy 40
3 James Shields 58 29 Ryan Vogelsong 39
4 Jerome Williams 57 30 Dan Haren 38
5 Tim Hudson 56 31 Kevin Correia 33

Splitter

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Mat Latos 58 7 Matt Shoemaker 50
2 Alex Cobb 56 8 Jake Odorizzi 49
3 Kyle Kendrick 55 9 Jorge de la Rosa 45
4 Tsuyoshi Wada 54 10 Kevin Gausman 42
5 Alfredo Simon 54 11 Hisashi Iwakuma 41

Curveball

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Felix Hernandez 60 66 Dillon Gee 37
2 Brandon McCarthy 58 67 Scott Carroll 37
3 Jacob deGrom 58 68 James Shields 33
4 Brandon Workman 57 69 Jesse Hahn 24
5 Jeremy Hellickson 57 70 Max Scherzer 22

Slider

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Max Scherzer 59 54 Tanner Roark 40
2 Wei-Yin Chen 59 55 Kyle Lohse 38
3 Jordan Zimmermann 59 56 Vance Worley 37
4 Corey Kluber 59 57 Dallas Keuchel 35
5 Tyler Matzek 58 58 Tim Lincecum 27

Changeup

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Chris Capuano 58 59 Wade Miley 38
2 Roberto Hernandez 58 60 Robbie Ray 36
3 Allen Webster 57 61 Trevor May 32
4 Yohan Flande 57 62 Zack Greinke 28
5 Jeremy Guthrie 57 63 Jon Niese 28

Screwball

Rank Pitcher Pitch Rating
1 Trevor Bauer 59

Knuckleball

Rank Pitcher Pitch Rating
1 R.A. Dickey 49

Monthly Discussion

As we can see, Alex Cobb takes the top for this month mainly due to the  strength of his sinker and splitter.  Cobb was classified as throwing four different pitches in August (Four-Seam, Sinker, Splitter, and Curveball) and managed to earn at least 0.1 WAR from all four.  The most valuable pitch overall in August was Chris Tillman’s Four-Seam Fastball.  The least valuable was Stephen Strasburg’s Four-Seam Fastball.  As far as offspeed pitches, Chris Capuano’s 0.4 WAR from his changeup lead the way.  The least valuable offspeed pitch was Brad Hand’s slider.

On our 20-80 scale pitch ratings, the highest rated qualifying pitch was Felix Hernandez’s curveball.  The lowest rated pitch was the curveball thrown by Max Scherzer.  The highest rated fastball was Jose Quintana’s four-seam fastball.  The lowest rated fastball was Tsuyoshi Wada’s sinker.

Pitch Values – 2014 Season

Four-Seam Fastball

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Jose Quintana 2.4 262 Dan Straily -0.3
2 Ian Kennedy 2.4 263 Edwin Jackson -0.3
3 Phil Hughes 2.2 264 Masahiro Tanaka -0.4
4 Jordan Zimmermann 2.1 265 Juan Nicasio -0.4
5 Chris Tillman 1.9 266 Marco Estrada -0.7

Sinker

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Charlie Morton 1.7 251 Mike Pelfrey -0.3
2 Dallas Keuchel 1.4 252 Dan Straily -0.3
3 Chris Archer 1.3 253 John Danks -0.3
4 Mike Leake 1.3 254 Wandy Rodriguez -0.3
5 Felix Hernandez 1.2 255 Andrew Heaney -0.4

Cutter

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Jarred Cosart 1.8 118 Felipe Paulino -0.2
2 Corey Kluber 1.5 119 C.J. Wilson -0.3
3 Madison Bumgarner 1.4 120 Dan Haren -0.3
4 Josh Collmenter 1.4 121 Hector Noesi -0.4
5 Adam Wainwright 1.3 122 Brandon McCarthy -0.6

Splitter

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Alex Cobb 1.0 35 Jake Peavy -0.1
2 Masahiro Tanaka 0.8 36 Franklin Morales -0.2
3 Hiroki Kuroda 0.7 37 Danny Salazar -0.2
4 Hisashi Iwakuma 0.5 38 Miguel Gonzalez -0.3
5 Kyle Kendrick 0.4 39 Clay Buchholz -0.3

Curveball

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Sonny Gray 1.1 225 Homer Bailey -0.2
2 A.J. Burnett 1.1 226 Josh Collmenter -0.2
3 Brandon McCarthy 1.0 227 Franklin Morales -0.3
4 Adam Wainwright 1.0 228 Felipe Paulino -0.3
5 Felix Hernandez 0.8 229 Eric Stults -0.5

Slider

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Garrett Richards 1.5 192 Liam Hendriks -0.2
2 Tyson Ross 1.2 193 Rafael Montero -0.3
3 Chris Archer 1.0 194 Danny Salazar -0.3
4 Corey Kluber 1.0 195 Erasmo Ramirez -0.4
5 Jordan Zimmermann 1.0 196 Travis Wood -0.5

Changeup

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Felix Hernandez 0.8 245 Wandy Rodriguez -0.4
2 Stephen Strasburg 0.8 246 Jordan Zimmermann -0.4
3 Roberto Hernandez 0.7 247 Matt Cain -0.4
4 Cole Hamels 0.7 248 Marco Estrada -0.6
5 Chris Sale 0.6 249 Drew Hutchison -0.7

Screwball

Rank Pitcher Pitch Value
1 Trevor Bauer 0.1
2 Alfredo Simon 0.0
3 Hector Santiago 0.0
4 Julio Teheran 0.0

Knuckleball

Rank Pitcher Pitch Value
1 R.A. Dickey 1.3
2 C.J. Wilson 0.0

Overall

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Corey Kluber 3.7 270 David Holmberg -0.4
2 Adam Wainwright 3.6 271 Felipe Paulino -0.5
3 Garrett Richards 3.5 272 Juan Nicasio -0.5
4 Jose Quintana 3.4 273 Wandy Rodriguez -0.8
5 Felix Hernandez 3.3 274 Marco Estrada -1.2

Year-to-Date Discussion

If we look at the year-to-date numbers, Indians ace and Cistulli favorite Corey Kluber has claimed the top spot.  Current MLB FIP and WAR leader Clayton Kershaw ranks eighth, with every pitcher ranked above him having made at least three more starts.  The least valuable starter has been Marco Estrada.  On a per-pitch basis, the most valuable pitch has been Jose Quintana’s four-seam fastball.  The most valuable offspeed pitch has been Garrett Richards’s slider.  The least valuable pitch has been Marco Estrada’s four-seam fastball.  The least value offspeed pitch has been Drew Hutchison’s changeup.


The Remarkable Control of Phil Hughes and Hisashi Iwakuma

Phil Hughes of the Minnesota Twins and Hisashi Iwakuma of the Seattle Mariners both pitched over the Labor Day weekend and both picked up wins without issuing any walks. While not unusual as single game performances, consider that Hughes now has 15 wins for the season and has allowed only 15 walks while Iwakuma has 13 wins and 13 walks. They both have the opportunity to achieve the rarest of feats if they can finish the season with as many wins as walks. Granted pitcher wins are a poor measure of baseball excellence and are generally out of favor with most readers on this site, but the rarity of their accomplishments are quite astounding and worthy of attention.

How rare? It’s rarer than a perfect game, a 4-homer game, an unassisted triple play, and a batting triple crown. The last time a qualified starter had as many wins as walks was Carlos Silva of the Twins in 2005. Silva recorded only 9 wins in his best pro season by WAR, but he also walked only nine batters. And it wasn’t a small sample size situation either. The dude started 27 games and pitched 188 innings. Unfortunately his team didn’t reward him very often in the win column. Amazingly, 2 of his 9 walks were intentional.

Before that, Bret Saberhagen recorded 14 wins and allowed a mere 13 walks with the New York Mets in 1994. Interestingly, Saberhagen’s season included zero intentional walks while Iwakuma and Hughes have both issued one IBB so far, which leads one to wonder how many walks by these control artists were actually due to wildness (or a stingy strike zone) and how many were because they were merely pitching around a batter? There could literally be zero wild walks by these four, but it’s hard to even estimate without analyzing all the gifs and then guessing.

Also of note, Hughes has hit 3 batters so far this year, which has the same effect as a walk, while Iwakuma hit 2 all season. Both of Iwakuma’s HBPs actually happened in the same game, against Boston in his 24 August start, against back-to-back batters. Silva hit a surprisingly high 3 batters in his 2005 season and Saberhagen hit 4 in 1994. Again, it’s hard to say which of these HBPs were due to wildness and which were statements or retaliation although I personally watched Iwakuma’s two HBPs on MLB.TV and they were definitely not intentional.

Prior to Saberhagen? You have to go all the way back to Slim Sallee in 1919 to find someone with as many wins as walks. Remember him? Me neither. He had 21 wins and 20 walks that year for the Cincinnati Reds over 228 IPs. In baseball terms, 1919 was before Babe Ruth became a Yankee.  He was still pitching for the Red Sox and now he’s extremely dead.  So in the last 95 MLB seasons, among thousands of qualified starting pitchers, only four people have had as many wins as walks, and two of them are doing it this year! Here’s the all time leaderborad going back to 1900 sorted by wins minus walks.

Table 1: MLB Single Season Control by Qualified Starters Ranked by Wins-Walks, 1900-2014

Rank Name Team W L IP BB BB/9 ERA WAR YR W-BB IBB* HBP
1 Christy Mathewson Giants 25 11 306 21 0.62 2.06 5.8 1913 4 0
2 Christy Mathewson Giants 24 13 312 23 0.66 3.00 3.2 1914 1 2
3 Slim Sallee Reds 21 7 227 20 0.79 2.06 2.5 1919 1 1
4 Bret Saberhagen Mets 14 4 177 13 0.66 2.74 5.1 1994 1 0 4
5 Phil Hughes Twins 15 9 180 15 0.75 3.54 5.3 2014 0 1 3
6 Hisashi Iwakuma Mariners 13 6 155 13 0.75 2.90 3.0 2014 0 1 2
7 Carlos Silva Twins 9 8 188 9 0.43 3.44 2.6 2005 0 2 3
8 Greg Maddux Braves 19 4 232 20 0.77 2.20 8.0 1997 -1 6 6
9 Babe Adams Pirates 17 13 263 18 0.62 2.16 4.8 1920 -1 1
10 Walter Johnson Senators 36 7 346 38 0.99 1.14 8.5 1913 -2 9
11 Cy Young Americans 26 16 380 29 0.69 1.97 7.5 1904 -3 4
12 Tiny Bonham Yankees 21 5 226 24 0.96 2.27 5.3 1942 -3 1
13 Bob Tewksbury Cardinals 17 10 213 20 0.84 3.83 4.3 1993 -3 1 6
14 Cy Young Americans 33 10 371 37 0.9 1.62 9.0 1901 -4 8
15 Deacon Phillippe Pirates 25 9 289 29 0.9 2.43 6.4 1903 -4 4
16 Greg Maddux Braves 19 2 209 23 0.99 1.63 7.9 1995 -4 3 4
17 Bob Tewksbury Cardinals 16 5 233 20 0.77 2.16 3.9 1992 -4 0 3
18 La Marr Hoyt Padres 16 8 210 20 0.86 3.47 2.8 1985 -4 2 2
19 Jon Lieber Yankees 14 8 176 18 0.92 4.33 3.7 2004 -4 2 2
20 Babe Adams Pirates 14 5 160 18 1.01 2.64 3.1 1921 -4 0

 

Christy Mathewson is the clear stud in this statistical category with a +4 in 1913 (with zero hit batters) and +1 the following year. Look at all the hall of famers like Cy Young, Walter Johnson and Greg Maddux mixed in with guys that had great control but less than HOF careers like Bob Tewksbury, Babe Adams, Jon Lieber and La Marr Hoyt. Now look at and appreciate some of the innings pitched by these early control artists, led by Cy Young’s incredible 380 IPs in 1904 with only 29 walks.

This being a sabermetric site, the more generally accepted advanced baseball metric for pitcher control is probably BB/9 which takes the subjectivity of wins out of the equation. By that measure, here’s the all time leaderboard since 1900.

Table 2: MLB Single Season Control by Qualified Starters Ranked by Walks per 9 Innings, 1900-2014

Rank Name Team W L IP BB BB/9 ERA WAR YR W-BB IBB* HBP
1 Carlos Silva Twins 9 8 188 9 0.43 3.44 2.6 2005 0 2 3
2 Christy Mathewson Giants 25 11 306 21 0.62 2.06 5.8 1913 4 0
3 Babe Adams Pirates 17 13 263 18 0.62 2.16 4.8 1920 -1 1
4 Christy Mathewson Giants 24 13 312 23 0.66 3.00 3.2 1914 1 2
5 Bret Saberhagen Mets 14 4 177 13 0.66 2.74 5.1 1994 1 0 4
6 Cy Young Americans 26 16 380 29 0.69 1.97 7.5 1904 -3 4
7 Red Lucas Reds 10 16 219 18 0.74 3.40 2.3 1933 -8 2
8 Phil Hughes Twins 15 9 180 15 0.75 3.54 5.3 2014 0 1 3
9 Hisashi Iwakuma Mariners 13 6 155 13 0.75 2.90 3.0 2014 0 1 2
10 Cliff Lee 2 Teams 12 9 212 18 0.76 3.18 7.0 2010 -6 2 1
11 Greg Maddux Braves 19 4 232 20 0.77 2.20 8.0 1997 -1 6 6
12 Bob Tewksbury Cardinals 16 5 233 20 0.77 2.16 3.9 1992 -4 0 3
13 Cy Young Americans 13 21 287 25 0.78 3.19 6.2 1906 -12 8
14 Slim Sallee Reds 21 7 227 20 0.79 2.06 2.5 1919 1 1
15 Babe Adams Pirates 17 10 263 23 0.79 1.98 5.6 1919 -6 3
16 Babe Adams Pirates 8 11 171 15 0.79 3.57 4.2 1922 -7 4
17 Slim Sallee Giants 8 8 132 12 0.82 2.25 2.1 1918 -4 0
18 Addie Joss Naps 24 11 325 30 0.83 1.16 6.8 1908 -6 2
19 Cy Young Americans 18 19 320 30 0.84 1.82 7.6 1905 -12 10
20 Bob Tewksbury Cardinals 17 10 213 20 0.84 3.83 4.3 1993 -3 1 6

 

Who would have ever guessed that the ALL TIME LEADER in single season BB/9 is Carlos Silva in 2005? By a significant margin! Notice also that even with the elimination of wins from the discussion, Hughes and Iwakuma are still having truly historic seasons, tied for eighth on the all time list. It’s time they start getting some recognition for their accomplishments. Miguel Cabrera won the Triple Crown in 2012 and rightfully received notoriety for achieving a traditional statistical feat. Hughes and Iwakuma are on the verge of doing something similarly extraordinary and deserve some credit as well. I for one am going to watch closely and root for them to continue their excellence and go into the record books with at least as many wins as walks.

* Intentional walks weren’t recorded as an official statistic until 1955


Cat Days of Summer: The Tigers and Schedule Effects

If you’ve been on the internet in the last few weeks (or within earshot of a Michigander) you may have heard about the Tigers. Specifically, you may have heard about how the odds in favor of a Detroit appearance in the 2014 ALDS dropped from 21-to-1 on July 25 to under break-even by August 23 before a slight rebound to finish out the month. Even more specifically, you may have read Mike Petriello’s article about that on this very website. Or at the very least, you may have heard their struggles described in a less quantitative fashion. Regardless, the month of August was not kind to the Bengals.

As Petriello pointed out, this has been less of a Tigers collapse than a Royals surge. But there’s still something to the idea that the Tigers were playing worse in August than they had been previously. Let’s start with the basics:

2014 First Half August
R/G 4.80 4.58
RA/G 4.25 4.74
W% .582 .516
Pythagenpat .557 .484

In August, the Tigers scored fewer runs, allowed more runs, and won fewer games than in the first half. On some level, that’s all that really matters. On another level, something else is different about August for these Tigers.

Back on July 14, Buster Olney and Jeff Sullivan both wrote articles about schedule strength. Olney called the Tigers’ schedule the second-most difficult of 17 “contending” teams (paywall), while Sullivan said it was the easiest in all of MLB. One of the key reasons for the discrepancy was that Sullivan was using projections to determine the difficulty of a particular opponent, while Olney was using actual results. Score one for Sullivan. Another key difference was that as of July 14, the Tigers were about to play 55 games in 56 days, which did not factor into Sullivan’s analysis.

A point for Olney? Perhaps. But first, what would we expect to see if this was a result of schedule fatigue? Or put another way, which groups of players might be hurt most or least by not having a day off? Based on conventional wisdom, the bullpen would probably be the most affected, and the starters the least. So how does this match up to the Tigers? Read the rest of this entry »


Brandon Moss has Become a Little Too Patient

Brandon Moss has wielded an immensely potent bat since joining the Athletics’ lineup in June of 2012. Between 2012 and 2013, he hit a remarkable 146 wRC+, and clubbed a homer once every 15.7 PA’s, placing him third in baseball behind Chris Davis and Miguel Cabrera over that span. Moss kept up the hot hitting to start the 2014 season, as well. The 30-year-old 1B/OF/DH posted a 162 wRC+ in the season’s first two months, further establishing himself as a key cog in one of baseball’s most potent lineups.

But Brandon Moss hasn’t been himself lately. Since his last home run on July 24th, he’s only managed three extra-base hits, resulting in a laughable .168/.317/.198 batting line. Moss’s slump has also coincided with a change in his hitting approach. Moss appears to have gotten a bit more passive at the plate, swinging at way fewer pitches both inside and outside of the strike zone. This new-found passivity took a turn for the extreme once the calendar turned to August, when his O-Swing% and Z-Swing% fell to 27% and 65%, respectively — both around six percentage points lower than his career norms.

Swing

Moss’s decision to lay off more pitches has unsurprisingly lead to a spike in both his walk and strikeout numbers, but it’s also resulted in his power completely flat-lining. Moss has basically been Adam Dunn without the power these last couple of months. That’s a pretty terrible hitter, and is part of the reason why the A’s went out and got the real Adam Dunn to help their sputtering offense.

BBK

ISOO

The new swing profile is something that’s recently changed, making it the obvious culprit for Moss’s drop-off in production, but we shouldn’t immediately rule out the possibility that pitchers have changed the way they’re approaching him. It could just be that he’s swinging at fewer pitches because he’s getting fewer pitches to hit. That doesn’t seem to be the case, though, as Moss’s zone breakdown from August looks nearly identical to what it was over the season’s first four months. For whatever reason, Moss just isn’t swinging as often as he used to.

Untitled

It’s not entirely clear what’s spurred Moss’ sudden reluctance to swing the bat, but all indications are that it’s done a number on his offensive performance. Unlike the Brandon Moss that — up until recently — could be counted on for a wRC+ north of 130, this latest iteration seems to be letting a few too many hittable pitches float down the heart of the plate. And based on what’s transpired over the last month or two, Moss’s best bet is probably to re-discover the more aggressive approach that’s worked so well for him in the past.

Statistics courtesy of FanGraphs; Zone breakdowns courtesy of Baseball Savant.


O Xander, Where Art Thou?

Coming into this season, the Boston Red Sox had high hopes. Obviously, they were coming off a World Series title, and they had every reason to expect that they could contend again. Jarrod Saltalamacchia was gone, but he could be replaced by A.J. Pierzynski; the drop-off there wouldn’t be too large. Ryan Dempster was gone, but the Red Sox’s rotation of Jon Lester, John Lackey, Clay Buchholz, Jake Peavy, and Felix Doubront was what they had gone with during last year’s stretch run anyways. Jacoby Ellsbury was gone, but Jackie Bradley Jr. (and Grady Sizemore!) should have been able to play well enough to make his departure bearable. And Stephen Drew was gone, but uber-prospect Xander Bogaerts was ready to take over the Red Sox’s shortstop position and dominate the league.

Needless to say, none of those really worked out like the Red Sox and their fans had hoped or planned. Boston currently resides in the AL East cellar, all but certain to go from first to worst just the year after they had done the very opposite. And perhaps no individual part of that failure this season has been a bigger disappointment than Bogaerts. Instead of being the hitter he was supposed to be, he has struggled mightily at the plate, to the tune of a .223/.293/.333 slash line — good for a 74 wRC+ (as of September 1) and a major contributor to his negative WAR.

Where do we start in trying to assess the reasons for Bogaerts’s struggles? Well, time-wise, we can place a pretty neat cutoff point at June 4: That is when Bogaerts started to slump (I think I cursed him). For the first two months of the season, actually, Xander was quite good: he had a 140 wRC+ through April and May, and that figure would have been higher if not for a mini-slump that came towards the very beginning of the season. He was drawing walks roughly 11% of the time (above average) and striking out at a clip a shade below 22% (not much below average). And then came June. It started out OK — he went 4-for-13 in his first 3 June games. But after that, for the rest of the month, he recorded a mere 9 hits and 3 walks in 88 plate appearances. July was better, but not good: Bogaerts managed just a .228/.253/.342 line, and now through most of August he has been even worse than he was the previous two months, with a paltry .123/.195/.164 triple slash.

His wRC+, by month:

March/April 120
May 151
June 11
July 60
August -3

Yeesh. Not the way you want to be trending. So what happened? Well, the easy answer is to point to BABIP:

March/April 0.364
May 0.421
June 0.149
July 0.286
August 0.170

This looks right, right? His best month by wRC+ was his best month by BABIP. His worst month by wRC+ was his worst month by BABIP. And the same can be said for every month in between. But that, of course, doesn’t tell the whole story. Why is his BABIP from the first two months so much higher? What can he do to fix it? Will he fix it? Can he? Let’s explore.

A .364 BABIP like Bogaerts had in April is unsustainable. The .421 BABIP he had the following month is way too high for even the best players to keep up. So naturally, we would expect some regression from him. But his batted ball profile did suggest a decent BABIP – high line drive rate and low popup rate. The only thing overly suspect was his 17.1% infield hit rate in June. Nothing there would suggest such an outrageously high BABIP for the first two months, but nothing would suggest the low BABIPs that were to come later either. So something must have changed. What was it?

It wasn’t Bogaerts’s average flyball distance; that stayed more or less intact. But he did start hitting many fewer line drives…

March/April 22.4%
May 24.4%
June 15.7%
July 19.0%
August 14.6%

…and started striking out more, which didn’t affect his BABIP directly but did have an impact on his overall hitting (somewhat astonishingly and coincidentally, his K% has been the exact same – to one decimal – each of the past 3 months):

March/April 21.7%
May 22.0%
June 26.5%
July 26.5%
August 26.5%

And in the same vein, he walked much less, which helped contribute to his very low wRC+ as well:

March/April 12.3%
May 10.2%
June 2.9%
July 3.6%
August 7.2%

So while it may be easy to ascribe Bogaerts’s recent struggles to his abnormally low BABIPs, there is more to the story. He simply isn’t hitting anywhere near as well as he did earlier in the season. I can think of a few potential reasons for this:

1. Pitchers are pitching to him differently, and he will have to adjust

2. He is in a prolonged slump, and will snap out of it eventually

3. He isn’t actually that good, and his first few months were just very lucky

4. He was playing third base

I think we can ignore the last two. Bogaerts, after all, was ranked a top-5 prospect coming into the season by almost anyone worth listening to, and he has hit very well in the majors before; he’s almost certainly not actually bad at hitting. As for the last one — that was a theory many people floated out when Bogaerts stopped hitting well at almost the exact same time as Stephen Drew returned and kicked Bogaerts over to third. The argument was that since short was Bogaerts’s natural position, and he felt most comfortable there and could focus on his hitting, he would do better when playing there.

And that theory holds some water: this season, his wRC+ as a third baseman is 37 (in 180 PA), and as a shortstop it is 95 (in 312). That is too large of a difference to dismiss offhandedly. But here’s the problem: when Drew was traded, and Bogaerts returned to shortstop, he continued to hit poorly. In fact, throughout the entire month of August, Bogaerts played shortstop, and he had a -3 wRC+. I am going to say that that theory, while compelling, doesn’t really explain Bogaerts’s struggles at all. He’d tell you that himself.

So what does? Pitchers pitching him differently? Yes, to an extent. Here is how Bogaerts has done all season long against certain pitches:

Pitch RAA BABIP Contact%
Fourseam 5.0 0.342 82.8%
Cutter 2.5 0.150 83.0%
Changeup 0.3 0.324 69.6%
Curveball -0.2 0.200 70.9%
Sinker -4.6 0.290 79.6%
Slider -12.2 0.205 57.9%

And here is how he has been pitched:

Bogaerts pitches

The pitches in that gif are ordered by how many runs above average Bogaerts has been against them, descending. You can see that from June 4 (the date of the start of Bogaerts’s extended slump) on, he has seen many fewer fastballs and many more sinkers and sliders than before. That could be the cause of his BABIP, strikeout, and general hitting struggles since he excels against fastballs and cannot hit sliders or sinkers (sliders more so).

But there’s only one issue: the problem isn’t that Bogaerts is getting fewer pitches he can hit, it’s that he’s not hitting the pitches he used to. Here’s Bogaerts against four-seam fastballs (from Brooks Baseball; BIP means balls in play):

Time Count Foul/Swing Whiff/Swing GB/BIP LD/BIP FB/BIP PU/BIP
March 31 – June 3 404 44.8% 18.8% 30.3% 27.3% 37.9% 4.6%
June 4 – September 1 292 41.4% 15.0% 24.1% 13.8% 46.6% 15.5%

He’s cut down a bit on his swings and misses, but everything else looks bad. He’s drastically decreased his line drive rate and drastically increased his popup rate. His groundball rate has gone down a bit, which can be good or bad (in this case I don’t think it’s had a huge effect on anything), and his flyball rate has gone up a lot — which could be good, but Bogaerts is averaging a mere 266.75 feet on his fly balls — 230th out of 284 qualified hitters. So how has this changed his results? Again, Bogaerts against fastballs:

Time Count AVG SLG ISO BABIP wOBA
March 31 – June 3 404 .386 .590 .205 .469 .471
June 4 – September 1 292 .179 .328 .149 .189 .253

Wow. That is quite the drop in production. League average wOBA against four-seamers this year is .416 (which makes you question why they are thrown so much, but that’s a different article) and so Bogaerts’s wRC+ relative to other fastballs went from a 113 to a 61 in those two timeframes (park-unadjusted).

And look where Bogaerts is hitting balls, too. The following charts aren’t only fastballs — it’s all balls put in play by him. In the beginning of the year, he was sending line drives to all fields, getting grounders through the infield, and pulling balls deep. In the second part, you see lots of shallow line drives and fly balls — in fact, in the three months covered in the second half of the gif, there are all of TWO ground balls that make it through the infield, and only one opposite-field line drive that makes it to the outfield. There are more popups, too, and the fly balls seem to be shallower generally.

Bogaerts BIP

Now, some of the things you’re seeing here could be a result of teams shifting on him more as the year goes on, which is why no ground balls are getting to the outfield. But more likely it is Bogaerts making weaker contact and allowing fielders to get to his ground balls; in addition, he isn’t hitting many ground balls up the middle, where you’re more likely to get hits.

Bogaerts hits

Take a look at the gif above. What you’re seeing is the same thing as the last one, only with the at bat result instead of the batted ball type. In the first part of the year, you see Bogaerts getting lots of hits to all parts of the outfields, including deep balls that end up in home runs or doubles. Then, many more balls end up in the infield and most of his hits are shallow balls to the outfield.

This doesn’t look good, especially since it’s been going on for so long. I’m no expert in swing mechanics, so I can’t tell you why Bogaerts has suddenly stopped hitting everything, fastballs especially. My guess is that it’s just a long, long slump that is happening because he’s only 21 years old. I don’t think this means that we should give up on him. He has already proven that he can hit, albeit in a very small sample.

Take a look at the list of all the players who had a wRC+ below 100 in a year where they were listed as top-10 prospects by Baseball America (since 1997):

Name wRC+ PA WAR Rank Year Age
Brandon Phillips 44 393 -0.7 7 2003 22
Todd Walker 62 171 0.2 7 1997 24
Paul Konerko 63 239 -0.4 2 1998 22
Hank Blalock 64 172 -0.3 3 2002 21
Aramis Ramirez 70 275 -1 5 1998 20
Xander Bogaerts 74 485 -0.2 2 2014 21
Lastings Milledge 74 185 -0.5 9 2006 21
Adrian Beltre 75 214 0.2 3 1998 19
Jurickson Profar 75 324 -0.4 1 2013 20
Sean Burroughs 77 206 0 4 2002 21
Miguel Tejada 78 407 -0.5 10 1998 24
Mike Moustakas 84 365 0.2 9 2011 22
Jeremy Hermida 84 348 -0.8 4 2006 22
Alex Gordon 87 601 2 2 2007 23
Alex Rios 87 460 2 6 2004 23
Colby Rasmus 89 520 2.6 3 2009 22
Cameron Maybin 89 199 0.9 8 2009 22
Delmon Young 89 681 0 3 2007 21
Jesus Montero 90 553 -0.4 6 2012 22
B.J. Upton 91 177 0.1 2 2004 19
Rickie Weeks 92 414 -0.3 8 2005 22
Ruben Mateo 94 222 0.8 6 2000 22
Eric Chavez 94 402 1.2 3 1999 21
Rocco Baldelli 94 684 1.7 2 2003 21
Matt Wieters 95 385 1.3 1 2009 23
J.D. Drew 95 430 2.5 1 1999 23
Andruw Jones 96 467 3.7 1 1997 20
Travis Snider 96 276 -0.3 6 2009 21
Michael Barrett 96 469 0 6 1999 22
Jay Bruce 97 452 0.7 1 2008 21

There are a lot of really good players on that list. Bogaerts is one of the worst there in terms of wRC+ that year, but he’s also younger and higher-ranked than most. That doesn’t concern me. What concerns me is that almost all of the ones on that list from the past few years haven’t succeeded: all of the ones that have are from 2009 or earlier. This is consistent with semi-recent findings by Jeff Zimmerman that the aging curve is changing: hitters don’t improve with age anymore. Further research by Brian Henry shows that players who start in the big leagues at 21 tend to stay steady with their production for a while, then decline at around 30. This does not bode well for the young Red Sox shortstop.

But who knows? If I had to guess, I would say that Bogaerts regains his stroke and starts driving the ball more. He’s too good of a hitter to be so bad against fastballs. After all, he is only 21 years old. Plus… I mean, look at that swing. Number two prospects go far. All the prospects on the list above ranked first or second had some degree of success in the majors, with the exception of Rocco Baldelli, who was good until injuries ruined his career. (Brandon Wood didn’t have enough plate appearances to qualify for the list.) If he was playing a little over his head in April and May, he’s been playing well below his feet for the past three months, and those kinds of things tend to right themselves in time.

Note: This was written before Bogaerts played today, Monday 9/1. He went 1 for 4 with a double and two strikeouts.


Mike Trout and the MVP

In 2012 and 2013, Mike Trout was considered by most in the sabermetric community to be the most valuable player in the American League.  That Miguel Cabrera ended up winning in both years was the source of much debate and consternation, to say the least.  Analytically-inclined fans and writers were fed up, frustrated, and outright angry with the “old school” writers voting for Cabrera based on a different set of values.  Now, in an amusing twist, it appears that this year Trout has his best chance yet to wind up with the award, in large part by having a season that is less aligned with what the sabermetric community values, and more aligned with what the majority of the voting population values.  I took a look at the changes in various aspects of Trout’s game and analyzed how the regressions/improvements will impact his candidacy, based on what voters traditionally have cared about.

Defense

A large part of Trout’s previous MVP candidacy (particularly in 2012) centered on his defense — an area that traditionally has had fewer metrics to quantify a player’s value (as compared to say, hitting).  In 2012, DRS had Trout as worth 21 runs above average; UZR had him at 13.3.

In 2013, Trout’s defensive value declined to the point where he was worth -9 runs by DRS and +4.4 runs by UZR.  This discrepancy was a major reason why Baseball-Reference’s DRS-based WAR for Trout was 8.9 while FanGraphs’ UZR-based WAR was 10.5.

This year, Trout’s worth -6 by DRS and -7.2 by UZR.

In actuality, it didn’t take a rocket scientist to predict this regression; Trout’s arm has been consistently slightly below average, and his range ended up over-contributing in 2012 thanks to a handful of plays that broke his way.  Interestingly enough, the sabermetric crowd didn’t call any attention to this detail in 2012, choosing instead to use Trout’s defensive numbers to bolster their MVP case; now this year they’re bending over backwards to try to discredit Alex Gordon’s defensive numbers so they can justify giving the MVP to Trout as they’ve hoped to be able to do all season long…but that’s a post for a different day.

Baserunning

Likewise in 2012, Trout’s baserunning was valued at 12 runs above average, which included his other-worldly 49 SB and 5 CS.  In 2013, his baserunning added 8.1 runs, including 33 SB and 7 CS — still a great 82.5% success rate.

This year, Trout’s been worth all of 1.5 runs on the bases, with just 13 SB and 2 CS.

Hitting

Trout’s offense is down slightly, but not nearly to the extent that his defense and baserunning have been.  Like his defense, this regression was fairly predictable, given Trout’s unsustainably high BABIP in 2012 and 2013.  His OPS is down to 0.934 compared to 0.963 and 0.988 in 2012 and 2013, but he still has plenty else to hang his hat on: he leads the league in total bases; he’s already hit 30 homers, a total he hasn’t surpassed before; and, with 94 RBIs, he’ll easily pass that magical/meaningless 100 threshold soon as well.  The voters as a whole still like HRs, RBIs, and round numbers.

Clutch Hitting

In previous years, Trout was criticized (at least by me!) for not getting hits in key situations.  Here are Trout’s offensive splits with Bases Empty versus with Runners on Base:

 Year  Split  BABIP  OPS  tOPS+
 2012  Empty  0.403  0.985
 2012  RoB  0.343  0.917  90
 2013  Empty  0.399  1.023
 2013  RoB  0.339  0.934  90
 2014  Empty  0.343  0.916
 2014  RoB  0.348  0.944  104

In 2012-2013, he performed significantly worse with runners on.  Presumably most folks here would no doubt cling to the notion that this is entirely luck, and that sequencing like this is entirely unpredictable and out of players’ control.  I argue that even if so, if we’re talking about how much value a player added to his team in a given year, he’s adding more value in years when he gets clutch hits than in years when he doesn’t.  And this year, he’s actually reversed the trend.  His 2014 WPA of 5.52 has already exceeded his 2012 and 2013 marks of 5.32 and 4.60.

The Field

Fortunately for Trout this year, there haven’t been many other position players giving him a run for his money.  Josh Donaldson has cooled off as expected after a hot start.  Alex Gordon’s case is even more heavily dependent on defensive metrics than Trout’s was in 2012, and I don’t see many voters slotting him above Trout.  After that, I just don’t see the award going to Robinson Cano or Kyle Seager (the only other 2 AL players in the top 10 for position player WAR as of this writing), unless Cano truly catches fire in September and leads the Mariners to the playoffs.  In fact Trout’s best competition for the MVP may well end up being a pitcher (another Mariner, no less!), Felix Hernandez.  And we know how hard it is for a pitcher to win the MVP even when his WAR outpaces that of position players (“They only pitch every 5 days!”).

Playoffs?!

Last and perhaps most importantly, I present the Angels’ records and division finishes over the past 3 seasons:

2012: 89-73, 3rd

2013: 78-84, 3rd

2014: 81-53, 1st (through 8/30)

FanGraphs gives the Angels a 99.9% chance of making the playoffs.  In fact, as of this writing, no other team in baseball has more than 78 wins, while the Angels have 81.  This should finally appease the “MVPs should lead their team to the playoffs” voters.

The Vote

So Trout’s hitting is slightly down and his defense and baserunning are way down from when he had his previous “MVP-caliber” seasons.  Fortunately for Trout, the voters by and large don’t value defense and baserunning as much as they probably should (though that’s starting to change, albeit slowly).  And as for hitting being down, 2014 Trout is doing more of what they value: hitting homers and driving in runs.  The only thing that might work against him is if he doesn’t bat .300 (he’s at .290 as of now), and the voters like nice round numbers (and they value BA over newfangled mumbo-jumbo like OBP and OPS).  Overall though, with the Angels in line for their first playoff spot since 2009 and no other traditional MVP-makeup players in the field, Trout seems like a shoo-in.

 Criteria  As Compared to 2012-2013  Do Voters care?
 Defense  Way Down  Not much
 Baserunning  Way Down  Not much
 Overall Hitting  Somewhat down  Somewhat
 HRs, RBIs  Up  Yes
 Playoffs  Angels in much better position  Yes
 Field  Not as many standouts as 2012-2013(Alex Gordon != Miguel Cabrera)  Yes

So there you have it: Trout will win the AL MVP award for all the wrong reasons.


Is Samardzija Really an Ace?

Jeff Samardzija will be a free agent this winter after turning down an offer from the Cubs in the range of 5 years/$85m, and being subsequently dealt to the Oakland A’s. One may reasonably assume he is looking for a payday more in the 7/100 range, and one may reasonably assume he’ll get pretty close to that. That’s ace money, but is he worth it?

To assess this question we need to have a good working definition of ace. My definition, unrigorously explored here, is that an ace is pitcher who has a reasonable chance of achieving an ace-caliber season. I didn’t define the latter in my previous post, but one way to look at it is to say that ace-caliber season is one in which the pitcher finishes in the top ten in pitcher WAR. Ten is a bit random — if most humans had six fingers and a thumb I’d probably be talking top 14 – but it at least roughly quantifies the point that ace seasons are something of a rarity.

Under the Baseball Reference  WAR system, a 5.0+ WAR season means an All-Star performance. Unsatisfied with this seemingly arbitrary number, I took the average WAR for the 10th-best pitcher in the majors over the last 10 complete seasons, and after the determined application of math and stuff, came up with … 5.0. So for the purposes of this post, that will be my definition of an ace-caliber season for a starter: a WAR of 5.0 or better.

Samardzija hasn’t come close to that in his career. This year will be his best — he’s at 2.7 right now and presumably will finish somewhere around 3.0. Indeed, Samardzija’s career WAR total is just 5.8. In contrast,  here are the number of 5+ seasons the Shark’s principal trade and/or free-agent competitors have amassed:

Jon Lester: 3

Cole Hamels: 2 (and on his way to a 3rd this year)

Max Scherzer: 2 (including this year)

But things get a bit more complicated when we remember that there is more than one type of WAR, and no, I’m not talking about wars of necessity vs. wars of choice. Rather, I’m referring to the differences between Baseball Reference’s WAR calculation and FanGraphs’, which has its own methodology for calculating WAR. This explains the differences between the two stats; my purpose here is not to laud or condemn either approach, but to use both to get a sense of how ace-like Samardzija might be. To do this, I compared Shark’s three seasons as a starter with the first three seasons of the guys mentioned above. I also devised a remarkably creative name for this stat: WAR(3).

Pitcher                            rWAR(3)            fWAR(3)

Shark                                   5.0                      8.1

Lester                                 14.8                    13.0

Hamels                              10.7                    10.4

Scherzer                              5.9                      9.4

Samardzija is the least impressive of the three, but he is not far off Max Scherzer’s numbers, regardless of which WAR you choose. (Note: I left out Scherzer’s first seven starts, which he made in Arizona in 2008 when he also served as a reliever.) While Shark and Scherzer are about the same age, Shark got his starting career under way three full years after Scherzer. The latter is has turned into an outstanding pitcher during the years you would expect a player to blossom (ages 27-29). The good news is that Shark has less mileage on his arm than Scherzer. The bad news is that Shark has already passed through the years when careers often take off. So this admittedly microscopic sample suggests that the Shark does have a platform, but a shaky one, from which he could launch an ace season or two.

Maybe there are other comps for Samardzija that could shed some light on this issue. A look at pitchers with high similarity scores to Samardzija through age 28 on Baseball Reference reveals a fairly grim list:

Juan Cruz
Calvin Schiraldi
Kevin Correia
Carlos Villanueva
Bill Swift
Dave Stewart
Mark Grant
Ron Schueler
Renie Martin
Willie Fraser

This group amassed a total of 100 pitching seasons, and managed just one ace-caliber season: Dave Stewart’s remarkable age-33 campaign with Oakland. If this list is predictive, it predicts that Shark will be hosting a regional cable network pre-game show within five years. But you may be saying to yourself, “Self, most of the people on that list don’t remind me of the Jeff Samardzija I’ve seen at all. And isn’t Renie Martin some kind of hard liquor?” All true. The majority of guys on this list lurked (or still do, in the case of Correia and Villanueva) at the edge of the rotation’s campfire, just beyond the flame’s light. Whatever one’s view of the Shark, no one would equate him with Calvin Schiraldi.

One problem with assessing Samardzija’s prospects is his highly unusual career trajectory. He bounced between starting and relieving in the minors, and early on in his major-league career was mostly a reliever. He didn’t become a full-time starter until 2012, at age 27. This partly accounts for his low career WAR, although he also put up 54 craptastic innings in 2009 and 2010 that might have killed a lesser man’s career. But that’s part of Shark’s story — so much physical talent that many in the Cubs’ organization were willing to put up with the setbacks, and keep tinkering with him until they found something that worked.

So it’s safe to say that Shark’s future is little harder to predict than most. His defenders may hope that, like Kevin Brown and Curt Shilling, he has a run of early-30s excellence in him, and he might. But Brown and Schilling were already good before age 30, and they had a lot more starts under their belts.  The one guy who does have a career trajectory somewhat similar to Shark’s is the one guy on the list above with an ace season: Dave Stewart.

Stewart walked a very hard road, overcoming a battalion of personal demons to become a rotation anchor in Oakland at age 30. (A good book could be written about the baseball souls Tony La Russa saved — Eck and Stewart would feature prominently, while McGwire would present a more complicated story.) Stewart’s career WAR to that point was an insignificant 6.1, slightly higher than Shark’s is today, but spread over more seasons. In the next four years Stewart would accumulate 17.8 WAR, including the dramatic 1990 World Series year, where he posted a career-best 2.56 ERA in 267 league-leading innings. Stewart would soldier on for four more years, losing effectiveness as the strike zone increasingly eluded him. But flags fly forever, and Stewart’s late-career surge may offer hope for Samardzija. Like Shark, Stewart threw hard and was very durable. Shark gets more strikeouts that Stewart did, but everyone is striking guys out in today’s modern game. It’s like, you know, a thing. Samardzija has not had anywhere near the off-field trouble that Stewart had early in his career, but both are similar in that chance and circumstances conspired to keep them out of the rotation until relatively late along the age curve.

Samardzija does have velo. He is seventh in 4-seam speed for starters, at 94.5 mph. But speed doesn’t guarantee dominance: only two of the top ten WAR pitchers this year are also in the top 10 in velocity (King Felix and Garrett Richards). Two more have very modest velocities in the 90 mph range (Adam Wainwright and Rick Porcello). I’d rather have velocity than not, but past radar gun performance is no guarantee of future ace success. It’s a close call, but I think Samardzija probably isn’t an ace, even though some team is going to pay him like one. You should probably hope it isn’t your team, although there are worse mistakes your team could, and probably will, make this winter.

And if your team does ink the Shark, remember to leave a light on for Dave Stewart.


The Search for a Good Approach

Last week I explored the strategic effect of seeing more pitchers per plate appearance. I love the ten-pitch walk as much as the next guy, but what I love even more is seeing a guy be able to change that approach to beat a scouting report. Let’s take a look at June 5, 2014, when the A’s went to see Masahiro Tanaka for the first time. The first batter is Coco Crisp:

Pitcher
M. Tanaka
Batter
C. Crisp
Speed Pitch Result
1 91 Sinker Ball
2 90 Sinker Ball
3 91 Fastball (Four-seam) Ball
4 90 Fastball (Four-seam) Called Strike
5 91 Fastball (Four-seam) Foul
6 92 Fastball (Four-seam) In play, out(s)

So Crisp doesn’t get the best of Tanaka, but he makes Tanaka labor a bit through six pitches. If you’re going to make an out to start the game, it might as well be a long one. For the next batter, John Jaso, Tanaka decides to go right after him:

Pitcher
M. Tanaka
Batter
J. Jaso
Speed Pitch Result
1 90 Sinker In play, run(s)

I may be looking too deeply into the narrative here, but I love to imagine Tanaka getting a bit frustrated here. Perhaps the scouting report said that both Coco is aggressive early, while Jaso’s running 15% walk rates in 2012 and 2013 suggest that he’s more patient.  Tanaka has to throw six pitches in order to get Crisp out, but after deciding to go right after Jaso, he gets taken deep.

So I wondered if there are players who are able to fulfill both ends of this spectrum. Are there any players that are capable of prolonging their time at the plate until they see the pitch they want, but are also aggressive and willing enough to hit the gas on the first pitch? I used FanGraphs for the pitches/plate appearance data, but used baseball-reference’s play index to look up all instances of first-pitch hits this season. Originally I was going to use first-pitch swings, but I decided to just stick to times when the pitcher gets punished for trying to get ahead early. After all, if your decision is to get ahead early in the count, and the guy swings but all he does is foul it off or hit into an out, then that doesn’t change your approach as a pitcher. I wanted to see guys whom the book isn’t written on yet.  Advance Warning: These stats will be about a week old by the time you see them, as I am a slow, slow man.

Best P/PA Rank + FPH Rank (I have no idea how to pitch to them) FPH% P/PA FPHR PPAR FPHR + PPAR wOBA
Scott Van Slyke 5.940594059 4.143564356 26 45 71 0.385
Eric Campbell 4.2424242424 4.248520710 117 18 99 0.326
Jesus Guzman 4.294478528 4.17791411 111 33 144 0.247
Daniel Murphy 4.577464789 4.111842105 87 58 145 0.305
Joey Votto 4.044117647 4.334558824 135 12 147 0.359
Mark Reynolds 5.037783375 4.0375 59 91 150 0.307

(For Reference: FPH% = First Pitch Hit Percentage, or how often a batter gets a hit on the first pitch they see.  P/PA = Pitches per Plate Appearance. FPHR = First Pitch Hit Ranking, or how they rank in this category compared to the rest of the league.  PPAR = Pitches per Plate Appearance Ranking.  FPHR + PPAR = The addition of these two numbers.)

I like this table!  I have wondered at times what has caused Scott Van Slyke‘s resurgence this year. Perhaps this table gives us a bit of a clue.  Van Slyke is the only person in the MLB to rank in the top 50 in both FPHR and PPAR.  That’s pretty neat.  Daniel Murphy is also quite balanced, but he’s been much more consistent over the last few years.  He’s particularly interesting in that he doesn’t have a particularly high walk rate or strikeout rate.  I guess he’s just selective at times.  Jesus Guzman’s presence on this list goes to show that a good approach doesn’t necessarily mean success; it just means that he may not head back to the bench in any predictable fashion.  I stretched out the table one spot to include Mark Reynolds, because his name on this table makes me feel better about drafting him in Fantasy Baseball for past five years.

I also wanted to look at the flip-side.  Who are the guys who don’t tend to take a lot of pitches, but also don’t tend to make any decent contact on first pitches?

Highest P/PA Rank + FPH Rank (Pick your poison) FPH% P/PA FPHR PPAR FPHR+PPAR wOBA
Joaquin Arias 0.6451612903 3.55483871 370 400 770 0.221
Ben Revere 1.629327902 3.563636364 365 368 733 0.307
Endy Chavez 0.9345794393 3.674311927 321 393 714 0.301
Conor Gillaspie 2.168674699 3.587112172 359 329 688 0.353
Jean Segura 2.564102564 3.42462845 396 289 685 0.262

Here we have a much less impressive list.  Joaquin Arias has been one of the worst hitter in the majors this year, and his dominance atop this leaderboard makes a bit of sense.  However, Conor Gillaspie is having an excellent season for the Pale Hose, despite the fact that he doesn’t seem to excel in either of the areas this article is interested in.  One pecuilar note is that this group is pretty poor at hitting for power in general; these 5 guys have 13 home runs between them on the year, and six of those are Gillaspie’s.

So now let’s look at the weird ones.  I would think that it stands that if there are certain players who tend to take a lot of pitches and who also never seem to square up the first pitch, then we know our game plan.  Get ahead early on these batters.  We can try to view that by simply looking at each players FPH Ranking minus their PPA ranking.  This is the same at looking at the absolute value of their PPAR minus their FPAR.  Here are the top five in that respect:

Worst in FPHR, Best in PPAR (Groove it Early) FPH% P/PA FPHR PPAR FPHR-PPAR wOBA
Jason Kubel 1.136363636 4.471590909 387 4 383 0.278
Aaron Hicks 0.641025641 4.224358974 401 21 380 0.286
Mike Trout 1.217391304 4.418965517 385 6 379 0.401
Matt Carpenter 1.376936317 4.357264957 380 8 372 0.343
A.J. Ellis 1.181102362 4.255813953 386 17 369 0.264

Golly; I’ve figured out Mike Trout!  Mike Trout ranks very highly on our list of PPAR but is unfortunately relatively average when it comes to the first-pitch punish.  All of these guys actually fit this mold.  We have three relatively poor hitters accompanied by the best player in baseball and an above average infielder on a winning team.  So we can tell that being patient isn’t necessarily a good or bad thing; it’s just that hitter’s style.  Now let’s take a look at the reverse:

Best in FPHR, Worst in PPAR (Don’t throw it in the zone early) FPH% P/PA FPHR  PPAR PPAR-FPHR wOBA
Jose Altuve 8.159722222 3.175862069 5 407 402 0.355
Wilson Ramos 7.169811321 3.293680297 6 405 399 0.327
Erick Aybar 6.628787879 3.347091932 12 401 389 0.312
Ender Inciarte 8.360128617 3.471518987 3 391 388 0.284
A.J. Pierzynski 6.413994169 3.391930836 16 399 383 0.283

It’s always satisfying when the data shows what you expect it to.  I imagined Jose Altuve as being among the more aggressive hitters, and this shows that at least.  Altuve ranks 5th in the league in FPH% and is rather mediocre in the PPA category.  Interesting to see that this top five is also sorted by wOBA; Altuve is the best hitter on the list, and Pierzynski is the worst.  So there’s nothing necessarily wrong with an aggressive approach, but it does give us a clue as to a possible plan of attack.

So all this is to say, like my last article, that no particular approach is best.  One can look to swing at the first pitch, or one can be patient and wait for their pitch to come.  That said, everybody does have an approach, and that means they’ve got something they’re not looking for.  Stats like FPH and PPAR may just give us more clues as fans as to what teams put together with scouting reports.

So to conclude by going back to our first example, perhaps Tanaka should have read this data before his start against the A’s.  Coco ranks 266th in the league in FPHR, but a respectable 76th in PPAR.  Conversely, Jaso ranks 80th in the league in FPHR, but just 225th in PPAR.  Tanaka might have been better served by going after the aging Crisp and saving his energy for the somewhat aggressive Jaso.


Is Nolan Ryan Overrated by FIP?

Nolan Ryan was a singular pitcher. He’s unique in baseball history, so distinct that it’s hard to know where to start. I’m going to begin with the obvious: strikeouts. Nolan Ryan struck out 5,714 batters, 17% more than second-place Randy Johnson. Only 16 pitchers in history recorded half as many strikeouts as Nolan Ryan. He led his league in strikeouts 11 times, the most since Walter Johnson (12).

Ryan also walked the most batters in history — 2,795. Steve Carlton is second on that list, with 1,833. Ryan averaged 4.67 BB/9 and 12.4 BB%. Both figures are higher than anyone else who pitched even half as many innings. Ryan led his league in walks eight times.

Ryan also threw 277 wild pitches, most since 1900. He allowed 757 stolen bases, almost 40% more than second-place Greg Maddux. Ryan led AL pitchers in errors four times, and retired with a ghastly .895 fielding percentage. Joe Posnanski summed up Ryan’s career, “He’s the most extraordinary pitcher who ever lived, I think. But I also think he’s not especially close to the best.”

Nolan Ryan is unique, and it makes him hard to evaluate. Casual fans and the old-school crowd have always worshiped Nolan Ryan. His uniform number was retired by three different teams, and he was the leading vote-getter, among pitchers, for the MLB All-Century Team. He got more than twice as many votes as Walter Johnson. But when you really look at his stats, Ryan doesn’t come off well.

Take wins. Yes, the pitcher win, because this is surprising. In a career that spanned 26 seasons (not including 1966, when he had only one decision), Ryan only led his team in wins 7 times. Actually, it’s 5 times outright — 7 counts two years he tied for the lead. In 11 of his 27 seasons (41%), Ryan had a lower winning percentage than the team. He lost more games (292) than anyone but Cy Young and Walter Johnson. What about ERA? Ryan led his league in ERA twice, but in one of those years, he went 8-16. The other year, strike-shortened 1981, he didn’t lead the league in strikeouts, but did lead the majors in wild pitches (16). His 1.25 WHIP ranks 278th all-time. Ryan never won a Cy Young Award and never finished among the top 10 in MVP voting.

They say a little knowledge is a dangerous thing. When you look at stats like wins and ERA, Ryan looks more like a good pitcher than a great one. He’s almost a compiler, just a guy who played forever, rather than a true standout. Then you look at FIP. Ryan had a FIP of 2.97 (84 FIP-), and he pitched 5,386 innings, giving him 106.6 WAR. By FIP, Nolan Ryan is the 6th-most valuable pitcher of all time: Roger Clemens, Cy Young, Walter Johnson, Greg Maddux, Randy Johnson, Nolan Ryan.

I suspect the percentage of FanGraphs readers who believe Nolan Ryan was one of the six best pitchers ever is south of 5%, maybe less than 1%. He rates considerably worse by RA9-WAR, 89.5 instead of 106.6, 25th all-time. Even that would seem high to many stat-oriented fans. It’s better than Bob Feller, basically equal to Pedro Martinez. Ryan also ranks 20th in rWAR (83.8), again much lower than when judged by FIP.

I gave this post a stupid title, with an obvious answer. Is Nolan Ryan overrated by FIP? Yes, clearly. His ERA was 20 points higher — in a 28-year, 807-game, 5,400-inning career. I think the numbers stabilize before 5,000 innings. Ryan’s RA9-WAR is 17 points lower than his fWAR, the biggest deficit of any pitcher in history. Ryan is overrated by FIP. That’s not a major revelation. The interesting question is why Nolan Ryan is overrated by FIP — and whether he is underrated by RA and ERA.
Read the rest of this entry »