Archive for Research

When Should I Steal?

The Stolen Base

Some consider the stolen base a “lost art.” Gone are the days of Vince Coleman’s back-to-back-to-back 100+ stolen base seasons of Whitey-ball folklore. Teams are stealing at the lowest rates (per game) since the 1950’s.

Stolen Bases by Year

Aside from the 2011 outlier, stolen base rates have trended downward at a serious pace, but stolen bases still have their place in the game, especially in increasingly shrinking run environments, but at what point is the value added from a stolen base worth the risk of an out?

Run Expectancy

Tom Tango’s handy-dandy run expectancy chart can give us this answer. In his run expectancy matrix, we can see how run expectancy can change from one state to another from a series of events. The basic guide that saberists abide by is that you should be able to steal bases twice as much as you get caught trying to steal to break even in expected runs, but every situation is different. With runners on first and third and two outs, you would actually have to steal bases at an almost 6:1 ratio to break even.

This is because of three factors: you are not adding any value to the runner that is already on third, making an out takes the bat out of someone’s hands, and making an out with someone already in scoring position is the most detrimental kind of out. Also, in any given situation, you are facing a battery with different characteristics. Stealing a base off of Kyle Lohse and Yadier Molina was nearly impossible back in 2011. On the other hand, stealing a base off of John Lackey and Jarrod Saltalamacchia would have been a lot easier. Accounting for the risk of your own baserunner, the defense, league rates, and base-out situation will lead to the most informed decision.

In the tool below, begin by picking your situation (the strings go: out, first base, second base, third base where “x” means no runner and a number means a runner occupies that base e.g. 0x2x means no outs and runner on second base). Then evaluate your baserunner’s steal rate against an average opponent (Steamer’s updated projection gives Kolten Wong a 21/24 chance of stealing a base). After that, evaluate your opponent’s steal rate against (lefty or righty pitcher, strong armed catcher). Then plug in the league average steal rate, and you should have an expected stolen base percentage for your given situation and the given change in run expectancy (RE24).

LINK


Billy Butler In: The Good, The Slightly Above Average, And The Ugly

For the past two years or so, Kansas City has been torn about breakfast… Billy “Big Country Breakfast” Butler that is. During this past offseason there were many rumors that the Royals were going to trade him and it seemed inevitable upon entering talks with then free agent Carlos Beltran. Billy Butler is part of the home-grown youth movement in Kansas City with Alex Gordon, and later followed by Salvy Perez, Mike Moustakas, Eric Hosmer, and company. From 2009 through 2013, Billy Butler has offensively been above average, and even great! However, after failing to meet expectations last year, and in some opinion already being in decline at the age of 28, Billy came out and struggled mightily to start the 2014 season.

But he has turned it around somewhat, and with the Royals making headlines this August, Big Country played a big part. So I wanted to look at what he did differently comparing his April dud, to his career average, and to his being a stud again in August. We will measure his overall offensive prowess with WRC+, which in this study would be 50 in March/April, 118 for his career average, and 126 in August. So let’s look at the more telling processing stats.

Split BB% K% BB/K BABIP GB/FB LD% GB% FB% HR/FB
April 8.3% 18.3% 0.45 0.275 2.82 18.8% 60.0% 21.3% 0.0%
Career Average 8.9% 14.4% 0.62 0.325 1.51 19.9% 48.3% 31.9% 11.1%
August 5.8% 13.2% 0.44 0.308 1.35 23.2% 44.2% 32.6% 12.9%

 

One of the first things to pop out at you is the BB/K ratio. While under his career margin (and by a decent margin too), his BB/K rate is nearly the exact same in April and August. A lot of times credit for a hitter’s success is given to an increase in the BB% and decrease in the K%, but here Butler cuts down on both, therefore increasing the amount of balls he puts into play bringing us to BABIP. Both his April and August are way below his career norms. Perhaps dealing with a little unluckiness? Or just weak contact? Fact is even with his BABIP down and his home run rate relatively consistent he can still create above average production.

Now comes the most telling rate, which is the type of balls that he hits. As someone who is an AL DH, Billy Butler is not only expected to hit, but to slug. That big goose egg for HR’s in April is just an absolute killer, and the culprit is the GB%. It is no wonder why a big, SLOW (we all know about his base running and uncanny attraction to double plays), gap to gap power hitter has one of the worst months of his career considering his GB% is up almost 12% and his FB% is down nearly 10%. Billy Butler will never be Aoki. He has to get the ball in the air. He lives on hitting doubles into the deep gaps at Kauffman Stadium and with ratios such as those it is no surprise he puts up a WRC+ of 50.

When your BB/K ratio is so nearly identical but yet you put up such drastically different numbers, not to mention the fluctuations in his BABIP, it has to come back to his swing mechanics and getting to a consistently good contact position where he can drive the ball.

 

Split O-Swing% Z-Swing% Swing% O-Contact% Z-Contact% Contact% Zone% F-Strike % SwStr%
April 30.0% 58.6% 43.7% 77.4% 92.9% 87.4% 48.0% 57.8% 5.5%
Career Average 28.0% 63.0% 44.3% 69.4% 90.0% 83.1% 46.7% 56.0% 7.2%
August 37.8% 62.1% 49.5% 70.1% 91.5% 83.1% 48.2% 71.9% 8.5%

Billy’s discipline at the plate has been waning. But the month he really lacked discipline is the same month he did so well in: August. In April he was within his career norms for all of his discipline stats except O-Contact%. Overall he was swinging less and missing less. And that is where the problem may lie! It is not so much that he was struggling with pitch selection, because clearly he was even worse with discipline in August, but the fact that he didn’t miss when he swung.

In a sense Butler was too good at making contact! With his swinging percentage up along with increasingly bad pitch selection, the higher his swinging strike percentage, the better! And perhaps with his swing percentage, his first pitch strike percentage, and his O-Swing percentage all up, he has changed to a more aggressive approach? Again all of this can lead back to the assumption of Butler making poor contact in April. Which leads to the question of what has he done differently, if anything, with his swing?

Split Fastball % Slider % Cutter % Curveball % Changeup % Splitfinger %
April 52.5% 19.5% 8.5% 10.3% 8.8% 0.5%
Career Average 56.3% 18.1% 5.6% 8.6% 9.9% 1.0%
August 50.4% 22.9% 8.3% 9.5% 8.5% 0.7%

 

 

Split Fastball % wFB/c wSL/c wCT/c wCB/c wCH/c wSF/c
April 52.5% -2.45 -0.92 0.56 1.96 0.86 -11.47
Career Average 56.3% 1.09 -0.81 0.16 0.29 0.16 -1.45
August 50.4% 2 1.89 -1.74 -5.1 -2.11 25.04

 

Now the main reason I bring these stats up is that I am a huge believer in fastball hunting. These charts may not be the most reliable in telling of pitch selection, but they do tell you if he has been seeing certain pitches better and the rates at which he has been seeing pitches.  So I wanted to look closely at his fastball rate in particular just to see if there was anything funky going on. And what was so funky is that in August he was crushing it! The more fastballs you see the better chance you have to hit well. While I am not sure of the exact quantity of fastballs he faced, for the most part he has been seeing the same consistent rate of different pitches he always has and he definitely has done one of his better jobs of taking advantage of the fastballs he has seen. Can a correlation be made between his April failures and August success against fastballs to a possible new approach and/or adjustment in his swing mechanics? Or just unlucky, bad contact?

After searching through the KC Star (hometown newspaper) as well as other media report outlets, I have not been able to find much of anything indicating adjustments being made. There was some talk of just his timing being off, but other than that there are not many clues. I wish I knew how to make video clips of swings and find a couple angles of Billy Butler’s swing in April compared to his swing in August and dissect them both. I would like to see what, if anything, is different. If we could see his timing and especially his bat path, I believe we can tell a lot about what he is doing wrong or right. If anyone can provide those, or teach how to make them, please do and send to me!

However, going off of what I have seen here, everything to me points back to weak contact consistently being made. Whether due to timing or mechanics, I am not sure. Normally I would say this is due to poor pitch selection, but as I showed above, he had even worst discipline and pitch selection in August than April and still put up very stellar numbers. To be clear hard contact is not good enough for a player of Billy Butler’s style. He NEEDS to get air under his pitch. Now they say that this is a game of adjustments. I would love to know what, if any, adjustments Billy “Big Country Breakfast” Butler has made. After all, could it really have just been a string of bad luck?


Run Distribution Using the Negative Binomial Distribution

In this post I use the negative binomial distribution to better model the how MLB teams score runs in an inning or in a game. I wrote a primer on the math of the different distributions mentioned in the post for reference, and this post is divided to a baseball-centric section and a math-centric section.

The Baseball Side

A team in the American League will average .4830 runs per inning, but does this mean they will score a run every two innings? This seems intuitive if you apply math from Algebra I [1 run / 2 innings ~ .4830 runs/inning]. However, if you attend a baseball game, the vast majority of innings you’ll watch will be scoreless. This large number of scoreless innings can be described by discrete probability distributions that account for teams scoring none, one, or multiple runs in one inning.

Runs in baseball are considered rare events and count data, so they will follow a discrete probability distribution if they are random. The overall goal of this post is to describe the random process that arises with scoring runs in baseball. Previously, I’ve used the Poisson distribution (PD) to describe the probability of getting a certain number of runs within an inning. The Poisson distribution describes count data like car crashes or earthquakes over a given period of time and defined space. This worked reasonably well to get the general shape of the distribution, but it didn’t capture all the variance that the real data set contained. It predicted fewer scoreless innings and many more 1-run innings than what really occured. The PD makes an assumption that the mean and variance are equal. In both runs per inning and runs per game, the variance is about twice as much as the mean, so the real data will ‘spread out’ more than a PD predicts.

Negative Binomial Fit

The graph above shows an example of the application of count data distributions. The actual data is in gray and the Poisson distribution is in yellow. It’s not a terrible way to approximate the data or to conceptually understand the randomness behind baseball scoring, but the negative binomial distribution (NBD) works much better. The NBD is also a discrete probability distribution, but it finds the probability of a certain number of failures occurring before a certain number of successes. It would answer the question, what’s the probability that I get 3 TAILS before I get 5 HEADS when I continue to flip a coin. This doesn’t at first intuitively seem like it relates to a baseball game or an inning, but that will be explained later.

From a conceptual stand point, the two distributions are closely related. So if you are trying to describe why 73% of all MLB innings are scoreless to a friend over a beer, either will work. I’ve plotted both distributions for comparison throughout the post. The second section of the post will discuss the specific equations and their application to baseball.

Runs per Inning

Because of the difference in rules regarding the designated hitter between the two different leagues there will be a different expected value [average] and variance of runs/inning for each league. I separated the two leagues to get a better fit for the data. Using data from 2011-2013, the American League had an expected value of 0.4830 runs/inning with a 1.0136 variance, while the National League had 0.4468 runs/innings as the expected value with a .9037 variance. [So NL games are shorter and more boring to watch.] Using only the expected value and the variance, the negative binomial distribution [the red line in the graph] approximates the distribution of runs per inning more accurately than the Poisson distribution.

Runs Per Inning -- 2011-2013

It’s clear that there are a lot of scoreless innings, and very few innings having multiple runs scored. The NBD allows someone to calculate the probability of the likelihood of an MLB team scoring more than 7 runs in an inning or the probability that the home team forces extra innings down by a run in the bottom of the 9th. Using a pitcher’s expected runs/inning, the NBD could be used to approximate the pitcher’s chances of throwing a no-hitter assuming he will pitch for all 9 innings.

Runs Per Game

The NBD and PD can be used to describe the runs scored in a game by a team as well. Once again, I separated the AL and NL, because the AL had an expected run value of 4.4995 runs/game and a 9.9989 variance, and the NL had 4.2577 runs/game expected value and 9.1394 variance. This data is taken from 2008-2013. I used a larger span of years to increase the total number of games.

Runs Per Game 2008-2013

Even though MLB teams average more than 4 runs in a game, the single most likely run total for one team in a game is actually 3 runs. The negative binomial distribution once again modeled the empirical distribution well, but the PD had a terrible fit when compared to the previous graph. Both models, however, underestimate the shut-out rate. A remedy for this is to adjust for zero-inflation. This would increase the likelihood of getting a shut out in the model and adjust the rest of the probabilities accordingly. An inference of needing zero-inflation is that baseball scoring isn’t completely random. A manager is more likely to use his best pitchers to continue a shut out rather than randomly assign pitchers from the bullpen.

Hits Per Inning

It turns out the NBD/PD are useful with many other baseball statistics like hits per inning.

Hits Per Inning 2011-2013

The distribution for hits per inning are slightly similar to runs per inning, except the expected value is higher and the variance is lower. [AL: .9769 hits/inning, 1.2847 variance | NL: .9677 hits/inning, 1.2579 variance (2011-2013)] Since the variance is much closer to the expected value, hits per inning has more values in the middle and fewer at the extremes than the runs per inning distribution.

I could spend all day finding more applications of the NBD and PD, because there are really a lot of examples within baseball. Understanding these discrete probability distributions will help you understand how the game works, and they could be used to model outcomes within baseball.

The Math Side

Hopefully, you skipped down to this section right away if you are curious about the math behind this. I’ve compiled the numbers used in the graphs for the American League for those curious enough to look at examples of the actual values.

The Poisson distribution is given by the equation:

There are two parameters for this equation: expected value [λ] and the number of runs you are looking to calculate [x]. To determine the probability of a team scoring exactly three runs in a game, you would set x = 3 and using the AL expected runs per game you’d calculate:

This is repeated for the entire set of x = {0, 1, 2, 3, 4, 5, 6, … } to get the Poisson distribution used through out the post.

One of the assumption the PD makes is that mean and the variance are equal. For these examples, this assumption doesn’t hold true, so the empirical data from actual baseball results doesn’t quite fit the PD and is overdispersed. The NBD accounts for the variance by including it in the parameters.

The negative binomial distribution is usually symbolized by the following equation:

where r is the number of successes, k is the number of failures, and p is the probability of success. A key restriction is that a success has to be the last event in the series of successes and failures.

Unfortunately, we don’t have a clear value for p or a clear concept on what will be measured, because the NBD measures the probability of binary, Bernoulli trials. It’s helpful to view this problem from the vantage point of the fielding team or pitcher, because a SUCCESS will be defined as getting out of the inning or game, and a FAILURE will be allowing 1 run to score. This will conform to the restriction by having a success [getting out of the inning/game] being the ultimate event of the series.

In order to make this work the NBD needs to be parameterized differently for mean, variance, and number of runs allowed [failures]. The NBD can be written as

where

Hits Per Inning 2011-2013

So using the same example as the PD distribution, this would yield:

The above equations are adapted from this blog about negative binomials and this one about applying the distribution to baseball. The Γ function used in the equation instead of a combination operator because the combination operator can’t handle the non-whole numbers we are using to describe the number of successes.

Conclusion

The negative binomial distribution is really useful in modeling the distribution of discrete count data from baseball for a given inning or game. The most interesting aspect of the NBD is that a success is considered getting out of the inning/game, while a failure would be letting a run score. This is a little counterintuitive if you approach modeling the distribution from the perspective of the batting team. While the NBD has a better fit, the Poisson distribution has a simpler concept to explain: the count of discrete events over a given period of time, which might make it better to discuss over beers with your friends.

The fit of the NBD suggests that run scoring is a negative binomial process, but inconsistencies especially with shut outs indicate elements of the game aren’t completely random. I’m explaining the underestimation of the number of shut outs as the increase use of the best relievers in shut out games over other games increasing the total number of shut outs and subsequently decreasing the frequency of other run-total games.

All MLB data is from retrosheet.org. It’s available free of charge from there. So please check it out, because it’s a great data set. If there are any errors or if you have questions, comments, or want to grab a beer to talk about the Poisson distribution please feel free to tweet me @seandolinar.


Pitch Win Values for Starting Pitchers — August 2014

Introduction

A couple months back, I introduced a new method of calculating pitch values using a FIP-based WAR methodology.  That post details the basic framework of these calculations and  can be found here .  The May, June, and July updates can be found herehere, and here respectively.  This post is simply the August 2014 update of the same data.  What follows is predominantly data-heavy but should still provide useful talking points for discussion.  Let’s dive in and see what we can find.  Please note that the same caveats apply as previous months.  We’re at the mercy of pitch classification.  I’m sure your favorite pitcher doesn’t throw that pitch that has been rated as incredibly below average, but we have to go off of the data that is available.  Also, Baseball Prospectus’s PitchF/x leaderboards list only nine pitches (Four-Seam Fastball, Sinker, Cutter, Splitter, Curveball, Slider, Changeup, Screwball, and Knuckleball).  Anything that may be classified outside of these categories is not included.  Also, anything classified as a “slow curve” is not included in Baseball Prospectus’s curveball data.

Constants

Before we begin, we must first update the constants used in calculation for August.  As a refresher, we need three different constants for calculation: strikes per strikeout, balls per walk, and a FIP constant to bring the values onto the right scale.  We will tackle them each individually.

First, let’s discuss the strikeout constant.  In August, there were 52,238 strikes thrown by starting pitchers.  Of these 52,238 strikes, 4,887 were turned into hits and 15,293 outs were recorded.  Of these 15,293 outs, 4,118 were converted via the strikeout, leaving us with 11,175 ball-in-play outs.  11,175 ball-in-play strikes and 4,887 hits sum to 16,062 balls-in-play.  Subtracting 16,062 balls-in-play from our original 52,238 strikes leaves us with 36,176 strikes to distribute over our 4,118 strikeouts.  That’s a ratio of 8.78 strikes per strikeout.  This is slightly lower than our from 8.82 strikes per strikeout in June and July, meaning batters were slightly easier to strikeout in August.

The next two constants are much easier to ascertain.  In August, there were 28,957 balls thrown by starters and 1,521 walked batters.  That’s a ratio of 19.04 balls per walk, down from 19.76 balls per walk in August.  This data would suggest that hitters were more likely to walk in August than previously.  The FIP subtotal for all pitches in August was 0.48.  The MLB Run Average for August was 4.12, meaning our FIP constant for  is 3.65.

Constant Value
Strikes/K 8.78
Balls/BB 19.04
cFIP 3.65

The following table details how the constants have changed month-to-month.

Month K BB cFIP
March/April 8.47 18.50 3.68
May 8.88 18.77 3.58
June 8.82 19.36 3.59
July 8.82 19.76 3.65
August 8.78 19.04 3.65

Pitch Values – August 2014

For reference, the following table details the FIP for each pitch type in the month of August.

Pitch FIP
Four-Seam 4.03
Sinker 4.17
Cutter 4.14
Splitter 4.48
Curveball 4.21
Slider 4.15
Changeup 4.47
Screwball 2.22
Knuckleball 4.56
MLB RA 4.12

As we can see, only two pitches would be classified as above average for the month of August: four-seam fastballs and screwballs.  Sinkers, cutters, and sliders also came in right around league average.  Pitchers that were able to stand out in other categories tended to have better overall months than pitchers who excelled at the these pitches.  Now, let’s proceed to the data for the month of August.

Four-Seam Fastball

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Chris Tillman 0.7 183 Sean O’Sullivan -0.2
2 Jose Quintana 0.6 184 John Danks -0.2
3 Phil Hughes 0.6 185 Anthony Ranaudo -0.3
4 Max Scherzer 0.6 186 Jason Hammel -0.3
5 Madison Bumgarner 0.5 187 Stephen Strasburg -0.4

Sinker

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Mike Leake 0.5 169 Shelby Miller -0.2
2 Rick Porcello 0.4 170 Travis Wood -0.2
3 Kyle Hendricks 0.4 171 Mat Latos -0.3
4 Dallas Keuchel 0.3 172 Tsuyoshi Wada -0.3
5 Jimmy Nelson 0.3 173 Kyle Kendrick -0.3

Cutter

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Jarred Cosart 0.6 74 Scott Carroll -0.1
2 Josh Collmenter 0.4 75 Jorge de la Rosa -0.1
3 Corey Kluber 0.3 76 J.A. Happ -0.1
4 James Shields 0.3 77 Kevin Correia -0.2
5 Jerome Williams 0.2 78 Dan Haren -0.2

Splitter

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Alex Cobb 0.4 26 Miguel Gonzalez -0.1
2 Mat Latos 0.2 27 Hisashi Iwakuma -0.1
3 Alfredo Simon 0.1 28 Felix Hernandez -0.1
4 Hiroki Kuroda 0.1 29 Jorge de la Rosa -0.1
5 Kyle Kendrick 0.1 30 Tim Hudson -0.2

Curveball

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Alex Wood 0.3 157 James Shields -0.2
2 Brandon McCarthy 0.3 158 Jesse Hahn -0.2
3 Adam Wainwright 0.3 159 Max Scherzer -0.2
4 Clay Buchholz 0.2 160 Zack Greinke -0.3
5 Scott Feldman 0.2 161 Nick Martinez -0.3

Slider

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Clayton Kershaw 0.4 123 Dallas Keuchel -0.2
2 Chris Archer 0.3 124 Scott Baker -0.2
3 Tyler Matzek 0.3 125 Rubby de la Rosa -0.2
4 Collin McHugh 0.3 126 Bartolo Colon -0.2
5 Kyle Gibson 0.2 127 Rafael Montero -0.2

Changeup

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Chris Capuano 0.4 154 Jon Niese -0.2
2 Jeremy Guthrie 0.3 155 Henderson Alvarez -0.2
3 Roberto Hernandez 0.2 156 Zack Greinke -0.2
4 David Price 0.2 157 Brad Peacock -0.3
5 Max Scherzer 0.2 158 Brad Hand -0.4

Screwball

Rank Pitcher Pitch Value
1 Trevor Bauer 0.0

Knuckleball

Rank Pitcher Pitch Value
1 R.A. Dickey 0.1

Overall

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Alex Cobb 0.9 186 Jason Hammel -0.2
2 Jordan Zimmermann 0.8 187 Justin Masterson -0.2
3 Corey Kluber 0.8 188 Sean O’Sullivan -0.3
4 Jarred Cosart 0.8 189 Kyle Lohse -0.4
5 Collin McHugh 0.8 190 Brad Hand -0.4

Pitch Ratings – August 2014

Four-Seam Fastball

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Jose Quintana 59 87 Vance Worley 39
2 Brad Peacock 59 88 Stephen Strasburg 37
3 Michael Pineda 59 89 Justin Masterson 36
4 Phil Hughes 58 90 Anthony Ranaudo 35
5 Franklin Morales 58 91 John Danks 35

Sinker

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Rick Porcello 58 68 Travis Wood 37
2 Jake Arrieta 58 69 Kyle Kendrick 36
3 Gio Gonzalez 57 70 John Lackey 35
4 J.A. Happ 57 71 Mat Latos 35
5 Marcus Stroman 57 72 Tsuyoshi Wada 33

Cutter

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Franklin Morales 58 27 Brandon McCarthy 43
2 Corey Kluber 58 28 Jake Peavy 40
3 James Shields 58 29 Ryan Vogelsong 39
4 Jerome Williams 57 30 Dan Haren 38
5 Tim Hudson 56 31 Kevin Correia 33

Splitter

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Mat Latos 58 7 Matt Shoemaker 50
2 Alex Cobb 56 8 Jake Odorizzi 49
3 Kyle Kendrick 55 9 Jorge de la Rosa 45
4 Tsuyoshi Wada 54 10 Kevin Gausman 42
5 Alfredo Simon 54 11 Hisashi Iwakuma 41

Curveball

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Felix Hernandez 60 66 Dillon Gee 37
2 Brandon McCarthy 58 67 Scott Carroll 37
3 Jacob deGrom 58 68 James Shields 33
4 Brandon Workman 57 69 Jesse Hahn 24
5 Jeremy Hellickson 57 70 Max Scherzer 22

Slider

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Max Scherzer 59 54 Tanner Roark 40
2 Wei-Yin Chen 59 55 Kyle Lohse 38
3 Jordan Zimmermann 59 56 Vance Worley 37
4 Corey Kluber 59 57 Dallas Keuchel 35
5 Tyler Matzek 58 58 Tim Lincecum 27

Changeup

Rank Pitcher Pitch Rating Rank Pitcher Pitch Rating
1 Chris Capuano 58 59 Wade Miley 38
2 Roberto Hernandez 58 60 Robbie Ray 36
3 Allen Webster 57 61 Trevor May 32
4 Yohan Flande 57 62 Zack Greinke 28
5 Jeremy Guthrie 57 63 Jon Niese 28

Screwball

Rank Pitcher Pitch Rating
1 Trevor Bauer 59

Knuckleball

Rank Pitcher Pitch Rating
1 R.A. Dickey 49

Monthly Discussion

As we can see, Alex Cobb takes the top for this month mainly due to the  strength of his sinker and splitter.  Cobb was classified as throwing four different pitches in August (Four-Seam, Sinker, Splitter, and Curveball) and managed to earn at least 0.1 WAR from all four.  The most valuable pitch overall in August was Chris Tillman’s Four-Seam Fastball.  The least valuable was Stephen Strasburg’s Four-Seam Fastball.  As far as offspeed pitches, Chris Capuano’s 0.4 WAR from his changeup lead the way.  The least valuable offspeed pitch was Brad Hand’s slider.

On our 20-80 scale pitch ratings, the highest rated qualifying pitch was Felix Hernandez’s curveball.  The lowest rated pitch was the curveball thrown by Max Scherzer.  The highest rated fastball was Jose Quintana’s four-seam fastball.  The lowest rated fastball was Tsuyoshi Wada’s sinker.

Pitch Values – 2014 Season

Four-Seam Fastball

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Jose Quintana 2.4 262 Dan Straily -0.3
2 Ian Kennedy 2.4 263 Edwin Jackson -0.3
3 Phil Hughes 2.2 264 Masahiro Tanaka -0.4
4 Jordan Zimmermann 2.1 265 Juan Nicasio -0.4
5 Chris Tillman 1.9 266 Marco Estrada -0.7

Sinker

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Charlie Morton 1.7 251 Mike Pelfrey -0.3
2 Dallas Keuchel 1.4 252 Dan Straily -0.3
3 Chris Archer 1.3 253 John Danks -0.3
4 Mike Leake 1.3 254 Wandy Rodriguez -0.3
5 Felix Hernandez 1.2 255 Andrew Heaney -0.4

Cutter

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Jarred Cosart 1.8 118 Felipe Paulino -0.2
2 Corey Kluber 1.5 119 C.J. Wilson -0.3
3 Madison Bumgarner 1.4 120 Dan Haren -0.3
4 Josh Collmenter 1.4 121 Hector Noesi -0.4
5 Adam Wainwright 1.3 122 Brandon McCarthy -0.6

Splitter

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Alex Cobb 1.0 35 Jake Peavy -0.1
2 Masahiro Tanaka 0.8 36 Franklin Morales -0.2
3 Hiroki Kuroda 0.7 37 Danny Salazar -0.2
4 Hisashi Iwakuma 0.5 38 Miguel Gonzalez -0.3
5 Kyle Kendrick 0.4 39 Clay Buchholz -0.3

Curveball

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Sonny Gray 1.1 225 Homer Bailey -0.2
2 A.J. Burnett 1.1 226 Josh Collmenter -0.2
3 Brandon McCarthy 1.0 227 Franklin Morales -0.3
4 Adam Wainwright 1.0 228 Felipe Paulino -0.3
5 Felix Hernandez 0.8 229 Eric Stults -0.5

Slider

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Garrett Richards 1.5 192 Liam Hendriks -0.2
2 Tyson Ross 1.2 193 Rafael Montero -0.3
3 Chris Archer 1.0 194 Danny Salazar -0.3
4 Corey Kluber 1.0 195 Erasmo Ramirez -0.4
5 Jordan Zimmermann 1.0 196 Travis Wood -0.5

Changeup

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Felix Hernandez 0.8 245 Wandy Rodriguez -0.4
2 Stephen Strasburg 0.8 246 Jordan Zimmermann -0.4
3 Roberto Hernandez 0.7 247 Matt Cain -0.4
4 Cole Hamels 0.7 248 Marco Estrada -0.6
5 Chris Sale 0.6 249 Drew Hutchison -0.7

Screwball

Rank Pitcher Pitch Value
1 Trevor Bauer 0.1
2 Alfredo Simon 0.0
3 Hector Santiago 0.0
4 Julio Teheran 0.0

Knuckleball

Rank Pitcher Pitch Value
1 R.A. Dickey 1.3
2 C.J. Wilson 0.0

Overall

Rank Pitcher Pitch Value Rank Pitcher Pitch Value
1 Corey Kluber 3.7 270 David Holmberg -0.4
2 Adam Wainwright 3.6 271 Felipe Paulino -0.5
3 Garrett Richards 3.5 272 Juan Nicasio -0.5
4 Jose Quintana 3.4 273 Wandy Rodriguez -0.8
5 Felix Hernandez 3.3 274 Marco Estrada -1.2

Year-to-Date Discussion

If we look at the year-to-date numbers, Indians ace and Cistulli favorite Corey Kluber has claimed the top spot.  Current MLB FIP and WAR leader Clayton Kershaw ranks eighth, with every pitcher ranked above him having made at least three more starts.  The least valuable starter has been Marco Estrada.  On a per-pitch basis, the most valuable pitch has been Jose Quintana’s four-seam fastball.  The most valuable offspeed pitch has been Garrett Richards’s slider.  The least valuable pitch has been Marco Estrada’s four-seam fastball.  The least value offspeed pitch has been Drew Hutchison’s changeup.


Cat Days of Summer: The Tigers and Schedule Effects

If you’ve been on the internet in the last few weeks (or within earshot of a Michigander) you may have heard about the Tigers. Specifically, you may have heard about how the odds in favor of a Detroit appearance in the 2014 ALDS dropped from 21-to-1 on July 25 to under break-even by August 23 before a slight rebound to finish out the month. Even more specifically, you may have read Mike Petriello’s article about that on this very website. Or at the very least, you may have heard their struggles described in a less quantitative fashion. Regardless, the month of August was not kind to the Bengals.

As Petriello pointed out, this has been less of a Tigers collapse than a Royals surge. But there’s still something to the idea that the Tigers were playing worse in August than they had been previously. Let’s start with the basics:

2014 First Half August
R/G 4.80 4.58
RA/G 4.25 4.74
W% .582 .516
Pythagenpat .557 .484

In August, the Tigers scored fewer runs, allowed more runs, and won fewer games than in the first half. On some level, that’s all that really matters. On another level, something else is different about August for these Tigers.

Back on July 14, Buster Olney and Jeff Sullivan both wrote articles about schedule strength. Olney called the Tigers’ schedule the second-most difficult of 17 “contending” teams (paywall), while Sullivan said it was the easiest in all of MLB. One of the key reasons for the discrepancy was that Sullivan was using projections to determine the difficulty of a particular opponent, while Olney was using actual results. Score one for Sullivan. Another key difference was that as of July 14, the Tigers were about to play 55 games in 56 days, which did not factor into Sullivan’s analysis.

A point for Olney? Perhaps. But first, what would we expect to see if this was a result of schedule fatigue? Or put another way, which groups of players might be hurt most or least by not having a day off? Based on conventional wisdom, the bullpen would probably be the most affected, and the starters the least. So how does this match up to the Tigers? Read the rest of this entry »


Mike Trout and the MVP

In 2012 and 2013, Mike Trout was considered by most in the sabermetric community to be the most valuable player in the American League.  That Miguel Cabrera ended up winning in both years was the source of much debate and consternation, to say the least.  Analytically-inclined fans and writers were fed up, frustrated, and outright angry with the “old school” writers voting for Cabrera based on a different set of values.  Now, in an amusing twist, it appears that this year Trout has his best chance yet to wind up with the award, in large part by having a season that is less aligned with what the sabermetric community values, and more aligned with what the majority of the voting population values.  I took a look at the changes in various aspects of Trout’s game and analyzed how the regressions/improvements will impact his candidacy, based on what voters traditionally have cared about.

Defense

A large part of Trout’s previous MVP candidacy (particularly in 2012) centered on his defense — an area that traditionally has had fewer metrics to quantify a player’s value (as compared to say, hitting).  In 2012, DRS had Trout as worth 21 runs above average; UZR had him at 13.3.

In 2013, Trout’s defensive value declined to the point where he was worth -9 runs by DRS and +4.4 runs by UZR.  This discrepancy was a major reason why Baseball-Reference’s DRS-based WAR for Trout was 8.9 while FanGraphs’ UZR-based WAR was 10.5.

This year, Trout’s worth -6 by DRS and -7.2 by UZR.

In actuality, it didn’t take a rocket scientist to predict this regression; Trout’s arm has been consistently slightly below average, and his range ended up over-contributing in 2012 thanks to a handful of plays that broke his way.  Interestingly enough, the sabermetric crowd didn’t call any attention to this detail in 2012, choosing instead to use Trout’s defensive numbers to bolster their MVP case; now this year they’re bending over backwards to try to discredit Alex Gordon’s defensive numbers so they can justify giving the MVP to Trout as they’ve hoped to be able to do all season long…but that’s a post for a different day.

Baserunning

Likewise in 2012, Trout’s baserunning was valued at 12 runs above average, which included his other-worldly 49 SB and 5 CS.  In 2013, his baserunning added 8.1 runs, including 33 SB and 7 CS — still a great 82.5% success rate.

This year, Trout’s been worth all of 1.5 runs on the bases, with just 13 SB and 2 CS.

Hitting

Trout’s offense is down slightly, but not nearly to the extent that his defense and baserunning have been.  Like his defense, this regression was fairly predictable, given Trout’s unsustainably high BABIP in 2012 and 2013.  His OPS is down to 0.934 compared to 0.963 and 0.988 in 2012 and 2013, but he still has plenty else to hang his hat on: he leads the league in total bases; he’s already hit 30 homers, a total he hasn’t surpassed before; and, with 94 RBIs, he’ll easily pass that magical/meaningless 100 threshold soon as well.  The voters as a whole still like HRs, RBIs, and round numbers.

Clutch Hitting

In previous years, Trout was criticized (at least by me!) for not getting hits in key situations.  Here are Trout’s offensive splits with Bases Empty versus with Runners on Base:

 Year  Split  BABIP  OPS  tOPS+
 2012  Empty  0.403  0.985
 2012  RoB  0.343  0.917  90
 2013  Empty  0.399  1.023
 2013  RoB  0.339  0.934  90
 2014  Empty  0.343  0.916
 2014  RoB  0.348  0.944  104

In 2012-2013, he performed significantly worse with runners on.  Presumably most folks here would no doubt cling to the notion that this is entirely luck, and that sequencing like this is entirely unpredictable and out of players’ control.  I argue that even if so, if we’re talking about how much value a player added to his team in a given year, he’s adding more value in years when he gets clutch hits than in years when he doesn’t.  And this year, he’s actually reversed the trend.  His 2014 WPA of 5.52 has already exceeded his 2012 and 2013 marks of 5.32 and 4.60.

The Field

Fortunately for Trout this year, there haven’t been many other position players giving him a run for his money.  Josh Donaldson has cooled off as expected after a hot start.  Alex Gordon’s case is even more heavily dependent on defensive metrics than Trout’s was in 2012, and I don’t see many voters slotting him above Trout.  After that, I just don’t see the award going to Robinson Cano or Kyle Seager (the only other 2 AL players in the top 10 for position player WAR as of this writing), unless Cano truly catches fire in September and leads the Mariners to the playoffs.  In fact Trout’s best competition for the MVP may well end up being a pitcher (another Mariner, no less!), Felix Hernandez.  And we know how hard it is for a pitcher to win the MVP even when his WAR outpaces that of position players (“They only pitch every 5 days!”).

Playoffs?!

Last and perhaps most importantly, I present the Angels’ records and division finishes over the past 3 seasons:

2012: 89-73, 3rd

2013: 78-84, 3rd

2014: 81-53, 1st (through 8/30)

FanGraphs gives the Angels a 99.9% chance of making the playoffs.  In fact, as of this writing, no other team in baseball has more than 78 wins, while the Angels have 81.  This should finally appease the “MVPs should lead their team to the playoffs” voters.

The Vote

So Trout’s hitting is slightly down and his defense and baserunning are way down from when he had his previous “MVP-caliber” seasons.  Fortunately for Trout, the voters by and large don’t value defense and baserunning as much as they probably should (though that’s starting to change, albeit slowly).  And as for hitting being down, 2014 Trout is doing more of what they value: hitting homers and driving in runs.  The only thing that might work against him is if he doesn’t bat .300 (he’s at .290 as of now), and the voters like nice round numbers (and they value BA over newfangled mumbo-jumbo like OBP and OPS).  Overall though, with the Angels in line for their first playoff spot since 2009 and no other traditional MVP-makeup players in the field, Trout seems like a shoo-in.

 Criteria  As Compared to 2012-2013  Do Voters care?
 Defense  Way Down  Not much
 Baserunning  Way Down  Not much
 Overall Hitting  Somewhat down  Somewhat
 HRs, RBIs  Up  Yes
 Playoffs  Angels in much better position  Yes
 Field  Not as many standouts as 2012-2013(Alex Gordon != Miguel Cabrera)  Yes

So there you have it: Trout will win the AL MVP award for all the wrong reasons.


The Search for a Good Approach

Last week I explored the strategic effect of seeing more pitchers per plate appearance. I love the ten-pitch walk as much as the next guy, but what I love even more is seeing a guy be able to change that approach to beat a scouting report. Let’s take a look at June 5, 2014, when the A’s went to see Masahiro Tanaka for the first time. The first batter is Coco Crisp:

Pitcher
M. Tanaka
Batter
C. Crisp
Speed Pitch Result
1 91 Sinker Ball
2 90 Sinker Ball
3 91 Fastball (Four-seam) Ball
4 90 Fastball (Four-seam) Called Strike
5 91 Fastball (Four-seam) Foul
6 92 Fastball (Four-seam) In play, out(s)

So Crisp doesn’t get the best of Tanaka, but he makes Tanaka labor a bit through six pitches. If you’re going to make an out to start the game, it might as well be a long one. For the next batter, John Jaso, Tanaka decides to go right after him:

Pitcher
M. Tanaka
Batter
J. Jaso
Speed Pitch Result
1 90 Sinker In play, run(s)

I may be looking too deeply into the narrative here, but I love to imagine Tanaka getting a bit frustrated here. Perhaps the scouting report said that both Coco is aggressive early, while Jaso’s running 15% walk rates in 2012 and 2013 suggest that he’s more patient.  Tanaka has to throw six pitches in order to get Crisp out, but after deciding to go right after Jaso, he gets taken deep.

So I wondered if there are players who are able to fulfill both ends of this spectrum. Are there any players that are capable of prolonging their time at the plate until they see the pitch they want, but are also aggressive and willing enough to hit the gas on the first pitch? I used FanGraphs for the pitches/plate appearance data, but used baseball-reference’s play index to look up all instances of first-pitch hits this season. Originally I was going to use first-pitch swings, but I decided to just stick to times when the pitcher gets punished for trying to get ahead early. After all, if your decision is to get ahead early in the count, and the guy swings but all he does is foul it off or hit into an out, then that doesn’t change your approach as a pitcher. I wanted to see guys whom the book isn’t written on yet.  Advance Warning: These stats will be about a week old by the time you see them, as I am a slow, slow man.

Best P/PA Rank + FPH Rank (I have no idea how to pitch to them) FPH% P/PA FPHR PPAR FPHR + PPAR wOBA
Scott Van Slyke 5.940594059 4.143564356 26 45 71 0.385
Eric Campbell 4.2424242424 4.248520710 117 18 99 0.326
Jesus Guzman 4.294478528 4.17791411 111 33 144 0.247
Daniel Murphy 4.577464789 4.111842105 87 58 145 0.305
Joey Votto 4.044117647 4.334558824 135 12 147 0.359
Mark Reynolds 5.037783375 4.0375 59 91 150 0.307

(For Reference: FPH% = First Pitch Hit Percentage, or how often a batter gets a hit on the first pitch they see.  P/PA = Pitches per Plate Appearance. FPHR = First Pitch Hit Ranking, or how they rank in this category compared to the rest of the league.  PPAR = Pitches per Plate Appearance Ranking.  FPHR + PPAR = The addition of these two numbers.)

I like this table!  I have wondered at times what has caused Scott Van Slyke‘s resurgence this year. Perhaps this table gives us a bit of a clue.  Van Slyke is the only person in the MLB to rank in the top 50 in both FPHR and PPAR.  That’s pretty neat.  Daniel Murphy is also quite balanced, but he’s been much more consistent over the last few years.  He’s particularly interesting in that he doesn’t have a particularly high walk rate or strikeout rate.  I guess he’s just selective at times.  Jesus Guzman’s presence on this list goes to show that a good approach doesn’t necessarily mean success; it just means that he may not head back to the bench in any predictable fashion.  I stretched out the table one spot to include Mark Reynolds, because his name on this table makes me feel better about drafting him in Fantasy Baseball for past five years.

I also wanted to look at the flip-side.  Who are the guys who don’t tend to take a lot of pitches, but also don’t tend to make any decent contact on first pitches?

Highest P/PA Rank + FPH Rank (Pick your poison) FPH% P/PA FPHR PPAR FPHR+PPAR wOBA
Joaquin Arias 0.6451612903 3.55483871 370 400 770 0.221
Ben Revere 1.629327902 3.563636364 365 368 733 0.307
Endy Chavez 0.9345794393 3.674311927 321 393 714 0.301
Conor Gillaspie 2.168674699 3.587112172 359 329 688 0.353
Jean Segura 2.564102564 3.42462845 396 289 685 0.262

Here we have a much less impressive list.  Joaquin Arias has been one of the worst hitter in the majors this year, and his dominance atop this leaderboard makes a bit of sense.  However, Conor Gillaspie is having an excellent season for the Pale Hose, despite the fact that he doesn’t seem to excel in either of the areas this article is interested in.  One pecuilar note is that this group is pretty poor at hitting for power in general; these 5 guys have 13 home runs between them on the year, and six of those are Gillaspie’s.

So now let’s look at the weird ones.  I would think that it stands that if there are certain players who tend to take a lot of pitches and who also never seem to square up the first pitch, then we know our game plan.  Get ahead early on these batters.  We can try to view that by simply looking at each players FPH Ranking minus their PPA ranking.  This is the same at looking at the absolute value of their PPAR minus their FPAR.  Here are the top five in that respect:

Worst in FPHR, Best in PPAR (Groove it Early) FPH% P/PA FPHR PPAR FPHR-PPAR wOBA
Jason Kubel 1.136363636 4.471590909 387 4 383 0.278
Aaron Hicks 0.641025641 4.224358974 401 21 380 0.286
Mike Trout 1.217391304 4.418965517 385 6 379 0.401
Matt Carpenter 1.376936317 4.357264957 380 8 372 0.343
A.J. Ellis 1.181102362 4.255813953 386 17 369 0.264

Golly; I’ve figured out Mike Trout!  Mike Trout ranks very highly on our list of PPAR but is unfortunately relatively average when it comes to the first-pitch punish.  All of these guys actually fit this mold.  We have three relatively poor hitters accompanied by the best player in baseball and an above average infielder on a winning team.  So we can tell that being patient isn’t necessarily a good or bad thing; it’s just that hitter’s style.  Now let’s take a look at the reverse:

Best in FPHR, Worst in PPAR (Don’t throw it in the zone early) FPH% P/PA FPHR  PPAR PPAR-FPHR wOBA
Jose Altuve 8.159722222 3.175862069 5 407 402 0.355
Wilson Ramos 7.169811321 3.293680297 6 405 399 0.327
Erick Aybar 6.628787879 3.347091932 12 401 389 0.312
Ender Inciarte 8.360128617 3.471518987 3 391 388 0.284
A.J. Pierzynski 6.413994169 3.391930836 16 399 383 0.283

It’s always satisfying when the data shows what you expect it to.  I imagined Jose Altuve as being among the more aggressive hitters, and this shows that at least.  Altuve ranks 5th in the league in FPH% and is rather mediocre in the PPA category.  Interesting to see that this top five is also sorted by wOBA; Altuve is the best hitter on the list, and Pierzynski is the worst.  So there’s nothing necessarily wrong with an aggressive approach, but it does give us a clue as to a possible plan of attack.

So all this is to say, like my last article, that no particular approach is best.  One can look to swing at the first pitch, or one can be patient and wait for their pitch to come.  That said, everybody does have an approach, and that means they’ve got something they’re not looking for.  Stats like FPH and PPAR may just give us more clues as fans as to what teams put together with scouting reports.

So to conclude by going back to our first example, perhaps Tanaka should have read this data before his start against the A’s.  Coco ranks 266th in the league in FPHR, but a respectable 76th in PPAR.  Conversely, Jaso ranks 80th in the league in FPHR, but just 225th in PPAR.  Tanaka might have been better served by going after the aging Crisp and saving his energy for the somewhat aggressive Jaso.


Is Nolan Ryan Overrated by FIP?

Nolan Ryan was a singular pitcher. He’s unique in baseball history, so distinct that it’s hard to know where to start. I’m going to begin with the obvious: strikeouts. Nolan Ryan struck out 5,714 batters, 17% more than second-place Randy Johnson. Only 16 pitchers in history recorded half as many strikeouts as Nolan Ryan. He led his league in strikeouts 11 times, the most since Walter Johnson (12).

Ryan also walked the most batters in history — 2,795. Steve Carlton is second on that list, with 1,833. Ryan averaged 4.67 BB/9 and 12.4 BB%. Both figures are higher than anyone else who pitched even half as many innings. Ryan led his league in walks eight times.

Ryan also threw 277 wild pitches, most since 1900. He allowed 757 stolen bases, almost 40% more than second-place Greg Maddux. Ryan led AL pitchers in errors four times, and retired with a ghastly .895 fielding percentage. Joe Posnanski summed up Ryan’s career, “He’s the most extraordinary pitcher who ever lived, I think. But I also think he’s not especially close to the best.”

Nolan Ryan is unique, and it makes him hard to evaluate. Casual fans and the old-school crowd have always worshiped Nolan Ryan. His uniform number was retired by three different teams, and he was the leading vote-getter, among pitchers, for the MLB All-Century Team. He got more than twice as many votes as Walter Johnson. But when you really look at his stats, Ryan doesn’t come off well.

Take wins. Yes, the pitcher win, because this is surprising. In a career that spanned 26 seasons (not including 1966, when he had only one decision), Ryan only led his team in wins 7 times. Actually, it’s 5 times outright — 7 counts two years he tied for the lead. In 11 of his 27 seasons (41%), Ryan had a lower winning percentage than the team. He lost more games (292) than anyone but Cy Young and Walter Johnson. What about ERA? Ryan led his league in ERA twice, but in one of those years, he went 8-16. The other year, strike-shortened 1981, he didn’t lead the league in strikeouts, but did lead the majors in wild pitches (16). His 1.25 WHIP ranks 278th all-time. Ryan never won a Cy Young Award and never finished among the top 10 in MVP voting.

They say a little knowledge is a dangerous thing. When you look at stats like wins and ERA, Ryan looks more like a good pitcher than a great one. He’s almost a compiler, just a guy who played forever, rather than a true standout. Then you look at FIP. Ryan had a FIP of 2.97 (84 FIP-), and he pitched 5,386 innings, giving him 106.6 WAR. By FIP, Nolan Ryan is the 6th-most valuable pitcher of all time: Roger Clemens, Cy Young, Walter Johnson, Greg Maddux, Randy Johnson, Nolan Ryan.

I suspect the percentage of FanGraphs readers who believe Nolan Ryan was one of the six best pitchers ever is south of 5%, maybe less than 1%. He rates considerably worse by RA9-WAR, 89.5 instead of 106.6, 25th all-time. Even that would seem high to many stat-oriented fans. It’s better than Bob Feller, basically equal to Pedro Martinez. Ryan also ranks 20th in rWAR (83.8), again much lower than when judged by FIP.

I gave this post a stupid title, with an obvious answer. Is Nolan Ryan overrated by FIP? Yes, clearly. His ERA was 20 points higher — in a 28-year, 807-game, 5,400-inning career. I think the numbers stabilize before 5,000 innings. Ryan’s RA9-WAR is 17 points lower than his fWAR, the biggest deficit of any pitcher in history. Ryan is overrated by FIP. That’s not a major revelation. The interesting question is why Nolan Ryan is overrated by FIP — and whether he is underrated by RA and ERA.
Read the rest of this entry »


The A’s and Hitting With Men On Base

Earlier this month I wrote about how the A’s front office is currently outpacing their competition when it comes to roster construction.  I focused primarily on how they’ve taken the platoon advantage to another level, loading up on defensively versatile players to allow for day-to-day lineup construction that maximizes the number of plate appearances where their hitters have the platoon advantage.  As a result of this, they get 70% of their PAs with the platoon advantage, as compared to the league average of 55%.  As part of my investigation into the platoon splits of A’s players, I also noticed another split of interest: offensive performance with runners on base as compared to with the bases empty.  After investigation, I’ve concluded that the A’s have identified and targeted players that have higher offensive production with runners on base.

League-wide trends
First, it should be noted that in general, everyone hits better with runners on base.  There are two primary reasons for this.  The first is sampling bias: if runners are on base, you’re more likely to be facing an inferior pitcher, as such pitchers allow more baserunners and hence face proportionally more batters with runners already on base.  Second, the defense is concerned with more than just the current batter.  With the bases empty, the defense presumably aligns themselves to maximize the chances of getting the batter out (or, more precisely, to minimize the overall output of the batter).  With runners on, there are other considerations – ensuring that the runners don’t steal, for example – that change the defensive alignment.  As a result, a given ball in play is more likely to be a hit if there are runners on base.  League-wide in 2014, the numbers look like this:

  PA OPS BAbip tOPS+
Bases Empty 80375 0.687 0.296 95
Runners on Base 61905 0.725 0.302 106

tOPS+ is a measure of the split, relative to average.  Roughly speaking, the above numbers mean that on average, hitters’ OPS is 6% higher (tOPS+ = 106) with runners on base compared to OPS in all scenarios.

Some teams have been better than others when it comes to hitting with runners on base:

Team OPS (Empty) OPS (RoB) OPS Diff BAbip (Empty) BAbip (RoB) BAbip Diff tOPS+
OAK 0.672 0.789 0.117 0.264 0.306 0.042 118
SEA 0.633 0.740 0.107 0.281 0.312 0.031 118
NYM 0.622 0.713 0.091 0.284 0.288 0.004 116
COL 0.740 0.820 0.080 0.319 0.332 0.013 112
CIN 0.648 0.719 0.071 0.291 0.288 -0.003 112
CLE 0.688 0.756 0.068 0.288 0.304 0.016 111
BAL 0.705 0.771 0.066 0.288 0.310 0.022 111
ATL 0.662 0.716 0.054 0.296 0.317 0.021 109
BOS 0.664 0.713 0.049 0.294 0.297 0.003 108
MIA 0.675 0.724 0.049 0.313 0.318 0.005 108
PHI 0.650 0.694 0.044 0.294 0.290 -0.004 107
CHW 0.700 0.743 0.043 0.308 0.311 0.003 107
LAA 0.717 0.752 0.035 0.290 0.327 0.037 106
PIT 0.710 0.744 0.034 0.302 0.313 0.011 106
CHC 0.666 0.700 0.034 0.300 0.279 -0.021 106
KCR 0.681 0.715 0.034 0.306 0.297 -0.009 106
MIL 0.710 0.740 0.030 0.299 0.295 -0.004 106
ARI 0.677 0.709 0.032 0.293 0.298 0.005 105
SFG 0.670 0.698 0.028 0.283 0.310 0.027 105
WSN 0.691 0.718 0.027 0.303 0.302 -0.001 104
MIN 0.691 0.710 0.019 0.293 0.308 0.015 103
HOU 0.696 0.711 0.015 0.292 0.294 0.002 102
NYY 0.686 0.697 0.011 0.282 0.293 0.011 102
DET 0.750 0.760 0.010 0.309 0.319 0.010 102
TBR 0.698 0.707 0.009 0.298 0.287 -0.011 102
SDP 0.637 0.644 0.007 0.278 0.274 -0.004 102
TEX 0.694 0.689 -0.005 0.308 0.299 -0.009 99
TOR 0.746 0.740 -0.006 0.303 0.291 -0.012 99
LAD 0.726 0.715 -0.011 0.313 0.310 -0.003 99
STL 0.699 0.688 -0.011 0.307 0.290 -0.017 98

Here, tOPS+ is the measure of the split relative to that team’s average.  So for example, the Tigers’ OPS with Runners on Base (RoB) is 0.760, vs. 0.750 with Bases Empty for a tOPS+ of 102.  The Reds on the other hand have a split of 0.648 vs. 0.719 for a tOPS+ of 112.  The Tigers are a better offensive team overall than the Reds, but the Reds’ split with runners on base is larger.

The A’s
The A’s and Mariners top the list as having the largest split with runners on base.  Let’s take a look at the A’s individual players and how they perform with RoB:

Name PA OPS BAbip tOPS+
Josh Donaldson 242 0.953 0.318 138
Brandon Moss 239 0.933 0.348 130
Yoenis Cespedes 208 0.798 0.310 114
Jed Lowrie 202 0.563 0.250 69
Alberto Callaspo 183 0.656 0.264 116
Derek Norris 147 0.878 0.316 109
John Jaso 144 0.842 0.351 120
Coco Crisp 143 0.857 0.333 130
Josh Reddick 134 0.837 0.283 122
Eric Sogard 115 0.587 0.258 108
Stephen Vogt 96 0.887 0.338 106
Nick Punto 94 0.679 0.368 135
Craig Gentry 85 0.676 0.333 116

Again, the tOPS+ column represents how well the player performs with runners on base relative to that player’s average performance.  We can see that across the board, with the notable exception of Jed Lowrie, all the A’s have been performing better with runners on this year.

Now typically this is where you’d say the A’s are just getting lucky, and expect them to regress to the mean.  Certainly some regression is expected, but I’m not sold on the idea that this is entirely luck-driven.  We know that there are some players who routinely and consistently perform better with runners on base – sometimes dramatically so.  Let’s take a look at these players’ career numbers to see if they might be such players:

Name PA OPS BAbip tOPS+
Donaldson – Empty 861 0.701 0.259 74
Donaldson – RoB 675 0.945 0.351 134
Moss – Empty 1084 0.737 0.263 85
Moss – RoB 944 0.864 0.348 117
Cespedes – Empty 844 0.746 0.277 90
Cespedes – RoB 768 0.824 0.304 111
Lowrie – Empty 1338 0.732 0.283 98
Lowrie – RoB 1096 0.756 0.299 104
Callaspo – Empty 2045 0.678 0.281 92
Callaspo – RoB 1580 0.741 0.287 110
Norris – Empty 471 0.694 0.292 87
Norris – RoB 390 0.813 0.309 116
Jaso – Empty 940 0.702 0.275 85
Jaso – RoB 697 0.835 0.308 120
Crisp – Empty 3609 0.742 0.298 100
Crisp – RoB 2237 0.739 0.291 100
Reddick – Empty 992 0.761 0.291 109
Reddick – RoB 820 0.692 0.249 89
Sogard – Empty 488 0.591 0.253 91
Sogard – RoB 362 0.654 0.274 112
Vogt – Empty 206 0.716 0.288 93
Vogt – RoB 183 0.773 0.300 107
Punto – Empty 2087 0.633 0.298 96
Punto – RoB 1627 0.664 0.298 106
Gentry – Empty 549 0.692 0.350 98
Gentry – RoB 432 0.709 0.325 103

Almost all of them have put up large splits with runners on.  Of course, it can take upwards of 1000 PAs for something like BABIP to stabilize (and even then you still need to account for regression to the mean), and many of these players aren’t at that threshold.  Nevertheless, taking these players’ careers in aggregate gives us 27,000 plate appearances; across these, the players show in an increase of 14 points of BABIP and 53 points of OPS with runners aboard.  When compared to league average (6 points of BABIP and 38 points of OPS), it really looks like the A’s are targeting players that have some inherent, non-random ability to perform better with runners on base (to a greater extent than average).

A quick look at the Mariners
The other team leading the league in the split is the Mariners.  What’s going on there?  A look at the individual players’ splits shows:

Name PA OPS BAbip tOPS+
Robinson Cano 221 1.032 0.327 137
Kyle Seager 219 0.905 0.336 120
Dustin Ackley 177 0.702 0.310 104
Mike Zunino 167 0.640 0.247 88
Brad Miller 139 0.619 0.293 108
Justin Smoak 117 0.697 0.268 119
James Jones 115 0.634 0.366 112
Logan Morrison 106 0.671 0.244 106
Corey Hart 101 0.580 0.269 97

The two biggest contributors, by far, are Cano and Seager.  If a genie were to give you one very specific wish which was, you get to pick 2 players on your team to magically perform dramatically better with runners on base, you’d want to pick the 2 guys who a) are clearly the best hitters on your team and b) get the most plate appearances.  For the Mariners, that’s Cano and Seager.

Here, I absolutely expect regression to the mean.  I don’t think the Mariners keep this up.  In fact, looking at Cano’s career numbers (over 6000 PA’s), he’s actually been better with the bases empty: OPS of .873 vs. 0.845, and BABIP of 0.335 vs. 0.313 — but for some reason so far this year he’s been far better with runners on.

What does it all mean?
The A’s have figured it out.  The Mariners have been lucky.  The Mariners will regress heavily to the mean for the remainder of the season.  The A’s might regress somewhat, but they’re on to something.  By building a roster of players that are more productive with runners on base, they score more runs.

This explains why the A’s are outperforming their Expected Runs, or BaseRuns.  BaseRuns predicts how many runs a team scores based purely on their aggregate totals (hits, homers, total bases, etc.), removing all sequencing from the picture entirely.  Based on BaseRuns, FanGraphs says they “should have” only scored 4.54 runs per game, when they’ve actually been scoring 4.82 runs per game.  If we can do a better job quantifying how much of this sequencing is luck-based versus skill-based, we can do a better job projecting run scoring, and by extension, win percentages.


Baseball’s 10 Most Unusual Hitters

Baseball, more than any other major team sport, has the reputation for having the least athletic athletes. Jose Molina is obligated to, at times, sprint. Jorge de la Rosa must swing a baseball bat. David Ortiz sometimes has to play in the field. Having skills like catcher defense, pitching, and hitting with power will earn you playing time, and many players have such elite strengths that it’s worth it just to deal with those weaknesses. So many of baseball’s skills are unrelated that players have to spend a lot of time doing things they aren’t good at, at least relative to other MLB talent. A good way to make anyone look unathletic is to make them perform a long list of skills that have little to do with one another and compare them to the best in the world at those tasks.

I wanted to assemble a list of players who experienced something like this phenomenon the most frequently. Essentially, I wanted to see what players’ strengths and weaknesses were the farthest apart. To determine those players whose skills varied the most between themselves, I gathered what I consider to be the six stats that best describe what a player’s strengths and weaknesses are. BABIP and K% for contact, BB% for discipline, ISO for power, and Fielding and Baserunning values. I then gathered stats from 2011-2014 to better control for less reliable fielding metrics, assigned each player’s stats a percentile rank, and calculated the standard deviation of those six stats for each player.

For instance, Mike Trout’s attributes look like this:

Mike Trout

His strikeout rate has been higher than MLB average, but he is otherwise an exceptionally well rounded player, as we know.

The most evenly talented player in baseball has been Kyle Seager, who is almost in the middle third at every stat.

Kyle Seager

Many players have much more severe strengths and weaknesses. Here are the 10 players whose stats show the greatest variation from one another.

10. Dexter Fowler

Dexter Fowler

9. Ichiro Suzuki

Ichiro Suzuki

8. Jose Altuve

Jose Altuve

7. Curtis Granderson

Curtis Granderson

6. Mark Reynolds

Mark Reynolds

5. Giancarlo Stanton

Giancarlo Stanton

4. Miguel Cabrera

Miguel Cabrera

3. Darwin Barney

Darwin Barney

2. Adam Dunn

Adam Dunn

1. Ben Revere

Ben Revere

The whole list is fun to look through and play around with, so feel free to click here and look through all the qualifying players.