The Cascading Bias of ERA
There are so many problems with ERA that it’s unbelievable. I’m not going to sit here and tell you what’s wrong with ERA, though, because you’re probably smart. But there’s a problem with ERA, and it’s a problem that transcends ERA. It’s a problem that trickles down through FIP, xFIP, SIERA, TIPS, etc. etc. name your favorite stat, etc., and it’s something I don’t see talked about much.
All of our advanced pitcher metrics are trying to predict or estimate ERA. They’re trying to figure out what a pitcher’s ERA should be, and herein lies the problem: Because they could be exactly right, but they’d still be a little incorrect due to one little assumption.
This assumption–that pitchers have no control over whether or not the fielders behind them make errors–seems easy to make. Like most assumptions, however, this one is subtly incorrect. Thankfully, the reason is pretty simple. Ground balls are pretty hard to field without making an error, and fly balls aren’t. And the difficulty gap is pretty huge.
How big? Well in 2013 there were precisely 58,388 ground balls, 1,344 of which resulted in errors. On the other hand a mere 98 out of 39,328 fly balls resulted in errors. That means that 2.3% of ground balls result in errors while a tiny 0.25% of fly balls do. It’s time to stop pretending that this gap doesn’t exist, because it does.
So now that we know this, what does it mean? Well it means this: ground-ball pitchers will have an ERA that suggests they are better than their actual value, while fly-ball pitchers have the opposite effect. Pitchers who allow contact, additionally, are worse off because every time they allow contact they put pressure on their defense. They’re giving themselves a chance to stockpile unearned runs which nobody will count against them if they’re only looking at ERA derivatives. When it comes to winning baseball games, however, earned runs don’t matter. Runs matter.
I am going to call this the “pressure on the defense” effect, which will cause some pitchers to be more prone to unearned runs than other pitchers. How big is this effect? Well, not huge. The gap between the best pitcher and worst pitcher in the league is roughly three runs over the course of the season. But keep in mind that three runs is about a third of a win, and a third of win is worth about $2 million dollars. We’re not discussing mere minutiae here.
In order to better quantify this effect I have developed the xUR/180 metric, which will estimate how many unearned runs should have taken place behind each pitcher with an average defense. Below is a table of all qualified starting pitchers from 2013 ranked according this metric. I have also included how many unearned runs they actually allowed in 2013, scaled to 180 innings for comparative purposes.
# | Name | xUR/180 | UR/180 |
---|---|---|---|
1 | Joe Saunders | 7.24 | 9.84 |
2 | Jeff Locke | 7.11 | 4.33 |
3 | Wily Peralta | 6.97 | 17.7 |
4 | Edwin Jackson | 6.88 | 13.36 |
5 | Edinson Volquez | 6.81 | 6.35 |
6 | Kyle Kendrick | 6.77 | 8.9 |
7 | Justin Masterson | 6.66 | 0.93 |
8 | Doug Fister | 6.58 | 5.19 |
9 | Wade Miley | 6.57 | 7.12 |
10 | Rick Porcello | 6.51 | 2.03 |
11 | Jerome Williams | 6.47 | 7.45 |
12 | Jorge de la Rosa | 6.43 | 5.38 |
13 | Yovani Gallardo | 6.42 | 7.99 |
14 | A.J. Burnett | 6.35 | 8.48 |
15 | Scott Feldman | 6.32 | 8.94 |
16 | Mike Leake | 6.26 | 5.62 |
17 | Andrew Cashner | 6.25 | 8.23 |
18 | Felix Doubront | 6.22 | 6.66 |
19 | Jhoulys Chacin | 6.13 | 5.48 |
20 | Kevin Correia | 6.13 | 2.92 |
21 | Jeremy Guthrie | 6.13 | 3.41 |
22 | Mark Buehrle | 6.11 | 5.31 |
23 | Andy Pettitte | 6.05 | 7.78 |
24 | Hyun-Jin Ryu | 6.01 | 2.81 |
25 | Jeff Samardzija | 6.0 | 5.07 |
26 | C.J. Wilson | 5.93 | 11.03 |
27 | CC Sabathia | 5.9 | 8.53 |
28 | Jon Lester | 5.84 | 4.22 |
29 | Ryan Dempster | 5.8 | 10.52 |
30 | Tim Lincecum | 5.77 | 5.48 |
31 | Hiroki Kuroda | 5.72 | 4.48 |
32 | Bud Norris | 5.72 | 7.15 |
33 | Jordan Zimmermann | 5.69 | 3.38 |
34 | Patrick Corbin | 5.68 | 1.73 |
35 | Dillon Gee | 5.67 | 3.62 |
36 | Ervin Santana | 5.67 | 7.68 |
37 | Kris Medlen | 5.66 | 8.22 |
38 | Bronson Arroyo | 5.63 | 2.67 |
39 | Stephen Strasburg | 5.62 | 9.84 |
40 | Mat Latos | 5.62 | 6.85 |
41 | Ubaldo Jimenez | 5.61 | 7.9 |
# | Name | xUR/180 | UR/180 |
---|---|---|---|
42 | Jarrod Parker | 5.61 | 4.57 |
43 | John Lackey | 5.6 | 5.71 |
44 | Gio Gonzalez | 5.55 | 5.53 |
45 | Lance Lynn | 5.55 | 2.68 |
46 | Eric Stults | 5.5 | 7.09 |
47 | Felix Hernandez | 5.49 | 4.41 |
48 | Zack Greinke | 5.48 | 2.03 |
49 | Hisashi Iwakuma | 5.47 | 3.28 |
50 | Jose Quintana | 5.46 | 4.5 |
51 | Ian Kennedy | 5.46 | 8.95 |
52 | Ricky Nolasco | 5.45 | 7.23 |
53 | R.A. Dickey | 5.44 | 6.42 |
54 | Jeremy Hellickson | 5.4 | 3.1 |
55 | Homer Bailey | 5.38 | 3.44 |
56 | Miguel Gonzalez | 5.36 | 9.47 |
57 | Madison Bumgarner | 5.34 | 5.37 |
58 | James Shields | 5.32 | 1.58 |
59 | Adam Wainwright | 5.32 | 2.99 |
60 | Bartolo Colon | 5.32 | 3.79 |
61 | Derek Holland | 5.3 | 7.61 |
62 | Kyle Lohse | 5.26 | 3.63 |
63 | Cole Hamels | 5.18 | 4.91 |
64 | Anibal Sanchez | 5.18 | 3.96 |
65 | David Price | 5.18 | 8.7 |
66 | Chris Sale | 5.14 | 6.73 |
67 | Justin Verlander | 5.06 | 8.25 |
68 | Chris Tillman | 5.04 | 1.75 |
69 | Jose Fernandez | 5.03 | 5.23 |
70 | Shelby Miller | 4.98 | 6.24 |
71 | Matt Cain | 4.97 | 2.93 |
72 | Clayton Kershaw | 4.9 | 5.34 |
73 | Julio Teheran | 4.9 | 2.92 |
74 | Matt Harvey | 4.86 | 1.01 |
75 | Cliff Lee | 4.79 | 4.86 |
76 | Travis Wood | 4.78 | 3.6 |
77 | Dan Haren | 4.78 | 4.26 |
78 | Yu Darvish | 4.53 | 1.72 |
79 | A.J. Griffin | 4.46 | 5.4 |
80 | Mike Minor | 4.46 | 5.29 |
81 | Max Scherzer | 4.15 | 3.36 |
– | – | – | – |
Some notes:
- Groundballs are still good, they’re just not as good.
- A combination of groundballs and contact lead to more unearned runs. The pitchers at the top of the board demonstrate this.
- A combination of strikeouts and fly balls will tend to limit the impact of unearned runs, as demonstrated by the bottom of the board.
- Errors that occur on fly balls tend to be more costly than errors on ground balls. This metric accounts for that gap, but the low likelihood of fly-ball errors make this bullet point’s effect relatively negligible.
- Line drives are similar to fly ball in terms of error rate, but they tend to be less costly than fly ball errors.
I’m sure there is more to be gleaned, but the point is this: we need to stop trying to predict ERA, because ERA is not a pure value stat. We should be trying to figure out how many runs a pitcher should/should have given up, because that’s what matters. Runs matter, and who cares if they’re unearned? They’re kind of the pitcher’s fault, anyways.
Brandon Reppert is a computer "scientist" who finds talking about himself in the third-person peculiar.
An excellent idea well-supported and well-written.
Very good article, concise, and a good attempt at putting a value to something that should be measured, could we potentially see a new sierra type thing that predicts for RA? I would guess that the FB/GB values would only need to be edited slightly to account for the difference as well as the numbers being scaled up to reflect RA instead of ER.
It’s possible I’ll turn this into a follow-up article, but the calculations here are actually pretty simple.
A quick and easy way to find the RA equivalent of any ERA estimator for a pitcher is to just add (xUR/180) / 20.
So Joe Saunders raFIP would be something like FIP + .35 and Max Sherzer’s would be FIP + .20. You could scale different metrics to include what you want, but this would get you close.
Thanks for the kind remarks everyone!
The creators of SIERA already created SIRA:
http://www.fangraphs.com/blogs/new-siera-part-five-of-five-what-didnt-work/
Oh cool, I hadn’t seen that.
It seems that the decision was to opt for familiarity over precision. Which, well, doesn’t seem like something we should be doing.
You know an article’s good when you’re frustrated that it’s ending. 🙂
Can you tell us how you get xUR? I’m not sure what to make of these numbers. For example, Masterson should have given up about 5.5 more UnEarned runs than he did. Why? Did those runs disappear, or were they Earned Runs?
The metric assumes a neutral defense. So deviation +/- the xUR means the defense behind that pitcher performed above or below average.
“means the defense behind that pitcher performed above or below average.”
That’s better than average for Errors only, right? Could be high BaBip, low errors, no?
Yes this is only for errors.
Metrics like FIP already do a good job of regressing BABIP, but what they don’t do is a good job of is accounting for plays that are harder for the defense to not commit errors on (hence this article). In regards to your comment further down, xUR basically does all of the things you request.
Masterson gave up less earned runs than he should have simply because the defense made plays behind him even though he gave his defense a lot of the most challenging play type (ground balls). There’s a lot of year-to-year fluctation in unearned runs allowed, so xUR looks to take out that fluctuation by using batted ball types and assuming league-average error rates.
The forumula is a bit complicated, but it’s basically this (I know it isn’t going to be formatted well in the comments section):
(gb_ratio * balls_in_play) * lg_gb_error_rate * avgUR/GB_Error +
(ld_ratio * balls_in_play) * lg_ld_error_rate * avgUR/LD_Error +
(fb_ratio * balls_in_play) * lg_fb_error_rate * avgUR/FB_Error =
xUR
Then some additional math is done to scale xUR to a 180 innings scale, and we’ve got our number. Note that avg_UR/XX_Error is a linear weight for how many runs each type of batted ball error costs on average. Overall each error tends to lead to about one unearned run, on average.
Clarification: the avgUR/XX_Error is not literally a linear weight in the sense that each error will a cost a team X amount of runs. It is the amount of unearned runs that can be expected to occur after the error.
For this reason the avg_UR/XX_Error will be higher than normal ROE linear weights. For example, if an error is made that would have been the third out of the inning then every run after that would be unearned. This is different than traditional linear weights, where only the effect of the error play on overall run expectancy is captured. I hope that makes some sense.
Can guys have lower or higher ER than they should because the defense did not get to balls that ‘should’ have been errors? Maybe look at types of balls in play, number of errors made on each type, number of expected errors, number of expected men getting to each base, on balls in play by type? Or something like that.
So really you’re saying we should test against RA9? This is already known, but the common theme is to scale to ERA, so testing against ERA is simple and easy. The differences are fairly small in testing. It’s not going to change which estimator wins in a sample if you use RA9 or ERA. If you think every stat should already be scaled to RA9, then I agree with you, but for the sake of conformity and familiarity, ERA scale is used.
I’m not saying that this will change which estimator wins, but that all estimators are losing a little bit because they’re all scaled to ERA. Weights should be modified to conform to RA9 instead of ERA, since ERA has a bias. The “common theme” needs to be changed.
All estimators are a little bit wrong in the same way, so when comparing two estimators it’s not going to change who wins because their errors are the same (unintended pun).
I’ve been screaming this at my computer for a year or two. I’m glad you put into print what I couldn’t. Great job making a huge point with simple ideas!
I’m glad I could scream softly into the keyboard instead 🙂
I think you are underestimating the value of ground balls. Ground balls are much more likely to result in a double play than a fly ball. It seems pitchers should at least get partial credit for inducing double plays.
Additionally, I suspect that errors on fly balls to the outfield have more severe consequences than ground ball errors.
Your first point is true, but metrics like SIERA already account for that. Ground balls are still the best batted ball type (other than IFFB), but since they lead to a lot of unearned runs they aren’t quite as good as many of our metrics say they are.
Your second point is also true, but it’s already included in the xUR metric as specified by bullet point #4 above.
This was a great idea and a great article. I look forward to seeing this tool used to assess how particular pitchers are misvalued. Already it’s helping to show how a pitcher can win the Cy Young Award with an infield of mostly DH’s.
Unless I am mistaken in thinking that you are talking about Scherzer, according to Brandon’s numbers Scherzer allowed less unearned runs (UR/180 3.36) than would be expected with an average defense (xUR/180 4.15). The numbers do not support your claim that it was unearned runs by DH-like infielders which earned Scherzer the Cy-Young award.
Good work Brandon, I liked the article. Thanks for keeping it concise, it really made your point that much more powerful.
Run value of an error: 0.24
Run value of a HR: 1.39
HR/FB% ~ 9.5%
Percent of ground balls that turn into HR:0.000001%
I’ll take a grounders all day…
As would I. The argument I’m making isn’t that fly balls are better than ground balls–it’s that ground balls are slightly over-rated by our metrics because our metrics don’t account for unearned runs.
Also, the run value I have for reaching base on an error is 0.546 (the possible difference is that your run value uses all errors, not just reaching base on errors?). http://www.tangotiger.net/RE9902event.html
Whats the std Dev between the xUR/180 and UR/180? I’m wondering if that could be used to “jiggle” cleaned data +/- to get a range of ERA that is more realistic to what pitchers do.
http://www.insidethebook.com/ee/index.php/site/comments/run_values_of_events/
Not sure about the difference, he’s also got an error listed @0.47 runs on the same page. I missed that the first time…
The correlation is a little less than .4 between the two, but that’s because only a single season’s data is being used. One season’s worth of innings isn’t enough to factor out the huge randomness of unearned runs. Over a larger sample of innings the two numbers will become more consistent.
As with anything, the +/- range will decrease as the sample goes up. I don’t have the numbers in front of me right now, though, so I can’t give exact values.
Yes totally agree
Damn you, Brandon. Stop making me rethink my current baseball philosophies. Well done.
One question: Isn’t it unfair to bundle in FIP with the rest of the advanced metrics? Because, FIP blatantly ignores any batted ball in play. In my observations, FIP is used in a context not originally intended to be used in. Metrics like xFIP and SIERA take into account batted ball types, which is the point you are making in the assumptions such metrics are making.
That’s mostly true, although strikeouts would still have to be up-tweaked in value a little bit since strikeouts lead to very few unearned runs.
Interesting article. I like a lot of where it goes with the data but unfortunately the entire premise is based on faulty assumptions. If ERA is a bad stat(which it is) trying to improve it using errors(an even worse stat)makes little sense. How many misplayed fly balls are officially recorded as singles, doubles, triples. That just affects the counting numbers. What about the weight of fly ball mistakes? An error in the outfield can be costlier
then infield errors in terms of advancing bases and runners scoring. I would love to see the same(ish) numbers run using other fielding metrics then is the gist of my long winded point.
“Errors that occur on fly balls tend to be more costly than errors on ground balls. This metric accounts for that gap.”
–quote from article appearing on this page.
As for your other point, other metrics already do a good job of regressing BABIP/other fielding factors. The point of this particular study was not to look at BABIP, but to isolate and then regress EOBIP, or errors on balls in play, based on batted ball data.
Totally agree.
I’d actually like to get rid of errors entirely. As was pointed out during this year’s AL MVP debate, some players (fast ones like Mike Trout) reach base on error far more often than other players (slow ones like Miguel Cabrera). But OBP gives no credit for reaching on an error. It should. The stat should simply be times on base divided by plate appearances. Shouldn’t be hard. I actually had never realized reached on errors weren’t included.