The Cascading Bias of ERA

December 26, 2013

There are so many problems with ERA that it’s unbelievable. I’m not going to sit here and tell you what’s wrong with ERA, though, because you’re probably smart. But there’s a problem with ERA, and it’s a problem that transcends ERA. It’s a problem that trickles down through FIP, xFIP, SIERA, TIPS, etc. etc. name your favorite stat, etc., and it’s something I don’t see talked about much.

All of our advanced pitcher metrics are trying to predict or estimate ERA. They’re trying to figure out what a pitcher’s ERA should be, and herein lies the problem: Because they could be exactly right, but they’d still be a little incorrect due to one little assumption.

This assumption–that pitchers have no control over whether or not the fielders behind them make errors–seems easy to make. Like most assumptions, however, this one is subtly incorrect. Thankfully, the reason is pretty simple. Ground balls are pretty hard to field without making an error, and fly balls aren’t. And the difficulty gap is pretty huge.

How big? Well in 2013 there were precisely 58,388 ground balls, 1,344 of which resulted in errors. On the other hand a mere 98 out of 39,328 fly balls resulted in errors. That means that 2.3% of ground balls result in errors while a tiny 0.25% of fly balls do. It’s time to stop pretending that this gap doesn’t exist, because it does.

So now that we know this, what does it mean? Well it means this: ground-ball pitchers will have an ERA that suggests they are better than their actual value, while fly-ball pitchers have the opposite effect. Pitchers who allow contact, additionally, are worse off because every time they allow contact they put pressure on their defense. They’re giving themselves a chance to stockpile unearned runs which nobody will count against them if they’re only looking at ERA derivatives. When it comes to winning baseball games, however, earned runs don’t matter. Runs matter.

I am going to call this the “pressure on the defense” effect, which will cause some pitchers to be more prone to unearned runs than other pitchers. How big is this effect? Well, not huge. The gap between the best pitcher and worst pitcher in the league is roughly three runs over the course of the season. But keep in mind that three runs is about a third of a win, and a third of win is worth about $2 million dollars. We’re not discussing mere minutiae here.

In order to better quantify this effect I have developed the xUR/180 metric, which will estimate how many unearned runs should have taken place behind each pitcher with an average defense. Below is a table of all qualified starting pitchers from 2013 ranked according this metric. I have also included how many unearned runs they actually allowed in 2013, scaled to 180 innings for comparative purposes.

#	Name	xUR/180	UR/180
1	Joe Saunders	7.24	9.84
2	Jeff Locke	7.11	4.33
3	Wily Peralta	6.97	17.7
4	Edwin Jackson	6.88	13.36
5	Edinson Volquez	6.81	6.35
6	Kyle Kendrick	6.77	8.9
7	Justin Masterson	6.66	0.93
8	Doug Fister	6.58	5.19
9	Wade Miley	6.57	7.12
10	Rick Porcello	6.51	2.03
11	Jerome Williams	6.47	7.45
12	Jorge de la Rosa	6.43	5.38
13	Yovani Gallardo	6.42	7.99
14	A.J. Burnett	6.35	8.48
15	Scott Feldman	6.32	8.94
16	Mike Leake	6.26	5.62
17	Andrew Cashner	6.25	8.23
18	Felix Doubront	6.22	6.66
19	Jhoulys Chacin	6.13	5.48
20	Kevin Correia	6.13	2.92
21	Jeremy Guthrie	6.13	3.41
22	Mark Buehrle	6.11	5.31
23	Andy Pettitte	6.05	7.78
24	Hyun-Jin Ryu	6.01	2.81
25	Jeff Samardzija	6.0	5.07
26	C.J. Wilson	5.93	11.03
27	CC Sabathia	5.9	8.53
28	Jon Lester	5.84	4.22
29	Ryan Dempster	5.8	10.52
30	Tim Lincecum	5.77	5.48
31	Hiroki Kuroda	5.72	4.48
32	Bud Norris	5.72	7.15
33	Jordan Zimmermann	5.69	3.38
34	Patrick Corbin	5.68	1.73
35	Dillon Gee	5.67	3.62
36	Ervin Santana	5.67	7.68
37	Kris Medlen	5.66	8.22
38	Bronson Arroyo	5.63	2.67
39	Stephen Strasburg	5.62	9.84
40	Mat Latos	5.62	6.85
41	Ubaldo Jimenez	5.61	7.9

#	Name	xUR/180	UR/180
42	Jarrod Parker	5.61	4.57
43	John Lackey	5.6	5.71
44	Gio Gonzalez	5.55	5.53
45	Lance Lynn	5.55	2.68
46	Eric Stults	5.5	7.09
47	Felix Hernandez	5.49	4.41
48	Zack Greinke	5.48	2.03
49	Hisashi Iwakuma	5.47	3.28
50	Jose Quintana	5.46	4.5
51	Ian Kennedy	5.46	8.95
52	Ricky Nolasco	5.45	7.23
53	R.A. Dickey	5.44	6.42
54	Jeremy Hellickson	5.4	3.1
55	Homer Bailey	5.38	3.44
56	Miguel Gonzalez	5.36	9.47
57	Madison Bumgarner	5.34	5.37
58	James Shields	5.32	1.58
59	Adam Wainwright	5.32	2.99
60	Bartolo Colon	5.32	3.79
61	Derek Holland	5.3	7.61
62	Kyle Lohse	5.26	3.63
63	Cole Hamels	5.18	4.91
64	Anibal Sanchez	5.18	3.96
65	David Price	5.18	8.7
66	Chris Sale	5.14	6.73
67	Justin Verlander	5.06	8.25
68	Chris Tillman	5.04	1.75
69	Jose Fernandez	5.03	5.23
70	Shelby Miller	4.98	6.24
71	Matt Cain	4.97	2.93
72	Clayton Kershaw	4.9	5.34
73	Julio Teheran	4.9	2.92
74	Matt Harvey	4.86	1.01
75	Cliff Lee	4.79	4.86
76	Travis Wood	4.78	3.6
77	Dan Haren	4.78	4.26
78	Yu Darvish	4.53	1.72
79	A.J. Griffin	4.46	5.4
80	Mike Minor	4.46	5.29
81	Max Scherzer	4.15	3.36
–	–	–	–

Some notes:

Groundballs are still good, they’re just not as good.
A combination of groundballs and contact lead to more unearned runs. The pitchers at the top of the board demonstrate this.
A combination of strikeouts and fly balls will tend to limit the impact of unearned runs, as demonstrated by the bottom of the board.
Errors that occur on fly balls tend to be more costly than errors on ground balls. This metric accounts for that gap, but the low likelihood of fly-ball errors make this bullet point’s effect relatively negligible.
Line drives are similar to fly ball in terms of error rate, but they tend to be less costly than fly ball errors.

I’m sure there is more to be gleaned, but the point is this: we need to stop trying to predict ERA, because ERA is not a pure value stat. We should be trying to figure out how many runs a pitcher should/should have given up, because that’s what matters. Runs matter, and who cares if they’re unearned? They’re kind of the pitcher’s fault, anyways.

What Is an Ace? (2013)

Billy Hamilton: 2014 Leadoff Hitter?

Brandon Reppert is a computer "scientist" who finds talking about himself in the third-person peculiar.

31 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Baltar

12 years ago

An excellent idea well-supported and well-written.

Spencer

12 years ago

Very good article, concise, and a good attempt at putting a value to something that should be measured, could we potentially see a new sierra type thing that predicts for RA? I would guess that the FB/GB values would only need to be edited slightly to account for the difference as well as the numbers being scaled up to reflect RA instead of ER.

Brandon Reppert

12 years ago

Reply to Spencer

It’s possible I’ll turn this into a follow-up article, but the calculations here are actually pretty simple.
A quick and easy way to find the RA equivalent of any ERA estimator for a pitcher is to just add (xUR/180) / 20.

So Joe Saunders raFIP would be something like FIP + .35 and Max Sherzer’s would be FIP + .20. You could scale different metrics to include what you want, but this would get you close.

Thanks for the kind remarks everyone!

Bryce

12 years ago

Reply to Spencer

The creators of SIERA already created SIRA:

http://www.fangraphs.com/blogs/new-siera-part-five-of-five-what-didnt-work/

Brandon Reppert

12 years ago

Reply to Bryce

Oh cool, I hadn’t seen that.

It seems that the decision was to opt for familiarity over precision. Which, well, doesn’t seem like something we should be doing.

The FoilsMember since 2017

12 years ago

You know an article’s good when you’re frustrated that it’s ending. 🙂

jss

12 years ago

Can you tell us how you get xUR? I’m not sure what to make of these numbers. For example, Masterson should have given up about 5.5 more UnEarned runs than he did. Why? Did those runs disappear, or were they Earned Runs?

olethros

12 years ago

Reply to jss

The metric assumes a neutral defense. So deviation +/- the xUR means the defense behind that pitcher performed above or below average.

jss

12 years ago

Reply to olethros

“means the defense behind that pitcher performed above or below average.”

That’s better than average for Errors only, right? Could be high BaBip, low errors, no?

Brandon Reppert

12 years ago

Reply to jss

Yes this is only for errors.

Metrics like FIP already do a good job of regressing BABIP, but what they don’t do is a good job of is accounting for plays that are harder for the defense to not commit errors on (hence this article). In regards to your comment further down, xUR basically does all of the things you request.

Brandon Reppert

12 years ago

Reply to jss

Masterson gave up less earned runs than he should have simply because the defense made plays behind him even though he gave his defense a lot of the most challenging play type (ground balls). There’s a lot of year-to-year fluctation in unearned runs allowed, so xUR looks to take out that fluctuation by using batted ball types and assuming league-average error rates.

The forumula is a bit complicated, but it’s basically this (I know it isn’t going to be formatted well in the comments section):
(gb_ratio * balls_in_play) * lg_gb_error_rate * avgUR/GB_Error +
(ld_ratio * balls_in_play) * lg_ld_error_rate * avgUR/LD_Error +
(fb_ratio * balls_in_play) * lg_fb_error_rate * avgUR/FB_Error =
xUR

Then some additional math is done to scale xUR to a 180 innings scale, and we’ve got our number. Note that avg_UR/XX_Error is a linear weight for how many runs each type of batted ball error costs on average. Overall each error tends to lead to about one unearned run, on average.

Brandon Reppert

12 years ago

Reply to Brandon Reppert

Clarification: the avgUR/XX_Error is not literally a linear weight in the sense that each error will a cost a team X amount of runs. It is the amount of unearned runs that can be expected to occur after the error.

For this reason the avg_UR/XX_Error will be higher than normal ROE linear weights. For example, if an error is made that would have been the third out of the inning then every run after that would be unearned. This is different than traditional linear weights, where only the effect of the error play on overall run expectancy is captured. I hope that makes some sense.

jss

12 years ago

Reply to Brandon Reppert

Can guys have lower or higher ER than they should because the defense did not get to balls that ‘should’ have been errors? Maybe look at types of balls in play, number of errors made on each type, number of expected errors, number of expected men getting to each base, on balls in play by type? Or something like that.

Christopher Carruthers

12 years ago

So really you’re saying we should test against RA9? This is already known, but the common theme is to scale to ERA, so testing against ERA is simple and easy. The differences are fairly small in testing. It’s not going to change which estimator wins in a sample if you use RA9 or ERA. If you think every stat should already be scaled to RA9, then I agree with you, but for the sake of conformity and familiarity, ERA scale is used.

Brandon Reppert

12 years ago

Reply to Christopher Carruthers

I’m not saying that this will change which estimator wins, but that all estimators are losing a little bit because they’re all scaled to ERA. Weights should be modified to conform to RA9 instead of ERA, since ERA has a bias. The “common theme” needs to be changed.

All estimators are a little bit wrong in the same way, so when comparing two estimators it’s not going to change who wins because their errors are the same (unintended pun).

Dan Farnsworth

12 years ago

I’ve been screaming this at my computer for a year or two. I’m glad you put into print what I couldn’t. Great job making a huge point with simple ideas!

Brandon Reppert

12 years ago

Reply to Dan Farnsworth

I’m glad I could scream softly into the keyboard instead 🙂

Ralph

12 years ago

I think you are underestimating the value of ground balls. Ground balls are much more likely to result in a double play than a fly ball. It seems pitchers should at least get partial credit for inducing double plays.

Additionally, I suspect that errors on fly balls to the outfield have more severe consequences than ground ball errors.

Brandon Reppert

12 years ago

Reply to Ralph

Your first point is true, but metrics like SIERA already account for that. Ground balls are still the best batted ball type (other than IFFB), but since they lead to a lot of unearned runs they aren’t quite as good as many of our metrics say they are.

Your second point is also true, but it’s already included in the xUR metric as specified by bullet point #4 above.

Jon L.

12 years ago

This was a great idea and a great article. I look forward to seeing this tool used to assess how particular pitchers are misvalued. Already it’s helping to show how a pitcher can win the Cy Young Award with an infield of mostly DH’s.

MustBunique

12 years ago

Reply to Jon L.

Unless I am mistaken in thinking that you are talking about Scherzer, according to Brandon’s numbers Scherzer allowed less unearned runs (UR/180 3.36) than would be expected with an average defense (xUR/180 4.15). The numbers do not support your claim that it was unearned runs by DH-like infielders which earned Scherzer the Cy-Young award.

Good work Brandon, I liked the article. Thanks for keeping it concise, it really made your point that much more powerful.

randhyllcho

12 years ago

Run value of an error: 0.24
Run value of a HR: 1.39
HR/FB% ~ 9.5%
Percent of ground balls that turn into HR:0.000001%
I’ll take a grounders all day…

Brandon Reppert

12 years ago

Reply to randhyllcho

As would I. The argument I’m making isn’t that fly balls are better than ground balls–it’s that ground balls are slightly over-rated by our metrics because our metrics don’t account for unearned runs.

Also, the run value I have for reaching base on an error is 0.546 (the possible difference is that your run value uses all errors, not just reaching base on errors?). http://www.tangotiger.net/RE9902event.html

randhyllcho

12 years ago

Reply to Brandon Reppert

Whats the std Dev between the xUR/180 and UR/180? I’m wondering if that could be used to “jiggle” cleaned data +/- to get a range of ERA that is more realistic to what pitchers do.

http://www.insidethebook.com/ee/index.php/site/comments/run_values_of_events/

Not sure about the difference, he’s also got an error listed @0.47 runs on the same page. I missed that the first time…

Brandon Reppert

12 years ago

Reply to randhyllcho

The correlation is a little less than .4 between the two, but that’s because only a single season’s data is being used. One season’s worth of innings isn’t enough to factor out the huge randomness of unearned runs. Over a larger sample of innings the two numbers will become more consistent.

As with anything, the +/- range will decrease as the sample goes up. I don’t have the numbers in front of me right now, though, so I can’t give exact values.

studstats_13Member since 2020

12 years ago

Yes totally agree

Charlie

12 years ago

Damn you, Brandon. Stop making me rethink my current baseball philosophies. Well done.

One question: Isn’t it unfair to bundle in FIP with the rest of the advanced metrics? Because, FIP blatantly ignores any batted ball in play. In my observations, FIP is used in a context not originally intended to be used in. Metrics like xFIP and SIERA take into account batted ball types, which is the point you are making in the assumptions such metrics are making.

Brandon Reppert

12 years ago

Reply to Charlie

That’s mostly true, although strikeouts would still have to be up-tweaked in value a little bit since strikeouts lead to very few unearned runs.

Asa

12 years ago

Interesting article. I like a lot of where it goes with the data but unfortunately the entire premise is based on faulty assumptions. If ERA is a bad stat(which it is) trying to improve it using errors(an even worse stat)makes little sense. How many misplayed fly balls are officially recorded as singles, doubles, triples. That just affects the counting numbers. What about the weight of fly ball mistakes? An error in the outfield can be costlier
then infield errors in terms of advancing bases and runners scoring. I would love to see the same(ish) numbers run using other fielding metrics then is the gist of my long winded point.

Brandon Reppert

12 years ago

Reply to Asa

“Errors that occur on fly balls tend to be more costly than errors on ground balls. This metric accounts for that gap.”
–quote from article appearing on this page.

As for your other point, other metrics already do a good job of regressing BABIP/other fielding factors. The point of this particular study was not to look at BABIP, but to isolate and then regress EOBIP, or errors on balls in play, based on batted ball data.

cass

12 years ago

Totally agree.

I’d actually like to get rid of errors entirely. As was pointed out during this year’s AL MVP debate, some players (fast ones like Mike Trout) reach base on error far more often than other players (slow ones like Miguel Cabrera). But OBP gives no credit for reaching on an error. It should. The stat should simply be times on base divided by plate appearances. Shouldn’t be hard. I actually had never realized reached on errors weren’t included.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG