Evaluating the Gap Between ERA and FIP

Fielding Independent Pitching (FIP) has displayed an ability to accurately measure a pitcher’s true skill. FanGraphs describes FIP succinctly as “a measurement of a pitcher’s performance that strips out the role of defense, luck, and sequencing, making it a more stable indicator of how a pitcher actually performed over a given period of time than a runs allowed based statistic that would be highly dependent on the quality of defense played behind him…”

This definition recognizes three factors that may differentiate the runs a pitcher is expected to surrender (FIP) versus the runs a pitcher actually surrenders.

  • Defense
  • Sequencing
  • Luck

FIP removes these factors by only measuring the events that are within control of the pitcher and therefore accurately reflect the skill of the pitcher. These events are strikeouts, walks, batters hit by pitch and home runs. All other events, which are balls put into play, may result in outs, bases, runs, or errors, but are outside the pitcher’s complete control.

The general measure of over- or under-performance of a pitcher’s true skill is ERA-FIP. ERA measures the earned runs given up by a pitcher based on all the events that happen, opposed to FIP’s measurement of runs given the limited events over which a pitcher has complete control. Therefore, the variance between ERA and FIP is attributed to the three factors noted above: defense, sequencing and luck.

But how much of the difference between pitching results and pitching skills are attributable to defense, sequencing, and luck, respectively? And shouldn’t the opponent get some credit for widening the gap between ERA and FIP, either to the benefit or detriment of the pitcher?

I compared Ultimate Zone Rating (UZR), Defensive Runs Saved (DRS), and FanGraphs’ Defensive Runs Above Average (DEF) to ERA-FIP for each team season between 2005–2015 to try to understand the effect of defense on pitching results.

All the metrics have similar correlations, but DRS has the highest adjusted r-squared (correlation coefficient) value (.39), which measures how much of the variance in ERA-FIP is correlated by the defensive metric. FanGraphs’ DEF was right behind DRS (.37) and UZR had an adjusted correlation coefficient of (.34).

The result was somewhat surprising, because DRS and UZR do not factor in positional adjustments (UZR also does not measure catcher or pitcher defense). These metrics measure a player against the average player at that player’s position. They do not measure the difficulty of the position in comparison to other positions.

DEF does apply positional adjustments. FanGraphs uses UZR, not DRS, as the metric they apply the positional adjustments to in order to determine DEF. (see notes below for further explanation of positional adjustments)

Still, the non-positionally adjusted DRS correlates most closely to ERA-FIP. However, it does seem that the advantage over DEF is negligible.

All in all, defense, considered alone, appears to explain 35–40% of a team’s ERA-FIP.

I chose to use a team’s Run Expectancy based on 24 base-out states (RE24) to measure the effects of sequencing. RE24 measures the change in run expectancy between the time a batter comes to the plate and the run expectancy after the plate appearance. The up and down of these changes will reflect the sequence of events experienced by each team (see notes below for further explanation of RE24).

The relationship between ERA-FIP and RE 24 has a similar correlation coefficient (.38) as ERA-FIP and the defensive metrics. Sequencing seems to play a role nearly equal to defense in determining the over- or under-performance of pitchers.

Defense and sequencing are not exclusive though. The reason that the single in the bottom of the 9th occurred is likely related to the fact that the shortstop and/or third baseman did not have enough range to get to the groundball hit between them. Therefore, I measured the correlation of ERA-FIP to defense and sequencing.

Again, DRS+RE24 (.54), DEF+RE24 (.53), and UZR+RE24 (.51) all yielded similar adjusted correlation coefficients.

This suggests roughly 50% of the difference between ERA and FIP are correlated to defense and sequencing. The other half of the difference is not the great unknown, but it’s (sort of) immeasurable.

Luck is part of the other half of the gap between ERA and FIP, but is luck really 50% of what separates a pitcher’s result from a pitcher’s skill?

The skill of the opponent in running the bases is probably a greater part of the other 50% than luck is. This was on display in the playoffs, whether it’s Lorenzo Cain scoring from first on a single, Daniel Murphy taking third base from first base on a walk, or one of the other examples of aggressive (and smart) baserunning witnessed throughout the playoffs. These events change run probabilities and create runs. These base running events tend to be less noticed during the 162-game season, but they still happen.

Some of the ability for catchers and pitchers to prevent stolen bases is cooked into the defensive metrics, but not much else is. FanGraphs’ Base Running (BsR) measures the baserunning abilities of players and teams, from an offensive perspective, but to my knowledge there is no accumulated stat to measure opponents’ BsR. The data is out there. The same measures used to determine BsR would only have to be aggregated from the perspective of the pitching team.

A measure of Opponents’ BsR would likely cover a good amount of the uncorrelated variance between ERA and FIP. There would still be a lot of luck left in play, but probably not as much as there is thought to be now.





You can read more of my thoughts, opinions, and research on baseball at https://medium.com/simply-bases. Twitter: @simplybases.

23 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
evo34
8 years ago

FIP is a fair approximation for normal run environments. In extreme parks, however, it breaks down. The Rockies, for example, will usually have an ERA well above their FIP — and it’s not all due to defense. I.e., even though FIP accounts for the park-driven component of HR, BB and K rates in a given park, it does not necessarily account for the ways in which these events produce runs in that particular park.

scotman144member
8 years ago

I ended up really liking this article and finding those correlations interesting enough to look into this more myself.

However I kept saying….but popups….to myself while reading this.

If you ran this again folding popups in with each pitcher’s K’s to get a “pFIP” figure I wonder how much “luck” would collapse out.

scotman144member
8 years ago
Reply to  scotman144

Also a brief blurb on why popups which aren’t strictly fielding independent still deserve to get lumped in with K’s:

A lot of the consistent FIP out performers are frequent popup inducers like Chris Young, the Good Jered Weaver, and Huston Street. I believe the stat is something like 99.9% of infield popups in play get caught at the MLB level. I think batters reaching on a dropped 3rd strike is actually more common than a dropped infield fly or at least on par. There’s plenty out there about popups being a skill that’s pretty sticky year to year as well.

scotman144member
8 years ago

Thanks for the reply. Perhaps rather than creating an alternate Popup inclusive FIP you could just run the correlation of popup% (IFFB/ Balls in play) with the ERA-FIP gap. Regardless of sustainability popups are automatic outs that are not being captured by FIP would be the idea.

Some articles about the sustainability of popups:

http://www.fangraphs.com/blogs/infield-flies-fip-and-war/

http://www.fangraphs.com/blogs/hitting-em-where-they-are/

http://www.fangraphs.com/community/babip-and-innings-pitched-plus-explaining-popups/

scotman144member
8 years ago
Reply to  scotman144

Christopher- Thanks so much for taking a look. That’s too bad about popups however I guess it’s likely you’d need to have a whole staff of Chris Youngs to accumulate enough popups to have it show up at a team level ERA-FIP calculation.

Paul Clarke
8 years ago

The result was somewhat surprising, because DRS and UZR do not factor in positional adjustments

This isn’t really surprising. The total positional adjustment for the fielders on each team is the same (and is equal to zero).

I chose to use a team’s Run Expectancy based on 24 base-out states (RE24) to measure the effects of sequencing

I don’t think this works. RE24 doesn’t isolate the effect of sequencing from overall offensive performance (e.g. wOBA). RE24 will correlate very strongly (*) with RA and so will obviously correlate very strongly with ERA. Teams with a high ERA will obviously be more likely to have a positive ERA-FIP gap, which is what your correlation is showing. Something like RE24-wRAA might work, though there are park adjustment issues there.

(*) Perfectly if you include all events, e.g. stolen bases or advancing on wild pitches.

Paul Clarke
8 years ago
Reply to  Paul Clarke

On further thought, RE24-wRAA doesn’t really work either because of the non-linear nature of run scoring. Maybe the best bet is to use a BaseRuns estimator and use RE24 – BaseRuns.

foxinsoxmember
8 years ago

Great article! I hope you will post another article about what the makeup of the remaining 50% is 🙂

Tony Blengino does a lot of articles about contact quality suppression by certain pitchers; do you think this is part of the ‘unknown 50%’?

What exactly do we mean by luck? Does luck not apply to K, HR, HPB, and BB? Then of course we have the catcher framing problem…

foxinsoxmember
8 years ago

Thanks for taking the time to respond! I’m not trying to play gotcha with your comments, just question some logic in case it may help with future research 😉

You could argue HR have some element of luck, as a lot of pitchers have HR Rates that vary from season to season, however K, BB, and HPB are directly related to pitcher skills of control,”stuff”, and pitching know-how.

It’s certainly not going out on a limb to say HR has a major element of luck (xFIP exists), and given xFIP’s imperfect year-to-year correlation, there is some (perhaps small) element of luck in BB/K/HBP too. I guess I was trying to say that using ‘luck’ to describe the remaining unknown isn’t much different than using ‘luck’ to describe the ERA-FIP gap in its entirety, and you’ve already disproved that.

So, catcher framing could affect FIP itself, but is unlikely to affect ERA-FIP.

Actually, if FIP is affected, then by definition ERA-FIP is also affected. The only way for ERA-FIP to *not* be affected is if catcher framing affects ERA and FIP in exactly the same magnitude.

dbminn
8 years ago

I like the idea of using RE24 as a partial sequencing surrogate.

One measurable parameter that might affect RE24 is a pitcher’s ability from the stretch. As a rough check, I reviewed REW during 2005-2015. Starters have an REW range of roughly -17 to -60. Relievers, who most always pitch from the stretch, have an REW of 7 to 52. Pitching from the stretch isn’t the only variable affecting REW, but I would guess it has an effect.

dbminn
8 years ago

*A High ERA-FIP pitcher’s “relative” ability from the stretch.

Mr Punch
8 years ago

“Outside the pitcher’s complete control” is not exactly the same as “outside the pitcher’s control.” The pitcher plays defense, has a good deal to do with sequencing, and as for luck, that’s a slippery concept as used here. This is a useful piece, especially the part on defense. It does seem to me to assume that FIP is “right” as opposed to “a better approximation” – what we need, beyond the general decomposition of variance, is analysis of more specific cases/dynamics as you suggest re the Rockies.

Nathaniel Dawson
8 years ago

In addition to the excellent comments by Paul above, I would add that treating sequencing as unrelated to luck probably misses the boat. I think it’s fair to assume that a pitcher has some control over sequencing (as does the batting order), I also think it’s fair to assume that there’s still a lot out of his control. Perhaps I’m just reading you wrong, but you seem to be making the point that the effects of sequencing are outside of the realm of luck “Luck is part of the other half of the gap between ERA and FIP, but is luck really 50% of what separates a pitcher’s result from a pitcher’s skill?“. So, yeah, you can attempt to suss out how much the effect of sequencing might have had on the difference between FIP and R/9 or ERA, but that doesn’t mean there wasn’t quite a bit of luck involved in the sequencing itself, even when you’re spreading things out by looking at team level results.

Nathaniel Dawson
8 years ago

Basically put, RE24 is not going to get you where you want to go. RE24 for a whole inning is going to equal the number of runs scored during that inning minus the number of runs a typical team would be expected to score during that inning. So for a whole game, RE24 plus the league average of runs scored is going to equal R/9, which we already know. RE24 compared to FIP is not going to tell you anything with regard to sequencing that R/9 compared to FIP wouldn’t tell you. It’s only a matter of scale, because RE24 measures the difference between what a team actually scores compared to what a typical team could expect to score, rather than just the total number of runs that team scored. If there’s a difference between R/9 and FIP, you’re going to find a difference between RE24 and FIP. That’s what Paul meant when he said “RE24 will correlate very strongly (*) with RA and so will obviously correlate very strongly with ERA.” Measuring RE24 compared to FIP doesn’t isolate the effects of sequencing any more than measuring R/9 (or ERA) to FIP would do.

Now, you’re using ERA rather than R/9, but any difference between those two is going to be attributable to the official scorer or the defense.

I suppose I could be more helpful if I could tell you a sure-fire way to measure the effects of sequencing independent of defense and luck, but I’m not smart enough to know how to do that, and I’m not sure anyone else has figured out how to do it, either.

Paul Clarke
8 years ago

On the positional adjustments: the point is that when you’re looking at how team defence relates to something it doesn’t matter whether or not you include the positional adjustments, because they’ll be the same for all teams – it’s not like some teams play two outfielders and an extra shortstop. That’s why you get similar results regardless of whether you use Def, which includes the adjustments, or UZR/DRS, which don’t.

McKay
8 years ago

Great article!

I notice you show correlation coefficients as, say, .53, then say that roughly 50% of variance is explained by predictors. I think you’re confusing r and r-squared. If your coefficient (r) is .53, then your predictors explain 28% of the vairance. I’d love it if you can clarify here, since you may have been rightly reporting r-squared but calling it a coefficient. Seems a big difference in the take away message … Is half of the discrepancy unaccounted for or is it 72%?