Evaluating the Gap Between ERA and FIP

Fielding Independent Pitching (FIP) has displayed an ability to accurately measure a pitcher’s true skill. FanGraphs describes FIP succinctly as “a measurement of a pitcher’s performance that strips out the role of defense, luck, and sequencing, making it a more stable indicator of how a pitcher actually performed over a given period of time than a runs allowed based statistic that would be highly dependent on the quality of defense played behind him…”

This definition recognizes three factors that may differentiate the runs a pitcher is expected to surrender (FIP) versus the runs a pitcher actually surrenders.

  • Defense
  • Sequencing
  • Luck

FIP removes these factors by only measuring the events that are within control of the pitcher and therefore accurately reflect the skill of the pitcher. These events are strikeouts, walks, batters hit by pitch and home runs. All other events, which are balls put into play, may result in outs, bases, runs, or errors, but are outside the pitcher’s complete control.

The general measure of over- or under-performance of a pitcher’s true skill is ERA-FIP. ERA measures the earned runs given up by a pitcher based on all the events that happen, opposed to FIP’s measurement of runs given the limited events over which a pitcher has complete control. Therefore, the variance between ERA and FIP is attributed to the three factors noted above: defense, sequencing and luck.

But how much of the difference between pitching results and pitching skills are attributable to defense, sequencing, and luck, respectively? And shouldn’t the opponent get some credit for widening the gap between ERA and FIP, either to the benefit or detriment of the pitcher?

I compared Ultimate Zone Rating (UZR), Defensive Runs Saved (DRS), and FanGraphs’ Defensive Runs Above Average (DEF) to ERA-FIP for each team season between 2005–2015 to try to understand the effect of defense on pitching results.

All the metrics have similar correlations, but DRS has the highest adjusted r-squared (correlation coefficient) value (.39), which measures how much of the variance in ERA-FIP is correlated by the defensive metric. FanGraphs’ DEF was right behind DRS (.37) and UZR had an adjusted correlation coefficient of (.34).

The result was somewhat surprising, because DRS and UZR do not factor in positional adjustments (UZR also does not measure catcher or pitcher defense). These metrics measure a player against the average player at that player’s position. They do not measure the difficulty of the position in comparison to other positions.

DEF does apply positional adjustments. FanGraphs uses UZR, not DRS, as the metric they apply the positional adjustments to in order to determine DEF. (see notes below for further explanation of positional adjustments)

Still, the non-positionally adjusted DRS correlates most closely to ERA-FIP. However, it does seem that the advantage over DEF is negligible.

All in all, defense, considered alone, appears to explain 35–40% of a team’s ERA-FIP.

I chose to use a team’s Run Expectancy based on 24 base-out states (RE24) to measure the effects of sequencing. RE24 measures the change in run expectancy between the time a batter comes to the plate and the run expectancy after the plate appearance. The up and down of these changes will reflect the sequence of events experienced by each team (see notes below for further explanation of RE24).

The relationship between ERA-FIP and RE 24 has a similar correlation coefficient (.38) as ERA-FIP and the defensive metrics. Sequencing seems to play a role nearly equal to defense in determining the over- or under-performance of pitchers.

Defense and sequencing are not exclusive though. The reason that the single in the bottom of the 9th occurred is likely related to the fact that the shortstop and/or third baseman did not have enough range to get to the groundball hit between them. Therefore, I measured the correlation of ERA-FIP to defense and sequencing.

Again, DRS+RE24 (.54), DEF+RE24 (.53), and UZR+RE24 (.51) all yielded similar adjusted correlation coefficients.

This suggests roughly 50% of the difference between ERA and FIP are correlated to defense and sequencing. The other half of the difference is not the great unknown, but it’s (sort of) immeasurable.

Luck is part of the other half of the gap between ERA and FIP, but is luck really 50% of what separates a pitcher’s result from a pitcher’s skill?

The skill of the opponent in running the bases is probably a greater part of the other 50% than luck is. This was on display in the playoffs, whether it’s Lorenzo Cain scoring from first on a single, Daniel Murphy taking third base from first base on a walk, or one of the other examples of aggressive (and smart) baserunning witnessed throughout the playoffs. These events change run probabilities and create runs. These base running events tend to be less noticed during the 162-game season, but they still happen.

Some of the ability for catchers and pitchers to prevent stolen bases is cooked into the defensive metrics, but not much else is. FanGraphs’ Base Running (BsR) measures the baserunning abilities of players and teams, from an offensive perspective, but to my knowledge there is no accumulated stat to measure opponents’ BsR. The data is out there. The same measures used to determine BsR would only have to be aggregated from the perspective of the pitching team.

A measure of Opponents’ BsR would likely cover a good amount of the uncorrelated variance between ERA and FIP. There would still be a lot of luck left in play, but probably not as much as there is thought to be now.





You can read more of my thoughts, opinions, and research on baseball at https://medium.com/simply-bases. Twitter: @simplybases.

newest oldest most voted
evo34
Member
evo34

FIP is a fair approximation for normal run environments. In extreme parks, however, it breaks down. The Rockies, for example, will usually have an ERA well above their FIP — and it’s not all due to defense. I.e., even though FIP accounts for the park-driven component of HR, BB and K rates in a given park, it does not necessarily account for the ways in which these events produce runs in that particular park.

scotman144
Member
Member

I ended up really liking this article and finding those correlations interesting enough to look into this more myself.

However I kept saying….but popups….to myself while reading this.

If you ran this again folding popups in with each pitcher’s K’s to get a “pFIP” figure I wonder how much “luck” would collapse out.

scotman144
Member
Member

Also a brief blurb on why popups which aren’t strictly fielding independent still deserve to get lumped in with K’s: A lot of the consistent FIP out performers are frequent popup inducers like Chris Young, the Good Jered Weaver, and Huston Street. I believe the stat is something like 99.9% of infield popups in play get caught at the MLB level. I think batters reaching on a dropped 3rd strike is actually more common than a dropped infield fly or at least on par. There’s plenty out there about popups being a skill that’s pretty sticky year to year as… Read more »

Paul Clarke
Guest
Paul Clarke

The result was somewhat surprising, because DRS and UZR do not factor in positional adjustments This isn’t really surprising. The total positional adjustment for the fielders on each team is the same (and is equal to zero). I chose to use a team’s Run Expectancy based on 24 base-out states (RE24) to measure the effects of sequencing I don’t think this works. RE24 doesn’t isolate the effect of sequencing from overall offensive performance (e.g. wOBA). RE24 will correlate very strongly (*) with RA and so will obviously correlate very strongly with ERA. Teams with a high ERA will obviously be… Read more »

Paul Clarke
Guest
Paul Clarke

On further thought, RE24-wRAA doesn’t really work either because of the non-linear nature of run scoring. Maybe the best bet is to use a BaseRuns estimator and use RE24 – BaseRuns.

foxinsox
Member
Member

Great article! I hope you will post another article about what the makeup of the remaining 50% is 🙂

Tony Blengino does a lot of articles about contact quality suppression by certain pitchers; do you think this is part of the ‘unknown 50%’?

What exactly do we mean by luck? Does luck not apply to K, HR, HPB, and BB? Then of course we have the catcher framing problem…

dbminn
Guest
dbminn

I like the idea of using RE24 as a partial sequencing surrogate.

One measurable parameter that might affect RE24 is a pitcher’s ability from the stretch. As a rough check, I reviewed REW during 2005-2015. Starters have an REW range of roughly -17 to -60. Relievers, who most always pitch from the stretch, have an REW of 7 to 52. Pitching from the stretch isn’t the only variable affecting REW, but I would guess it has an effect.

dbminn
Guest
dbminn

*A High ERA-FIP pitcher’s “relative” ability from the stretch.

Mr Punch
Guest
Mr Punch

“Outside the pitcher’s complete control” is not exactly the same as “outside the pitcher’s control.” The pitcher plays defense, has a good deal to do with sequencing, and as for luck, that’s a slippery concept as used here. This is a useful piece, especially the part on defense. It does seem to me to assume that FIP is “right” as opposed to “a better approximation” – what we need, beyond the general decomposition of variance, is analysis of more specific cases/dynamics as you suggest re the Rockies.

Nathaniel Dawson
Guest
Nathaniel Dawson

In addition to the excellent comments by Paul above, I would add that treating sequencing as unrelated to luck probably misses the boat. I think it’s fair to assume that a pitcher has some control over sequencing (as does the batting order), I also think it’s fair to assume that there’s still a lot out of his control. Perhaps I’m just reading you wrong, but you seem to be making the point that the effects of sequencing are outside of the realm of luck “Luck is part of the other half of the gap between ERA and FIP, but is… Read more »

McKay
Guest
McKay

Great article!

I notice you show correlation coefficients as, say, .53, then say that roughly 50% of variance is explained by predictors. I think you’re confusing r and r-squared. If your coefficient (r) is .53, then your predictors explain 28% of the vairance. I’d love it if you can clarify here, since you may have been rightly reporting r-squared but calling it a coefficient. Seems a big difference in the take away message … Is half of the discrepancy unaccounted for or is it 72%?