Trying to Improve fWAR: Part 1

FanGraphs Wins Above Replacement is considered by many in the sabermetric community be the holy grail of WAR.  And, even though I’m writing a piece that is critical of fWAR, FanGraphs is still the first website I go to when I want to get a basic understanding of a specific player or team’s value.  Don’t view this article as an attack on fWAR or FanGraphs, both of which I use frequently; instead, consider this article as constructive criticism.

fWAR, specifically for pitchers, is riddled with minor problems that together make the metric less valuable.  In Part 1 of the series, we’re going to look at a hotly debated issue regarding fWAR that has been brought up by other readers before: the fWAR park factors.

According to the FanGraphs glossary, a basic runs park factor is used when calculating fWAR.  Because FIP models ERA, using runs park factors for FIP shouldn’t be a problem.

Unfortunately, this idea simply isn’t true.  The inputs of FIP, HR/9, BB/9, and K/9, only include about 30% of plate appearances.  Some ballparks (Citi Field for example), inflate HR/9 and FIP despite suppressing runs in general.  If Pitcher fWAR is based on FIP, FIP park factors, not runs park factors, must be used.  Below is a table comparing runs and FIP park factors for different teams/ballparks, with FIP park factor equaling ((13*HRPF)+(3*BBPF)-(2*SOPF))/(14), with all of the data coming from the FanGraphs park factors.

Season Team Basic FIP Difference
2014 Reds 101 112 -11
2014 Brewers 103 111 -8
2014 White Sox 104 111 -7
2014 Yankees 103 110 -7
2014 Mets 95 102 -7
2014 Phillies 100 106 -6
2014 Dodgers 96 101 -5
2014 Orioles 102 107 -5
2014 Blue Jays 103 108 -5
2014 Astros 100 104 -4
2014 Indians 97 100 -3
2014 Padres 94 96 -2
2014 Mariners 97 97 0
2014 Rays 95 95 0
2014 Rangers 106 106 0
2014 Braves 99 99 0
2014 Diamondbacks 104 103 1
2014 Cubs 102 101 1
2014 Rockies 117 116 1
2014 Tigers 102 101 2
2014 Nationals 100 97 3
2014 Angels 95 92 3
2014 Athletics 97 93 4
2014 Cardinals 98 94 4
2014 Giants 93 88 5
2014 Royals 101 96 5
2014 Twins 101 95 6
2014 Pirates 97 89 8
2014 Red Sox 104 96 8
2014 Marlins 101 90 11

In addition, the standard difference between the Basic and FIP park factors was a staggering 5.5.  Clearly, using runs park factors on FIP significantly benefits and hurts certain teams’ Pitcher fWAR.

While the Marlins, Red Sox, Pirates, Twins, and Royals benefit from park factors that overestimate their ballpark’s FIP-inflating ability, the Reds, Brewers, White Sox, Yankees and Mets experience the opposite effect, falsely increasing/decreasing these teams’  Pitcher fWAR.

Looking at the team pitching leaderboards, the effect of this mistake is pronounced on several teams’ fWAR.  For example, the Mets, despite ranking 9th in the National League in FIP while playing in a ballpark that inflates FIP by 2%, rank dead last in the National League in Pitcher fWAR.  Similarly, the Red Sox rank 5th in the AL in Pitcher fWAR despite ranking 10th in the AL in FIP and playing in a ballpark that suppresses FIP by 4%.

Using FIP park factors instead of runs park factors is a simple change that would vastly improve the accuracy of Pitcher fWAR.  In the next segment of “Trying to Improve fWAR”, I’ll examine the league adjustments (or lack thereof) in both Position Player and Pitcher fWAR.





Founder of NothingButNumbers.com

newest oldest most voted
Matthew Tobin
Member

Good Stuff. I’ve noticed this before but never really know what to attribute this to. Really interesting that is might be park factors.

RMR
Guest
RMR

I wonder if this is why the Reds routinely have seasonal projections of RA that miss the mark on the high side? I had always chalked that up to an expected regression of team defensive performance that simply didn’t happen. But perhaps it’s also not giving their pitchers enough credit.

Wobatus
Guest
Wobatus

Great work, Noah.

Albert Einstein
Guest
Albert Einstein

why didn’t i think of this

jserline
Guest
jserline

The Tigers should be 1, unless it’s a rounding thing, right?

PPP
Guest
PPP

This is interesting stuff. I’m thinking now about what FIP and WAR are supposed to indicate. FIP is supposed to be a statistic that describes how a pitcher has done, and is by design park neutral. Translating that into WAR would then be where park factors are added in to decide the pitcher’s performance value. Yet one problem with FIP is that by being park neutral, it doesn’t account for the possibility that its inputs may be affected by the park. So if Fenway park, for example, increases the total amount of batters that will get on base relative to… Read more »

Senor_Met
Member

Oh my god. Thank you so much for this. I’ve been trying to figure out why fWAR hates Mets pitchers so much – I assumed it was something wonky in the park factors, but I didn’t know enough about it to research it. I KNEW Zack Wheeler and Jon Niese weren’t below average.

I still want to know why the Mets bullpen is a full 2 wins worse than the Astros (and in the negatives) when they have the same FIP-.

Psy Jung
Guest
Psy Jung

Another couple big lacunae are handedness and batted ball profile for batters, I hope you look at those! I have a question, though: if there are effects like those above, let’s say you have a player that is worth 2 WAR in a neutral park, but he was acquired by a team whose stadium makes him a 3 WAR player because of his tendencies. Should we aim to fix WAR by adding in the unaccounted factors until players are suspended in the grey jelly of neutral true talent, or should WAR reflect the fact that some players are better tailored… Read more »

Erik
Guest

This is the reason why they use the runs park factors to begin with. WAR is meant to put a value on what actually happened, not on what would happen in a context neutral state. A player who hits 30 homers contributes 30 homers. When GM’s are putting a value on two players of equal talent, who both would hit 25 homers in a context neutral stadium, the GM will pay more for and choose the player who can turn his true talent 25 homers into an actual 30 homers in his stadium due to extreme home park effects. The… Read more »

The grey jelly of neutral true talent
Guest
The grey jelly of neutral true talent

I heard you were talking about me?

Psy Jung
Guest
Psy Jung

hey man don’t give me none of that neutralizing teleology

Tangotiger
Guest
Tangotiger

I agree with you. This should be changed immediately.

In addition, while I use IP in the denominator, it would be more precise to use PA instead, namely PA/4.3.

Daniel
Member
Daniel

Yeah, I hope this gets changed ASAP.

Lanidrac
Guest
Lanidrac

FIP is highly flawed, anyway. Why not create a new pitching metric to replace FIP that includes GroundBall%, LineDrive%, and InfieldFly%? Then you can go ahead and use the runs park factor.

Sylvan
Guest

…or just use RA9.

Oscar
Guest
Oscar

SIERA accounts for batted ball types.

Tangotiger
Guest
Tangotiger

FIP has no flaw. It properly combines the statistics it is interested in.

You may be interested in Batted Ball FIP (bbFIP). That also properly combines the statistics it it interested in.

OBP has no flaw. It also properly combines the stats it’s interested in.

The flaw is the user who tries to use the metrics in unintended ways.

David Appelman
Admin
Member

We’ll definitely incorporate this change in some changes we make to fWAR this off-season. Though we may calculate the FIP park factor from scratch as opposed to using the already yearly regressed components. I’ll have to take a look and see what kind of difference it makes. We may also make the change Tangotiger suggests using PA. Just to make note, we definitely wanted to see what the community said about this approach as this issue has occasionally come up and I think this quote from the post more or less summed up our response: “According to the FanGraphs glossary,… Read more »

Erik
Guest

Am I missing something? WAR is designed to measure value to the team. It’s not meant to measure value to the average team, or theoretical context neutral value. We seem to have made a decision in the past to not have the stat be an estimation of true talent level, nor on the other side of things have we wanted the stat to incorporate sequencing or ‘clutch’ aspects of players performance. That said the reason for using FIP is to factor out the defensive aspect of run prevention for the purpose of allotting that value to defenders. The reason we… Read more »

David Appelman
Admin
Member

I ended up running just FIP park factors, (not the weighted average of the various park factors that could be involved in FIP) and you actually get a much more compressed numbers. I think this most likely do to FIP having some compression effects on the high / low end, and with HRs perhaps being weighted too heavily in the weighted average:

        PF      FIP_PF  Weighted FIP PF
Rockies	1.17	1.08	1.15
Yankees	1.03	1.04	1.10
Reds	1.01	1.03	1.12
Cubs	1.02	1.03	1.01
Wht Sox	1.04	1.02	1.11
Rangers	1.06	1.02	1.06
Orioles	1.02	1.02	1.07
Phill	1.00	1.02	1.06
Diamo	1.04	1.02	1.03
Brewers	1.03	1.02	1.10
BluJays	1.03	1.01	1.08
Astros	1.00	1.01	1.05
Mets	0.95	1.00	1.01
Angels	0.95	1.00	0.92
Tigers	1.02	1.00	1.00
Nats	1.00	0.99	0.97
Rays	0.95	0.99	0.95
Royals	1.01	0.99	0.95
Indians	0.97	0.99	1.00
Red Sox	1.04	0.99	0.96
A's	0.97	0.99	0.93
Twins	1.01	0.99	0.95
Braves	0.99	0.99	0.98
Padres	0.94    0.99	0.96
Dodgers	0.96	0.99	1.01
Pirates	0.90	0.98	0.89
Cards	0.98	0.98	0.94
Marine	0.97	0.98	0.98
Marlins	1.01	0.97	0.90
Giants	0.93	0.96	0.88
Tangotiger
Guest
Tangotiger

Terrific.

Coors has a huge BIP park factor, which is why you might see a big difference between the runs factor and the FIP factor.

Tangotiger
Guest
Tangotiger

This is wrong:

((13*HRPF)+(3*BBPF)-(2*SOPF))/(14)

You can’t wait the factors this way. You have to weight them by the coeffient times the frequency of those events.

Imagine for example that this is 1908 and there are almost no HR in the league. We obviously would want that to have a tiny impact overall.

Cliff Otto
Guest
Cliff Otto

As a matter of curiosity in regard to infield fly balls, how do you separate routine catches from !catches?

What do you do with fly balls that could be caught routinely by either an infielder or an outfielder? You sometimes see three players who could make the catch.

Is there a way to separate outfielders who shallow or deep, possibly effecting the number of infielder chances?

Although it is possible that there won’t be a major bearing on the matter, I think factors like I mentioned above need to be considered.