Trying to Improve fWAR: Part 1

FanGraphs Wins Above Replacement is considered by many in the sabermetric community be the holy grail of WAR.  And, even though I’m writing a piece that is critical of fWAR, FanGraphs is still the first website I go to when I want to get a basic understanding of a specific player or team’s value.  Don’t view this article as an attack on fWAR or FanGraphs, both of which I use frequently; instead, consider this article as constructive criticism.

fWAR, specifically for pitchers, is riddled with minor problems that together make the metric less valuable.  In Part 1 of the series, we’re going to look at a hotly debated issue regarding fWAR that has been brought up by other readers before: the fWAR park factors.

According to the FanGraphs glossary, a basic runs park factor is used when calculating fWAR.  Because FIP models ERA, using runs park factors for FIP shouldn’t be a problem.

Unfortunately, this idea simply isn’t true.  The inputs of FIP, HR/9, BB/9, and K/9, only include about 30% of plate appearances.  Some ballparks (Citi Field for example), inflate HR/9 and FIP despite suppressing runs in general.  If Pitcher fWAR is based on FIP, FIP park factors, not runs park factors, must be used.  Below is a table comparing runs and FIP park factors for different teams/ballparks, with FIP park factor equaling ((13*HRPF)+(3*BBPF)-(2*SOPF))/(14), with all of the data coming from the FanGraphs park factors.

Season Team Basic FIP Difference
2014 Reds 101 112 -11
2014 Brewers 103 111 -8
2014 White Sox 104 111 -7
2014 Yankees 103 110 -7
2014 Mets 95 102 -7
2014 Phillies 100 106 -6
2014 Dodgers 96 101 -5
2014 Orioles 102 107 -5
2014 Blue Jays 103 108 -5
2014 Astros 100 104 -4
2014 Indians 97 100 -3
2014 Padres 94 96 -2
2014 Mariners 97 97 0
2014 Rays 95 95 0
2014 Rangers 106 106 0
2014 Braves 99 99 0
2014 Diamondbacks 104 103 1
2014 Cubs 102 101 1
2014 Rockies 117 116 1
2014 Tigers 102 101 2
2014 Nationals 100 97 3
2014 Angels 95 92 3
2014 Athletics 97 93 4
2014 Cardinals 98 94 4
2014 Giants 93 88 5
2014 Royals 101 96 5
2014 Twins 101 95 6
2014 Pirates 97 89 8
2014 Red Sox 104 96 8
2014 Marlins 101 90 11

In addition, the standard difference between the Basic and FIP park factors was a staggering 5.5.  Clearly, using runs park factors on FIP significantly benefits and hurts certain teams’ Pitcher fWAR.

While the Marlins, Red Sox, Pirates, Twins, and Royals benefit from park factors that overestimate their ballpark’s FIP-inflating ability, the Reds, Brewers, White Sox, Yankees and Mets experience the opposite effect, falsely increasing/decreasing these teams’  Pitcher fWAR.

Looking at the team pitching leaderboards, the effect of this mistake is pronounced on several teams’ fWAR.  For example, the Mets, despite ranking 9th in the National League in FIP while playing in a ballpark that inflates FIP by 2%, rank dead last in the National League in Pitcher fWAR.  Similarly, the Red Sox rank 5th in the AL in Pitcher fWAR despite ranking 10th in the AL in FIP and playing in a ballpark that suppresses FIP by 4%.

Using FIP park factors instead of runs park factors is a simple change that would vastly improve the accuracy of Pitcher fWAR.  In the next segment of “Trying to Improve fWAR”, I’ll examine the league adjustments (or lack thereof) in both Position Player and Pitcher fWAR.





Founder of NothingButNumbers.com

43 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Matthew Tobin
7 years ago

Good Stuff. I’ve noticed this before but never really know what to attribute this to. Really interesting that is might be park factors.

RMR
7 years ago

I wonder if this is why the Reds routinely have seasonal projections of RA that miss the mark on the high side? I had always chalked that up to an expected regression of team defensive performance that simply didn’t happen. But perhaps it’s also not giving their pitchers enough credit.

Wobatus
7 years ago

Great work, Noah.

Albert Einstein
7 years ago

why didn’t i think of this

jserline
7 years ago

The Tigers should be 1, unless it’s a rounding thing, right?

scott
7 years ago
Reply to  Noah Baron

If 100.5 rounds to 101, then 1.5 must round down. This will avoid this issue. Evens round up, odds round down. Works just as well the other way too.

PPP
7 years ago

This is interesting stuff.

I’m thinking now about what FIP and WAR are supposed to indicate. FIP is supposed to be a statistic that describes how a pitcher has done, and is by design park neutral. Translating that into WAR would then be where park factors are added in to decide the pitcher’s performance value. Yet one problem with FIP is that by being park neutral, it doesn’t account for the possibility that its inputs may be affected by the park. So if Fenway park, for example, increases the total amount of batters that will get on base relative to a different one, the FIP will be different.

Looking at an easy example: Let’s pretend that (apparently superhuman) pitcher A carries a 50 K%, 0 BB%, 0 HR%, and 0 HBP%. All of his innings are pitched at these exact rates. He pitches 1 inning and gives up 1 hit, resulting in 2 K. His FIP would -4 + Constant. Let’s say he pitches another inning and gives up 3 hits, resulting in 3 K. His FIP would be -6 + Constant. In both cases, his rate stats are the same yet something like his ballpark could end up affecting his FIP.

Getting back to park factors translating from FIP to WAR, relying solely on the FIP factors might be less accurate than: adjusting the pitcher’s FIP to match his ballpark (i.e. adjust the K/BB/HR/HBP inputs to match his ballpark, such as a park increasing batters that reach base by 5% so reduce all of those inputs by 5.25%); and then translate FIP into WAR based on how you stated above.

PPP
7 years ago
Reply to  Noah Baron

What I meant by park neutral is probably not how it is generally. As you said, the inputs of FIP are affected by the park. I meant that the statistic itself does not account for these differences in producing a statistic that can be compared across parks without being affected by them. Poor wording on my behalf.

What I was trying to say in the second half is that the parks can actually affect babip, which affects the total amount of batters faced. Even if a pitcher maintains the exact same rate stats (K%/BB%/HR%/HBP%), their raw totals (K/BB/HR/HBP) will be affected by their babip, and, thus, by their park to some extent. Since FIP deals with raw totals instead of rate stats, and since different raw totals result in different FIP’s despite equivalent rates (i.e. 100 K/50 BB/10 HR/10 HBP produces a different FIP than 50 K/25 BB/5 HR/ 5HBP), reducing the raw totals by how much a park factor adjusts the batters that reach base might be more accurate.

It’s a much more minor quibble than the one you pointed out in this article, and it’s probably a bit pedantic of me to think so much about. In fact, it’s probably more of a critique of FIP than of your evaluation. Regardless, I think it’s something that can be improved upon.

Senor_Met
7 years ago

Oh my god. Thank you so much for this. I’ve been trying to figure out why fWAR hates Mets pitchers so much – I assumed it was something wonky in the park factors, but I didn’t know enough about it to research it. I KNEW Zack Wheeler and Jon Niese weren’t below average.

I still want to know why the Mets bullpen is a full 2 wins worse than the Astros (and in the negatives) when they have the same FIP-.

Thanks, Comcast
7 years ago
Reply to  Noah Baron

Does that mean FIP ballpark factors take popups into account as well? The amount of foul territory seems pretty significant in that regard.

Psy Jung
7 years ago

Another couple big lacunae are handedness and batted ball profile for batters, I hope you look at those!

I have a question, though: if there are effects like those above, let’s say you have a player that is worth 2 WAR in a neutral park, but he was acquired by a team whose stadium makes him a 3 WAR player because of his tendencies. Should we aim to fix WAR by adding in the unaccounted factors until players are suspended in the grey jelly of neutral true talent, or should WAR reflect the fact that some players are better tailored for some environments and teams know how to exploit this? If park factors aren’t just blanket phenomena that affect all players equally, should there be two WARs? one that’s totally context neutral and one that reflects the value that is created through the interaction of player and environment that is specific to that player?

for example Pablo Sandoval. I think there was an Eno Sarris piece recently about him being elite at fouling off the ball in two strike counts. If you play him in a park with a huge foul area, how much more does that affect him compared to the player with the lowest foul ball rate? here’s a hypothetical: he plays for a team with zero foul area, his teammate is the aforementioned player with the lowest foul ball rate, and Sandoval’s offense is better than his teammate’s by exactly the same amount that his foul balls never getting caught improve it, while having exactly the same defensive value. By the current neutralizing teleology of WAR you’d isolate Sandoval’s production from its context and say that he had the same value as his teammate, which seems problematic.

Does this make any sense? I mean it’s probably not a big deal anyway because if these things exist they’re probably marginal.

Erik
7 years ago
Reply to  Psy Jung

This is the reason why they use the runs park factors to begin with. WAR is meant to put a value on what actually happened, not on what would happen in a context neutral state.

A player who hits 30 homers contributes 30 homers. When GM’s are putting a value on two players of equal talent, who both would hit 25 homers in a context neutral stadium, the GM will pay more for and choose the player who can turn his true talent 25 homers into an actual 30 homers in his stadium due to extreme home park effects.

The problems illustrated with FIP in this article are misattributing poor roster construction for problems with WAR. WAR does not underrate high fly ball pitchers playing in homer happy stadiums it is baseball itself that is rough on these players.

The bottom line is that WAR is not meant to be predictive of anything. What it is trying to do is act as an accounting tool to simply describe what has happened. Changes to the WAR formula should be things that bring the final numbers closer to lining up with the final totals at seasons end.

The grey jelly of neutral true talent
7 years ago

I heard you were talking about me?

Psy Jung
7 years ago

hey man don’t give me none of that neutralizing teleology

Tangotiger
7 years ago

I agree with you. This should be changed immediately.

In addition, while I use IP in the denominator, it would be more precise to use PA instead, namely PA/4.3.

Daniel
7 years ago
Reply to  Tangotiger

Yeah, I hope this gets changed ASAP.

Paul Kasińskimember
7 years ago
Reply to  Noah Baron

JDOL

Lanidrac
7 years ago

FIP is highly flawed, anyway. Why not create a new pitching metric to replace FIP that includes GroundBall%, LineDrive%, and InfieldFly%? Then you can go ahead and use the runs park factor.

Sylvan
7 years ago
Reply to  Lanidrac

…or just use RA9.

Oscar
7 years ago
Reply to  Lanidrac

SIERA accounts for batted ball types.

Tangotiger
7 years ago
Reply to  Lanidrac

FIP has no flaw. It properly combines the statistics it is interested in.

You may be interested in Batted Ball FIP (bbFIP). That also properly combines the statistics it it interested in.

OBP has no flaw. It also properly combines the stats it’s interested in.

The flaw is the user who tries to use the metrics in unintended ways.

Tangotiger
7 years ago
Reply to  Noah Baron

FIP is making a conscious decision to combine statistics that do not involve fielders at all, or any involvement of the fielders is at a minimum. So, HR that just clear the wall, or could be caught, or a catcher saving a strike, etc. That’s why HR, HB, BB, and SO are included.

In some version, infield flies are also included.

But you CANNOT include any other batted balls. It goes against the idea of the “I” in FIP.

That Fangraphs has created a WAR version that uses FIP has NOTHING to do with FIP. FIP is FIP. It’s what it is.

If we want to reweight OBP so that walks count less and HR more, then we don’t rework OBP: we instead create a new metric called wOBA.

And that’s why I have Batted Ball FIP to address the issue of batted balls.

If someone wants to further make it more complicated by adding different adjustment factors based on the percentage of knuckleballs thrown, then call it FIPknuck.

None of that has any bearing on FIP itself.

David Appelmanmember
7 years ago

We’ll definitely incorporate this change in some changes we make to fWAR this off-season. Though we may calculate the FIP park factor from scratch as opposed to using the already yearly regressed components. I’ll have to take a look and see what kind of difference it makes.

We may also make the change Tangotiger suggests using PA.

Just to make note, we definitely wanted to see what the community said about this approach as this issue has occasionally come up and I think this quote from the post more or less summed up our response: “According to the FanGraphs glossary, a basic runs park factor is used when calculating fWAR. Because FIP models ERA, using runs park factors for FIP shouldn’t be a problem.”

Anyway, I’m glad this post was well received and thanks Noah as this will help improve our WAR formula. I’ve actually seen your second post in the hopper and we will publish that too. I am curious to hear feedback on that as well and will hold my own comments on it until later.

Erik
7 years ago
Reply to  David Appelman

Am I missing something?

WAR is designed to measure value to the team. It’s not meant to measure value to the average team, or theoretical context neutral value. We seem to have made a decision in the past to not have the stat be an estimation of true talent level, nor on the other side of things have we wanted the stat to incorporate sequencing or ‘clutch’ aspects of players performance.

That said the reason for using FIP is to factor out the defensive aspect of run prevention for the purpose of allotting that value to defenders. The reason we use park factors in other stats is to measure out true talent performance. Its place in WAR is simply to provide context on the value of a run given the games played. Contributing 3 runs in Coors Field is not as valuable as contributing 3 runs in PetCo.

Tango’s suggestion of using PA’s makes some sense, since the distinction gives credit for caught stealing, and doubles plays to the defense – though pitchers do deserve some credit for that as well (so the change doesn’t seem like too big of a deal).

Moving to FIP based park factors will probably make WAR lineup a bit more with true talent, but then we’ll end up talking about how WAR at the team level is overrating teams with a lot of fly ball pitchers in fly ball parks.

We should be crediting pitchers for all batters faced, giving full credit for strikeouts and walks (until the days we include pitch framing) as well as infield pop ups and hit by pitch. They should be losing credit for every home run (not fly ball), and the value of each home run should be weighted based on the BABIP park factor for the stadium (i.e. the likelihood of additional baserunners) and the pitchers strikeout rate. The run output value should then be scaled based on the runs park factor.

David Appelmanmember
7 years ago

I ended up running just FIP park factors, (not the weighted average of the various park factors that could be involved in FIP) and you actually get a much more compressed numbers. I think this most likely do to FIP having some compression effects on the high / low end, and with HRs perhaps being weighted too heavily in the weighted average:

        PF      FIP_PF  Weighted FIP PF
Rockies	1.17	1.08	1.15
Yankees	1.03	1.04	1.10
Reds	1.01	1.03	1.12
Cubs	1.02	1.03	1.01
Wht Sox	1.04	1.02	1.11
Rangers	1.06	1.02	1.06
Orioles	1.02	1.02	1.07
Phill	1.00	1.02	1.06
Diamo	1.04	1.02	1.03
Brewers	1.03	1.02	1.10
BluJays	1.03	1.01	1.08
Astros	1.00	1.01	1.05
Mets	0.95	1.00	1.01
Angels	0.95	1.00	0.92
Tigers	1.02	1.00	1.00
Nats	1.00	0.99	0.97
Rays	0.95	0.99	0.95
Royals	1.01	0.99	0.95
Indians	0.97	0.99	1.00
Red Sox	1.04	0.99	0.96
A's	0.97	0.99	0.93
Twins	1.01	0.99	0.95
Braves	0.99	0.99	0.98
Padres	0.94    0.99	0.96
Dodgers	0.96	0.99	1.01
Pirates	0.90	0.98	0.89
Cards	0.98	0.98	0.94
Marine	0.97	0.98	0.98
Marlins	1.01	0.97	0.90
Giants	0.93	0.96	0.88
Tangotiger
7 years ago
Reply to  David Appelman

Terrific.

Coors has a huge BIP park factor, which is why you might see a big difference between the runs factor and the FIP factor.

Tangotiger
7 years ago

This is wrong:

((13*HRPF)+(3*BBPF)-(2*SOPF))/(14)

You can’t wait the factors this way. You have to weight them by the coeffient times the frequency of those events.

Imagine for example that this is 1908 and there are almost no HR in the league. We obviously would want that to have a tiny impact overall.

Tangotiger
7 years ago
Reply to  Noah Baron

I agree, it’s a great point, and that something so obvious (now that it’s pointed out) took this many years to be highlighted shows that we need more careful thinkers out there.

Cliff Otto
7 years ago

As a matter of curiosity in regard to infield fly balls, how do you separate routine catches from !catches?

What do you do with fly balls that could be caught routinely by either an infielder or an outfielder? You sometimes see three players who could make the catch.

Is there a way to separate outfielders who shallow or deep, possibly effecting the number of infielder chances?

Although it is possible that there won’t be a major bearing on the matter, I think factors like I mentioned above need to be considered.