Trying to Improve fWAR Part 2: League and Divisional Factors

In Part 1 of the “Trying to Improve fWAR” series, we focused on how using runs park factors for a FIP-based WAR leads to problems when calculating fWAR, and suggested the use of FIP park factors instead.  Today we’ll analyze a different yet equally important problem with the current construction of FanGraphs Wins Above Replacement for both position players and pitchers: league adjustments. When calculating WAR, the reason we adjust for league is simple; the two leagues aren’t equal.  The American League has been the superior league for some time now, and considering that all teams play about 88% of their games within their league, the relative strength of the leagues is relevant when trying to put a value on individual players.  If a player moved from the American League, a stronger league, to the National League, a weaker league, we’d expect the player’s basic numbers to improve; yet, if we properly adjust for quality of league when calculating WAR, his WAR shouldn’t change significantly by moving into a weaker league.

The adjustments that FanGraphs makes for strength of league are unclear.  The glossary entry “What is WAR?” and the links within it don’t seem to reference adjusting for the strength of a player’s league/division at all.  The only league adjustment is within position player fWAR, and is described as “a small correction to make it so that each league’s runs above average balances out to zero”.  Not exactly a major adjustment. Rather than evaluating FanGraphs’ methods of adjusting for league, let’s instead look at the how the two leagues compared in fWAR for both pitchers and position players in 2014:

League

Position Player fWAR Pitcher fWAR Total fWAR
AL 285.7 242.3 528
NL 284.3 187.7 472
AL fWAR / League Average 1.002 1.127 1.056
NL fWAR / League Average .998 .873

.944

 

 

 

 

 

 

Interestingly, AL pitchers seem to get a much greater advantage than AL position players from playing in a superior league.  Yes, the AL does have a DH, but the effect of having a DH should be in the form of the AL replacement level RA/9 being higher than the NL replacement level RA/9.  Having a DH (and hence a higher run environment) does not mean that the league should have more pitching fWAR.  Essentially, somewhere in the calculation and implementation of fWAR, the WAR of AL pitchers is being inflated by around 13% and the WAR of NL pitchers is being deflated by the same amount. Meanwhile, AL position players don’t benefit at all from playing in a superior league.  In order to accommodate for league strength, the entire American League should benefit from playing in the stronger league, not just the pitchers.  In order to find out what the league adjustment should be (at least for the 2015 season), let’s look at each league’s interleague performance since 2013:

League Wins Losses Interleague WP% Regressed WP%
AL 317 283 0.528 0.5255
NL 283 317 0.472 0.4745

The “Regressed Winning Percentage” is simply the league’s interleague Winning Percentage regressed to the mean by a factor of .1, meaning that 90% of the league’s interleague WP% is assumed to be skill.  Each league’s interleague winning percentage is regressed slightly to ensure that we aren’t overestimating the differences between the two leagues.  Part of the reason we regress each league’s interleague winning percentage is because the interleague system is admittedly not perfect; while NL teams believe that the AL has an inherent advantage because of their everyday DH, AL teams complain about having pitchers who can’t bunt and a managerial style that is strategically difficult for their managers.  While both sides have valid points, interleague games probably don’t hurt one side significantly more than the other, meaning that the vast amount of data that comes from interleague games is reliable as long as it is properly regressed.

Just knowing each league’s regressed interleague winning percentage, however, is not enough.  We also need to know the percent of games each league plays within its own league.  Why?  The more games the league plays against the other league, the less playing in a superior league matters; the only reason we have to adjust for strength of league in the first place is because of the disparity in competition between the leagues. In a 162-game season, a team plays exactly 20 games against interleague opponents, meaning that 142 of 162 games, or 87.7% of a team’s schedule, is intra-league.  Therefore, in order to find each league’s multiplier, the following equation is used:

League Multiplier = 2 * ((.877 * Regressed WP%) + ((1-.877) * Opponent Regressed WP%))

In this calculation, the “Opponent Regressed WP%” is simply the opposing league’s Regressed WP%.  This is incorporated into the formula because each league plays 12.3% of its games (20 games) against the other league.  Without further ado, here are the league multipliers:

League Regressed WP% Percent of Games Intra-league Interleague Opponent Regressed WP%

League Multiplier

AL 0.5255 0.877 0.4745 1.0384
NL 0.4745 0.877 0.5283 0.9616

As expected, the American League comes out as the stronger league, albeit by a smaller margin than its advantage in fWAR (remember, the AL’s league multiplier in fWAR was 1.056).  Still, there are other adjustments that can be made besides adjusting for league. In the same way that the superiority of the American League is no secret, the fact that all divisions are not created equal is relatively obvious to most baseball fans.  The AL East has long been considered the best division in baseball, and their inter-division record backs up that reputation; they have a .530 inter-division winning percentage over the last two seasons (only including games in their own league), best in the American League.  Using the same process we used to calculate the league multipliers, division multipliers were calculated as shown below, with the data from the 2013-2014 seasons:

Division W L Inter-division WP% Regressed WP% Percent of Non- Interleague Games Intra-division Inter-division Opponent Regressed WP% Division Multiplier
AL East 350 311 0.530 0.527 0.535 0.487 1.041
AL Central 322 338 0.488 0.489 0.535 0.505 0.983
AL West 319 342 0.483 0.484 0.535 0.508 0.976
NL East 318 342 0.482 0.484 0.535 0.508 0.975
NL Central 350 310 0.530 0.527 0.535 0.486 1.042
NL West 322 338 0.488 0.489 0.535 0.505 0.983

One difference between this calculation and the league multiplier calculation was that, in this calculation, not all games were used when determining what percent of a division’s games were intra-division; because we already adjusted for league earlier, the 20 interleague games on each team’s schedule were ignored from the calculation.  The .535 figure in column 6 is simply the number of games each team plays against its own division, 76, divided by the number of non-interleague games each team plays, 142.  In addition, the “Interdivision Opponent Regressed WP%” is the average opponent each division faces while playing out of division in non-interleague games.  The AL East, for example, plays the AL Central and AL West in its remaining intra-league games, so the .487 inter-division opponent regressed WP% is calculated by taking a simple average of the AL Central’s Regressed WP%, .489, and the AL West’s Regressed WP%, .484.

Now that we have both divisional and league multipliers, we can derive each division’s total (observed) multiplier by simply multiplying the two:

Division Division Multiplier League Multiplier Total Multiplier
AL East 1.0408 1.0384 1.081
AL Central 0.9833 1.0384 1.021
AL West 0.9760 1.0384 1.013
NL East 0.9749 0.9616 0.937
NL Central 1.0419 0.9616 1.002
NL West 0.9833 0.9616 0.945

How do these multipliers, which were fairly easy to calculate, compare with the multipliers implied in FanGraphs’ WAR calculations?  Below, the multipliers are compared in bar graph form:

L and D 1

 

As you can see, the current construction of fWAR artificially helps certain divisions while hurting others.  Let’s get a closer look at the problem by graphing how much fWAR inflates each division’s pitchers and position players relative to the multipliers we just calculated:

L and D 4

 

Upon viewing the chart, a theme emerges: Pitching WAR at FanGraphs is in need of serious repair.  Pitching fWAR dramatically overvalues the American League.  All three American League divisions have Pitching fWAR Multipliers at least 4.5% higher than they should be, while each Pitching fWAR Multipliers for the National League are all at least 6% lower than they should be.

Is this just a random aberration for 2014?  Probably not; in 2013, the American League’s Pitching fWAR Multiplier was 1.095, not much lower than 2014’s 1.127 (and nowhere near the 1.038 value we got).  For whatever reason, Pitching fWAR overvalues American League pitchers and undervalues their National League counterparts.  The strongest National League division, the NL Central, suffers the most from this calculation error, while the weaker American League divisions (the AL Central and AL West) experience the greatest benefit.  Fans of the Reds and Brewers in particular should take solace in the fact that their teams were hurt the most by not only the errors discussed here but also the park factor miscalculation discussed in Part 1 (hint: fWAR seriously undervalues Cueto).

As the chart shows, position player fWAR overvalues the National League, albeit to a lesser extent.  Position player fWAR suffers an almost entirely different problem then Pitcher fWAR: Unlike pitcher fWAR, which seems to over-adjust for league, position player fWAR doesn’t adjust for strength of league and division at all.  This inflates the fWAR of players/teams in weaker divisions – the NL East and NL West, for example – while deflating the fWAR of players in stronger divisions, like the AL East.

While the issue with position player fWAR is more obvious – a lack of league and divisional factors – the problem with pitching fWAR is less clear.  Perhaps part of the problem is how replacement level is calculated.  I am not familiar enough with the FanGraphs’ process of calculating WAR to know if there is a clear, fixable mistake.  Either way, hopefully this article will inspire change in the way that fWAR is calculated for both pitchers and position players, with the changes to position player fWAR being much simpler to incorporate.





Founder of NothingButNumbers.com

newest oldest most voted
tz
Guest
tz

Good stuff again Noah. We need to kick the tires on areas like the league adjustment in fWAR that might contain some systematic bias or material inaccuracy. Another approach to reflect the league/strength of schedule impact on WAR components might look something like this: – For each team, adjust the hitters’ wOBA by the weighted average of the wRC+ allowed by the pitching staffs that team had faced. – Also for each team, adjust the pitchers’ FIP by the weighted average of the bbFIP of the hitters on the teams they had faced (as a % of the MLB average… Read more »

Anon
Guest
Anon

Some nice food for thought. Hopefully, it makes some of the decision makers consider an update to WAR.

Even knowing the formula/structure of WAR has flaws, my biggest complaint is the repeated advice (from Dave Cameron among others) that WAR is not accurate to 0.1 value. If WAR shouldn’t be used to that precision, Fangraphs shouldn’t display it. It wouldn’t be difficult to change the displayed precision to 0.5 or 1.0 (whatever is accurate enough for general use).

Jared Cross
Member
Member

I don’t really agree. I don’t like “sig figs” as a way of trying to capture uncertainty also I think people are too vague about what they mean by “accuracy” for the term to have any meaning. I think ideally, we’d be very clear about exactly what uncertainty we’re talking about and use a +/- instead of limiting the number of decimal places.

Anon
Guest
Anon

I agree that showing a confidence interval (or error bars or whatever you want to call it) is a better solution than reducing the displayed precision. However, I highly doubt that Fangraphs would display this information. A ‘sig figs’ type conceptual approach has a much better chance of actually being publically implemented.

AC_Butcha_AC
Member

Hey Noah, I appreciate your work here. I actually played around with the FIP-park factors myself a coubple of years ago so we are on the same page in that department 🙂 Some things I wanted to adress.. I think the league average RA/9 is ALWAYS the league average FIP scaled to RA/9 in any given year. I might be mistaken but I think the scale is a qutient – not a difference. You take the quotient of all league earned rus and all league runs (ER/R) wich should come out as something close to .92. This should be the… Read more »

Matthew Cornwell
Guest
Matthew Cornwell

On a related note: BBRef has been doing AL vs. NL adjustments from the onset. Here is the interesting part. Despite the NL actually beating the AL in interleague play from 1997-2004 to a tune of close to .510-.490, BBRef still gives the AL a .520-.480 or so advantage. I wonder why nobody has noticed this? Tyinng it back in, since BBref and FG came to some common ground re: replacement level, this seems to be another area in which the two could line up pretty easily. At that point, the only major difference would be FIP vs. RA (adjusted… Read more »

Tangotiger
Guest
Tangotiger

I didn’t follow all the “multipliers” that you were doing, but you have to do it relative to replacement level. If there was no league adjustment needed, we get to 1000 WAR by doing: (.500 – .294) x 162 x 30 = 1000 In 2013-2014, the AL has a .528 record. Gives past history, the Astros moving in, that’s just about right. But, that’s a .528 record against NL teams. If we treat the .528 as a true talent against NL teams, then it would be .514 true talent against .500 teams. That’s because a .514 team facing a .486… Read more »

AC_Butcha_AC
Member

I have an article coming up here which I wrote some days ago that will be about EXACTLY this, tango.

Tangotiger
Guest
Tangotiger

Email me when it’s online. Looking forward to it!