Trying to Improve fWAR Part 2: League and Divisional Factors

In Part 1 of the “Trying to Improve fWAR” series, we focused on how using runs park factors for a FIP-based WAR leads to problems when calculating fWAR, and suggested the use of FIP park factors instead.  Today we’ll analyze a different yet equally important problem with the current construction of FanGraphs Wins Above Replacement for both position players and pitchers: league adjustments. When calculating WAR, the reason we adjust for league is simple; the two leagues aren’t equal.  The American League has been the superior league for some time now, and considering that all teams play about 88% of their games within their league, the relative strength of the leagues is relevant when trying to put a value on individual players.  If a player moved from the American League, a stronger league, to the National League, a weaker league, we’d expect the player’s basic numbers to improve; yet, if we properly adjust for quality of league when calculating WAR, his WAR shouldn’t change significantly by moving into a weaker league.

The adjustments that FanGraphs makes for strength of league are unclear.  The glossary entry “What is WAR?” and the links within it don’t seem to reference adjusting for the strength of a player’s league/division at all.  The only league adjustment is within position player fWAR, and is described as “a small correction to make it so that each league’s runs above average balances out to zero”.  Not exactly a major adjustment. Rather than evaluating FanGraphs’ methods of adjusting for league, let’s instead look at the how the two leagues compared in fWAR for both pitchers and position players in 2014:

League

Position Player fWAR Pitcher fWAR Total fWAR
AL 285.7 242.3 528
NL 284.3 187.7 472
AL fWAR / League Average 1.002 1.127 1.056
NL fWAR / League Average .998 .873

.944

 

 

 

 

 

 

Interestingly, AL pitchers seem to get a much greater advantage than AL position players from playing in a superior league.  Yes, the AL does have a DH, but the effect of having a DH should be in the form of the AL replacement level RA/9 being higher than the NL replacement level RA/9.  Having a DH (and hence a higher run environment) does not mean that the league should have more pitching fWAR.  Essentially, somewhere in the calculation and implementation of fWAR, the WAR of AL pitchers is being inflated by around 13% and the WAR of NL pitchers is being deflated by the same amount. Meanwhile, AL position players don’t benefit at all from playing in a superior league.  In order to accommodate for league strength, the entire American League should benefit from playing in the stronger league, not just the pitchers.  In order to find out what the league adjustment should be (at least for the 2015 season), let’s look at each league’s interleague performance since 2013:

League Wins Losses Interleague WP% Regressed WP%
AL 317 283 0.528 0.5255
NL 283 317 0.472 0.4745

The “Regressed Winning Percentage” is simply the league’s interleague Winning Percentage regressed to the mean by a factor of .1, meaning that 90% of the league’s interleague WP% is assumed to be skill.  Each league’s interleague winning percentage is regressed slightly to ensure that we aren’t overestimating the differences between the two leagues.  Part of the reason we regress each league’s interleague winning percentage is because the interleague system is admittedly not perfect; while NL teams believe that the AL has an inherent advantage because of their everyday DH, AL teams complain about having pitchers who can’t bunt and a managerial style that is strategically difficult for their managers.  While both sides have valid points, interleague games probably don’t hurt one side significantly more than the other, meaning that the vast amount of data that comes from interleague games is reliable as long as it is properly regressed.

Just knowing each league’s regressed interleague winning percentage, however, is not enough.  We also need to know the percent of games each league plays within its own league.  Why?  The more games the league plays against the other league, the less playing in a superior league matters; the only reason we have to adjust for strength of league in the first place is because of the disparity in competition between the leagues. In a 162-game season, a team plays exactly 20 games against interleague opponents, meaning that 142 of 162 games, or 87.7% of a team’s schedule, is intra-league.  Therefore, in order to find each league’s multiplier, the following equation is used:

League Multiplier = 2 * ((.877 * Regressed WP%) + ((1-.877) * Opponent Regressed WP%))

In this calculation, the “Opponent Regressed WP%” is simply the opposing league’s Regressed WP%.  This is incorporated into the formula because each league plays 12.3% of its games (20 games) against the other league.  Without further ado, here are the league multipliers:

League Regressed WP% Percent of Games Intra-league Interleague Opponent Regressed WP%

League Multiplier

AL 0.5255 0.877 0.4745 1.0384
NL 0.4745 0.877 0.5283 0.9616

As expected, the American League comes out as the stronger league, albeit by a smaller margin than its advantage in fWAR (remember, the AL’s league multiplier in fWAR was 1.056).  Still, there are other adjustments that can be made besides adjusting for league. In the same way that the superiority of the American League is no secret, the fact that all divisions are not created equal is relatively obvious to most baseball fans.  The AL East has long been considered the best division in baseball, and their inter-division record backs up that reputation; they have a .530 inter-division winning percentage over the last two seasons (only including games in their own league), best in the American League.  Using the same process we used to calculate the league multipliers, division multipliers were calculated as shown below, with the data from the 2013-2014 seasons:

Division W L Inter-division WP% Regressed WP% Percent of Non- Interleague Games Intra-division Inter-division Opponent Regressed WP% Division Multiplier
AL East 350 311 0.530 0.527 0.535 0.487 1.041
AL Central 322 338 0.488 0.489 0.535 0.505 0.983
AL West 319 342 0.483 0.484 0.535 0.508 0.976
NL East 318 342 0.482 0.484 0.535 0.508 0.975
NL Central 350 310 0.530 0.527 0.535 0.486 1.042
NL West 322 338 0.488 0.489 0.535 0.505 0.983

One difference between this calculation and the league multiplier calculation was that, in this calculation, not all games were used when determining what percent of a division’s games were intra-division; because we already adjusted for league earlier, the 20 interleague games on each team’s schedule were ignored from the calculation.  The .535 figure in column 6 is simply the number of games each team plays against its own division, 76, divided by the number of non-interleague games each team plays, 142.  In addition, the “Interdivision Opponent Regressed WP%” is the average opponent each division faces while playing out of division in non-interleague games.  The AL East, for example, plays the AL Central and AL West in its remaining intra-league games, so the .487 inter-division opponent regressed WP% is calculated by taking a simple average of the AL Central’s Regressed WP%, .489, and the AL West’s Regressed WP%, .484.

Now that we have both divisional and league multipliers, we can derive each division’s total (observed) multiplier by simply multiplying the two:

Division Division Multiplier League Multiplier Total Multiplier
AL East 1.0408 1.0384 1.081
AL Central 0.9833 1.0384 1.021
AL West 0.9760 1.0384 1.013
NL East 0.9749 0.9616 0.937
NL Central 1.0419 0.9616 1.002
NL West 0.9833 0.9616 0.945

How do these multipliers, which were fairly easy to calculate, compare with the multipliers implied in FanGraphs’ WAR calculations?  Below, the multipliers are compared in bar graph form:

L and D 1

 

As you can see, the current construction of fWAR artificially helps certain divisions while hurting others.  Let’s get a closer look at the problem by graphing how much fWAR inflates each division’s pitchers and position players relative to the multipliers we just calculated:

L and D 4

 

Upon viewing the chart, a theme emerges: Pitching WAR at FanGraphs is in need of serious repair.  Pitching fWAR dramatically overvalues the American League.  All three American League divisions have Pitching fWAR Multipliers at least 4.5% higher than they should be, while each Pitching fWAR Multipliers for the National League are all at least 6% lower than they should be.

Is this just a random aberration for 2014?  Probably not; in 2013, the American League’s Pitching fWAR Multiplier was 1.095, not much lower than 2014’s 1.127 (and nowhere near the 1.038 value we got).  For whatever reason, Pitching fWAR overvalues American League pitchers and undervalues their National League counterparts.  The strongest National League division, the NL Central, suffers the most from this calculation error, while the weaker American League divisions (the AL Central and AL West) experience the greatest benefit.  Fans of the Reds and Brewers in particular should take solace in the fact that their teams were hurt the most by not only the errors discussed here but also the park factor miscalculation discussed in Part 1 (hint: fWAR seriously undervalues Cueto).

As the chart shows, position player fWAR overvalues the National League, albeit to a lesser extent.  Position player fWAR suffers an almost entirely different problem then Pitcher fWAR: Unlike pitcher fWAR, which seems to over-adjust for league, position player fWAR doesn’t adjust for strength of league and division at all.  This inflates the fWAR of players/teams in weaker divisions – the NL East and NL West, for example – while deflating the fWAR of players in stronger divisions, like the AL East.

While the issue with position player fWAR is more obvious – a lack of league and divisional factors – the problem with pitching fWAR is less clear.  Perhaps part of the problem is how replacement level is calculated.  I am not familiar enough with the FanGraphs’ process of calculating WAR to know if there is a clear, fixable mistake.  Either way, hopefully this article will inspire change in the way that fWAR is calculated for both pitchers and position players, with the changes to position player fWAR being much simpler to incorporate.





Founder of NothingButNumbers.com

40 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
tz
9 years ago

Good stuff again Noah. We need to kick the tires on areas like the league adjustment in fWAR that might contain some systematic bias or material inaccuracy.

Another approach to reflect the league/strength of schedule impact on WAR components might look something like this:

– For each team, adjust the hitters’ wOBA by the weighted average of the wRC+ allowed by the pitching staffs that team had faced.

– Also for each team, adjust the pitchers’ FIP by the weighted average of the bbFIP of the hitters on the teams they had faced (as a % of the MLB average bbFIP)

– Use the schedule-adjusted Runs Above Average from the above steps to calculate Wins Above Average in the usual fashion. However, there is absolutely No reason to force the WAA to equal zero for each league individually.

– Determine the replacement level for WAR using the WAA for all players in both leagues in the usual fashion.

I’d be curious how this approach would compare to your approach here.

Anon
9 years ago

Some nice food for thought. Hopefully, it makes some of the decision makers consider an update to WAR.

Even knowing the formula/structure of WAR has flaws, my biggest complaint is the repeated advice (from Dave Cameron among others) that WAR is not accurate to 0.1 value. If WAR shouldn’t be used to that precision, Fangraphs shouldn’t display it. It wouldn’t be difficult to change the displayed precision to 0.5 or 1.0 (whatever is accurate enough for general use).

Jared Crossmember
9 years ago
Reply to  Anon

I don’t really agree. I don’t like “sig figs” as a way of trying to capture uncertainty also I think people are too vague about what they mean by “accuracy” for the term to have any meaning. I think ideally, we’d be very clear about exactly what uncertainty we’re talking about and use a +/- instead of limiting the number of decimal places.

Anon
9 years ago
Reply to  Jared Cross

I agree that showing a confidence interval (or error bars or whatever you want to call it) is a better solution than reducing the displayed precision. However, I highly doubt that Fangraphs would display this information. A ‘sig figs’ type conceptual approach has a much better chance of actually being publically implemented.

tz
9 years ago
Reply to  Noah Baron

Great catch. Just goes to show the importance of using a consistent basis for the components of a metric so you don’t introduce a distortion (just like OPS had the structural issue of different demonimators for the OBA and SLG components).

AC_Butcha_AC
9 years ago

Hey Noah,

I appreciate your work here. I actually played around with the FIP-park factors myself a coubple of years ago so we are on the same page in that department 🙂

Some things I wanted to adress.. I think the league average RA/9 is ALWAYS the league average FIP scaled to RA/9 in any given year. I might be mistaken but I think the scale is a qutient – not a difference. You take the quotient of all league earned rus and all league runs (ER/R) wich should come out as something close to .92. This should be the scale IIRC and therfore the RA/9-FIP and RA/9 should always match.

For batting value there is already a league adjustment worked into the formula wRAA uses the run environment of both leagues but compares the actual output to the respective league the player plays in. Additionaly, pitcher batting is removed – thus both league’s batting runs should be a lot closer together than one would think when taking a first look at a league’s overall batting line.

AC_Butcha_AC
9 years ago
Reply to  Noah Baron

Well, the .92 is not set in stone. It is whatever the league’s
(runs allowed/earned runs allowed) is that given year.

And I am not talking about the “league adjustment” which I think you refer to. I am talking about “batting runs” which you can find in the value section on a player page which is basicly a player’s wRAA adjusted for park and league.

Actually I am writing a community piece at this moment that deals with a solution to the league adjustment.

Lanidrac
9 years ago
Reply to  AC_Butcha_AC

It’s just another reason to use SIERA instead of FIP. Some guys like Kyle Lohse will always tend to outperform (or underperform) their FIP simply due to their batted ball profiles. In Lohse’s case, he doesn’t get very many strikeouts (even compared to his low walk rates), but he makes up for it by getting a lot of ground balls and weak contact.

Paul Kasińskimember
9 years ago
Reply to  AC_Butcha_AC

I agree with you that SIERA is better, and should be used in WAR, but Lohse has actually been seeming to outpitch his SIERA by more than he’s outpitched his FIP in recent years.

Matthew Cornwell
9 years ago

On a related note: BBRef has been doing AL vs. NL adjustments from the onset. Here is the interesting part. Despite the NL actually beating the AL in interleague play from 1997-2004 to a tune of close to .510-.490, BBRef still gives the AL a .520-.480 or so advantage. I wonder why nobody has noticed this?

Tyinng it back in, since BBref and FG came to some common ground re: replacement level, this seems to be another area in which the two could line up pretty easily. At that point, the only major difference would be FIP vs. RA (adjusted for defense) for pitchers.

Matthew Cornwell
9 years ago
Reply to  Noah Baron

Yeah, UZR vs. DRS is another big difference.

Here are the BBRef adjustments…down at the bottom. It seems like NLers are being hurt from 1997-2004 a bit.

http://www.baseball-reference.com/about/war_explained_position.shtml

Paul Kasińskimember
9 years ago
Reply to  Noah Baron

YOU DIFFERENCE

Tangotiger
9 years ago

I didn’t follow all the “multipliers” that you were doing, but you have to do it relative to replacement level.

If there was no league adjustment needed, we get to 1000 WAR by doing:

(.500 – .294) x 162 x 30 = 1000

In 2013-2014, the AL has a .528 record. Gives past history, the Astros moving in, that’s just about right. But, that’s a .528 record against NL teams.

If we treat the .528 as a true talent against NL teams, then it would be .514 true talent against .500 teams. That’s because a .514 team facing a .486 team will have a .528 win%.

Therefore, AL would be:

(.514 – .294) x 162 x 15 = 535

And NL is 465.

Fangraphs numbers look about right.

AC_Butcha_AC
9 years ago
Reply to  Tangotiger

I have an article coming up here which I wrote some days ago that will be about EXACTLY this, tango.

Tangotiger
9 years ago
Reply to  AC_Butcha_AC

Email me when it’s online. Looking forward to it!

Tangotiger
9 years ago
Reply to  Noah Baron

“If the AL is superior than the NL, both pitchers and hitters should benefit equally.”

That’s not true. You have to figure out what the split is. Check out MGL’s three-parter from 10 years ago on a few ways to do that:

http://www.hardballtimes.com/author/mgl59/

Tangotiger
9 years ago
Reply to  Tangotiger

I’m not suggesting Fangraphs is correct in how they do it. I’m simply making the obvious point that if in 2015 Kershaw, Lee, Hamels, Strasburg, Bumgarner, and Zimmerman get traded to the AL, then the shift in WAR will have been disproportionately by pitchers.

Given that we’re only talking about 35 WAR in shift in 2014 between AL and NL, it’s pretty easy to see that if you get say 3 star pitchers moving one way, and 2 star hitters moving the other way, you’ll get an imbalance.

Tangotiger
9 years ago
Reply to  Tangotiger

International league: again, you can’t assume that. It could very well be that you have some minor league in some year that has alot more pitching talent than hitting talent, and there’s a shift year to year.

Tangotiger
9 years ago
Reply to  Tangotiger

“To me, it’s always meant performance above league average. … Which is why I don’t really think there ever is such a thing as a league having “more hitting talent” or “more pitching talent”. ”

Well, you are simply wrong. You have to accept that you are wrong, otherwise, we’re not going to have a discussion. If you are going to say you are right, and I say you are wrong, then let’s stop right here. Otherwise, continue reading…

…ah, good… thank god you admitted you are wrong. As a matter of CONVENIENCE, I set the WAR ratio of 4:3 for nonpitchers:pitchers every year. But that doesn’t mean I believe it’s static year to year. It can’t possibly be. Now, it can be close enough year to year that it’s not worth the trouble, so we make it a SYSTEM LIMITATION. That’s fine. But let’s just remember that we are trying to take shortcuts here to get to a desirable point.

AC_Butcha_AC
9 years ago
Reply to  Tangotiger

Noah, the 4:3 or 57/43 split isn’t really explained but mentioned in the WAR pages. Play around with the league leaderboards and see for yourself, that position players get 57% or 570 WAR and pitchers get 43% or 430 WAR.

The split is coming out of a seemingly tough to understand calculation (I say seemingly tough because I remember TONS of posts over at tango’s blog in which he explained himself to his readers who couldn’t really grasp the concept)

The long answer is to go over to his blog and read through tons of comments and posts about this.

The short answer is: The split is a result of different variances. Run scoring (offense) run prevention (pitching + defense). Baseball is 50% offense and 50% defense. Position players take care of all the offense and part of the defense. The pitcher only takes care of some amount of the defense aspect. This can get long and mathy so I will stop here.

One thing to note: The 4:3 or 57/43 split can be observed in real world MLB organization’s spending on position players and pitchers. So it is really beautiful.

Tangotiger
9 years ago
Reply to  Tangotiger

“…add in replacement level runs…”

Well, that’s the key. You don’t add the same number per PA for pitchers and nonpitchers. It has to be preset somehow.

Hence, I preset it based on the idea that nonpitchers have a .380 win%, SP have a .380 win% and relievers have a .470 win%.

Tangotiger
9 years ago
Reply to  Noah Baron

I agree and that is how I handle it (furthermore, I handle it based on the number of games against each opponent, and so, takes care of the divisional issue as well).