Application of my fWAR League Adjustment Method
This article is a follow-up to my previous one in which I will work through some examples. You should try to get an intuition on it. If the concept seems too complicated I have to apologize for not explaining myself well because I sincerely think this is very straightforward and no voodoo and could help improve fWAR even further… which is mindboggling if you think about it. It could improve projection systems as well as the correlation of WAR and actual wins while also handling players changing from the AL to the NL or vice versa more elegantly.
I will simply follow my steps 1-4 from my previous article to figure out the proper league adjustment and continue with some WAR calculations. I will use the 2014 season as my guinea pig.
While playing around with it I also stumbled upon a wRC+ adjustment that has to be done because of a) the independence of both leagues and b) the differing league strengths. I will tackle this issue in my next article.
All right, here are steps 1) –4).
1) I need to figure out the wOBA values, R/PA, FIP, R/W, cFIP for each league individually. These can normally be found here. I will not list every single wOBA value here because that doesn’t add much to the explanation and saves me some time.
AL (2014):
wOBA: .312
R/PA: .110
FIP: 3.82
R/W: 9.25
cFIP: 3.16
NL (2014):
wOBA: .308
R/PA: .105
FIP: 3.66
R/W: 8.97
cFIP: 3.10
The exact values for all of MLB found on the Guts! page is conveniently exactly the arithmetic mean of my AL and NL values.
2) All right, we now move on to step 2 which is to figure out the interleague record. I suggested that a 3 year rolling regressed average could be a possibility with years N-1, N and N+1 as inputs. I cannot see into the future, for that reason I will simply use the 2012-2014 interleague record based on pythagenpat. This comes out to a .539 W% for the AL. Conveniently, the actual W% is exactly the same. For demonstration purposes let’s just do a farmer’s regression and call that a “true talent” .530 W%.
3) This is the seemingly tricky part but once you got your head around it is is very easy to grasp. As a reminder: the three necessary “true” replacement levels needed for all WAR calculations are .294 in general for teams – this is where the fixed 1,000 WAR each year comes from – the .380 replacement level for starting pitchers and the .470 for relievers.
Imagine an NL team that is a .500 team within the NL. This team plays a .500 AL team within the AL. That needs to be stressed. Those teams are NOT of equal strength, even if both have a .500 record. Why, you ask? Because if they were, we would not see an advantage for the AL in interleague play. We would see a balanced .500 interleague record. That is not our reality and we can confidently conclude that the NL is the weaker league as of today.
Following this line of thought, what happens if two replacement teams out of each league play each other? Well, this means a .294 NL team plays a .294 AL team. What would the outcome be? A .530 winning percentage in favor of the AL. This comes straight out of the interleague record.
How much better than a .294 W% would this NL team have to be in order to win exactly half of its games against this .294 AL team? This is where the odds ratio comes into play and it spits out a .320 winning percentage. That means if a .320 NL team faces a .294 AL team in an environment, in which the AL wins 53% of all interleague games, we would finally expect parity. A .500 interleague record. This .320 is our new “artificial” replacement level for the NL in 2014.
On the other hand we have to ask the question: How much worse than a .294 can an AL team be when facing a .294 NL team and still win half of its games? Odds ratio says a .270 AL team would still win 50% of all games against a .294 NL team in a context where the AL wins 53% of all interleague games. This .270 is our new “artificial” replacement level for the AL in 2014.
4) Remember that our “regressed” interleague record suggests the AL to be the stronger league, thus worthy of receiving more share of the WAR-pie. Now it is time to figure out how much more they deserve.
We figured out a .270 “artificial” replacement level for the AL. Therefore, we can distribute (.500-.270)*15*162 = 559 WAR towards the AL. This is split up 57/43 between position players and pitchers.
In the National League we found a .320 “artificial” replacement level. Therefore, we can distribute (.500-.320)*15*162 = 437 WAR towards the NL. Same 57/43 split.
Now 559+437 = 996, which is not equal to 1,000. This is because of the odds ratio being non-linear the closer it gets to the extremes but I might be totally mistaken here. This usually is where Tangotiger appears out of the dark and helps out with fancy math or steps in when the math gets hurt. I don’t really see it as a problem.
We could either distribute the remaining 4 WAR 50/50 between both leagues or adjust the replacement levels slightly to arrive at exactly 1,000 WAR. Both would change individual WAR figures only on an atomic level.
I want to point out that this kind of inconsistency is very common in the implementations of WAR. rWAR and fWAR both have some adjustment runs to match inconsistencies like that. This doesn’t even make a difference on a player level. It would not even change a team’s WAR figure by 1/10 I guess.
WAR calculations
After you have come this far you are probably interested in how much certain player’s WAR figure might change. Again, I won’t list every step necessary but only the actual results. If you ask yourself how I have done it, you should take a look here, here and here. If that doesn’t help out, just comment with your question and I will walk you through.
My example will be Mike Trout. I will show the differences of some of the more important and interesting stats as (OLD/NEW). Forgive me for not being a formatting wizard.
NOTE: For sake of better comparison I will present the “new” run values with an exchange rate of 9.117 R/W (currently used). Otherwise 1 run wouldn’t have the same meaning since in my WAR calculations 1 win equals 9.25 runs.( See step 1 ) This makes this an apples to apples comparison.
Trout:
wOBA: (.403 /.402)
wRC+* : (167 / 170)
WAR**: (7.8 / 8.0)
batting: (52.1 / 54.0)
UBR: (3.0 / 3.0) unchanged
wSB: (1.8 / 1.7)
Fld: (-9.8 / -9.8) unchanged
Pos: (1.4 / 1.4) unchanged
Lg: (2.9 / 2.9)
Rep***: (19.9 / 19.9 )
* I use a slightly different wRC+ calculation here. My league adjustment method would also improve the accuracy of wRC+ as a comparison tool between the two leagues. I will write another article dealing with the modified wRC+ calculation, as well as the wRAA and replacement runs modifications to improve the accuracy of fWAR.
** Fielding runs, UBR and positional adjustment were not changed. These three will never change, the league adjustment however will undoubtedly change, as well as wSB, although the changes would be tiny. It involves complete league stats, i.e. every single player’s stats.
*** The value of replacement runs will never be affected in my league adjustments even though I use different replacement levels for my calculations. Replacement runs will always be based on the .294 baseline. I hope this makes sense to you. If not I point out to the upcoming article of mine.
Outlook
In my next article I will lay out the modifications that have to be applied to wRAA, wRC+, batting runs and the replacement runs. I will show why my modifications make wRC+ more accurate in comparing both leagues and explain why this new league adjustment influences position player WAR more than pitcher WAR. Because right now, the fWAR-process for pitchers leans heavily, not entirely though, towards the independency treatment of both leagues – a cornerstone of my league adjustments.
Also look forward to a table of the players with the biggest and the smallest increase in WAR and the corresponding losses. In both the AL and NL there are players who gain or lose more than others. This has to do with the different run environments is my best educated guess so far. In the NL – the lower scoring league – extra-base hits become slightly more valuable. So does base-stealing. Opposite for the AL. So look forward to my next piece, fellows!
A few thoughts:
1. Is it really necessary to use three years of data when calculating interleague play? Does the AL-NL interleague record in 2012 really have anything to do with the qualities of each league in 2014 and beyond? I think I would want to use the least amount of data possible where you still have a significant sample size – season N-1 and Season N.
2. I think your WAR calculations for each league are incorrect. While I’m somewhat unfamiliar with the particulars of your calculation, if you prorate to 1000 WAR you have the AL at 561 and the NL at 439. This is definitely too extreme. If we take the AL’s .530 winning percentage at face value that implies a .515 WP% against .500 opponents using Bill James’ log5 method. (.515-.294)*15*162 gets you 537 WAR for the AL, well below your 559 value.
3. Even if you think I’m underestimating the AL’s advantage, the AL and the NL’s WAR should add up to 1000. The fact that it doesn’t shows that there may be a problem with your method.
Hey Noah, thanks for you thoughts.
1) I just used 2012-2014 as a solution bacuase as I said I can’t look into the future to use 2015 season stats. Normally, you would use 2013-2015 i.e. the surrounding data for the 2014 season and then regress. Why not weight them 2-3-2 or something? The final method can still be tweaked
2) What kind of .500 opponents are you talking about? .500 teams that are .500 in the NL or AL?
This is what you seem to have incorrect in your calculation. The .530 interleague record is already against .500 NL competition. Because when you evaluate a league independently, they will always and in every case be .500. Every win means a loss for the other. There are always as many wins as losses in a league >>> .500 record.
The AL in the 2012-2014 timeframe won 53.9%. (.530 in my eyeball regression) This was the .500 NL vs the .500 AL. You have to ask: If there were a .294 NL team playing against a .294 AL team in a world, where AL teams of equal intraleague records win 53% of their games, how much worse than a .294 could this AL team possibly be while still winning at .500 versus this .294 NL team. I know this sounds complicated but try to read it a couple of times and see if it makes sense to you.
Following your line of thought I think I would arrive at (.530-.294)*15*162 = 573 WAR, which is even MORE than what I advocate. You should try to estimate the winning percentage via odds ratio rather than log5.
Also, if the WAR split is larger than expected… maybe there is something new to learn. I mean it surprised me too at first but the numbers do make sense. Also remember that every WAR gained for one league is a WAR lost for the other. A 440/560 WAR split means the NL is 60 WAR less than equilibrium not 120. That’s about 600 runs at 10 R/W, 40 runs less per team, aprrox. 23 runs less for pos. players and 17 for pitchers. That’s about 2-3 runs per 600 PA or sth like 0.2 – 0.3 WAR which is very reasonable IMO.
3) As I mentioned in the article rWAR and fWAR both aadd or subtract “adjustment” runs to make everything sum up at the end. I have to admit it buged me a little, that it didn’t turn out to be exactly 1,000 WAR but I think this is because close to the mean I think it is almost linear but getting extremer it starts to lose its linearity a little. 996/1,000 is still 99,6% though.
3)
I’m going to quote from Tom Tango:
“If there was no league adjustment needed, we get to 1000 WAR by doing:
(.500 – .294) x 162 x 30 = 1000
In 2013-2014, the AL has a .528 record. Gives past history, the Astros moving in, that’s just about right. But, that’s a .528 record against NL teams.
If we treat the .528 as a true talent against NL teams, then it would be .514 true talent against .500 teams. That’s because a .514 team facing a .486 team will have a .528 win%.
Therefore, AL would be:
(.514 – .294) x 162 x 15 = 535
And NL is 465.”
In my opinion this is actually OVER-estimating the league adjustment, because each league actually plays 12.3% of their games against the other league.
I played around with it and I am really struggling, going back and forth because both methods seem valid to me and make sense.
I want to ask tangotiger a question:
Using your method above (the one quoted by Noah), what would the interleague record be between AL and NL if the NL somehow disappeared in a hole in spacetime and was replaced entirely with replacement players? (Assume AL and NL equally strong before hole in spacetime)
Would you say it were
a) .294 NL – .706 AL
b) .392 NL – .608 AL
c) something different
thanks a lot!
After playing around with it a little more I am pretty convinced now that my method would be the right solution.
tango’s method above for such a replacement league would give you (.608-.294)*15*162 = 763 WAR for the superior league. But is has to be 1,000 WAR.
It seems logical at first because a .706 team vs a .500 team would be a .608 team. But the issue is that you all over sudden mix the W% up using a .500 for the “entire” MLB as a standard. You would always have to striclty seperate both league’s W%.
Now with my artificial replacement levels this issue is gone. An average team in the AL would have 500/15 = 33.3 WAR. Add .294*162 = 47.7 and you have 81 wins = .500.
In an unbalanced world with a stronger league (.530 W% for AL) my method gives you 559/15 = 37.3 WAR for the average AL team. Add that to the “artificial” replacement level of .270 my method suggests.. this means .270*162 = 43.7 wins. So 37.3 + 43.7 =81 wins = .500.
This means an average AL team has a .500 record even if there is a gap of 120 WAR between the leagues. Which is correct in reality.
The NL would have an artificial replacement level of .320. So they will receive (.500-.320)*15*162 = 439 WAR. Thus, the average NL team gets 439/15 = 29.2 WAR. Add that to the replacement level wins of .320*162 = 51.8 wins and you will get 29.2 + 51.8 = 81 wins = .500
Which is also true. And now the interleague happens in which we see that the AL wins 53% of all games. Perfect! means the artificial replacement level runs worked.
I think you’re making this more complicated than it needs to be.
The AL is winning 52.8% of its games while the NL is winning 47.2% of its interleague games (from 2013-2014). We could regress by the variance in interleague winning percentage, but I don’t have that now, so we’ll just regress by about 10%. That gives ups .526 for the AL and .474 for the NL.
But that’s .526 against a .474 league. Against .500 opponents the AL is a .513 and the NL is .487.
Finally, when we’re calculating WAR we want to know who an individual player is playing against. An AL player plays 12.3% of their games against the NL. In order to adjust, we calculate an estimated “average opponent”:
(.877*.513) + (.123*.487) = .510 for the AL
(.877*.487) + (.123*.513) = .490 for the NL
Then we plug that into replacement level and calculate WAR for each league. It doesn’t have to be any more complicated than that. This process makes sense mathematically and, perhaps more importantly, intuitively.
Having a league that wins less than 53% of it’s interleague games have more than 56% of the WAR just doesn’t make sense.
Noah,
I am currently rethinking the 88% intralegue thing. I don’t know yet if my solution comes out to be the same as before or a different. So thanks for challenging that aspect because it keeps things moving.
***
Now I have 3 questions I hope you would answer me. Could you tell me your WAR distribution via your method above for
a) a .500 interleague record
b) a .600 interleague record for the AL
c) a .706 interleague record for the AL
thanks =)
Also, I think you are incorrect when you say it wouldn’t make sense that the AL gets 56% of all WAR while winning only 53% of the games.
This is why I asked you c). Would it then make sense to you that the AL would get 100% of the WAR if they had a .706 record vs the NL?
Actually the AL could get MORE than 100% of WAR if their record vs NL teams is above .706. This would make the entire NL a below replacement level team.
Even in your calculation ehre you arrive at a .510 for the AL this comes out to (.510-.294)*15*162 = 525 WAR for the AL.
Here’s my issue with your method:
We could theoretically make up a league consisting solely of the Dodgers, Nationals, and Cardinals. Let’s call it “A League”. This league has a .600 Winning Percentage against “B League”, which we’ll call the remaining 27 teams, meaning that against .500 opponents A League is a roughly .550 true talent.
Under your method we’d make a huge league adjustment. Is that actually necessary? No. Because A League is made up – and therefore doesn’t play an abnormally large percentage of its games against itself – there’s no need to make an adjustment for league.
The percent of games a league plays outside of its league is important when calculating league adjustments. Another, real world example is the NFL. NFL teams play 25% of their games against the other conference, making it less important whether a given team is in the better or worse conference. Why? Well, a team in the NFC (lets say the Giants) is playing 25% of its games against the weaker AFC.
Why should we give them full credit for playing in the tougher conference when they play a quarter of their games against the weaker conference? That’s what your method (and Tango’s initially) does, which is problematic if the purpose of WAR is to evaluate a player in a contextually neutral environment.
I would have to disagree. It seems like we don’t reach the same page when I talk about a .500 team.
You said:
” …This league has a .600 Winning Percentage against “B League”…..meaning that against .500 opponents A League is a roughly .550 true talent. ”
You have to distinguish what you mean by a .500 team. What context is this .500 in?
Allright. What is your A-league’s overall winning percentage within its league? If it is not .500 we would not be talking about baseball.
What is your B-leagues’s overall winning percentage within its league? It HAS to be .500. It HAS. period. That is not an assumption or a concept it is just the truth.
You said that the A-league dominates the B-league with a .600 winning percentage.
Let me ask you this question: What is your expected winning percentage when a .300 team from the A-league plays a .300 team from the B-league?
***
If there is still 1,000 WAR for the 30 teams, we have a default distribution of 100/900 WAR.
How much WAR would you re-distribute towards the A-league based on the .600 interleague record?
My guess is you would do: (.550-.294)*3*162 = 124 WAR
My method would do: (.500-.218)*3*162 = 137 WAR
And I hope you read this with a friendly voice becuase I actually like that you try to convince me. That moves the discussion forward =)
Actually now that I think about it I agree with your way of weighting interleague record in a given season. Using future performance, while inconvenient and somewhat annoying, is more accurate.
Oops, that ^^^^ comment was meant to be somewhere else!
Anyway, I think you may be misinterpreting the point of my “A League” comment. The point I was trying to make was that when we’re making a league ADJUSTMENT – that is adding or subtracting WAR to a league because of a talent differential between leagues – the amount of games that league plays against the other league needs to be considered.
If each league plays 50% of their schedule against the other league, there is no longer a need to make a league adjustment because each league is playing a balanced schedule. In this example, being in a more difficult league has no tangible affect on the individual player production. With the way MLB is set up – where each league plays 12.3% of its games against the other league – the effect of being in a stronger league is diminished because of the slightly more balanced schedule.
These 12.3% of games need to be accounted for. That was my entire point; the “A League, B League” thing was just an attempt to make it easier to understand (clearly it only made things more complicated).
Noah,
regarding you made-up leagues… I have gotten you wrong. My bad. The confusing thing to me: Why would you even have two different “leagues” when in fact they are just one with no interleague plax at all?
And in my method is is irrelevant if 12% is played in interleague. This is because EVERY team in your own division plays 12% against the other league. They ALL get the benefit of playing against this weaker league. Pitchers see their ERA fall collectively, hitters see their production rise collectively.
In my WAR calculation with my league adjustment all hitters are only compared to their own league. Therefore it is not necessary to adjust for games played against the other league.
In your method it would actually be necessary to adjust for % played vs the weaker league because your value runs (batting and pitching to calculate WAR) would be based on a run environment of all MLB not just strict AL and NL.
Another way to think of it:
Say 5% are interleague games. Both league environments would be rather independent.
Say 30% are interleague games. Both league’s environment would sitll be mostly independent, but they would com closer together.
Say 100% are interleague games (so no 2 different leagues, really). Both leagues run environemnt will be exactly the same. They have merged to become one. The same.
This is why there is no need to adjust for % of interleague games in my method.
It’s not about the league run environment. We need to account for the amount of interleague games because we’re making an ADJUSTMENT for strength of league. There’s no need to make an adjustment if the two leagues play balanced schedules. The more games each league plays against itself, the greater the adjustment has to be.
In terms of the interleague data I would use, I would probably do one of the following:
1. Use 2013-2014 Interleague Data, regressed by 10%.
2. Use 2014-2015 Interleague Data, regressed by 10%. Obviously we don’t have all the data now, but the interleague adjustment could be updated automatically during the season (kind of like how the cFIP changes over the course of the season).
My issue is with using data from 2012 is that the interleague record in 2012 probably doesn’t reflect the skill level of the league in 2015. It’s just too far in the past, especially with the vast amount of talent that switches leagues each year. Some people have even told me that using 2013 data is going too far into the past.
For the 2014 saeason, only data from 2013-2015 would be used. I also offered a 2-3-2 arbitrary weighting. Let#s ignore regression for a moment.
For example:
interleague record
2013: .540 AL
2014: .550 AL
2015: .520 AL
(all made up)
for the 2014 season we would get (2*.540 + 3*.550 + 2*.520) = .539
This is was I mentioned in my article. Rolling average for years n-1, n and n+1. Regress that based on the observed variance and there you have it.
How could you use data from 2015 for the 2014 season? That would change 2014 WAR values a full year after the season ends (!), which doesn’t make any sense. It also doesn’t make sense to weight some seasons more than the other in my opinion.
Yes that is actually happening right now anyways due to the park factors. If you recall in my first article I mentioned the park factors. They sometimes change WAR figures AFTER the season ended. Even 2 years after a season ended is possible.
You would use the surrounding years to get a better estimate of the year in question becuase you increase your sample size but the farthest away from the status quo is 1 year. And you get 3 years of data.
If you took the season in question and the two prior ones. You would also have 3 years of data but now you move away from the satus quo by 2 years.
For the park factors:
Think about a new stadium built. WAR is park adjusted. How much weight can be put into ONE season worth of data for that park? Now FG uses 5 years rolling regressed park factors. So when FG gathers more data on how the park actually plays, it will show up in a changed WAR figure.
Nothing else with my league adjustment.
It’s correct to change the past estimates as future information comes in.
Question about the park factors: Does FanGraphs use data from before fences are moved at a stadium? For example, when calculating the 2012 Mets Park Factor, do they look at 2010 and 2011 even though those years were played in a ballpark that had different dimensions?
No, AFAIK they start over new when dimensions change or a new park is built.
Just a quick question.
In step two, you find that the American League has a .539 winning percentage against the National League and then regress that to .530. If you keep using this value of .530 in your calculations, doesn’t that imply that in inter-league games, its just one random AL team against another random NL team.
IRL though, its not random scheduling of inter-league play. The league chooses teams that may have had a rivalry in the past or are within close proximity to each other in terms of geography.
The last point may mean something if you look at some examples:
CHICAGO WHITE SOX VS CHICAGO CUBS-
SOX: 221-265 record last three years(.455 winning percentage)
CUBS:200-286 record last three years(.412 winning percentage)
NEW YORK YANKEES VS NEW YORK METS:
YANKEES: 264-222 record last three years(.543 winning percentage)
METS: 227-259 record last three years(.467 winning percentage)
Maybe that .530 winning percentage is too high given that MLB matches team to play against each other to gain more revenue and it just so happens that in the last three years, the American League teams are better than their National League counterparts.
And I know, I only looked at two rivalries when there are other rivalries such as Dodgers/Angels, A’s/Giants etc. Plus, you would also have to look at the matchups for the teams that are non-rivals as well. Also, using actual wins/losses isnt probably that useful but its kind of late where I am, so I was too lazy to find expected wins.
Great Work Though, nonetheless!
Thanks, Rahi!
Yes, it is obviously an assumption that the interleague record reflects two random teams out of each league (should be around .500 teams). But one that should come pretty close to truth but I will check and post a comment here.
One thing you had incorrect though was that all the winning percentages you showed were with interleague play included. You would want to exclude it. But I will take a look and post the results.
I took a look at the 2014 interleague schedule for the divisions.
AL only W% —- opponents NL only W%
(all weighted by games played)
NYY: .500 — .503
BOS: .437 — .505
BAL: .592 — .533
TOR: .493 — .502
TBR: .472 — .506
DET: .549 — .502
KCR: .521 — .501
CLE: .528 — .483
CHW: .437 — .482
MIN: .430 — .479
OAK: .528 — .510
LAA: .606 — .516
SEA: .549 — .497
HOU: .458 — .487
TEX: .401 — .482
interpretation help: NYY had a .500 record against AL teams only. Their interleague opponents had a combined record of .503 against NL teams only. This is already weighted by games played against.
On the aggregate, all the weighted match-ups tunr out to:
AL: .500 — .499
This is only 2014 data but very very encouraging, that the assumption of an entire interleague schedule being random and coming very close to .500 vs .500. Great.
Sorry messed up my last sentence. Should read:
*This is only 2014 data but very, very encouraging that the assumption of a random interleague schedule holds true. It does in fact come very close to .500 vs .500 which is great news for a modell based on this assumption.
Thanks!
Very encourging indeed; looking forward to the followup to this article.