This article is a follow-up to my previous one in which I will work through some examples. You should try to get an intuition on it. If the concept seems too complicated I have to apologize for not explaining myself well because I sincerely think this is very straightforward and no voodoo and could help improve fWAR even further… which is mindboggling if you think about it. It could improve projection systems as well as the correlation of WAR and actual wins while also handling players changing from the AL to the NL or vice versa more elegantly.
I will simply follow my steps 1-4 from my previous article to figure out the proper league adjustment and continue with some WAR calculations. I will use the 2014 season as my guinea pig.
While playing around with it I also stumbled upon a wRC+ adjustment that has to be done because of a) the independence of both leagues and b) the differing league strengths. I will tackle this issue in my next article.
All right, here are steps 1) –4).
1) I need to figure out the wOBA values, R/PA, FIP, R/W, cFIP for each league individually. These can normally be found here. I will not list every single wOBA value here because that doesn’t add much to the explanation and saves me some time.
The exact values for all of MLB found on the Guts! page is conveniently exactly the arithmetic mean of my AL and NL values.
2) All right, we now move on to step 2 which is to figure out the interleague record. I suggested that a 3 year rolling regressed average could be a possibility with years N-1, N and N+1 as inputs. I cannot see into the future, for that reason I will simply use the 2012-2014 interleague record based on pythagenpat. This comes out to a .539 W% for the AL. Conveniently, the actual W% is exactly the same. For demonstration purposes let’s just do a farmer’s regression and call that a “true talent” .530 W%.
3) This is the seemingly tricky part but once you got your head around it is is very easy to grasp. As a reminder: the three necessary “true” replacement levels needed for all WAR calculations are .294 in general for teams – this is where the fixed 1,000 WAR each year comes from – the .380 replacement level for starting pitchers and the .470 for relievers.
Imagine an NL team that is a .500 team within the NL. This team plays a .500 AL team within the AL. That needs to be stressed. Those teams are NOT of equal strength, even if both have a .500 record. Why, you ask? Because if they were, we would not see an advantage for the AL in interleague play. We would see a balanced .500 interleague record. That is not our reality and we can confidently conclude that the NL is the weaker league as of today.
Following this line of thought, what happens if two replacement teams out of each league play each other? Well, this means a .294 NL team plays a .294 AL team. What would the outcome be? A .530 winning percentage in favor of the AL. This comes straight out of the interleague record.
How much better than a .294 W% would this NL team have to be in order to win exactly half of its games against this .294 AL team? This is where the odds ratio comes into play and it spits out a .320 winning percentage. That means if a .320 NL team faces a .294 AL team in an environment, in which the AL wins 53% of all interleague games, we would finally expect parity. A .500 interleague record. This .320 is our new “artificial” replacement level for the NL in 2014.
On the other hand we have to ask the question: How much worse than a .294 can an AL team be when facing a .294 NL team and still win half of its games? Odds ratio says a .270 AL team would still win 50% of all games against a .294 NL team in a context where the AL wins 53% of all interleague games. This .270 is our new “artificial” replacement level for the AL in 2014.
4) Remember that our “regressed” interleague record suggests the AL to be the stronger league, thus worthy of receiving more share of the WAR-pie. Now it is time to figure out how much more they deserve.
We figured out a .270 “artificial” replacement level for the AL. Therefore, we can distribute (.500-.270)*15*162 = 559 WAR towards the AL. This is split up 57/43 between position players and pitchers.
In the National League we found a .320 “artificial” replacement level. Therefore, we can distribute (.500-.320)*15*162 = 437 WAR towards the NL. Same 57/43 split.
Now 559+437 = 996, which is not equal to 1,000. This is because of the odds ratio being non-linear the closer it gets to the extremes but I might be totally mistaken here. This usually is where Tangotiger appears out of the dark and helps out with fancy math or steps in when the math gets hurt. I don’t really see it as a problem.
We could either distribute the remaining 4 WAR 50/50 between both leagues or adjust the replacement levels slightly to arrive at exactly 1,000 WAR. Both would change individual WAR figures only on an atomic level.
I want to point out that this kind of inconsistency is very common in the implementations of WAR. rWAR and fWAR both have some adjustment runs to match inconsistencies like that. This doesn’t even make a difference on a player level. It would not even change a team’s WAR figure by 1/10 I guess.
After you have come this far you are probably interested in how much certain player’s WAR figure might change. Again, I won’t list every step necessary but only the actual results. If you ask yourself how I have done it, you should take a look here, here and here. If that doesn’t help out, just comment with your question and I will walk you through.
My example will be Mike Trout. I will show the differences of some of the more important and interesting stats as (OLD/NEW). Forgive me for not being a formatting wizard.
NOTE: For sake of better comparison I will present the “new” run values with an exchange rate of 9.117 R/W (currently used). Otherwise 1 run wouldn’t have the same meaning since in my WAR calculations 1 win equals 9.25 runs.( See step 1 ) This makes this an apples to apples comparison.
wOBA: (.403 /.402)
wRC+* : (167 / 170)
WAR**: (7.8 / 8.0)
batting: (52.1 / 54.0)
UBR: (3.0 / 3.0) unchanged
wSB: (1.8 / 1.7)
Fld: (-9.8 / -9.8) unchanged
Pos: (1.4 / 1.4) unchanged
Lg: (2.9 / 2.9)
Rep***: (19.9 / 19.9 )
* I use a slightly different wRC+ calculation here. My league adjustment method would also improve the accuracy of wRC+ as a comparison tool between the two leagues. I will write another article dealing with the modified wRC+ calculation, as well as the wRAA and replacement runs modifications to improve the accuracy of fWAR.
** Fielding runs, UBR and positional adjustment were not changed. These three will never change, the league adjustment however will undoubtedly change, as well as wSB, although the changes would be tiny. It involves complete league stats, i.e. every single player’s stats.
*** The value of replacement runs will never be affected in my league adjustments even though I use different replacement levels for my calculations. Replacement runs will always be based on the .294 baseline. I hope this makes sense to you. If not I point out to the upcoming article of mine.
In my next article I will lay out the modifications that have to be applied to wRAA, wRC+, batting runs and the replacement runs. I will show why my modifications make wRC+ more accurate in comparing both leagues and explain why this new league adjustment influences position player WAR more than pitcher WAR. Because right now, the fWAR-process for pitchers leans heavily, not entirely though, towards the independency treatment of both leagues – a cornerstone of my league adjustments.
Also look forward to a table of the players with the biggest and the smallest increase in WAR and the corresponding losses. In both the AL and NL there are players who gain or lose more than others. This has to do with the different run environments is my best educated guess so far. In the NL – the lower scoring league – extra-base hits become slightly more valuable. So does base-stealing. Opposite for the AL. So look forward to my next piece, fellows!
This article is a response to Noah’s thought inspiring articles about a modification to the FIP-based pitching fWAR and his issues with the fWAR league adjustments in which I want to lay out a possible solution to the somewhat “flawed” league adjustments currently used. My method could be applied to a divisional context as well therefore I won’t address it specifically. I am not a native speaker therefore please do not take any offense in grammar or spelling mistakes.
Let’s start with the basics of the current concept. 1,000 WAR has to be given out each year to all players implying a replacement level of .294. Even if for some reason every player on all the current 25-man rosters happened to be abducted by aliens this would not change. Even if both leagues consisted entirely of “replacement” players, 1,000 WAR would be handed out. This is our model and it is a great one because it includes context so beautifully and effortlessly.
Here is a little thought experiment: Say these aliens are huge fans of the NL for some reason and decide to abduct the entire league’s player population. We would be left with the untouched AL (we assume the AL and NL are of exactly equal strength for this thought experiment). Again, 1,000 WAR has to be distributed among all big league players. If our current model is handling league adjustments correctly we would expect to see 0 WAR in the NL and 1,000 WAR in the AL. Unfortunately, the current fWAR model wouldn’t spit out a result coming close to this.
Here is why: Even in a reality where about 88% of all games are played internally in a given league a great portion of the fWAR calculation is based on treating MLB as being ONE league instead of two rather independent leagues. The consequences can be strongly seen in my thought experiment. Because every player in the NL would be a replacement player we could hardly find a hint of the changed talent level in the NL’s stats. This is because replacement level hitters are facing replacement level pitching and my guess would be that the NL’s overall batting line and R/G would barely change – even if the talent changed dramatically. Now wOBA is calculated using both leagues and the offensive output by these replacement hitters would be weighted as if they put up these numbers against actual major league competition. Thus, the NL would be undeservedly credited with batting runs and run prevention for the pitchers (again versus replacement hitters).
This is certainly an exaggeration but it is still true with one league being weaker. The only way we would notice the changed talent level would be the interleague record against the AL. In a perfectly balanced world with two equally strong and talented leagues we were to see a .500 record and our 1,000 WAR could be handed out 50/50 between the AL and NL and 57/43 between position players and pitchers. What would the interleague record be? What would it have to be? The answer is pretty easy: .294 aka replacement level. Now this is interesting and it seems like we are going somewhere. Here seems to lie the key for the proper league adjustments because how much WAR should be handed out to a league that wins at a replacement level against a “true” major league? Sounds pretty darn like a league full of replacement players which are by definition worth 0 WAR. And this 0 WAR should be the correct answer based on our assumptions in this thought experiment.
How do we get there?
1) Calculate every aspect that goes into WAR (R/PA, wOBA, FIP, etc) separately for both leagues. In fact we have to treat both leagues as independent. This would mean 500 WAR for each league per default, distributed 57/43 between position players and pitchers.
2) Figure out the interleague record. I would suggest using something like a 3 year regressed rolling average (Just like the 5 year rolling regressed park factors on FG that can actually change a player’s WAR retroactively if his home park happens to play very hitter – or pitcher friendly in the immediate future) I will use a .525 record in favor of the AL for an example later on.
3) Based on the “true” replacement levels of .294 for teams, .380 for starters and .470 for relievers we calculate an “artificial replacement level” for the weaker and the stronger league via the odds ratio. Using the .525 interleague record for the AL as an example this will come out to an artificial replacement level of
.315 for NL teams / .274 for AL teams
.404 for NL starting pitchers / .357 for AL staring pitchers
.495 for NL relievers / .445 for AL relievers.
To help interpret these numbers think about it this way: The .475 NL is the weaker league. A “replacement team” would have a .294 record in the NL (forget about interleague for a moment). If this team plays against a .294 AL team, we would expect a .500 W% IF both leagues are equally strong. But we already established that the AL wins at a .525 clip when two teams with “equal” records IN their respective leagues match up. The .315 “artificial” replacement level for the NL means that we expect a .315 NL team to win 50% of all games a against a .294 AL team. Thus, we can conclude that the replacement level bar to clear should be put a little higher in the NL because it seems easier to accumulate value in the weaker league. On the other hand the opposite is true for the AL, where the replacement level bar should be put a little lower for the same opposite reasons and to be consistent with handing out 1,000 WAR each year.
4) Derive the correct distribution of WAR for both leagues based on the artificial replacement levels. In my thought experiment at the beginning we would have a 0/1,000 WAR distribution, because replacement level would actually be .500 for the NL using my methodology in 3). A balanced league would have a 500/500 WAR distribution with a replacement level of .294 for both leagues. With the AL winning at a .525 clip against the NL this means a WAR distribution close to 450/550 in favor of the AL.
The WAR distribution for 2014 on FG was 472/528 in favor of the AL.
There are really some beautiful and elegant side effects. The independence of both league’s calculations would mean interleague adjustments are not necessary at all. This is because even if there are about 12% interleague games, pitchers and hitters are only compared to the stats that other players in the same league have put up – interleague included. The adjustment takes place when we evaluate the interleague record because this is the only direct way to measure difference in strength/talent. The current league adjustments are a little bit flawed in my opinion because wOBA and the run environment is calculated for the entire MLB and interleague records are not taken into consideration at all. Therefore a stiff replacement level is used for all years. My methodology addresses these problems and scales an artificial replacement level for each year and league based on a multi-year regressed interleague record while still keeping the overall replacement level for all of MLB to .294 and 1,000 WAR each year.
To be honest with you I am not a huge fan of divisional adjustments because of small samples and differing opponents. In an entire season’s interleague schedule there should be a lot more signal. I think when applying divisional adjustments we would have to regress heavily. I am not entirely sold yet to include a possibly very complicated divisional adjustment when its heavily regression doesn’t give us much to learn from anyway. But I am open to be sold the other way.
Look forward to a follow-up in which I walk through some real life examples and present some of the changes my methodology brings. Feel free to comment and discuss! Prost!