Offering a Solution to the fWAR League Adjustments

by AC_Butcha_AC

January 7, 2015

This article is a response to Noah’s thought inspiring articles about a modification to the FIP-based pitching fWAR and his issues with the fWAR league adjustments in which I want to lay out a possible solution to the somewhat “flawed” league adjustments currently used. My method could be applied to a divisional context as well therefore I won’t address it specifically. I am not a native speaker therefore please do not take any offense in grammar or spelling mistakes.

Let’s start with the basics of the current concept. 1,000 WAR has to be given out each year to all players implying a replacement level of .294. Even if for some reason every player on all the current 25-man rosters happened to be abducted by aliens this would not change. Even if both leagues consisted entirely of “replacement” players, 1,000 WAR would be handed out. This is our model and it is a great one because it includes context so beautifully and effortlessly.

Here is a little thought experiment: Say these aliens are huge fans of the NL for some reason and decide to abduct the entire league’s player population. We would be left with the untouched AL (we assume the AL and NL are of exactly equal strength for this thought experiment). Again, 1,000 WAR has to be distributed among all big league players. If our current model is handling league adjustments correctly we would expect to see 0 WAR in the NL and 1,000 WAR in the AL. Unfortunately, the current fWAR model wouldn’t spit out a result coming close to this.

Here is why: Even in a reality where about 88% of all games are played internally in a given league a great portion of the fWAR calculation is based on treating MLB as being ONE league instead of two rather independent leagues. The consequences can be strongly seen in my thought experiment. Because every player in the NL would be a replacement player we could hardly find a hint of the changed talent level in the NL’s stats. This is because replacement level hitters are facing replacement level pitching and my guess would be that the NL’s overall batting line and R/G would barely change – even if the talent changed dramatically. Now wOBA is calculated using both leagues and the offensive output by these replacement hitters would be weighted as if they put up these numbers against actual major league competition. Thus, the NL would be undeservedly credited with batting runs and run prevention for the pitchers (again versus replacement hitters).

This is certainly an exaggeration but it is still true with one league being weaker. The only way we would notice the changed talent level would be the interleague record against the AL. In a perfectly balanced world with two equally strong and talented leagues we were to see a .500 record and our 1,000 WAR could be handed out 50/50 between the AL and NL and 57/43 between position players and pitchers. What would the interleague record be? What would it have to be? The answer is pretty easy: .294 aka replacement level. Now this is interesting and it seems like we are going somewhere. Here seems to lie the key for the proper league adjustments because how much WAR should be handed out to a league that wins at a replacement level against a “true” major league? Sounds pretty darn like a league full of replacement players which are by definition worth 0 WAR. And this 0 WAR should be the correct answer based on our assumptions in this thought experiment.

How do we get there?

1) Calculate every aspect that goes into WAR (R/PA, wOBA, FIP, etc) separately for both leagues. In fact we have to treat both leagues as independent. This would mean 500 WAR for each league per default, distributed 57/43 between position players and pitchers.

2) Figure out the interleague record. I would suggest using something like a 3 year regressed rolling average (Just like the 5 year rolling regressed park factors on FG that can actually change a player’s WAR retroactively if his home park happens to play very hitter – or pitcher friendly in the immediate future) I will use a .525 record in favor of the AL for an example later on.

3) Based on the “true” replacement levels of .294 for teams, .380 for starters and .470 for relievers we calculate an “artificial replacement level” for the weaker and the stronger league via the odds ratio. Using the .525 interleague record for the AL as an example this will come out to an artificial replacement level of

.315 for NL teams / .274 for AL teams

.404 for NL starting pitchers / .357 for AL staring pitchers

.495 for NL relievers / .445 for AL relievers.

To help interpret these numbers think about it this way: The .475 NL is the weaker league. A “replacement team” would have a .294 record in the NL (forget about interleague for a moment). If this team plays against a .294 AL team, we would expect a .500 W% IF both leagues are equally strong. But we already established that the AL wins at a .525 clip when two teams with “equal” records IN their respective leagues match up. The .315 “artificial” replacement level for the NL means that we expect a .315 NL team to win 50% of all games a against a .294 AL team. Thus, we can conclude that the replacement level bar to clear should be put a little higher in the NL because it seems easier to accumulate value in the weaker league. On the other hand the opposite is true for the AL, where the replacement level bar should be put a little lower for the same opposite reasons and to be consistent with handing out 1,000 WAR each year.

4) Derive the correct distribution of WAR for both leagues based on the artificial replacement levels. In my thought experiment at the beginning we would have a 0/1,000 WAR distribution, because replacement level would actually be .500 for the NL using my methodology in 3). A balanced league would have a 500/500 WAR distribution with a replacement level of .294 for both leagues. With the AL winning at a .525 clip against the NL this means a WAR distribution close to 450/550 in favor of the AL.

The WAR distribution for 2014 on FG was 472/528 in favor of the AL.

Conclusion

There are really some beautiful and elegant side effects. The independence of both league’s calculations would mean interleague adjustments are not necessary at all. This is because even if there are about 12% interleague games, pitchers and hitters are only compared to the stats that other players in the same league have put up – interleague included. The adjustment takes place when we evaluate the interleague record because this is the only direct way to measure difference in strength/talent. The current league adjustments are a little bit flawed in my opinion because wOBA and the run environment is calculated for the entire MLB and interleague records are not taken into consideration at all. Therefore a stiff replacement level is used for all years. My methodology addresses these problems and scales an artificial replacement level for each year and league based on a multi-year regressed interleague record while still keeping the overall replacement level for all of MLB to .294 and 1,000 WAR each year.

To be honest with you I am not a huge fan of divisional adjustments because of small samples and differing opponents. In an entire season’s interleague schedule there should be a lot more signal. I think when applying divisional adjustments we would have to regress heavily. I am not entirely sold yet to include a possibly very complicated divisional adjustment when its heavily regression doesn’t give us much to learn from anyway. But I am open to be sold the other way.

Look forward to a follow-up in which I walk through some real life examples and present some of the changes my methodology brings. Feel free to comment and discuss! Prost!

Clay Buchholz: Not What He Appears to Be

Matter of Import: The Padres’ Strange Roster

17 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

jim S.

10 years ago

Ah, das war meine Frage. Deutsche, also.

AC_Butcha_AC

10 years ago

Ja, ich bin aus Deutschland 😉

Jim S.

10 years ago

Und sehr gut geschrieben, sollen wir auch sagen.

10 years ago

Ihr Englisch >>> Mein Deutsch.

Noah BaronMember since 2016

10 years ago

While I don’t think I understand your method well enough to know if it’s technically sound, I think a much easier way to adjust for league would be as follows:

1. Calculate WAR, for pitchers and position players, separately and relative to the league they are playing in. This would mean that the AL and NL have roughly equal WAR for both position players and Pitchers.

2. Use two years of interleague winning percentage (slightly regressed) to create league multipliers. Don’t forget to include the fact that the each league plays 12.3% of its games against the other league, moving the multipliers closer to 1.

3. To adjust for league, multiply a player’s WAR by that league’s league multiplier. For example, if Giancarlo Stanton currently has 6.1 WAR before adjusting for league. Using the .9521 NL multiplier I found in the comment section of my post his new WAR is 5.8.

AC_Butcha_AC

10 years ago

Hey Noah,
IMO it is not beneficial to use a multiplier because this would treat every player the same. I think it probably would be more accurate than the current iteration of fWAR nevertheless. In my second article coming I have an example in there and made the observation that not all players receive the same credit for being in the tougher league. This is because in a lower run environment the relative value of extra bases goes up, the on base value goes down and more risk can be taken in base-stealing. This affects certain player types more than others which a stif multiplier for all players wouldn’t catch.
But feel free to share any more thoughts because I am in favor of knowledge and your post gave me the final push to publish here.

Noah BaronMember since 2016

10 years ago

Reply to AC_Butcha_AC

That’s a good point that I hadn’t thought about. Still, I doubt that this would make using league multipliers inaccurate to more than a couple thousandths places.

I mean, how much does playing in a league that scores .24 runs more per 9 innings really affect the the relative values of each event? While I’m sure it exists, I can’t imagine it’s too significant.

Noah BaronMember since 2016

10 years ago

I also disagree with the following:

“I am not a huge fan of divisional adjustments because of small samples and differing opponents. In an entire season’s interleague schedule there should be a lot more signal.”

In a given season, each league plays 300 interleague games. Likewise, each division plays 330 inter-division games within its league. There is no reason that we can’t do divisional adjustments, especially if we regress two seasons of data.

It’s a fact that certain teams benefit/suffer from playing in strong/weak divisions. Why can’t we implement a real-life phenomenon we all know exists into WAR?

AC_Butcha_AC

10 years ago

With my league adjustment, Stanton comes also in at 5.8 WAR. On the extremes using my method the difference in WAR is something like -0.3 to + 0.2. But this is topic of my third article.

AC_Butcha_AC

10 years ago

I am certeainly open to divisiona adjustments. There were two resaons for not including them. One, I wanted to get my concept out there and not overwhelm entirely with another adjustment. Two, I may have expressed myself badly on short sample.
I wanted to say that on interleague play between the AL and NL it will almost always come down to two on average .500 teams. This will be true for all years. In divisional play there can be huge swings in talent for certain teams if they go all-in to win now or blow off their roster (Oakland). Thus we would see more variance which means more regression. Don’t get me wrong, I would actually be in favor of a divisional adjustment because there are advantages and disadvantages, no question. But because of the variance I feel like we would have to regress so much that we will always come close to no effect. This is unfortunately especially true because it wouldn’t make sense to take 5 years of data because you would be talking sometimes completely different rosters and that is not what we want to reflect.

Tangotiger

10 years ago

Excellent!

***

Noah: As for whether to multiply or add the adjustments: you have to add. Otherwise, you are saying that a 0 WAR player will remain 0 WAR regardless of which league he goes to. WAR is not some absolute number, like scoring runs. It is itself relative to something. Don’t multiply it.

***

Keep up the good work guys, I am enjoying what I see.

Noah BaronMember since 2016

10 years ago

Reply to Tangotiger

Good point. I’m glad you mentioned this.

Tangotiger

10 years ago

Reply to Tangotiger

Even worse: think about the negative WAR.

You would have come across these scenarios yourself, so, just saving you the time and trouble that I went through.

Noah BaronMember since 2016

10 years ago

Reply to Tangotiger

So it looks like you have your own, privately-operated WAR that’s only available to you and the Cubs? 🙂

I did think of the problem actually. I just couldn’t think of a better solution (like an additive league/divisional adjustment).

Tangotiger

10 years ago

Reply to Noah Baron

I encountered these kinds of issues 20-30 years ago. This happens when you deal with stuff like Linear Weights.

Noah BaronMember since 2016

10 years ago

Also I forgot to say (amid my criticism) that this is great work. We need as many people as possible questioning the way fWAR is calculated to improve upon it.

AC_Butcha_AC

10 years ago

I am not taking criticism as a bad thing really. As long as you keep my mom out of it or something lol. I am just here to increase knowledge and share ideas to improve our modells. Like you did with the FIP park factors. I loved it. I respect every reasonable person’s criticism and are always in favor of communcation to move forward together.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG