Thoughts on the MVP Award: Team-Based Value and Voter Bias by Jonah Pemstein November 30, 2013 You are reading this right now. That is a fact. Since you are reading this right now, many things can be reasonably inferred: 1. You probably read FanGraphs at least fairly often 2. Since you probably read FanGraphs at least fairly often, you probably know that there are a lot of differing opinions on the MVP award and that many articles here in the past week have been devoted to it. 3. You probably are quite familiar with sabermetrics 4. You probably are either a Tigers fan or think that Mike Trout should have won MVP, or both 5. You might know that Josh Donaldson got one first-place vote 6. You might even know that the first-place vote he got was by a voter from Oakland 7. You might know that Yadier Molina got two first-place votes, and they both came from voters from St. Louis 8. You might even know that one of the voters who put Molina first on his ballot put Matt Carpenter second 9. You might be wondering if there is any truth to the idea that Miguel Cabrera is much more important to his team than Mike Trout is I have thought about many of those things myself. So, in this very long 2-part article, I am going to discuss them. Ready? Here goes: Part 1: How much of an impact does a player have on his team? Lots of people wanted Miguel Cabrera to win the MVP award. Some of you reading this may be shocked, but it’s actually true. One of the biggest arguments for Miguel Cabrera over Mike Trout for MVP is that Cabrera was much more important and “valuable” than Trout. Cabrera’s team made the playoffs. Trout’s team did not. Therefore anything Trout did cannot have been important. Well, let’s say too important. I don’t think that anybody’s claiming that Trout had zero impact on the game of baseball or the MLB standings whatsoever. OK. That’s reasonable. There’s nothing flawed about that thinking when it’s not a rationale for voting Cabrera ahead of Trout for MVP. As just a general idea, it makes sense: Cabrera had a bigger impact on baseball this year than Trout did. I, along with many other people in the sabermetric community, disagree with the fact that that’s a reason to vote for Cabrera, though. But the question I’m going to ask is this: did Cabrera have a bigger impact on his own team than Trout did? WAR tells us no. Trout had 10.4 WAR, tops in MLB. Cabrera had 7.6 – a fantastic number, good for 5th in baseball and 3rd in the AL, as well as his own career high – but clearly not as high as Trout. Miggy’s hitting was out of this world, at least until September, and it’s pretty clear than he could have at least topped 8 WAR easily had he stayed healthy through the final month and been just as productive as he was April through August. But, fact is, he did get hurt, and did not finish with a WAR as high as Trout. So if they were both replaced with a replacement player, the Tigers would suffer more than the Angels. Cabrera was certainly valuable – if replaced by a replacement, the 7 or 8 wins the Tigers would lose would probably not be enough to win them the AL Central. But take Trout out, and the Angels go from a mediocre-to-poor team to a really bad one. The Angels had 78 wins this year, and that would have been around 68 (if we trust WAR) without Trout. That would have been the 6th worst total in the league. So, by WAR, Trout meant more to his team than Cabrera did. But WAR is not the be all and end all of statistics (though we may like to think it is sometimes). Let’s look at this from another angle. Here’s a theory for you: the loss of a key player on a good team would probably not hurt that team as much because they’re already good to begin with. If a not-so-good team loses a key player, though, the other players on the team aren’t as good so they can’t carry the team very well. How do we test this theory? Well, we have at our disposal a fairly accurate and useful tool to determine how many wins a team should get. That tool is pythagorean expectation – a way of predicting wins and losses based on runs scored and allowed. So let’s see if replacing Trout with an average player (I am using average and not replacement because all the player run values given on FanGraphs are above or below average, not replacement) is more detrimental to the Angels than replacing Cabrera with an average player is to the Tigers. The Angels, this year, scored 733 runs and allowed 737. Using the Pythagenpat (sorry to link to BP but I had to) formula, I calculated their expected win percentage, and it came out to .497 – roughly 80.6 wins and 81.4 losses*. That’s actually significantly better than they did this year, which is good news for Angels fans. But that’s not the focus right here. Trout, this year, added 61.1 runs above average at the plate and 8.1 on the bases for a total of 69.2 runs of offense. He also saved 4.4 runs in the field (per UZR). So, using the Pythagenpat formula again with adjusted run values for if Trout were replaced by an average hitter and defender (663.8 runs scored and 741.4 runs allowed), I again calculated the Angels’ expected win percentage. This came out to be .449 – roughly 72.7 wins and 89.3 losses. 7.9 fewer wins than the original one. That’s the difference, for that specific Angels team, that Trout made. Now, keep in mind, this is above average, not replacement, so it will be lower than WAR by a couple wins (about two WAR signifies an average player, so wins above average will be about two less than wins above replacement). 7.9 wins is a lot. But is it more than Cabrera? Let’s see. This year, the Tigers scored 796 runs and allowed 624. This gives them a pythagorean expectation (again, Pythagenpat formula) of a win percentage of .612 – roughly 99.1 wins and 62.9 losses. Again much better than what they did this year, but also not the focus of this article. Cabrera contributed 72.1 runs above average hitting and 4.4 runs below average on the bases for a total of 67.7 runs above average on offense. His defense was a terrible 16.8 runs below average. Now take Cabrera out of the equation. With those adjusted run totals (728.3 runs scored and 607.2 runs allowed) we get a win percentage of .583 – 94.4 wins and 67.6 losses. A difference of 4.7 wins from the original. Talk about anticlimactic. Trout completely blew Cabrera out of the water (I would say no pun intended, but that was intended). This makes sense if we think about it – a team with more runs scored will be hurt less by x fewer runs because they are losing a lower percentage of their runs. In fact, if we pretend the Angels scored 900 runs this year instead of 733, they go from a 96.5-win team with Trout to an 89.8-win team without. Obviously, they are better in both cases, but the difference Trout makes is only 6.7 wins – pretty far from the nearly 8 he makes in real life. The thing about this statistic is that it penalizes players on good teams. Generally, statistics such as the “Win” for pitchers are frowned upon because they measure things that the pitcher can’t control – just like this one. But if we want to measure how much a team really needs a player, which is pretty much the definition of value, I think this does a pretty good job. Obviously, it isn’t perfect: the numbers that go into it, especially the baserunning and fielding ones, aren’t always completely accurate, and when looking at the team level, straight linear weights aren’t always the way to go; overall, though, this stat gives a fairly accurate picture. The numbers aren’t totally wrong. Here’s a look at the top four vote-getters from each league by team-adjusted wins above average (I’ll call it tWAA): Player tWAA Mike Trout 7.9 Andrew McCutchen 6.4 Paul Goldschmidt 6.2 Chris Davis 6.1 Josh Donaldson 4.9 Miguel Cabrera 4.7 Matt Carpenter 4.0 Yadier Molina 3.1 This is interesting. Like expected, the players on better teams have a lower tWAA than the ones on good teams, just as we discussed earlier. One notable player is Yadier Molina, who despite being considered one of, if not the best catcher in the game, has the lowest tWAA of anyone on that list. This may be because he missed some time. But let’s look at it a little closer: if we add the 2 wins that an average player would provide over a replacement-level player, we get 5.1 WAR, which isn’t so far off of his 5.6 total from this year. And the Cardinals’ pythagorean expectation was 101 wins, so obviously under this system he won’t be credited as much because his runs aren’t as valuable to his team. Another factor is that we’re not adjusting by position here (except for the fielding part), and Molina is worth more runs offensively above the average catcher than he is above the average hitter, since catchers generally aren’t as good at hitting. But if Molina was replaced with an average catcher, I’m fairly certain that the Cardinals would lose more than the 3 games more that this number suggests. They might miss Molina’s game calling skills – if such a thing exists – and there’s no way to quantify how much Molina has helped the Cardinal pitchers improve, especially since they have so many rookies. But there’s also something else, something we can quantify, even if not perfectly. And that’s pitch framing. Let’s add the 19.8 runs that Molina saved (measured by Statcorner) to Molina’s defensive runs saved (for which, by the way, I used the Fielding Bible’s DRS, since there is no UZR for catchers – that may be another reason Molina’s number may seem out of place, because DRS and UZR don’t always agree; Trout’s 2013 UZR was 4.4, and his DRS was -9. Molina did play 18 innings at first base, where he had a UZR of -0.2. We’ll ignore that, though, since it is such a small sample size and won’t make such a big difference). Here is the table with only Molina’s tWAA changed, to account for pitch framing: Player tWAA Mike Trout 7.9 Andrew McCutchen 6.4 Paul Goldschmidt 6.2 Chris Davis 6.1 Yadier Molina 5.4 Josh Donaldson 4.9 Miguel Cabrera 4.7 Matt Carpenter 3.9 Now we see Molina move up into 5th place out of 8 with a much better tWAA of 5.4 – more than 2 wins better than without the pitch framing, and about 7.4 WAR if we want to convert from wins above average to wins above replacement. Interesting. I don’t want to get into a whole argument now about whether pitch framing is accurate or actually based mostly on skill instead of luck, or whether it should be included in a catcher’s defensive numbers when we talk about their total defense. I’m just putting that data out there for you to think about. But as I mentioned before, I used DRS for Molina and not UZR. What if we try to make this list more consistent and use DRS for everyone? (We can’t use UZR for everyone.) Let’s see: Player tWAA DRS UZR Mike Trout 6.5 -9 4.4 Andrew McCutchen 6.4 7 6.9 Paul Goldschmidt 7.0 13 5.4 Chris Davis 5.5 -7 -1.2 Molina w/ Framing 5.4 31.8 N/A Josh Donaldson 5.0 11 9.9 Miguel Cabrera 4.6 -18 -16.8 Matt Carpenter 4.1 0 -0.9 Yadier Molina 3.1 12 N/A We see Trout go down by almost a win and a half here. I don’t really trust that, though, because I really don’t think that Mike Trout is a significantly below average fielder, despite what DRS tells me. DRS actually gave Trout a rating of 21 in 2012, so I don’t think it’s as trustworthy. But for the sake of consistency, I’m showing you those numbers too, with the DRS and UZR comparison so you can see why certain people lost/gained wins. OK. So I think we have a pretty good sense for who was most valuable to their teams. But I also think we can improve this statistic a little bit more. Like I said earlier, the hitting number I use – wRAA – is based off of league average, not off of position average. In other words, if Chris Davis is 56.3 runs better than the average hitter, but we replace him with the average first baseman, that average first baseman is already going to be a few runs better than the average player. So what if we use weighted runs above position average? wRAA is calculated by subtracting the league-average wOBA from a player’s wOBA, dividing by the wOBA scale, and multiplying by plate appearances. What I did was subtract the position average wOBA from the player’s wOBA instead. So that penalizes players at positions where the position average wOBA is high. Here’s your data (for the defensive numbers I used UZR because I think it was better than DRS, even though the metric wasn’t the same for everyone): Player position-adj. tWAA Pos-adj. wRAA wRAA Trout 7.7 59.4 61.1 McCutchen 6.2 40.1 41.7 Molina w/ Framing 5.6 23.3 20.5 Goldschmidt 5.0 39.5 50.1 Davis 5.0 46.4 56.3 Donaldson 4.9 36.6 36.7 Cabrera 4.7 72.0 72.1 Carpenter** 4.3 41.7 37.8 Molina 3.4 23.3 20.5 I included here both the regular and position-adjusted wRAA for all players for reference. Chris Davis and Paul Goldschmidt suffered pretty heavily – each lost over a win of production – because the average first baseman is a much better hitter than the average player. Molina got a little better, as did Carpenter, because they play positions where the average player isn’t as good offensively. Everyone else stayed almost the same, though. I think this position-adjusted tWAA is probably the most accurate. And I would also use the number with pitch framing included for Molina. It’s up to you to decide which one you like best – if you like any of them at all. Maybe you have a better idea, in which case you should let me know in the comments. Part 2: Determining voter bias in the MVP award As I mentioned in my introduction, Josh Donaldson got one first-place MVP vote – from an Oakland writer. Yadier Molina got 2 – both from St. Louis writers. Matt Carpenter got 1 second-place vote – also from a St. Louis writer. Obviously, voters have their bias when it comes to voting for MVP. But how much does that actually matter? The way MVP voting works is that for each league, AL and NL, two sportswriters who are members of the BBWAA are chosen from each location that has a team in that league – 15 locations per league times 2 voters per location equals 30 voters total for each league. That way you won’t end up with a lot of voters or very few voters from one place who may be biased one way or another. But is there really voter bias? In order to answer this question, I took all players who received MVP votes this year (of which there were 49) and measured how many points each of them got per 2 voters***. Then I took the amount of points that each of them got from the voters from their chapter and found the difference. Here’s what I found: AL: Player, Club City Points Points/2 voter Points From City voters % Homer votes Homer difference Josh Donaldson, Athletics OAK 222 14.80 22 9.91% 7.20 Mike Trout, Angels LA 282 18.80 23 8.16% 4.20 Evan Longoria, Rays TB 103 6.87 11 10.68% 4.13 David Ortiz, Red Sox BOS 47 3.13 7 14.89% 3.87 Adam Jones, Orioles BAL 9 0.60 3 33.33% 2.40 Miguel Cabrera, Tigers DET 385 25.67 28 7.27% 2.33 Coco Crisp, Athletics OAK 3 0.20 2 66.67% 1.80 Edwin Encarnacion, Blue Jays TOR 7 0.47 2 28.57% 1.53 Max Scherzer, Tigers DET 25 1.67 3 12.00% 1.33 Salvador Perez, Royals KC 1 0.07 1 100.00% 0.93 Koji Uehara, Red Sox BOS 2 0.13 1 50.00% 0.87 Chris Davis, Orioles BAL 232 15.47 16 6.90% 0.53 Adrian Beltre, Rangers TEX 99 6.60 7 7.07% 0.40 Yu Darvish, Rangers TEX 1 0.07 0 0.00% -0.07 Felix Hernandez, Mariners SEA 1 0.07 0 0.00% -0.07 Shane Victorino, Red Sox BOS 1 0.07 0 0.00% -0.07 Jason Kipnis, Indians CLE 31 2.07 2 6.45% -0.07 Torii Hunter, Tigers DET 2 0.13 0 0.00% -0.13 Hisashi Iwakuma, Mariners SEA 2 0.13 0 0.00% -0.13 Greg Holland, Royals KC 3 0.20 0 0.00% -0.20 Carlos Santana, Indians CLE 3 0.20 0 0.00% -0.20 Jacoby Ellsbury, Red Sox BOS 3 0.20 0 0.00% -0.20 Dustin Pedroia, Red Sox BOS 99 6.60 5 5.05% -1.60 Manny Machado, Orioles BAL 57 3.80 2 3.51% -1.80 Robinson Cano, Yankees NY 150 10.00 8 5.33% -2.00 NL: Player, Club City Points Points/2 voter Points from City Voters % Homer votes Homer difference Yadier Molina, Cardinals STL 219 14.60 28 12.79% 13.40 Hanley Ramirez, Dodgers LA 58 3.87 7 12.07% 3.13 Joey Votto, Reds CIN 149 9.93 13 8.72% 3.07 Allen Craig, Cardinals STL 4 0.27 3 75.00% 2.73 Jayson Werth, Nationals WAS 20 1.33 4 20.00% 2.67 Hunter Pence, Giants SF 7 0.47 3 42.86% 2.53 Yasiel Puig, Dodgers LA 10 0.67 3 30.00% 2.33 Matt Carpenter, Cardinals STL 194 12.93 15 7.73% 2.07 Andrelton Simmons, Braves ATL 14 0.93 2 14.29% 1.07 Paul Goldschmidt, D-backs ARI 242 16.13 17 7.02% 0.87 Michael Cuddyer, Rockies COL 3 0.20 1 33.33% 0.80 Andrew McCutchen, Pirates PIT 409 27.27 28 6.85% 0.73 Clayton Kershaw, Dodgers LA 146 9.73 10 6.85% 0.27 Craig Kimbrel, Braves ATL 27 1.80 2 7.41% 0.20 Russell Martin, Pirates PIT 1 0.07 0 0.00% -0.07 Matt Holliday, Cardinals STL 2 0.13 0 0.00% -0.13 Buster Posey, Giants SF 3 0.20 0 0.00% -0.20 Adam Wainwright, Cardinals STL 3 0.20 0 0.00% -0.20 Adrian Gonzalez, Dodgers LA 4 0.27 0 0.00% -0.27 Troy Tulowitzki, Rockies COL 5 0.33 0 0.00% -0.33 Shin Soo Choo, Reds CIN 23 1.53 1 4.35% -0.53 Jay Bruce, Reds CIN 30 2.00 1 3.33% -1.00 Carlos Gomez, Brewers MIL 43 2.87 1 2.33% -1.87 Freddie Freeman, Braves ATL 154 10.27 8 5.19% -2.27 Where points is total points received, points/2 voter is points per two voters (points/15), points from city voters is points received from the voters in the player’s city, % homer votes is the percentage of a player’s points that came from voters in his city, and homer difference is the difference between points/2 voter and points from city voters. Charts are sorted by homer difference. I don’t know that there’s all that much we can draw from this. Obviously, voters are more likely to vote for players from their own city, but that’s to be expected. Voting was a little bit less biased in the AL – the average player received exactly 1 point more from voters in their city than from all voters in the AL, whereas that number in the NL was 1.21. 8.08% of all votes in the AL came from homers compared to 8.31% in the NL. If you’re wondering which cities were the most biased, here’s a look: AL: City Points Points/2 voter Points From City voters Difference OAK 225 15.00 24 9.00 LA 282 18.80 23 4.20 TB 103 6.87 11 4.13 DET 412 27.47 31 3.53 BOS 152 10.13 13 2.87 TOR 7 0.47 2 1.53 BAL 298 19.87 21 1.13 KC 4 0.27 1 0.73 TEX 100 6.67 7 0.33 SEA 3 0.20 0 -0.20 CLE 34 2.27 2 -0.27 NY 150 10.00 8 -2.00 NL: City Points Points/2 voters Points From City Voters Difference STL 422 28.13 46 17.87 LA 218 14.53 20 5.47 WAS 20 1.33 4 2.67 SF 10 0.67 3 2.33 CIN 202 13.47 15 1.53 ARI 242 16.13 17 0.87 PIT 410 27.33 28 0.67 COL 8 0.53 1 0.47 ATL 195 13.00 12 -1.00 MIL 43 2.87 1 -1.87 Where all these numbers are just the sum of the individual numbers for all players in that city. If you’re wondering what players have benefited the most from homers in the past 2 years, check out this article by Reuben Fischer-Baum over at Deadspin’s Regressing that I found while looking up more info. He basically used the same method I did, only for 2012 as well (the first year that individual voting data was publicized). So that’s all for this article. Hope you enjoyed. ———————————————————————————————————————————————————– *I’m using fractions of wins because that gives us a more accurate number for the statistic I introduce by measuring it to the tenth and not to the single digit. Obviously a team can’t win .6 games in real life but we aren’t concerned with how many games the team won in real life, only their runs scored and allowed. **Carpenter spent time both at second base and third base, so I used the equation (Innings played at 3B*average wOBA for 3rd basemen + Innings played at 2B*average wOBA for 2nd basemen)/(Innings played at 3B + Innings played at 2B) to get Carpenter’s “custom” position-average wOBA. He did play some other positions too, but very few innings at each of them so I didn’t include those. It came out to about .307. ***Voting is as such: Each voter puts 10 people on their ballot, with the points going 14-9-8-7-6-5-4-3-2-1.