“Alex Bregman is the runaway AL MVP for 2019” – MVP voters from the 1950s & 1960s
“Mike Trout finishes a disappointing 5th in 2019 AL MVP voting” – MVP voters from the 1960s & 1970s
“Christian Yelich is the near-unanimous 2019 NL MVP” – MVP voters from the 1980s & 1990s
“Xander Bogaerts narrowly misses the 2019 AL MVP” – MVP voters from the 1960s & 1970s
The Evolving MVP Voter’s CriteriaThe winner of the MLB MVP awards is a function of two factors: How the players performed, and how the electorate evaluated that performance.
Much attention is paid to how players perform and how they stack up historically to peers from different eras, but for MVP selection, little attention has been paid to how the electorate has changed and shifted the definition of the Most Valuable Player.
Since 1931, the Baseball Writers Association of America (BBWAA) has voted and awarded each league’s MVP award. Over this period of time, the world’s understanding of player performance and what contributes to winning has changed dramatically. The 1931 voters probably looked at home runs, RBIs, and batting average leaderboards printed at year-end in their daily newspaper before filling out their ballot. That’s not to accuse them of being narrowly minded, it was just all they had available to them and all the baseball world knew to look at.
On the other hand, the 2019 voter (hopefully) spent at least a few minutes on FanGraphs or a similar site looking at things like WAR, wRC+, and DRS, and at best also considered advanced Statcast data and maybe even built their own AI-powered simulations to model a season without the player to see how much worse their team performed. At least, that’s what I would do if I had a ballot, and that’s what I would call “responsible voting” in 2019.
Seeing 2019 Through the Eyes of Prior PeriodsI believe the lens through which the MVP voter sees the world has changed since 1931. What might have been deemed an MVP-level performance in the 1930s might not be so highly valued today, and players that deserve MVP status in 2019 might have been much less appreciated in the 1940s.
To test that, I turned to AI tools from DataRobot to predict who MVP voters from different eras would’ve voted for in the highly-competitive 2019 races. Using player performance statistics from FanGraphs and actual MVP results, I built AI models that predicted the probability that a single player would be elected MVP within 20-year periods between 1931 and 2018. In other words, one model would predict MVPs based on the criteria established by the 1931-1949 electorate, and a different model would replicate the criteria of the 1950-1969 electorate. I then fed actual 2019 performance results into each of those models to see who the voters from those periods would’ve elected if they had a ballot in 2019.
How Period Voters Would’ve Voted in 2019The results of these simulations are below. Each 20-year period has its own column, and the bars indicate the likelihood of MVP election for that player.
Note: For the purposes of simplicity, I only considered position players for MVP voting. Adding pitchers would’ve been overly complex, if not impossible, to establish cross-position, cross-period criteria.
Mike Trout would’ve been selected AL MVP by the 1980-1999 voters and 2000-2018 voters, but he would’ve been greatly less-appreciated by the voters of 1940-1979 who would’ve selected Alex Bregman. Very interestingly, the 1960-1979 voters would’ve loved 2019 Xander Bogaerts and given him a close second-place finish.
Interestingly, the selection of Cody Bellinger should’ve been a major surprise in 2019 since the recent trends of the 2000-2018 electorate pointed towards Christian Yelich instead, and the same goes with the 1980-1999 voters. The 1960-1979 voters would’ve selected Bellinger, but the 1940-1959 voters would’ve actually favored Anthony Rendon narrowly over Nolan Arenado and Christian Yelich.
What Voters Cared About by PeriodSo how has voter criteria changed over time? I charted each player performance stat that went into the AI models to show the relative importance by period. I’ve pre-selected the most important stats (aka
features), but you can also add/remove any stat to see how it changed over time as well.
- Def (Defensive Runs Above Average): This has steadily declined in performance over time, and can explain the early-period preferences for Bregman over Trout.
- RBI (Runs Batted In): The most important stat in the 1960-1979 period, but it has become all but irrelevant recently. This tended to give players on better teams (e.g. Astros, Nationals, Dodgers, Brewers) more of an advantage since they’d have more RBI opportunities.
- Off (Offensive Runs Above Average): This advanced, all-encompassing stat was all-but-invisible in earlier periods, but it has taken on paramount importance in the “Moneyball era”. Similar stories can be told for wOBA and wRC+.
- WAR (Wins Above Replacement): Interestingly, this has taken on less importance recently. That’s not because it doesn’t matter, but more I think because it has been crowded out by other advanced stats (e.g. Offensive Runs Above Average and wRC+).
- AVG (Batting Average): What used to be a key stat in all matters of baseball discourse has become a non-factor in MVP selection.
Detailed Statistic Importance by PeriodLastly, I’ve included the detailed relative importance from my AI models for each period of voters. The charts below show how much each stat affects the final prediction, with larger blue bars showing more importance. This is a detailed break-out of the “Feature Importance” interactive chart above.
1940 – 1959 Voters
1960 – 1979 Voters
1980 – 1999 Voters
So What?MVP voting criteria has definitely changed. This has probably been influenced by contextual factors such as pitcher strength, ballpark design, baseball & bat liveliness, and even baseball rule changes such as changing the height of the mound in 1969. However, a large effect has to be our increased understanding of what contributes to winning and which players actually score and prevent runs the best.
I also think there is no doubt that these criteria will continue to improve as we learn more and remove “luck and chance” from our understanding of how outcomes happen. Perhaps Statcast data will improve, as will our understanding of it and our computational power to harness it; or perhaps biometric data will start creeping in and we’ll be able to evaluate offensive performance vs. pitcher fatigue. Who knows, perhaps David Bote was actually grossly under-appreciated in 2019 because he hit the best when controlling relative tendon flexibility of the RHPs he faced. The 2019 voters wouldn’t know this, but the 2050 voters might.
This post originally ran with interactive charts at baseball-pop.com
I write about baseball because it's fun. My writing is very data-driven, which is often not fun. I aspire to someday write the 'Infinite Jest' of baseball advanced analytics. I'm also a technology transformation consultant and owner of a very shed-y dog.