Do MVP Voters Look at Some Stats Above Others?

The regression that I am going to run analyzes whether sabermetric statistics, more specifically WAR, have a greater impact on MVP voting than traditional statistics. This is important to the sport because MVP voting helps players garner a good reputation. It also affects how the front office of each major-league baseball team goes about acquiring specific players. In fact, the salaries of players can be affected by MVP voting, especially if that player is in the last year of his contract and is preparing to become a free agent. In turn, acquiring high-level or MVP-type players can potentially improve overall team performance, which would result in an increase in attendance, and therefore, the team would have an increase in revenue.

The data set that I have chosen to look at is the 2014 results for MVP voting for both the American and National Leagues. Also, I will look at the individual statistics for each of the players that received votes. From this relationship, the independent variables would be the player statistics (batting average, home runs, RBI, WAR) and the dependent variable would the number of votes that each player receives. This is because certain statistics are bound to affect whether one player receives more votes than another. Essentially, what I am trying to prove is that one set of statistics is a better indicator of player ability and player contribution than the other set. Bill James was one of the first to expound upon sabermetrics when he wrote a series of books known as Baseball Abstract in the 1980s. Many other baseball historians, such as Pete Palmer and John Thorn, have written books detailing and introducing the concept of sabermetric statistics. While many books have been written and studies have been done about sabermetrics, no one has really done a study about the accuracy and influence that sabermetrics can have on statisticians, writers, fans, and teams.

For this regression, I analyzed only position players (non-pitchers) to prevent confusion due to the use of different statistics which are required to analyze pitchers separately. After the running of the regression, it appears that the WAR has a greater impact on MVP voting than home runs, RBI, batting average, and stolen bases. However, the two statistics that seem to have the greatest impact on MVP voting are On-Base Percentage (OBP) and Slugging Percentage (SLG). WAR has a positive slope of 35.9 while SLG has a positive slope of 2,535.7. The coefficient of correlation (R) is 0.87 and this seems to indicate that the nature of the relationship in this regression is positive. Also, the fact that the coefficient of correlation is closer to 1 indicates that there is a significant relationship between respective statistics and their influence on MVP voting. The coefficient of determination (R^2) is 0.76. This shows that just about 76% of the MVP voting results can be attributed to the certain statistics of a specific player. For instance, in the American League, Mike Trout led the league in WAR and RBI, and was third in SLG. Since those two statistics were the most impactful, they definitely contributed to Mike Trout being named the MVP. Therefore, this relationship is positive, and some statistics have a significantly higher impact on MVP voting than others. Once again, based on the regression, SLG seems to be the most impactful statistic, and stolen bases were the least impactful.

After analyzing the results of the regression, I ran a hypothesis test to determine the population coefficient of correlation. The level of significance for this hypothesis test was 0.05. The null hypothesis was that p=0; in other words, there is no significant relationship between any statistic and MVP voting. The alternative hypothesis is that p>0, p<0 and that there is a significant relationship between certain statistics and MVP voting. The degrees of freedom for this hypothesis test was 21. The t-critical value turned out to be about 2.1. I tested each individual test statistic and discovered that there is a significant relationship between MVP voting and RBI, SLG, and WAR since the t-calc for those variables was greater than 2.1.

To further test this theory, I also did an ANOVA. I wanted to test the variation of MVP voting when compared to certain statistics at the 0.05 level of significance. The degrees of freedom1 was 7 and the degree of freedom 2 was 21. Therefore the f-critical value turned out to be 2.5. F-Calc from the ANOVA was 9.6. Since F-calc is greater than the critical value, we prove that, once again, there is a significant relationship between certain statistics and MVP voting.

Next, I did a test for the least squares regression. For the least squares regression you have to do a test for three separate things. They are normality, homoscedasticity, and independence. To test for normality, I looked at the normal probability plot. The points on this plot seemed to be curved slightly, therefore, the residuals are not normally distributed. To test for homoscedasticity, we look at the residual plots for each of the x variables. Since most of these variables neither increase nor decrease as x increases or decreases, these variables are homoscedastic. To test for independence, you would have to run another regression. This time, it would be a simple regression using the same x variables; however, each residual is the x variable for the next one. To test for independence, you would also have to do a hypothesis test. The null hypothesis would be that bi=0 and the alternative hypothesis would be that bi>0, bi<0. If bi is equal to 0 than the residuals are independent. The level of significance is 0.05 and the degrees of freedom would be 30. The t-critical value came out to be about 1.7. T-calc turned out to be greater, which means that the residual values are not independent.

In conclusion, the initial multiple regression that I ran showed a significant relationship between certain statistics and MVP voting. Despite the fact that the residuals were not independent, the other tests that I ran showed over and over again that the same statistics that the regression stated were impactful on MVP voting were still impactful after I ran other tests. Thus, it seems that the sabermetric statistic WAR did have more of an impact on MVP voting than most of the traditional statistics such as batting average and home runs. While sabermetric statistics are a new trend in baseball analytics, they will not replace the traditional statistics such as batting average, home runs, and runs batted in, simply because those statistics have been used since the early days of baseball. Fans and statisticians alike will continue to use both traditional and sabermetric statistics to analyze player performance.

There are many other statistics that I could’ve analyzed for this regression. In fact, pitching statistics are completely different from the statistics that I used in this regression for position players. However, the statistics that I did use proved to be effective in proving that, in fact, some statistics do have a considerably greater impact on MVP voting than some statistics that some people simply assume are not relevant or needed in order to analyze player performance and contributions. Also, for this regression, I only analyzed the offensive statistics for the position players. Defensive statistics such as defensive runs saved (DRS) and defensive WAR are also important statistics that many baseball statisticians look into when evaluating player performance. Overall, the possibilities for this regression are endless, and even though there may never be a definitive statistic that everyone agrees upon for analyzing player performance, all of the statistics that I used in this regression, as well as many others, will continue to remain relevant in the game of baseball for many years to come.

2014 American League MVP Voting Results

Player, Team 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th Voting Points
Mike Trout, Angels 30 420
Victor Martinez, Tigers 16 4 3 3 2 1 229
Michael Brantley, Indians 8 6 5 4 1 1 1 1 191
Jose Abreu, White Sox 1 6 3 1 6 5 2 2 1 145
Robinson Cano, Mariners 1 1 6 5 2 4 2 1 1 124
Jose Bautista, Blue Jays 1 1 3 8 4 1 5 3 122
Nelson Cruz, Orioles 6 3 2 2 2 1 1 102
Josh Donaldson, Athletics 1 2 2 3 3 6 5 2 96
Miguel Cabrera, Tigers 1 2 2 2 2 1 6 5 82
Alex Gordon, Royals 1 1 2 2 3 1 2 44
Jose Altuve, Astros 1 3 3 3 9 41
Adam Jones, Orioles 1 3 1 1 2 2 34
Adrian Beltre, Rangers 1 5 1 1 22
Albert Pujols, Angels 1 1 5

 

 

2014 National League MVP voting results

Player, Team 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th  Voting Points
Giancarlo Stanton, Marlins 8 10 12 298
Andrew McCutchen, Pirates 4 10 15 1 271
Jonathan Lucroy, Brewers 1 13 6 7 1 167
Anthony Rendon, Nationals 1 5 8 10 2 1 1 1 155
Buster Posey, Giants 1 6 9 6 3 1 1 1 152
Adrian Gonzalez, Dodgers 1 4 2 3 3 1 57
Josh Harrison, Pirates 1 2 5 1 4 4 52
Anthony Rizzo, Cubs 1 4 2 3 4 37
Hunter Pence, Giants 1 3 2 3 1 34
Russell Martin, Pirates 2 3 1 2 21
Matt Holliday, Cardinals 1 1 2 17
Jhonny Peralta, Cardinals 1 2 3 1 17
Carlos Gomez, Brewers 2 3 1 13
Justin Upton, Braves 1 1 4 10
Jayson Werth, Nationals 1 1 3 9

American League MVP Candidate statistics: (league ranks for respective statistics in parenthesis)

PLAYER NAME BA HR RBI SLG OBP SB WAR
Mike Trout .287 (15) 36 (4) 111 (1) .561 (3) .377 (8) 16 (25) 7.9 (1)
Victor Martinez .335 (2) 32 (8) 103 (8) .565 (2) .409 (1) 3 (104) 5.3 (14)
Michael Brantley .327 (3) 20 (29) 97 (12) .506 (9) .385 (4) 23 (11) 7 (4)
Jose Abreu .317 (5) 36 (3) 107 (4) .581 (1) .383 (5) 3 (103) 5.5 (12)
Robinson Cano .314 (6) 14 (50) 82 (20) .454 (17) .382 (6) 10 (41) 6.4 (6)
Jose Bautista .286 (16) 35 (5) 103 (7) .524 (6) .403 (2) 6 (60) 6 (7)
Nelson Cruz .271 (38) 40 (1) 108 (3) .525 (5) .333 (35) 4 (87) 4.7 (23)
Josh Donaldson .255 (56) 29 (9) 98 (11) .456 (16) .342 (25) 8 (49) 7.4 (2)
Miguel Cabrera .313 (7) 25 (14) 109 (2) .524 (7) .371 (10) 1 (158) 4.9 (20)
Alex Gordon .266 (44) 19 (32) 74 (28) .432 (24) .351 (18) 12 (35) 6.6 (5)
Jose Altuve .341 (1) 7 (99) 59 (47) .453 (19) .377 (7) 56 (1) 6 (8)
Adam Jones .281 (21) 29 (10) 96 (13) .469 (13) .311 (58) 7 (54) 4.9 (19)
Adrian Beltre .324 (4) 19 (31) 77 (23) .492 (10) .388 (3) 1 (160) 7 (3)
Albert Pujols .272 (35) 28 (11) 105 (5) .466 (14) .324 (42) 5 (70) 3.9 (30)

 

National League MVP candidate statistics: (league ranks for respective statistics in parenthesis)

PLAYER NAME BA HR RBI SLG OBP SB WAR
Giancarlo Stanton .288 (15) 37 (1) 105 (2) .555 (1) .395 (3) 13 (34) 6.5 (3)
Andrew McCutchen .314 (3) 25 (10) 83 (13) .542 (2) .410 (1) 18 (22) 6.4 (4)
Jonathan LuCroy .301 (7) 13 (53) 69 (36) .465 (15) .373 (9) 4 (91) 6.7 (1)
Anthony Rendon .287 (18) 21 (23) 83 (14) .473 (13) .351 (21) 17 (24) 6.5 (2)
Buster Posey .311 (4) 22 (20) 89 (10) .490 (7) .364 (14) 0 (539) 5.2 (13)
Adrian Gonzalez .276 (29) 27 (6) 116 (1) .482 (9) .335 (34) 1 (161) 3.9 (27)
Josh Harrison .315 (2) 13 (52) 52 (65) .490 (8) .347 (24) 18 (23) 5.3 (12)
Anthony Rizzo .286 (21) 32 (2) 78 (20) .527 (3) .386 (6) 5 (79) 5.1 (15)
Hunter Pence .277 (27) 20 (27) 74 (27) .445 (26) .332 (37) 13 (33) 3.6 (34)
Russell Martin .290 (12) 11 (68) 67 (39) .430 (35) .402 (2) 4 (90) 4.1 (8)
Matt Holliday .272 (32) 20 (26) 90 (8) .441 (29) .370 (10) 4 (88) 3.4 (39)
Jhonny Peralta .263 (44) 21 (22) 75 (26) .443 (28) .336 (32) 3 (118) 5.8 (6)
Carlos Gomez .284 (23) 23 (14) 73 (28) .477 (12) .356 (18) 34 (4) 4.8 (17)
Justin Upton .270 (36) 29 (5) 102 (3) .492 (6) .342 (27) 8 (55) 3.3 (41)
Jayson Werth .292 (9) 16 (41) 82 (16) .455 (20) .394 (4) 9 (46) 4 (23)

http://www.seanlahman.com/baseball-archive/sabermetrics/sabermetric-manifesto/

www.baseball-reference.com       http://sabr.org/sabermetrics/statistics

http://bbwaa.com/14-al-mvp/                                            

http://bbwaa.com/14-nl-mvp/

SUMMARY OUTPUT
Regression Statistics
Multiple R 0.872425
R Square 0.761125
Adjusted R Square 0.6815
Standard Error 57.61154
Observations 29
ANOVA
  df SS MS F Significance F
Regression 7 222087.3 31726.76 9.558873 2.52E-05
Residual 21 69700.89 3319.09
Total 28 291788.2
  Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -807.848 179.9347 -4.48967 0.000202 -1182.04 -433.653 -1182.04 -433.653
X Variable 1 -9.48922 4.745055 -1.99981 0.058622 -19.3571 0.378659 -19.3571 0.378659
X Variable 2 2.36734 1.153175 2.052889 0.052752 -0.03082 4.765499 -0.03082 4.765499
X Variable 3 1.520176 1.123229 1.353399 0.190318 -0.81571 3.856058 -0.81571 3.856058
X Variable 4 -2356.53 1163.345 -2.02565 0.055695 -4775.84 62.77839 -4775.84 62.77839
X Variable 5 461.8825 539.878 0.855531 0.401913 -660.855 1584.62 -660.855 1584.62
X Variable 6 2535.698 864.2199 2.934089 0.007927 738.4541 4332.941 738.4541 4332.941
X Variable 7 35.88267 9.544159 3.759648 0.001153 16.03451 55.73084 16.03451 55.73084
RESIDUAL OUTPUT PROBABILITY OUTPUT
Observation Predicted Y Residuals Standard Residuals Percentile Y
1 341.4428 78.55718 1.574511 1.724138 5
2 159.2133 69.7867 1.398726 5.172414 9
3 208.4449 -17.4449 -0.34965 8.62069 10
4 208.8821 -63.8821 -1.28038 12.06897 13
5 85.97122 38.02878 0.762206 15.51724 17
6 169.1591 -47.1591 -0.9452 18.96552 17
7 89.41378 12.58622 0.252264 22.41379 21
8 139.984 -43.984 -0.88157 25.86207 22
9 152.777 -70.777 -1.41857 29.31034 34
10 72.81304 -28.813 -0.5775 32.75862 34
11 85.05055 -44.0505 -0.8829 36.2069 37
12 1.398422 32.60158 0.653429 39.65517 41
13 110.0989 -88.0989 -1.76576 43.10345 44
14 12.87683 -7.87683 -0.15787 46.55172 52
15 253.6965 44.30355 0.88797 50 57
16 232.1926 38.80738 0.777811 53.44828 82
17 120.6994 46.3006 0.927997 56.89655 96
18 133.6297 21.37027 0.428322 60.34483 102
19 58.40867 93.59133 1.875839 63.7931 122
20 78.55184 -21.5518 -0.43196 67.24138 124
21 69.89341 -17.8934 -0.35864 70.68966 145
22 104.3838 -67.3838 -1.35056 74.13793 152
23 -44.5376 78.53756 1.574118 77.58621 155
24 -7.78478 28.78478 0.57693 81.03448 167
25 -8.32685 25.32685 0.507623 84.48276 191
26 41.84824 -24.8482 -0.49803 87.93103 229
27 -0.7288 13.7288 0.275165 91.37931 271
28 58.2716 -48.2716 -0.9675 94.82759 298
29 39.27614 -30.2761 -0.60682 98.27586 420

 





Die-hard baseball fan looking to make a niche in the online baseball blog community. Enjoy writing about the Yankees, Mets, and sabermetrics but can also discuss a variety of baseball related topics.

Comments are closed.