The American League, perceived as being bad this year, was actually a good deal better than the National League overall, and
The perception of the American League’s weakness was due to a near-record level of parity, with neither great nor bad teams.
Let’s start with the second point. At the time of the post, through games of September 5, the standard deviation of winning percentages among American League clubs was the lowest it has been in the 30-team era. Projected onto a 162-game season, the standard deviation of wins for American League teams was 7.8, barely eking out 2007’s 7.9 as the most egalitarian distribution of wins since 1998.
Since September 5, a .500 record has become a black hole, exerting irresistible gravity throughout the American League galaxy:
Of the teams with the six best records in the league on that date–the Royals, Blue Jays, Yankees, Astros, Rangers, and Twins–only Toronto and Texas had a winning record the rest of the season.
Baltimore, the sixth-worst team in the league as of the morning of September 6, tied the Jays for the best record in the East thereafter. Boston, then the third-worst team, went 15-12 the rest of the way.
Cleveland, four games below .500 at the time, scrambled to finish 81-80.
Overall, parity in the already-equality-loving Junior Circuit increased, by so much that I looked beyond the post-1998 30-team era. I calculated the standard deviation of winning percentages for every league-season since 1901. I then multiplied the standard deviations by 162 to arrive at the standard deviation of wins over a 162-game season. Yes, I know, most of those seasons were shorter than 162 games, but that’s OK; I’m just looking to turn the standard deviation of winning percentages, which is not an intuitive figure (e.g., American League, 1930, 0.1107), into something that is recognizable (17.9 wins). Here are the ten seasons in baseball history with the highest parity, that is, the lowest standard deviation of wins:
The 2015 American League is the most egalitarian, populist, tax the rich/feed the poor, Kumbaya-singing league in baseball history. As I suggested in September, it’s the Sweden of leagues.
(The National League finished 2015 with a standard deviation of 13.1 wins, ranking it 102 out of 230 league-seasons in terms of parity. It was the ninth-most unequal among 36 league-seasons since the expansion to 30 teams in 1998. For Gini coefficient detractors, the most unequal league ever was the 1909 National League, which featured the 110-42 Pirates, 104-49 Cubs, and 92-61 Giants, along with the 55-98 Dodgers, 54-98 Cardinals (Yadi was hurt), and 45-108 Braves.)
Now, as to the other point, the American League’s superiority over the National League despite its group hug ethic, here’s a chart.
1. The Pirates finished the year with a 98-64 record, the second best in all of baseball. That ties them with the 1979 and 1908 clubs for the third most wins in franchise history. (The 1909 Pirates won 110 and the 1902 club won 103.) The Pirates’ record, however, included a losing record against two of the worst teams in the game, the Cincinnati Reds (8-11) and the Milwaukee Brewers (9-10).
Let’s break that down. In games in which the Reds didn’t play the Pirates, they were 53-90. In games in which the Brewers didn’t play the Pirates, they were 58-85. So in their non-Pirates games, the two clubs combined for a 111-175 record, a .388 winning percentage. Had they played at that pace in their 38 games against the Pirates, they would have won .388 x 38 games = 15 games, losing 23. Turned around, the Pirates would have gone 23-15 against the Reds and Brewers.
The Pirates were 81-43 in their games that weren’t against Cincinnati or Milwaukee. Had they gone 23-15 against the two clubs–that, is had they been as successful as the rest of the teams in the majors were–their record would have been 104-58. That would have given the Pirates the best record in baseball. They would be enjoying four off days, looking forward to Wednesday’s wild-card game between the Cardinals and Cubs to see whom they’d face at home to kick off the Division Series on Friday.
In “Hardball Retrospective: Evaluating Scouting and Development Outcomes for the Modern-Era Franchises”, I placed every ballplayer in the modern era (from 1901-present) on their original team. Therefore, Tony Perez is listed on the Reds roster for the duration of his career while the Red Sox declare Wade Boggs and the Rockies claim Troy Tulowitzki. I calculated revised standings for every season based entirely on the performance of each team’s “original” players. I discuss every team’s “original” players and seasons at length along with organizational performance with respect to the Amateur Draft (or First-Year Player Draft), amateur free agent signings and other methods of player acquisition. Season standings, WAR and Win Shares totals for the “original” teams are compared against the “actual” team results to assess each franchise’s scouting, development and general management skills.
Expanding on my research for the book, the following series of articles will reveal the finest single-season rosters for every Major League organization based on overall rankings in OWAR and OWS along with the general managers and scouting directors that constructed the teams. “Hardball Retrospective” is available in digital format on Amazon, Barnes and Noble, GooglePlay, iTunes and KoboBooks. The paperback edition is available on Amazon, Barnes and Noble and CreateSpace.Supplemental Statistics, Charts and Graphs along with a discussion forum are offered at TuataraSoftware.com.
Don Daglow (Intellivision World Series Major League Baseball, Earl Weaver Baseball, Tony LaRussa Baseball) contributed the foreword for Hardball Retrospective. The foreword and preview of my book are accessible here.
Terminology
OWAR – Wins Above Replacement for players on “original” teams
OWS – Win Shares for players on “original” teams
OPW% – Pythagorean Won-Loss record for the “original” teams
Assessment
The 1979 Montreal Expos OWAR: 53.9 OWS: 327 OPW%: .572
GM Jim Fanning acquired 88% (23/26) of the ballplayers on the 1979 Expos roster. Based on the revised standings the “Original” 1979 Expos captured the first pennant in franchise history with 93 victories while topping the National League in OWAR and OWS.
Gary “Kid” Carter paced Montreal with 28 Win Shares and 5.2 WAR. The Hall of Fame backstop slugged 22 round-trippers and commenced a run of 10 consecutive All-Star appearances. Third-sacker Larry Parrish (.307/30/82) clubbed 39 two-baggers en route to a four-place finish in the N.L. MVP balloting. Andre “The Hawk” Dawson displayed his five-tool talent, blasting 25 long balls and nabbing 35 bags. Gary Roenicke swatted 25 big-flies while platooning in left field. Warren Cromartie delivered career-highs with 181 base knocks and 46 doubles. Ellis Valentine contributed 21 jacks and Tony Scott swiped 37 bases.
Tim Raines received the proverbial “cup of coffee” in 1979 with six pinch-running appearances. “Rock” pilfered 808 bases during a career that spanned 23 seasons. He ranks eighth among left fielders according to Bill James in “The New Bill James Historical Baseball Abstract.” Teammates listed in the “NBJHBA” top 100 rankings include Carter (8th-C), Dawson (19th-RF) and Parrish (53rd-3B).
LINEUP
POS
WAR
WS
Tony Scott
RF/CF
1.21
13.93
Warren Cromartie
1B/LF
3.28
17.18
Andre Dawson
CF
2.74
24.01
Gary Carter
C
5.25
28.95
Larry Parrish
3B
4.07
27.34
Gary Roenicke
LF
3.33
18.9
Tony Bernazard
2B
0.6
2.56
SS
BENCH
POS
WAR
WS
Jerry White
RF
0.79
6.09
Barry Foote
C
1.58
12.2
Bombo Rivera
LF
0.55
5.17
Ellis Valentine
RF
0.4
14.41
Tim Raines
–
0
0
Terry Humphrey
C
-0.22
0.21
Steve Rogers (13-12, 3.00), the Expos first-round selection in the June 1971 Amateur Draft, hurled a League-leading 5 shutouts and achieved his third All-Star invitation. Dan Schatzeder posted a 10-5 mark with a 2.83 ERA. David Palmer fashioned a 2.64 ERA with a record of 10-2 in his rookie campaign. Scott Sanderson contributed 9 victories along with a 3.43 ERA. Byron McLaughlin collected 7 wins and 14 saves working in a variety of roles while portsider Shane Rawley saved 11 contests.
ROTATION
POS
WAR
WS
Steve Rogers
SP
3.78
16.61
Dan Schatzeder
SP
3.31
13.13
David Palmer
SP
2.25
11.23
Scott Sanderson
SP
1.89
10.21
Balor Moore
SP
0.01
5.7
BULLPEN
POS
WAR
WS
Byron McLaughlin
SW
1.29
11.04
Shane Rawley
RP
0.78
7.84
Bill Atkinson
RP
0.22
2.08
Dale Murray
RP
-1.09
3.15
Bill Gullickson
RP
0.02
0.14
Gerry Hannahs
SP
0.03
0.74
Bob James
RP
-0.21
0
Craig Minetto
SP
-2
0.47
The “Original” 1979 Montreal Expos roster
NAME
POS
WAR
WS
General Manager
Scouting Director
Gary Carter
C
5.25
28.95
Jim Fanning
Mel Didier
Larry Parrish
3B
4.07
27.34
Jim Fanning
Mel Didier
Steve Rogers
SP
3.78
16.61
Jim Fanning
Mel Didier
Gary Roenicke
LF
3.33
18.9
Jim Fanning
Mel Didier
Dan Schatzeder
SP
3.31
13.13
Jim Fanning
Danny Menendez
Warren Cromartie
LF
3.28
17.18
Jim Fanning
Mel Didier
Andre Dawson
CF
2.74
24.01
Jim Fanning
Mel Didier
David Palmer
SP
2.25
11.23
Jim Fanning
Danny Menendez
Scott Sanderson
SP
1.89
10.21
Charlie Fox
Danny Menendez
Barry Foote
C
1.58
12.2
Jim Fanning
Mel Didier
Byron McLaughlin
SW
1.29
11.04
Jim Fanning
Mel Didier
Tony Scott
CF
1.21
13.93
Jim Fanning
Jerry White
RF
0.79
6.09
Jim Fanning
Mel Didier
Shane Rawley
RP
0.78
7.84
Jim Fanning
Mel Didier
Tony Bernazard
2B
0.6
2.56
Jim Fanning
Mel Didier
Bombo Rivera
LF
0.55
5.17
Jim Fanning
Mel Didier
Ellis Valentine
RF
0.4
14.41
Jim Fanning
Mel Didier
Bill Atkinson
RP
0.22
2.08
Jim Fanning
Mel Didier
Gerry Hannahs
SP
0.03
0.74
Jim Fanning
Mel Didier
Bill Gullickson
RP
0.02
0.14
Charlie Fox
Danny Menendez
Balor Moore
SP
0.01
5.7
Jim Fanning
Tim Raines
–
0
0
Charlie Fox
Danny Menendez
Bob James
RP
-0.21
0
Jim Fanning
Danny Menendez
Terry Humphrey
C
-0.22
0.21
Jim Fanning
Dale Murray
RP
-1.09
3.15
Jim Fanning
Mel Didier
Craig Minetto
SP
-2
0.47
Jim Fanning
Mel Didier
Honorable Mention
The “Original” 1985 Expos OWAR: 55.8 OWS: 320 OPW%: .556
Montreal claimed the National League East division title by a five-game margin over New York while pacing the Senior Circuit in OWAR and OWS. Tim Raines stole 70 bases in 79 tries and batted .320 with 115 runs scored. Raines (35 WS) and Gary Carter (33 WS) surpassed the 30 Win Share plateau as the “Kid” blasted 32 moon-shots. Tim Wallach dialed long-distance 22 times and earned his first Gold Glove Award. Tony Bernazard supplied career-bests with a .301 BA, 169 hits, 17 home runs and 73 RBI. Andre Dawson collected his sixth consecutive Gold Glove Award and drove in 91 runs. Bob James anchored the bullpen staff with 32 saves, 8 victories and a 2.13 ERA. Shane Rawley provided 13 wins with a 3.31 ERA in 31 starts while Joe Hesketh delivered a 2.49 ERA and a record of 10-5 in his freshmen year.
The news that Barry Zito has been called up to start against Tim Hudson, with Mark Mulder in attendance, has rightfully thrown the baseball world into a mini-frenzy. Jonah Keri covered the meat of it spectacularly here. Here’s a dirty secret though: as we evaluate pitchers today, they weren’t great pitchers. An even dirtier secret: I don’t think it matters.
Zito, Hudson, and Mulder were undoubtedly good pitchers, racking up a Cy Young trophy and four more top-10 finishes in their time in Oakland. But were they great pitchers? Let’s take a look at their FIP- during their Oakland careers:
At their peak, they were well above-average pitchers, but, combined, they only had two top-10 finishes in FIP-, with Mulder finishing 10th in 2001 and Hudson 10th in 2004. Good, but not transcendent. If that’s worse than you remember, it’s probably because their ERAs consistently undershot their FIPs:
The Big Three were among the last players before sabermetrics exploded in popularity with casual fans, and our analysis of them reflects that. If they had come up today, would we label them as three guys who are above-average, rather than the cultural phenomenon they became? It’s very likely. That the cultural relevance of the Big Three has carried into the sabermetric era is a delightful reminder of how recently we crossed the frontier.
Does the fact that their accomplishments don’t hold up as well in the FIP era diminish their place in baseball history? I say no. Even though baseball has seen a number of better three-man rotations, the “Big Three” label feels at home in Oakland. In 2008, Dan Haren, Brandon Webb, and Randy Johnson, averaged a 76 FIP- for the Diamondbacks, better than any year of the real Big Three. But would Dan Haren starting against Brandon Webb on Saturday be a headline event (forgetting about the medical miracle required)? I doubt it. Zito, Hudson, and Mulder evoke something in us beyond their raw performance.
I always recited the order as Zito, Hudson, Mulder. Zito always comes first because as a fellow lefty who didn’t throw very hard, but thought he had a big hook, I emulated him both in real life and in MVP baseball, where I spent countless hours dropping his curve in against hapless computer foes. Zito was my guy and Hudson and Mulder fell in line after. Everyone had their own order relative to their personal biases. The combination of youth, talent, and personality made them relatable in a way that other greater pitchers just weren’t.
The Big Three were also the rock on which Moneyball was built. For fans of small-market teams, they represented what was possible. If your team scouted, drafted, and developed well, you too could have your own set of homegrown stars. The 2001 A’s-Yankees ALDS was, in my opinion, the pinnacle of the era. Zito, Hudson, and Mulder combined to throw 28.2 innings and give up just five earned runs, but it wasn’t enough. The 2001 A’s were one of the most likable teams of all time and the Big Three were the dominant reason why.
Although they had their best years in Oakland, when they were forced to move on, there was a sense when that they were headed for greater things. The potential they left behind in Oakland still tantalizes. Although the greatness never materialized in their new homes, it still feels like they left something on the table when they left. We never had the closure of seeing them grow old and decline together which is why finally getting our closure on Saturday feels so comforting.
You will note that Keri’s article does not once mention FIP. It’s a defensible choice because that’s not how we evaluated them at the time so it’s not how we remember them now. None of the reasons why we loved them are because they were the very best pitchers in the game or sabermetric darlings. It was a confluence of harder-to-quantify factors.
Baseball is a funny game. None of the Big Three ever had a season as good as Jake Arrieta’s this season. But ask me who I’m going to remember in 20, 30, 40 years? No contest. Baseball is an analytical nostalgia factory, a game that runs on both numbers and feelings without ever feeling like it contradicts itself. Perhaps no one represents that dichotomy better than the legendary Big Three.
Baltimore Orioles first baseman, Chris Davis, is in the final year of his contract where he is making $12 mil/yr. At age 29, Davis has had a roller-coaster of a career starting in Texas where he burst onto the scene hitting 17 & 21 homers in his first two seasons. After which, he declined dramatically the next season hitting just under .200 for Texas. The following season he was traded to Baltimore where he revived his career and met his long-awaited potential. Today, Davis is one of the biggest power hitters in the game. He hit 53 homers in 2013 and 43 homers thus far in ’15. With Davis on the market we know clubs will be interested in his bat along with some other big FA names such as Yoenis Cespedes, Justin Upton and Alex Gordon.
Photo by Algerina Perna
The problem with Chris Davis is that he’s somewhat inconsistent. As a power hitter we can take a look at his slugging percentage, which will give us a better indication of his extra-base hits and power numbers. In seasons with at least 300+ at-bats he slugged an average of .507 but has a standard deviation of .091. It seems he has struggled to find consistency with his hitting, especially last season when he hit .194, with 23 homers and a slugging mark of .404. Compared to his .286/53 HR/.634 campaign in 2013, it is a huge difference.
If we take a look at a similar power hitter in Nelson Cruz, in his seasons with at least 300 ABs he slugged a similar average of .515 with a SD of only .04. Cruz is more of a model of consistency and has been less risky than Davis. Besides his one season of slugging .460, Cruz was always in the mid-.500s. Which is great for a power hitter. This is a big reason I am not a huge fan of Chris Davis. He just hasn’t shown a high level of consistency.
Another is his strikeout rate, which is extremely high. Since his first full season in Baltimore, Davis struck out 169, 199, 173, 182 times over the last 4 seasons. That’s good for a 31% K-Rate. Easily one of the worst in the league. His 196 strikeouts this season also happen to lead the league! Although he strikes out a ton, he gets the job done by driving in runs. Which at the end of the day could be seen as more important. Davis drove in 138 runs in 2013, 86 in ’14 and so far 110 in ’15. Did I mention he’s also eighth in the league with 118 runs created. A stat used to measure how valuable a player is to contributing runs to his team. So with great power comes great responsibility. Davis may strike out but he can really drive the ball. To me, he’s a high-reward/high-risk guy.
Interested teams:
Chris Davis is so valuable to the Orioles in terms of producing runs it’s hard to imagine him being let go without a fight. I think the Orioles will absolutely make him an offer. He’s already making $12m/yr and the O’s have a lot of money coming off the books this winter (Orioles’ Payroll). Only $41m is committed to next season ($119m payroll this year). So I can see them raising their price tag to about $18m/yr. The O’s are in a position to win now with Machado/Jones and a fairly young team so if they aren’t getting Davis they will no doubt be spending on others.
Other teams I can see having interest in Davis would be Seattle, San Diego, and Houston. Seattle is the kind of team to pay up for hitting; I could see them doing that with Davis as they did with Cano/Cruz. They need offense but they already have Trumbo at first base who has been decent. If they could move Trumbo I could see them making a play for Davis. Having Cano-Cruz-Davis would be quite powerful. They’re losing some money with Rodney & Jackson coming off the books. Seattle could be really interesting to watch.
The Padres I could see showing some interest but only if they lose Justin Upton and keep Wil Myers in CF. I think they’ll try to re-sign Upton who has had a good year playing in Petco Park. They’ve played Myers at 1B occasionally because of his injury concerns. With no DH, it’s harder to maneuver players around. Yet, again, AJ Preller is a magician so no one can really predict what he will do next. I think the Padres’ concern would be as Davis gets older he could regress on the defensive side of the ball and offensively. Petco Park is a pitcher’s palace so if Davis’ power dropped off his value would really take a hit. Putting Davis in as a full-time DH later in his career would help him maintain his power and consistency like it has for David Ortiz and A-Rod.
The Astros are a wild card I think. I said in the Cespedes post, they have a ton of cash to spend but only if they’re willing to spend it. They love guys who hit home runs. That’s basically their back end of the lineup (Carter, Valbuena, Rasmus). Their 1B Chris Carter ($4.5m) is a mini Chris Davis (low avg/high power) and he will be headed to arbitration. But in the offseason I think they will look to upgrade. They’ll obviously want to replace him with someone more improved. But Davis will cost them a lot; as a more analytics front office I’m not sure if they would see the value in paying up for him. Then again, pairing Davis, who hits lefty, with Correa/Altuve would really help them score runs along with mixing and matching their lineup.
Honorable Mentions:
The last two teams I looked into with 1B trouble were the Cardinals and Pirates. St. Louis has Matt Adams coming off the DL and we’re not sure if he’s 100% just yet. We know Brandon Moss is not a long-term solution. The Pirates’ Pedro Alvarez has been super inconsistent with Pittsburgh. I think they’ll look to upgrade or float around some other names during the offseason. To be honest, I think Adam Lind would be a great addition to the Cardinals or Pirates instead of Davis. Adam Lind has a club option for $8m this offseason; he hits lefty and has had a solid year for Milwaukee. Overall, I don’t think these two teams will end up throwing money at Chris Davis but they may need 1B help next season. Its baseball, anything can happen.
In the end, I believe Davis enjoys playing in Baltimore. Due to his success there, the favorable ballpark and the DH factor I think he should stay. For his long-term career he should seriously consider staying in the AL with the DH factor. But I think another club will come in and make a play to acquire Davis. Power/RBIs come at a premium these days. Dan Duquette, GM of the Orioles, has experience and knows what he’s doing. If the price tag became too high I think he will definitely consider looking elsewhere on the market. Possibly an Adam Lind or Mark Trumbo.
Similar players such as Nelson Cruz signed a 4yr/57m (14m AVV) at age 34 and Albert Pujols signed a 10y/240m (24mAAV) in 2011. This sets a decent basis for Davis. In terms of his contract, I think Davis could get a 5-year deal worth about $18-21 million a year. His WAR for this season is 4.2 which puts him in the ballpark for this. So, it’s near our estimation. Personally, I would not give Davis $20m for 5 years. I think that’s going overboard but some teams are more into his skill than others. Power really comes at a premium in today’s game and Davis has a ton of it. I wouldn’t be surprised if it went to 6 years, but I just don’t see as many teams bidding on Davis right now. Scott Boras is his agent which will probably drive up the asking price. That may turn off the Orioles which could lead to another club coming in and swooping up Crash Davis. I think it’s favored to be the Orioles or Mariners come signing day.
There is a treasure trove of data sitting on FanGraphs which to my (limited) knowledge is little used. These data are the Inside Edge fielding stats. We have UZR and DRS, but no IERS (Inside Edge Runs Saved), despite the general availability of the data.
UZR essentially guesses, based on batted-ball profile, what the probability is that a play will be made, which given the lack of true batted-ball data will take time to stabilize. Inside Edge has the benefit of stacking each play in a probability bucket. Here is a list of the probabilities by POS, Probability Bucket and Year that we have IE data:
I then took each player’s stats and based on their position and year, computed the expected number of plays they should have made and compared that to the actual number of plays they had made. In other words, a RF in 2014 should make 86% of his plays in the 60-90% range, so if he had 100 plays there and made 92, he made 6 extra plays. Here’s what 2015 Top 30 looks like in that lens:
Note that IE seems to like Arenado and Longoria a lot more than DRS, however the list is pretty consistent with DRS, esp with Simmons and Hechavarria in the 2/3 spots. I didn’t control for team bias in the results, so it may be favouring certain teams (Jays players seem to be getting a large boost, see Martin, Revere, Pillar, Tulo and Donaldson all on the top 30). Go Jays Go! Yankees Suck!
The next step is a little less mathematical, in that I attempt to ascribe an average run saved based on position. Based on linear weights, a single is worth roughly .5, a double roughly .75 and a triple roughly 1. Thus, a catcher and pitcher can save at most .5 runs each play they make. Second basemen and shortstops will save .5 on most plays, but will get a bump when they convert a double play. A third baseman will prevent some doubles as well as convert some double plays. Outfielders will be preventing some mix of singles, doubles and triples (and the occasional home run). So, based just on my gut feelings on the matter, I ascribed the following run values to each position:
C/P: .5 Runs Saved
1B/2B/SS: .6 Runs Saved
3B: .65 Runs Saved
OF: .75 Runs Saved
Based on these values (estimated runs saved), these are the top fielders (catchers excluded) from 2012-2015 and 2015, respectively:
The timeless struggle between pitcher and batter is one of dominance — who holds it and how. Both players use a repertoire of techniques to adapt to each other’s strategies in order to gain advantage, thereby winning the at-bat and, ultimately, the game.
These strategies can rely on everything from experience to data. In fact, baseball players rely heavily on data analytics in order to tell them how they’re swinging their bats, how well they’ll do in college, how they’ll perform at Wrigley versus Miller.
Big data has been used in baseball for decades — as early as the 60s. Bill James, however, was the first prominent sabermetrician, writing about the field in his Bill James Baseball Abstracts during the 80s. Sabermetrics are used to measure in-game performance and are often used by teams to prospect players.
Baseball fans familiar with sabermetrics, the A’s, and Brad Pitt have likely seen Moneyball, the Hollywood adaptation of Michael Lewis’ book. The book told the story of As manager Billy Beane’s use of sabermetrics to amass a winning team.
Sabermetrics is one way baseball teams use big data to leverage game theory in baseball — on a team-wide scale. However, by leveraging their data through the concepts of game theory on a smaller scale, baseball teams can help their men on mound out-duel those at the plate.
Game theory studies strategic decision making, not just in sports or games, but in any situation in which a decision must be made against another decision maker. In other words, it is the study of conflict.
Game theory uses mathematical models to analyze decisions. Most sports are zero-sum games, in which the decisions of one player (or team) will have a direct effect on the opposing player (or team). This creates an equilibrium which is known as the Nash equilibrium, named for the mathematician John Forbes Nash. What this means is that if a team scores a run, it is usually at the expense of the opposing team — likely based on an error by a fielder or a hit off a pitcher.
In the case of pitching, game theory — especially the use of the Nash equilibrium — can be used to predict pitch optimization for strategic purposes. Neil Paine of FiveThirtyEight advocates using big data and sabermetrics to analyze each pitch in a hurler’s armory, then cultivating the pitcher’s equilibrium — the perfect blend of pitches that will result in the highest number of strikeouts, etc.
Paine has gone so far as to create his own formula, the Nash Score, to predict which pitcher should throw which pitches in order to outwit batters.
In perfect game theory, the Nash equilibrium states that each game player uses a mix of strategies that is so effective, neither has incentive to change strategies. For pitchers, Paine’s Nash Score uses their data to find the optimal combination of pitches to combat batters, including frequency.
Paine does point out that creating this kind of equilibrium in baseball can be detrimental to a pitcher. He is, after all, playing against another human being who is just as capable of using game theory to adapt strategies to upset the equilibrium.
If a pitcher’s fastball is his best, and his Nash Score shows that he should be using it more often, savvy hitters are going to notice. “ . . . In time, the fastball will lose its effectiveness if it’s not balanced against, say, a change-up — even if the fastball is a far better pitch on paper,” writes Paine.
In this case, a mixed strategy is the best — in game theory, mixed strategies are best used when a player intends to keep his opponent guessing. Though pitch optimization using Paine’s Nash Score could lead to efficiency, allowing pitchers to throw fewer pitches for more innings, it could also lead to batters adapting much quicker to patterns, thus negating all the work.
In his first year of being eligible for arbitration, Matt Harvey will be able to substantially increase his salary for the 2016 season. Since beginning his career with the New York Mets in 2012, he has taken off to become an All-Star pitcher and fan favorite. His agent, Scott Boras, and the front office of the Mets will negotiate a one year salary based off his success in 2015. We’ll cut right to the chase and get into the hard numbers which will help us identify a rough projection of what we would expect Matt Harvey to receive this coming winter.
NEW YORK, NY – JULY 16: National League All-Star Matt Harvey #33 of the New York Mets pitches during the 84th MLB All-Star Game on July 16, 2013 at Citi Field in the Flushing neighborhood of the Queens borough of New York City. The American League defeated the National League 3-0. (Photo by Brace Hemmelgarn/Minnesota Twins/Getty Images)
Overall performance:
Since 2012, Matt Harvey at age 26 has a career 2.59 ERA with 24-17 win/loss record. During his 2013 season Harvey was on a tear with a 2.27 ERA and became one of the leading NL Cy Young candidates before his injury. He also started the 2013 All-Star game which happened to be in Citi Field that year. After tearing his UCL and missing the entire 2014 season, Harvey came back strong this year and has pitched in 26 games thus far with a 2.88 ERA through 171 innings (11th best in league). He has a 12-7 win record and gives up less than a hit per inning (which ranks 9th in all of MLB). His WHIP is also one of the top 10 leagues best at 1.03 so he rarely allows runners on base and is averaging 8.6 strikeouts per game.
His W/L record this season does not show his true value, as the Mets started the first half of the season with one of the worst offenses in the league. After acquiring premier Major League hitters such as Yoenis Cespedes and Juan Uribe, the Mets have led the league in runs scored giving Mets starters big run support. Since those acquisitions, Harvey has pitched in 7 games winning 3 and losing 0. But the Mets’ bullpen blew Harvey’s lead in 3 other games in which he had outperformed the other team. Had it not been for a mediocre bullpen, Harvey could have been 6-0 in 7 games since August 1st. Clearly, Harvey is an ace to this team and is the backbone of a staff that has propelled the Mets to first place. He is a consistent pitcher and does not show signs of letting up even after having TJ surgery. Without Harvey, the Mets would lose a dominant, consistent ace which is obviously hard to come by.
Leadership/Public appeal: As one of the older members on the New York Mets’ young pitching staff, Harvey is one the leaders on this team. After fighting his way back from injury rehab, he has become a consistent stronghold to the Mets’ rotation. Although Dr. Andrews, who performed Tommy John surgery on Harvey, has stated he should not exceed 180 innings due to his injury, Harvey is continuing to pitch on an innings watch to help the Mets win, especially through the postseason. Even if it hurts his chances at re-injuring himself, he is going out there to pitch.
As a leader, you need to show guts and heart; Harvey has definitely displayed that, battling out there everyday. Matt Harvey also is a fan favorite. He ranks 9th in all of Major League Baseball and 1st with the Mets in 2015 top jersey sales. Many fans across the country are purchasing his jersey, thus showing how popular he is with people. When he returned to the mound this season to pitch, his first game back drew the biggest crowd (39,000 fans) for the second home game of the season since Citi Field opened in 2009. That was 10,000 more fans in attendance than last year and 20,000 more than two years ago. During the 2013 All-Star game at Citi Field, which Harvey started, the Mets drew their most fans in history at 45,000. When he’s the night’s starting pitcher, fans flock to the ballpark to see Matt Harvey. At the same time he’s able to strikeout hitters, captivate a crowd and draw extra revenue in from ticket sales than if he wouldn’t be pitching. The Mets fans also have a popular nickname for Harvey: The Dark Knight. Symbolizing his leadership skills and journey back from Tommy John surgery, Harvey symbolizes the 2015 Mets team and has dramatically changed the mood of the fan base since his arrival/return. There’s no denying this.
Injury history: As stated earlier, Matt Harvey missed all of 2014 season undergoing Tommy John surgery to repair his torn UCL. His recovery has been a success thus far but is always a case for concern in the future. But arbitration cases do not quite debate the future; only his previous success. He has shown no discomfort and has spent 0 days on the disabled list this year. To combat future problems the Mets’ pitching staff went to a 6-man rotation, which has caused Harvey (and other Mets pitchers) to skip a couple starts. Harvey has constantly said he feels good and does not show any signs of slowing down unless the Mets management shut him down.
Performance of club:
The Mets are currently in first place by 6 games and it looks like it will stay that way come October. Largely in part due to Harvey’s success on the mound, the Mets would not be in the same situation without him or his 12 wins this season. When the playoff schedule arrives, Harvey will easily be the game 1 or game 2 starter depending on how he finishes the season.
Record of the players past compensation:
Harvey made MLB’s minimum salary in 2013 at $498,000 and this year at around $510,000. This will be his first eligible year of Arbitration 1. His value to the team over the last couple years has been sky-high but he’s been grossly underpaid.
Comparative salaries: Tyson Ross was arbitration 1 last year for the San Diego Padres. In his 2014 campaign he pitched to a 2.81 ERA / 1.211 WHIP with 13 wins in 191 innings pitched. He also struck out 9 hitters per inning and was named an All-Star that same season. But Ross pitches in a heavily favored pitcher’s ballpark. His stats at home included a 1.88 ERA with an 8-5 record but his away stats included a 3.79 ERA with a 5-9 record. Clearly, Ross does not pitch better on the road and his starts could have been affected by where he pitched. Compared to Harvey’s career numbers, he pitches more consistently than Ross at home (12-7, 2.15 ERA) and away (12-10, 3.14 ERA). From our previous numbers we know that Harvey has been a better pitcher overall this season in ERA, WHIP, wins and many other pitching statistics than Ross had in his 2014 season. Following Ross’ 2014 year, he was able to negotiate a 1yr/$5.25m deal in January. Ross is not as consistent and skilled as Matt Harvey. Since Harvey surpasses Ross in success we can see he is due much more in salary as well.
Chris Tillman is the next player we can compare to. Although a little less successful, Tillman was able to get a 1yr/$4.3m deal. The season prior to his arbitration, Tillman had a 13-6 record with a 3.34 ERA and struck out only 6.5 K/9 in 207 innings. Tillman is on the lower end of the comparison as he agreed to almost a million dollars less than Tyson Ross.
Summary:
These players give us the best guideline and recent examples in terms of numbers/dollars that can help us estimate what Harvey should be owed for the 2016 season. Harvey is definitely much better than Ross and Tillman. He brings more to the table than just numbers as he is a figurehead in New York, one of the largest markets in baseball. The first-place Mets could not be where they are if it was not for Harvey. His health was a concern earlier this year but he hasn’t had any setbacks this entire season except for skipping a start here or there. We can expect Harvey to easily surpass Tyson Ross and his $5.25m deal.
Due to the pizzazz of the Dark Knight, the revenues generated from his starts/jersey sales and the recent success of the team, Harvey should be able to negotiate himself around a 1yr/$6.3m deal. If we talk about fairness in terms of his contract, I think this is “fair” to both parties. We have to take into account everything that Harvey brings to the table and I think he’s more valuable than Ross and most previous pitchers who went to arbitration 1 and did not sign a multi-year deal. The one factor that could haunt Harvey’s dollar amount is his elbow due to TJ surgery. If that happens to wear out during the last couple of weeks in September and postseason, we can easily make a case that he should be owed less. But as for now he’s been Harvey-esque and back to where he was before the surgery. Next year his innings limit should be lifted or increased dramatically so there won’t be too much of a cause for concern compared to if he spent time on the DL this season. Obviously, he isn’t a sure bet that he will remain healthy but arbitration does not greatly take into consideration future success/problems, only previous. That is why we project him to get approximately $6.3m.
Overall, both sides will negotiate and the Mets will offer less than what I project. I could definitely see the Mets’ offering $5.5 to $6m. But Scott Boras will clearly try to get more for Harvey — I think around $7m. Both arguments will be justified. In the end, I think an arbitrator would agree that 1yr/$6.3m is common ground, a good midpoint and fit for an agreement by both parties. Stay tuned for more…
Since Baseball Info Solutions’ contact-quality data was uploaded here on FanGraphs, many attempts have been made to predict BABIP using said data without a great deal of success. So I tried breaking down the data by type of ball in play using the splits function on the leaderboards and results seem promising (for fly balls at least).
Data from 2012 season onward was used for hitters with minimum 250 PA as a qualifier (completely arbitrary).
Data on fly balls showed the best r-squared at 0.79 with the control variables being hard%, soft%, pull% and speed scores.
The usual suspects top the xAVG list: Paul Goldschmidt, Joey Votto, Chris Davis, Ryan Braun and Miguel Cabrera. But the most puzzling fact was Mike Trout’s .266 xAVG vs a .342 AVG. What does Trout do differently to beat the formula? I don’t know.
xAVG on groundballs correlated less well with average on grounders with an r-squared of 0.48. Though if one sets the PA qualification to 600 r-squared improves to 0.52. The lower r-squared on groundballs probably has to do with the fact that success on groundballs depends on not only hitting them hard but also hitting them in the gaps in the infield and no variable captures that effectively.
Mike Trout is restored to the place where he belongs, the top of the xAVG list with A.J Pollock, Adam Eaton, Carlos Gomez and Willie Bloomquist in the top five. Yasiel Puig’s xAVG shows the biggest difference from his average, probably because he has mastered hitting balls in the gaps.
Data on liners was the least promising with an r-squared of 0.21 between xAVG and average. Moreover the constant in the linear equation was the biggest term, meaning average on liners is mostly random. So there is only a slight positive effect on hitting liners hard and having a high average on liners.
Overall, contact-quality data is promising and we can get better estimates as we get more and more years’ worth of data. Data from 2002-2010 wasn’t used because it was manually collected and results-based while 2011 seems to differ from 2012-2015 data as league-average hard% seems to be 5% lower than normal.
My name is Rich Rieders and I am a 2015 graduate of Rutgers School of Law. Over the winter, I participated in Tulane University’s 9th Annual Baseball Arbitration Competition and we finished in 2nd place overall out of 40 teams. The arbitration cases used in the competition were Jenrry Mejia v. New York Mets, Lorenzo Cain v. Kansas City Royals, and Mark Trumbo v. Arizona Diamondbacks. My team represented the Royals, Mets and Mark Trumbo in those cases. It was a great experience and I learned a tremendous amount. Those of you who are in law school should absolutely participate. Being in New Orleans is an amazing bonus as well! You can read more about the competition from Tulane’s website and Jerry Crasnick’s ESPN article.
Instead of explaining how arbitration works, I highly recommend reading this article as it will give you an excellent basis for understanding the arbitration process. Just ignore the part about free agency since that’s been done away with now.
In order to prepare for the competition, I created a database (going back to 2008) consisting of all arbitration awards and players who signed 1-year contracts avoiding arbitration along with their respective statistics (Note multi-year contracts are not allowable as player comps for arbitration purposes). Using regression analysis, I was able to determine which statistics correlate most with salary.
Here on FanGraphs we pride ourselves on the use of metrics and the abandonment of traditional stats. That all goes out the window for the arbitration process. The arbitrators jointly selected by league and the union have a background in labor law, not baseball. And those that are baseball fans probably aren’t avid FanGraphs readers and their exposure is likely to be limited to Wins, Losses, ERA, H, HR BB, SO, etc. Each side gets 30 minutes to present their case, plus another 15 minutes of rebuttal. You simply don’t have time to teach the panel sabermetrics and argue your case at the same time. And as I will discuss later, the use of predictive stats largely fall outside the scope of an arbitration hearing anyway. However, by using regression analysis we can pinpoint exactly which stats correlate most with eventual salary and which ones don’t.
SP: W (.6099), IP (.5401), SO (.5368), RA9-WAR (.5166), GS (.4598)
Now that’s not to say only the stats with the highest RSQ matter. Traditional rate stats like K/9 and ERA are still important. Try arguing to a casual fan that a pitcher with an ERA of 2.50 was not as productive as pitcher with an ERA of 4.00 ERA and see how that goes.
What we can take away from this is that:
Traditional stats have a strong correlation, metrics do not.
Counting stats have a strong correlation, rate stats do not.
Offense, particularly power have a strong correlation and defense and baserunning do not.
The more playing time you receive (PA, IP, G), the more money you are likely to make.
In essence, the overarching principal behind baseball arbitration is that salary is almost wholly dependent on the accumulation of traditional counting stats with traditional rate stats used to highlight the difference between the comparable players and serves in my formula to help prevent outliers.
Individual awards also matter a great deal. In my hearing, it was extremely difficult to try and argue against Lorenzo Cain when he won the ALCS MVP with his breakout postseason fresh in everyone’s mind. Those type of factors are extremely difficult to overcome. For a real-life example, I heard a story from one of our judges that the Giants were planning on going to arbitration with Tim Lincecum in 2010. Lincecum showed up with a Cy Young Award under each arm and within a few hours, a two-year contract was agreed upon.
Also keep in mind that for players going through arbitration for the first time, we also consider their career numbers as well. The correlations are fairly similar for career stats, but with slight improvement for career rate stats. For players going through the process for a second, third or fourth time, we pretty much ignore career statistics.
Before I introduce the model, I want to stress the importance of understanding the purpose of the baseball arbitration process. During the final round in Tulane, we represented the Kansas City Royals against Lorenzo Cain. One of our principal arguments was that Lorenzo Cain had an unsustainable .380 BABIP (highest in MLB mind you) which is why he batted .300 and that his BA (and the rest of his offensive numbers) would likely regress towards his career averages. The expected regression along with his low walk rate would limit his value to the club going forward. An argument most of us on FanGraphs would surely have made at the time, but Lorenzo Cain’s awesomeness is a topic for another day.
While this type of logic works perfectly well for free-agent signings or whether to acquire the player via trade, it does not work for arbitration purposes. The underlying purpose of the arbitration process is to compensate the player for his performance in the previous season, NOT to compensate him based on what we expect he will do the following season. This is absolutely critical. Hence, for arbitration purposes, the fact that a player was lucky, his performance was unsustainable or anything along the lines of “he won’t be as good as he was last season” is not permissible. This works the same for underachievers too as teams will get the benefit at arbitration when a player was “unlucky.”
Keeping all this in mind, what I have been able to do is determine which statistics (and other factors) matter the most when it comes to arbitration salaries and have created a formula that can accurately predict the salaries of future players by plugging in certain statistics. You may have seen similar work featured on MLBTradeRumors.com, however, the raw numbers produced by my formula are more accurate and contain less variance than their model’s adjusted projections. The 2015 arbitration projections on MLBTradeRumors featured an average error of $303,061 with a standard deviation of $334,102. My unadjusted projections yield an average error of $283,094 with a standard deviation of $255,174. Not to mention that my formula does not have any built in restraints or adjustments, which would certainly help increase its accuracy even more.
While these projections aren’t perfect, we can get a pretty good idea of what arbitration-eligible players will receive. Using these projections we should be able to not only predict a player’s salary for the upcoming season, but with good long-range statistical modeling, we can reasonably project a player’s subsequent arbitration salaries as well.
How much will Matt Harvey earn before he reaches free agency? How many millions will TJS wind up costing him?
Should Kris Bryant sign an extension this winter or should he try to reach free agency as early as possible? What should each side do? What about someone coming to arbitration for the first time like Nolan Arenado?
How much money does a team stand to save by avoiding Super-2 or delaying free agency by a year? Should the type of hitter/pitcher influence the decision?
Were the Reds or Todd Frazier better off by agreeing to a 2-year, $12-million deal this winter instead of going through arbitration twice? What about a defense-first player like Juan Lagares?
How much money is a rebuilding team like the Phillies costing themselves over the next few years by using Ken Giles as a closer instead of as a “high-leverage reliever?” Should the Marlins not make Carter Capps their closer in 2016?
Which teams do the best when it comes to arbitration? Which ones do the worst? (More on that next time). What about the agencies?
Using my formula, these are the questions we can begin to answer now.