Archive for Outside the Box

Measuring Team Chemistry with Social Science Theory

Every athlete, professional or otherwise, talks about that feeling of being on a team. There’s something that happens when a team “clicks” – it’s a united feeling of team spirit that propels team members to compete, most often referred to as team chemistry. In the social sciences there’s no measure of team chemistry, but there is however Team Cohesion, which is defined as:

A dynamic process that is reflected in the tendency of a group to stick

together and remain untied in the pursuit of its instrumental objectives

and/or for the satisfaction of member affective needs [1].

Team cohesion has been shown to exist across multiple work group settings (organizational, military and sport) [2], as well as across multiple sports (basketball, golf [3], softball, and baseball [4]). Perhaps more interestingly, cohesion has also been bi-directionally linked to performance: when teams perform better, they are more cohesive; and when they are more cohesive, they perform better [2,5]. And while the research on this relationship is clear, it has mostly been conducted with non-professional teams. Indeed, team cohesion is one of many other “unobservable” properties that are untapped within profession sports.

How can we measure team cohesion in professional sports?

 As researchers, we would normally use a validated survey to measure team cohesion – a survey that I could rely on to accurately measure team cohesion. Unfortunately, when I don’t have access to a team, I’m forced to use alternative methods. The first step is to examine the literature; a few key findings are brought to light about indications of team cohesion:

  • Team cohesion is related to the extent that members accept the roles on their team (captain, motivator, leader, follower, etc.) [6].
  • Charismatic leaders will refer to their teams more often than referring to themselves [7].
  • The higher the level of team cohesion, the better the team performance [2,5].

So, if I can somehow measure how often leaders refer to their teams (vs. themselves), then I can use this as an approximation of their leadership characteristics. And if leaders are acting like leaders, they may also be helping to solidify roles within their team. Therefore we might expect that:

Hypothesis 1: As leaders reference their team more, we should see increased team cohesion – and as team cohesion increases, we should see better performance.

A charismatic leader does not typically arise without a contextual or conditional trigger. Crisis often prompts the emergence of charismatic leadership – a setting that allows a charismatic leader to propose an ambitious goal [8]. Both the context and the charismatic leader influence one another, almost as if the leader requires crisis as an occasion to exemplify charismatic leadership [9]. Additionally, at the group level, team members have been shown to become more attached to the leader in times of crisis, prompting a greater presence of cohesion during times of crisis as followers rally around the charismatic leader [10].

In baseball, teams experience all types of crises throughout the long season, including injuries, losing streaks, playoff races, and team conflicts. Perhaps the most common and least contextual of these crisis is the race to the playoffs as the season comes to an end. With an understanding of how and when the playoff races begin to make an impression, I can expect to observe a temporal effect of charismatic leadership by using our previous indicator of team reference. That is, it may not only be that “there is a positive relationship between a leader’s team references and the amount of wins his team will have at the end of the regular season”, but also:

Hypothesis 2: The timing of when a team leader references his team can determine the effectiveness of his leadership.

Methods

As the first component of the measure, I needed to assess team leaders’ reference to themselves or their team, I used the most popular newspaper from that team’s city to extract quotations (e.g., San Francisco Chronicle for the Giants; the New York Times for the Yankees). A team leader was identified by teammates, coaches, or front offices as a “leader”, a “captain”, or having either of these qualities. If there was more than one identified team leader, I randomly chose between the two. I tracked the quotes from 8 randomly selected baseball team leaders from 8 randomly selected teams across an entire regular season (April 4th, 2012 – October 3rd, 2012). Statement settings included comments made in locker rooms after games, during the All-Star break, before a game started, or in any other setting. Any time the leader was documented as saying anything that appeared in the newspaper, that quote was documented for analysis. Leader quotes were qualitative coded independently between 3 different coders. Each quote was coded as containing “self-reference”, “team-reference”, and/or “other reference” (the 3 coders had 97% agreement on their final codes). I began this study in 2013 thus I used the 2012 season, which was the latest complete season at my disposal.

Due to the disparity in responses, the sample was aggregated based on team leaders who played on teams that finished with a certain number of wins. Since 1996, no AL team has made the playoffs with less than 86 wins [11]. During the same time period, no NL team has made the playoffs with less than 82 wins [12]. For this study, leaders were categorized based on how their teams finished the regular season (86 or more wins for AL teams and 82 or more wins for NL teams). Those at or above the win mark were titled “high team leader” (HTL) and those below the win mark were titled “low team leader” (LTL). Four teams in the sample met the HTL criteria and their combined record was 368 – 280 (.568 wining percentage). Not all HTLs were on teams that made the playoffs in 2012, but each of the four teams were competing for a playoff spot in the months of August and September. Four teams in the sample met the LTL criteria and their combined record was 296 – 352 (.457 winning percentage).

 

High or low team leader classification

Team League 2012 Regular Season Record Team Leader High or Low Team Leader
Angels AL 89-73 Torii Hunter HTL
Giants NL 94-68 Buster Posey HTL
Yankees AL 95-67 Derek Jeter HTL
Rays AL 90-72 Evan Longoria HTL
Rockies NL 64-98 Michael Cuddyer LTL
Twins AL 66-96 Justin Morneau LTL
White Sox AL 85-77 Paul Konerko LTL
Phillies NL 81-81 Jimmy Rollins LTL
     Table 1. Classification of high or low team leaders based on their team’s 2012 regular season record

Results

There was no significant correlation between the total number of team references and the total number of wins that a leader’s team had at the end of the regular season r = .237, p > .05). Nor was there an indication of a negative correlation between self-references and total number of team wins r = -.086, p > .05.

Leader responses were then aggregated between LTLs and HTLs. Of the 490 total responses, 252 responses were made after or in reference to a previous game. Quotes were then selected for these post-game interview responses after a leader’s team had won a game (162 total) or lost a game (90 total). After a loss, both HTLs and LTLs referred to their teams much more often than referring to themselves. LTLs were 7.20 times as likely to reference their team after a loss than reference themselves. When compared to LTLs, HTLs were less likely to refer to their team after loss (4.42:1). After a win, LTLs were 1.41 times as likely to reference their team than themselves. HTLs on the other hand were 2.32 times as likely to reference their team than themselves after a win (Table 1).

Reference to team or self as ratio

Leader Loss Win
HTL 31:7 (4.42:1) 65.28 (2.32:1)
LTL 36:5 (7.20:1) 45:32 (1.41:1)
     Table 2. Ratios of team vs. self references for each type of leader

The monthly distribution of team reference for LTLs was relatively even across all months of the regular season. The highest percentage was July (19.9%) and the lowest was August (12%), a difference of 7.9% (Figure 1). The overall standard deviation for team references by month was σ = 2.88. In contrast, team reference for HTLs was much more dynamic. The highest percentage was September (39.6%) and the lowest was June (5.8%), a difference of 33.8%. September team references for HTLs were more than double any other month. The overall standard deviation was σ = 12.2, with the resulting distribution becoming much more parabolic (Figure 2). The quadric trend line that is used to represent the team reference distribution for HTLs showed a very good fit R2 = .91.

nullFigure 1. Percentage of team reference by month LTLs
           Figure 2. Percentage of team reference by month HTLs with quadratic trend line

 

Discussion

The increased rate of team reference by HTLs as compared to LTLs may have helped to establish better role clarity – a characteristic of more cohesive teams. This was further marked by the fact that HTLs were on higher performing teams than LTLs. The direction of the team cohesion to performance relationship in this case is still unknown.

HTLs also referred to their teams most often during the end of the regular season. This relates to the theory that charismatic leaders will “activate” in times of crisis. In turn, this helps to create more team cohesion as members attach themselves to leaders in times of crisis.

 

[1] Carron, A.V., Colman, M.M., Wheeler, J., & Stevens D. (2002). Cohesion and Performance in Sport: A Meta Analysis. Journal of Sport & Exercise Psychology, 24, 168-188.

[2] Mullen, B. and Copper, C. (1994). The relation between group cohesiveness and performance: an integration. Psychological Bulletin.115, 210-227.

[3] Vincer, D., & Loughead, T.M. (2010). The Relationship Among Athlete Leadership Behaviors and Cohesion in Team Sports. The Sport Psychologist, 24, 448-467.

[4] Carron, A.V., Bray, S.R., & Eys, M.A. (2002). Team Cohesion and Team Success in Sport. Journal of Sports Sciences. 20(2). 119-126.

[5] Oliver, L.W., Harman, J., Hoover, E., Hayes, S.M., & Pandhi, N.A. (2003) A quantitative integration of the military cohesion literature. Military Psychology, 11, 57-83.

[6] Carron, A. V., & Eys, M. A. (2012). Group dynamics in sport (4th ed.). Morgantown, Fitness Information Technology.

[7] Shamir, B., Arthur, M.B., & House, R.J. (1994). The rhetoric or charismatic leadership: A theoretical extension, a case study, and implications for research. The Leadership Quarterly, 5(1), 25-42.

[8] Poon, J. & Fatt, T. (2000). Charismatic Leadership. Equal Opportunities International. 19(8), 24-28.

[9] Conger, J. A. (1999). Charismatic and transformational leadership in organizations: An insider’s perspective on these developing streams of research. The Leadership Quarterly, 10, 145-179.

[10] Kets de Vries, F. R. (1988). Prisoners of leadership. Human Relations, 41, 261-280.

[11] Gaines, C. (2011, April 21). Chart of the Day: What it takes to make the playoffs in Baseball. Business Insider. Retrieved from http://www.businessinsider.com/chart-of-the-day- what-it-takes-to-make-the-playoffs-in-baseball-2011-4

[12] Bloom, B.M. (2005). Padres Try to Recover from 82-80 Record. San Diego Padres. Retrieved from http://m.padres.mlb.com/news/article/1236830/


Give Me a Rise

It is well established that having more rise on your four-seam fastball is a good thing. The question then becomes, can we identify the optimal amount of rise as compared to the league-average fastball. For the purposes of this analysis, we will look at swinging-strike rate, from all four-seam fastballs thrown since the dawn of the PITCHf/x era, in regular-season action.

We in the sabermetrically-inclined community tend to pooh-pooh popular baseball concepts, particularly ones where the science, on the surface, doesn’t appear to jive with the age-old baseball wisdom. Don’t worry, this is not a DIPS discussion, nor a discussion on a pitcher’s ability to manage contact. I bring up this concept in relation to the term “late life” as in movement later in the pitches trajectory. Physics tell us that the ball will have a very predictable trajectory from the moment the ball leaves the pitchers hand, until it reaches the front of the plate. That, however, is merely half the story. There are two important points I want to bring up:

  1. Batters cannot compute vertical trajectory explicitly; they essentially tap into a huge vault of experience telling them how far a pitch will drop based on their experience with pitches of similar velocity.
  2. A hitter’s swing is largely ballistic (very difficult to change mid-swing) and takes about 0.18 seconds to execute. That means that a hitter has roughly 0.2 seconds post-release of the ball to gather information and form an educated guess as to where the ball will end up.

Based on these assumptions, I computed late movement, in both the vertical direction and horizontal direction. I then compared this to the expected vertical movement based on the velocity (more velocity, less drop obviously). This to me is the optimal way to look at movement, since presumably they cannot gather any more information. A great hitter may be able to factor in their knowledge of the pitcher’s ability to rise the fastball, but they are fighting their memories of all the other fastballs they’ve seen, so more difficult than you would think.

Which brings us to a very interesting graph: The height and colours in the histogram reflect the magnitude of the swinging-strike rates, shown in sequential order of velocity. If you scroll all the way to the bottom, you’ll see that the center of the histogram is somewhere around -.6, or 0.6 feet more rise than the average four-seam fastball when looking at the pitch 0.2 seconds after release until it crosses home plate.

We see a very clear normal curve, with more “normal” at higher n. Thus we can now compute the value of rise in a four-seam fastball, as distributed by a normal curve centered around 0.6 feet above the mean drop. Not really a stats guy, so not sure how to do that exactly. What I find interesting is that the 7 inches or so of rise is pretty consistent across the velocity spectrum. I’m not sure why it peaks at this point, though I would surmise that it’s probably the sweet spot where the hitter feels like they can make contact, but can’t, as opposed to extreme rise which would freeze the hitter.

This leads us to our last graph (warning: this one scrolls for a while). You’ll see the same graph as above, but you’ll see Whiff%, GB% and HR% stacked one on top of the other.

This actually paints a very intuitive picture. If there is more rise than average, you’ll get swinging strikes. If it drops more than average, you’ll get groundballs and if it drops about what you’d expect, you’ll get some groundballs, but also homers. Ignore the SSS noise with homers at the higher velocities. Again what is interesting with the GB% and Whiff% histograms are how consistent they are irrespective of velocity. So… if velocity doesn’t impact this analysis, let’s collapse it all into one final graph:

Paints a very clear picture: if your four-seam fastball isn’t getting at least 5 inches of late rise, you are going to be giving up a lot of homers. Note that swing% (swings/total pitches) is normally distributed around a mean of .2 feet of rise and appears to track pretty closely to HR%, implying that hard contact is not affected within 1 standard deviation.

Looking forward to the feedback.


Vertical Command – Or Lack Thereof

I read a great book by Mike Stadler called the Psychology of Baseball. In it he referenced that it is far more difficult for humans to control where a ball ends up vertically (due to the need for advanced spatial reasoning) compared to horizontally. You can find his discussion starting on page 86. Amazon Link

I’m going to show you three pictures which will illustrate this quite well. Data is inclusive of all pitches thrown in regular season games since 2010. The first is a heat map of sorts which maps vertical distance from the center of the zone (from PITCHf/x data sz_top and sz_bottom) on the y axis and velocity on the x axis. What we see quite clearly is that it is *much* better to throw a four-seam fastball up in the zone than down in the zone, almost irrespective of velocity. In fact, a 92 MPH four-seam fastball thrown 0.8 feet above the center of the zone will get about 13% swings and misses; a 98 mph four-seam fastball thrown below the center of the zone will get 12% swings and misses. Behold the graph, from a fan:

Four Seam Fastball, Depth x Velocity
Four-Seam Fastball, Depth x Velocity

The question then becomes, if a pitcher throws the ball up in the zone, how will the probability of a HR change? This brings us to picture #2, where we have the same x and y axes (apparently that’s the plural of axis, thanks google), but instead we have HR% (# of HRs/Total Pitches). I’ve removed 99+ MPHs from the graph as they were displaying SSS noise.

HR% by Depth and Velocity
HR% by Depth and Velocity

So interestingly, if you look at the totals on the right, it paints a visual that HRs are NOT hit on high fastballs, but rather on fastballs closer to the heart of the zone (vertically). In fact (and a story for another day) there is a 97% R-squared correlation between distance from the center of the zone and HR%. On an aside, this also reproduces other research which indicate that faster fastballs yield fewer home runs. The trend is also quite linear (don’t have a computed R2 for that, but that’s old news anyway).

Now, if you are far more likely to get a swinging strike and you aren’t putting yourself at risk for a home run by throwing up in the zone, if we looked at a distribution of four-seam fastballs, we should see a higher proportion of four-seamers up in the zone, ideally right at the top 0.8 to 1.0 feet above the zone, where whiffs are plentiful and HRs are scarce. Beware SSS in some of the higher velocities, but note that a 95 MPH fastball only .4 feet above the center of the zone will yield more HRs than an 88 MPH fastball thrown at the top of the zone (the 95 MPH fastball will still yield more whiffs, but just goes to show how important command is). This is what we actually see:

A nearly uniform distribution across all velocities, slightly skewed to below the center of the zone. I’m not ready to conclude that pitchers are not capable of pitching up in the zone with four-seam fastballs, it may just be old school “pitch down in the zone” thinking. I still find it astonishing how consistent the data is across the velocity spectrum. It almost appears to me that if a pitcher can simply pitch higher in the zone with a four-seam fastball, they can make their stuff play up a lot, sort of like MadBum:

Still not pitching at the top end of the zone, but definitely skewed higher, with his distribution centered around .3 feet above the heart of the zone.


How Game Theory Is Applied to Pitch Optimization

The timeless struggle between pitcher and batter is one of dominance — who holds it and how. Both players use a repertoire of techniques to adapt to each other’s strategies in order to gain advantage, thereby winning the at-bat and, ultimately, the game.

These strategies can rely on everything from experience to data. In fact, baseball players rely heavily on data analytics in order to tell them how they’re swinging their bats, how well they’ll do in college, how they’ll perform at Wrigley versus Miller.

Big data has been used in baseball for decades — as early as the 60s. Bill James, however, was the first prominent sabermetrician, writing about the field in his Bill James Baseball Abstracts during the 80s. Sabermetrics are used to measure in-game performance and are often used by teams to prospect players.

Baseball fans familiar with sabermetrics, the A’s, and Brad Pitt have likely seen Moneyball, the Hollywood adaptation of Michael Lewis’ book. The book told the story of As manager Billy Beane’s use of sabermetrics to amass a winning team.

Sabermetrics is one way baseball teams use big data to leverage game theory in baseball — on a team-wide scale. However, by leveraging their data through the concepts of game theory on a smaller scale, baseball teams can help their men on mound out-duel those at the plate.

Game theory studies strategic decision making, not just in sports or games, but in any situation in which a decision must be made against another decision maker. In other words, it is the study of conflict.

Game theory uses mathematical models to analyze decisions. Most sports are zero-sum games, in which the decisions of one player (or team) will have a direct effect on the opposing player (or team). This creates an equilibrium which is known as the Nash equilibrium, named for the mathematician John Forbes Nash. What this means is that if a team scores a run, it is usually at the expense of the opposing team — likely based on an error by a fielder or a hit off a pitcher.

In the case of pitching, game theory — especially the use of the Nash equilibrium — can be used to predict pitch optimization for strategic purposes. Neil Paine of FiveThirtyEight advocates using big data and sabermetrics to analyze each pitch in a hurler’s armory, then cultivating the pitcher’s equilibrium — the perfect blend of pitches that will result in the highest number of strikeouts, etc.

Paine has gone so far as to create his own formula, the Nash Score, to predict which pitcher should throw which pitches in order to outwit batters.

In perfect game theory, the Nash equilibrium states that each game player uses a mix of strategies that is so effective, neither has incentive to change strategies. For pitchers, Paine’s Nash Score uses their data to find the optimal combination of pitches to combat batters, including frequency.

Paine does point out that creating this kind of equilibrium in baseball can be detrimental to a pitcher. He is, after all, playing against another human being who is just as capable of using game theory to adapt strategies to upset the equilibrium.

If a pitcher’s fastball is his best, and his Nash Score shows that he should be using it more often, savvy hitters are going to notice. “ . . . In time, the fastball will lose its effectiveness if it’s not balanced against, say, a change-up — even if the fastball is a far better pitch on paper,” writes Paine.

In this case, a mixed strategy is the best — in game theory, mixed strategies are best used when a player intends to keep his opponent guessing. Though pitch optimization using Paine’s Nash Score could lead to efficiency, allowing pitchers to throw fewer pitches for more innings, it could also lead to batters adapting much quicker to patterns, thus negating all the work.


Stop Thinking Like a GM; Start Thinking Like a Player

Like many baseball fans, I have played a lot of baseball in my life. I wasn’t anything special—Just A Guy in HS-age select ball, a starter in college only by virtue of attending a notoriously nerdy institution, and a player in the kind of adult league where a typical pitcher throws 80 and a double play ball has about a 50/50 shot of actually becoming a double play. What might be atypical about me is that as both a player and fan of baseball, I never had to struggle with sabermetrics upending conventional wisdom. For me sabermetrics was conventional wisdom from the very beginning. I grew up in a house with every single Bill James book ever published on the bookshelves and knew who Pete Palmer was when I was twelve.

Here’s the honest truth: Sabermetrics provided essentially no help in making me a better baseball player.

If a sabermetrician (or saber-partisan) wonders why the larger baseball world has not discarded Medieval Superstition for Enlightened Science, foregoing the burning of witches to instead guillotine the likes of Hawk Harrelson, he should think about all that is implied by the above.

Sabermetrics has immeasurably improved the management of baseball, but has done comparatively little to improve the playing of baseball. The management of baseball (meant generically to encompass front office as well as in-game management) is primarily an analytical task, but the playing of baseball is at heart an intuitive one. Getting better at managing involves mastering and applying abstract concepts. Getting better at playing involves countless mechanical repetitions with the goal of honing one’s neurology to the point at which certain tasks no longer require conscious attention to perform.

It is not terribly surprising that sabermetricians, being almost by definition analytically inclined, have gravitated towards finding management to be a more interesting problem than playing. That attitude has gotten sabermetrics a long way but is now a problem. Traditional sabermetric lines of inquiry are on multiple fronts running into limits, beyond which sabermetricians are declaring, “Everything past here is just luck!” Breaking new ground is most definitely possible, but it will require sabermetricians to ask different questions. To ask those questions, a perspective change has to occur: going forward, the sabermetrician will need to look at baseball through the eyes of a player, not the GM.

The Cultural Divide

To come at this dichotomy from another, roundabout direction, let’s consider a hypothetical player who has just been through a 3-for-20 (with a walk) slump. Two statements are made about him:

Statement A: 21 PA’s is far too small a sample size to make any definite judgement about him. His anomalously low .200 BABIP is driven by a IFFB% well above his career average, so in all likelihood he’ll regress towards his projection.

Statement B: He is letting his hands drift too far away from his body, so pitchers are busting him inside, and he’s popping up what he isn’t whiffing.

Start with the obvious: The reader does not require n = 600 to expect with 95% confidence that he is more likely to read statement A rather than B at FanGraphs, Baseball Prospectus, Grantland, or FiveThirtyEight, and that with nearly equal confidence he would expect to hear statement B rather than A from a color announcer on a broadcast. Furthermore, someone making statement A will often imply or suggest that Statement A is Responsible Analysis and that Statement B is an attempt to Construct a Narrative (“Construct a Narrative” being the polite datasplainer way to say, “Bullshit”). Most people making statement B look at statement A and roll their (glazed) eyes.

Tribal affiliations established, let’s analyze the two statements in the critical literary sense. Who is the intended audience of the respective statements? A is a probabilistic statement about the future that implies lack of direct control but supposes its audience needing to make a decision about the player. The appropriate audience for such a statement is a manager or general manager. B is a definite statement about the present that implies direct, controllable causality and implicitly suggests a course of action to take. The appropriate audience for such a statement is the player himself.

Now of course, neither statement is really made for the GM or player but both are rather made for the fan who vicariously stands in for one or the other. What fundamentally defines a fan is that he identifies with the team and internalizes such statements as if he were actually a participant. The faux-audience of the two statements thus reveals a difference in how the real audience identifies with the team: A is made for fans who primarily identify with the GM, or more likely, fans who have fantasy teams (a variation on the theme).  B is for fans who primarily identify with the players. The use of “primarily” implies that the division suggested is of degree rather than kind—any fan of a mind to be critical, from the bleacher creature-est to the most R-proficient, will do both—but to implicitly adopt the viewpoint of management carries an inherent elitism.

To say the viewpoint of sabermetrics is elitist is not to say it is wrong—quite the opposite. As a system for framing and evaluating management decisions it has proven spectacularly right. It has been over a decade now since Bill James got his ring, and today every single MLB franchise employs people whose sole job is to produce proprietary statistical analysis. The premier saber-oriented publications have difficulty retaining talent because said talent is routinely poached by said franchises. Were an alien to arrive on earth and learn Western Civ from Brad Pitt movies he would judge Billy Beane a greater hero than Achilles. The revolution is over, and the new regime is firmly ensconced. To point at any remaining Tallyrands who have managed to survive the turnover is to ignore the amount of adaptation that has been required of them to do so.

No, to say sabermetrics is elitist is instead to say merely that its assumed perspective is managerial. It asks and answers questions like, What is the optimal strategy? or, How do I compare the value of different skillsets? or the real, ultimate, bottom-line bone of contention: How much does this guy deserve to get paid? That sabermetrics adopted this perspective was not necessarily inevitable. Sabermetrics grew out of the oldest of fan arguments: Who is the (second) greatest? Who deserves the MVP this year? Should this guy be in the Hall of Fame? These questions are about status, and status ultimately rests on subjective values. The declared purpose of sabermetrics is to answer those questions objectively. More modestly stated, the purpose is to force people arguing over subjective values to do so in the context of what actually wins baseball games. More cynically stated, it can be a way of humbugging that dispute by presenting a conclusion dependent upon a particular value judgement as precise, objective truth and its detractors as retrograde obscurantists.

The cynical way of stating sabermetric purpose is unfair, but it is made possible because the sabermetric solution to this problem of trying to referee aesthetics with numbers was to assert a specific conception of value as normative: that of a general manager whose job is to assemble a team to win the most baseball games in the specific context of free-agency era Major League Baseball’s talent pool and collectively-bargained labor and roster rules. When Keith Woolner looked at the talent distribution of players and proposed that there was a more or less uniform level of talent that was so ubiquitous and readily available that players of that skill level should be considered to possess zero scarcity value, he established something that could serve as an objective basis for value comparison. The existence of such a talent level meant that an optimally-operating GM should evaluate players by their skill level in reference to that baseline and naturally allocate the franchise’s finite resources according to this measure of talent scarcity. Woolner didn’t merely propose the idea. He demonstrated, quantified, and named it: VORP. Value Over Replacement Player. Regardless of how an MVP voter wished to philosophize “value”, this was clearly the correct way for a general manager to conceive of it.

“Replacement Level” is one of those ideas that, once one understands it, one immediately recognizes its intuitive obviousness and is embarrassed to have not thought of it before. It cannot be un-thought, and the difficulty of re-imagining what it was like to lack it in one’s mental toolkit makes it easy to forget how revolutionary it was. Overstating this revolutionary impact is exceedingly difficult, so here’s a go: In an alternate universe where Woolner chose to stay at MIT to become an economist instead of going to Silicon Valley, in which he published VORP about a normal profession in an economics journal with Robert Solow as his advisor rather than doing it as a baseball nerd in his spare time at Baseball Prospectus, he’d probably have a Nobel Prize (shared with Tom Tango and Sean Smith). That VORP as a statistic has been superseded by the more comprehensive WAR should not diminish its revolutionary status; VORP is to WAR what the National Convention is to Napoleon. “Replacement Level” labor was the most analytically powerful conceptual advance in economics since Rational Expectations. That some actual labor economists have had difficulty with it and have yet to adopt it as a common principle of labor economics is nothing short of mind-blowing. While it was developed to explain such a unique and weird labor environment, with minor modifications it could be applied widely.

WAR of the Worlds

WAR has conquered the baseball world, but no war of conquest is ever won cleanly. Amongst the common vices: looting. The best example of such is catcher defense. Establishing the level and value of pitch-framing ability has been a hot project in sabermetrics for several years now, enabled by a sufficiently large PITCHf/x database. Quantifying this ability may be a new thing, but anyone who claims the discovery of its existence belongs in the sabermetric trophy case is like a Frenchman claiming the Louvre as the rightful place of Veronese’s Wedding at Cana. The old-school baseball guys shoehorned into the role of bad guys in Moneyball were nearly uniform in their insistence on the value of a catcher’s defensive ability. The great unwritten story of sabermetrics of the last five to seven years is how much of the previously-derided, old-timey wisdom of the tobacco chewers has been validated, vindicated, and… appropriated. There is little better way to see this (r)evolution in opinion than reading the player blurbs on Jose Molina from several editions of the Baseball Prospectus Annual:

2003: My God, there are two of them. Jose has a little more pop than Ben, which is among the faintest praise you’ll read in this book. The Angels would be well served to go out and find a left-handed hitting catcher with some sock, just to bring off the bench and have a different option available. No, not Jorge Fabregas.

2004: Gauging catchers’ defense is an inexact science. We can measure aspects of it, but there’s enough gray area to make pure opinion a part of any analysis. So consider that a number of people think that Jose, the middle brother of the Backstopping Molinas, is a better defender than his Gold Glove-laden sibling. Although the two make a great story, the Angels would be better served by having at least one catcher who can hit right-handers and outrun the manager.

2005: At bat, both Molinas combined weren’t as productive as Gregg Zaun was by himself. That’s the value of getting on base; the difference from the best defensive catcher to the worst isn’t nearly as wide as the gulf created when one player uses his plate appearances effectively and the other toasts them like marshmallows. The younger Molina is a poor fit to back up his bro, given their too-similar skill sets.

2009: Since 2001, 66 catchers including Molina have had a minimum of 750 PAs in the majors. Of those, exactly two—John Flaherty and Brandon Inge—have had lower OBPs than Molina’s .275 (as a catcher only, Inge is lowest at .260). If OPS is your preferred stat, than just three backstops have been lower than Molina’s 614. Compared to Molina, Henry Blanco is Mickey Cochrane. The wealthiest franchise in sports could have had anyone as their reserve catcher, but in December 2007, Cashman decided they would have Molina for two years. He then climbed Mt. Sinai, shook his fist at the Almighty, and shouted, “I dare you to take Jorge Posada away from us, because we have JOSE MOLINA!” Thus goaded, the Almighty struck Posada with a bolt of lightning, and the Yankees hit the golf courses early. The moral of the story is that hubris sucks. P.S.: Molina threw out an excellent 44 percent of attempting basestealers, which is why he rates seven-tenths of a win above replacement.

2010: Nothing about Molina surprises. He could be caught in a hot-tub tryst with two porn starlets and a Dallas Cowboys linebacker and you’d still yawn, because it wouldn’t change a thing: he’s a glove man who can’t hit. In the last two years, he has posted identical 51 OPS+ marks, batting .217/.273/.298 in 452 PAs. He accumulated that much playing time because of Posada’s various injuries and scheduled days off. Though Molina’s good defense stands in direct contrast to Posada’s complete immobility behind the plate (so much so that Molina was used as A.J. Burnett’s personal catcher during the postseason), the offensive price was too high to pay. Molina is a free agent at press time; the Yankees are ready to turn his job over to Cervelli.

2013: Molina owes Mike Fast big-time. Fast’s 2011 research at Baseball Prospectus showed Molina to be by far the best pitch-framer in the business, turning him (and Fast, in fact) into a revered hero almost overnight. The Rays pounced for $1.5 million, and Molina rewarded them by setting a career high for games played (102) at age 37. He’d have played a few more were it not for a late-season hamstring strain, which also interrupted a Yadier-like, week-long hitting spree that separated the offensively challenged Molina from the Mendoza line for good. The Rays were glad to pick up his $1.8 million option in 2013 and hope for similar production.

2014: Arguably the best carpenter in the business because of his noted framework (*rimshot* *crickets*), Molina continued to handle a steady workload for Tampa Bay as he creeps toward his 40th birthday. The middle Molina receives a lot of praise for his work behind the plate, but his best attributes might be imaginary. He has been the stabilizing force for a pitching staff that perennially infuses youth as well as a role model for the organization’s young backstops. These traits are likely to keep him around the game long after he has stolen his last strike. For now, the framing alone is enough—the Rays inked Molina to a new two-year deal last November.

There is much to unpack from these blurbs, too much in fact to do systematically here. I selected them not to pick on Baseball Prospectus specifically (they did after all correctly identify the moral of the story), but because BP is a flagship sabermetric publication whose opinions can serve as a rough proxy for all of sabermetrics and because Jose Molina can serve as the avatar of catcher defense. I have omitted 2006-8 and 2011-12 partially for brevity and partially because it brings into high relief distinct eras of sabermetric consensus: In 2003-5, there is an acknowledgement that he might be a truly elite defensive catcher, but this view is a) not actually endorsed, b) assumed to be of minimal importance even if true given the then-saber consensus that OBP trumps all. In 2009-10, the opinion of him hasn’t really changed but the tone has—the writers acknowledge no uncertainty and are openly offended at his continued employment. By 2013-14 there has been a complete sea change in attitude. Not only does the writer appreciate the value of Molina’s skill, he confidently claims that it was because of Baseball Prospectus that he was now properly appreciated by an MLB franchise!

Fast’s research was genuinely outstanding (as was Max Marchi’s). He deserves enormous credit for it and has received (as has Marchi) the ultimate in sabervalidation- to be hired by a franchise to keep his future work exclusive. What he doesn’t deserve credit for is Jose Molina remaining employed. For someone (it wasn’t Fast) to claim that Molina owed BP a thank-you note for being paid less than he had been as a Yankee is astonishing on several levels, even granting that such blurbs are supposed to be cheeky and entertainingly irreverent. For starters, BP is confident that the overlap between front offices and saberworld is tight enough (and BP influential enough) that someone at every single franchise would have read Fast’s work. This part is at least true. The claim of being so influential as to be the primary reason Jose Molina was signed by the Rays is most likely false.

In February, Ben Lindbergh wrote at Grantland about his experience as an intern at the Yankees, during which time he had firsthand knowledge that the Yankees baseball ops department seriously debated as early as 2009 the possibility that Jose Molina was better at helping the Yankees win games than Jorge Posada, possessor of a HOF-worthy (for a catcher) .273/.374/.474 career slash line. Not only did he witness this argument, he proofread the final internal report that demonstrated this possibility to be reality. When Fast published his research at BP in 2011, Lindbergh was an editor there. Fast’s result was already known to him (although possibly NDA’d). When the blurb in the 2013 annual was published, Lindbergh had risen to Managing Editor. For BP to claim that Fast’s research drove Tampa Bay’s decision (as opposed to their own) was to claim that a front office renowned for its forward-thinking and sabermetric savvy was two years behind two of its division rivals (Molina having just finished a stint in Toronto).

About two weeks before the Rays signed Molina in November 2011, DRaysBay (the SBNation Rays site) had a Q&A with Andrew Friedman, which touched on framing (my emphasis):

Erik Hahnmann [writer at DRaysBay]: Recently there was a study by Mike Fast at Baseball Prospectus on a catchers’ ability to frame pitches and how many runs that can save/cost a team over the course of a season. A catcher being able to frame a pitch so that it switches a ball into a strike on a close pitch was worth 0.13 runs on average. The best can save their team many runs a year while the worst cost their team runs by turning strikes into balls. Is this a study you’ve looked at, and is receiving the ball and framing a pitch a skill that is valued and taught within the organization?

Andrew Friedman: We place a huge emphasis on how our catchers receive the ball. Jamie Nelson, our catching coordinator, pays close attention to each catcher’s technique from day one, and he and our catching instructors have drills to address different issues in that area. As with any skill, some players have to work more at it than others. The recent studies confirm what baseball people have been saying for decades: technique matters, and there’s more to catcher defense than throwing runners out.

To some extent every GM is a politician when it comes to communicating the fanbase, so we can’t necessarily take what Friedman said at face value. Friedman did after all employ Dioner Navarro for years. With that caveat though, those are not the words of a recent convert. Friedman is also the guy who traded for the defensively superb Gregg Zaun in 2009 and for whom Zaun most wanted to play after the 2010 season (he ultimately retired, unable to get an offer coming off of labrum surgery at 39). The weight of evidence, most heavily that the famously low-budget franchise had a full-time employee whose title was “Catching Coordinator”, is that the Rays front office valued catcher defense before it was cool.

The point is not to be too hard on Lindbergh, who is a joy to read and whose linked article above is in part a personal a mea culpa for his original skepticism. The point is to be hard on sabermetricians as a tribe who, having discovered for themselves the value of pitch framing in 2011 and refined their techniques subsequently, rarely if ever made similar mea culpa for belittling the folks who were right about it all along. Imagine the view from the other side: you’re a grizzled scout, a career baseball guy, a former-player color announcer who knew in your bones and always insisted that a catcher’s receiving ability was crucial. Your name might be Mike Scioscia. You were castigated as an ignoramus for more than a decade by a bunch of nerds who couldn’t see the dot on a slider if it Dickie Thon-ed them and who relied almost exclusively on CERA, a statistic so quaintly simplistic it was created before anyone would have thought to construct it as C-FIP. Then all of a sudden, one day the statheads not only show that you were right the whole time, they also show that you are good at judging this ability, and they make no apologies. One can perhaps forgive such a person for not bowing too deeply to his new overlords.

Science?

While Michael Lewis no doubt exaggerated the scout/sabermetric culture clash, especially within actual front offices, he certainly did not invent it either. It is epistemological at heart—whether or not one prefers an intuitive or analytical basis for knowledge. Keith Woolner (can’t win ‘em all) in his above-linked 1999 research on catcher defense stated the sabermetric viewpoint most succinctly, “Currently, the most common way to evaluate game calling in the majors right now is expert evaluation — in other words, managers’ and coaches’ opinions and assessments. Ultimately, this approach is contrary to the spirit of sabermetric investigation, which is to find objective (not subjective) knowledge about baseball.” Given that attitude and the evidence available in 1999 Woolner was, in a limited sense, correct. The best evidence available did not show much differentiation in catcher defensive value. Where he (and saberworld generally) erred was in succumbing to the empiricist’s seductive temptation: declaring absence of evidence to be evidence of absence. It is oh-so-easy to say, “The answer is no” when the technically correct statement ought to be, “There is no answer.” What makes this subtle sleight-of-hand tempting is that on some level everyone understands what’s at stake: Saying, “There is no answer” when a rival epistemology plausibly claims otherwise amounts to betting the entire belief structure that the rival is wrong, a bet for which, by construction, an empiricist has insufficient evidence to make. Authority is up for grabs, and pilgrims do not tolerate silence from their oracles.

Woolner’s apt summation of the sabermetric viewpoint implies the grander ambition: Sabermetrics aspires to Science. Unfortunately, it cannot be Science in the most rigorous sense of the word. It is like economics, faced with complicated systems producing enormous amounts of data, nearly all of which is tainted by selection bias. One can wield the mathematical tools of science, but one is unable to run controlled experiments. Worse, also like economics, in order produce results of even remote usefulness one must often make unfalsifiable assumptions of questionable validity.

For a more concrete illustration of this problem, let’s continue drawing from the catcher framing well. We can measure with high precision the first-order effect of a catcher’s impact on called balls and strikes with PITCHf/x, and with linear weights we can calculate good context-independent estimates of the consequent run & win values. We do this calculation and tacitly assume that this first-order effect is, if not the whole story, at least 70-80% of it. We also know that a catcher’s receiving ability affects pitch selection (type and targeted location), both because we have testimonial evidence to that effect from actual major league pitchers and because it is intuitively obvious. Anyone who has ever toed the rubber with a runner on 3rd has at some point gotten queasy when the catcher signals a deuce and shaken it off. While this effect is openly acknowledged by absolutely everyone who studies framing, it is just as soon ignored or dismissed with prejudice by hand-wavy arguments. Should it be? Who knows? Certainly not anyone who considers Sabermetrics to be Science, because there has never been any rigorous attempt in saberworld to quantify the selection effect. No one has yet laid out a convincing methodology to do so with the extant data.

Yet, the potential second-order effect of pitch selection dwarfs the first order one- only a small fraction of pitches thrown form the basis of the first order calculation, and by definition this sample excludes every single pitch on which a batter swings. One logical possibility would be supposing that a pitcher who knows he has a good catcher is more likely to test the edges of the zone and less likely to inadvertently leave pitches over the middle of the plate. From 2012-present the team-level standard deviation of HR/9 allowed is 0.15. At 10 runs/win and a 1.41 R/HR linear weight, over a 120-game catcher-season it would only take a 0.06 difference in HR/9 to make for a whole win of value. 0.06 HR/9 equates to 1 HR per 17 games, during which time a typical starting catcher will be behind the dish for 2400 pitches, give or take. To repeat: +/- 1 meatball every 2400 pitches could drive 1 win of value. Raise your hand if you want to bet your reputation, with zero statistical evidence to back you up, on the triviality of something that we know exists and only takes 1 HR per 2400 pitches to equate to 1 WAR, let alone whatever effects it has on balls in play. The selection effect could easily be that big and be completely lost in the noise. It could be thrice that big and still look like randomness. Yet, because we can’t measure it, we ignore it. How many Molina-caught pitching staffs (any Molina) would you guess have been on the wrong side of average in HR/9?

The issue of known-but-unmeasurable effects is a big enough practical problem, but the issue of falsifiability is the sub-surface rest the iceberg. Scroll back to the beginning of this essay and compare the two hypothetical statements, this time not from a sociological or literary standpoint but rather from a Popperian, scientific one. Which is falsifiable? The “sabermetric” piece of analysis (A) is a single, probabilistic statement about the future. “The future” has sample size n =1, much too small to reject any distributional hypothesis. Any single statement about the future becomes impossible to falsify once it is hedged with the word “likely”. That by no means makes such statements incorrect, but it does mean that in order to believe it one must implicitly suspend the strict epistemology of Science for the purpose in question. That’s the cost of shifting into a probabilistic view of the world. A set of probabilistic statements made under identical methodologies can potentially be subject to falsification, but that has no bearing on any individual one. That such statements most likely (oh, snap! meta-meta!) are indeed correct ought to present any saberperson with a troubling level of cognitive dissonance.

We’re deep into bizarro world when we’re declaring statements correct but their underlying epistemology questionable, so let’s get a little less abstract and ask what ought to have been the most straightforward question about our hypothetical statements A and B: Are they true? Being hypothetical, there’s of course no way to know, but anyone who has followed baseball ought to be comfortable with the idea that either, neither, or both could be true. If either, neither, or both could be true, does that mean the truth values of the two statements are independent of each other? NO!

Wait, huh? Dig into the assumptions. Statement A is premised upon a body of research that shows that over small sample sizes, performance can vary widely, and that as a statistical matter career-to-date performance is vastly more predictive of future performance than is the most recent 21 PA. All of the data forming the basis of that research has a common feature: It was generated by actual professional hitters on actual professional teams, all of whom have had managers, hitting coaches, and teammates observing them, precisely so that flaws get spotted as soon as possible. When a hitter goes into a slump, it is the hitting coach’s job to point out flaws that might be a factor. A hitting coach who makes Statement A to the player instead of B is simply not doing his job. If he doesn’t say statement B exactly, he will say something like statement B. Being strictly hypothetical, it’s all the same.  If a mechanical flaw is the cause of the slump, then the player or his coach will discover it, and the combined forces of survival instinct, competitiveness, income maximization, and simple professional pride will lead the player to correct it. This is the normal ebb and flow of baseball. This normal ebb and flow of baseball forms the entire sample for the research upon which statement A relies. Hello again, Selection Bias, glad you came back! Statement A is true only if Statement B, or something like Statement B, is true. Furthermore, if B is true, then A is true only if the player realizes the truth of B, either by being told by a coach or discovering it himself.  Alternatively, If the real reason a hitter has started popping up and missing a lot of pitches is instead that he’s lost batspeed due to aging or injury, then statement A is false. Near-term mean-reversion is not likely in those cases. To say that statement A is likely true is simply to say that correctable flaws are much more common than uncorrectable skill declines, and that as a historical matter, players have been expeditious about correcting the easily correctable before generating large sample sizes.

Let’s resume our Popperian examination, this time with “narrative-constructing” Statement B. On close examination, it very much is falsifiable, on several levels: 1) It makes definite, unhedged assertions about observable reality that can be objectively and transparently evaluated, and 2) it proposes a causal mechanism that can be tested and begs for an experiment.  That sounds a lot like proper science. Ah, but there’s a catch: only the player himself has the ability to run the suggested experiment. The literary and the sociological factors return! The saber-inclined reader can easily miss the testability of the statement if he identifies not with the player but with management, because management cannot run such a test.

If the reader began this essay agreeing with the “sabermetric” view that statement ‘A’ is the scientific, responsible piece of analysis and ‘B’ the empty bullshit and hasn’t gotten the point yet, it’s time to level the boom: The truth is the reverse; it is statement ‘B’ that is genuinely scientific and ‘A’ that is the empty bullshit.

The Way Forward

What should be done in light of this truth? If there is a single phrase that expresses the ‘progressive’ management model to which most of saberworld adheres, it is “Process over Results”. That phrase, and the sentiment it expresses, are now sufficiently ubiquitous to be entering the MBA lexicon. Nike sells that T-shirt. It is a good general principle to live by, but once consultants figure out that it is also an infinite excuse generator for mediocrity and outright failure, it will shortly thereafter occupy a spot on the business buzzword bingo board alongside “Synergy” and “Leveraging Core Competencies.” Before that sad day arrives, cutting-edge baseball analysis ought to apply it in a way it has not yet done.

Sabermetric analysis has been very good in applying that principle in the evaluation of management decisions. That’s the easy part, since saberworld identifies with that process closely enough, and feels sufficiently knowledgeable about it to pass judgement. Conversely, sabermetrics has rarely if ever taken that viewpoint regarding its evaluation of players. On that front it has always been and remains resolutely results-oriented. Shifting from AVG to OBP to wRC+, or ERA to FIP, or E to UZR is not shifting from results to process. It is merely identifying a superior, more fundamental, more predictive result upon which to make judgements. Even at the most fundamental level possible—batted ball speed / launch angle/spin—one is still looking at a result instead of a process.

Players themselves, even the most saber-friendly, when asked about advanced stats typically give a highly noncommittal answer. Usually, it’s something along the lines of, “The numbers don’t really tell the whole story.” Saberfans usually assume this response is the meathead’s answer to Barbie. Math class is tough! Let’s go hit fungos! The post-structuralist-inclined will also usually think that the players’ refusal to unreservedly accept the definitiveness of sabermetrics is driven by a subconscious, defensive instinct to retain “control of the narrative.” That both of these explanations have an element of truth makes it easy to think they are the whole truth. They are not. Players are just operating on the same premise we have already endorsed: Process over Results. Because they are young, unacademic, and routinely measured against a ruthlessly tough standard, it is easy to forget that they are professionals operating at the most elite end of the spectrum. The difference between the players and the sabermetricians is that the players see Process in a way the rest of us can scarcely imagine and make their judgements accordingly. Should we accept those judgements uncritically? Of course not. Players like everyone are subject to all the biases datasplainers love to bring up when they are losing arguments (Decrying, “Confirmation Bias!” every time someone presents evidence one dislikes should be a punchinthefaceable offense). We should instead try to figure out how to test them. That means looking at Process through their eyes.

What does Process mean to a player? It means two things: mechanics and psychology. The psychological may always remain opaque to the outside observer, but the mechanics need not. On the contrary, the mechanics are there, open for all to see, and nowadays recorded from multiple angles at 240 fps. There is a wealth of data waiting to be captured there. When conjoined with PITCHf/x and Statcast, we can now have a complete physical picture, literally and figuratively, of what goes on for every single pitch in MLB. We should make use of it.

The gif-ability of pitches has already rapidly changed online baseball writing. No longer must a writer attempt to invent new superlatives to describe the filthiness of a slider when a gif can do it far better than words. It has also opened a new seam of sabermetric inquiry that has only barely begun to attract pickaxes–How do mechanics lead to batted-ball outcomes? Dan Farnsworth has written some great posts at FanGraphs starting down that path, as has Ryan Parker at BP. Doug Thorburn, also at BP, writes articles along these lines on the pitching side. As fascinating as those articles are, the problem they all share is that they take the form of case study rather than systematic compilation. The latter ought to be attempted.

It is fortunate that sabermetric semantics has settled on “luck” rather than “randomness” as the converse of “skill,” because nothing that transpires on a baseball diamond is truly random, and to insist otherwise is fatalistic laziness. Baseball exists in the Newtonian realm; the dance of a knuckleball is an aerodynamic rather than quantum phenomenon. “Random” in baseball is just a placeholder for anything with results that seem to adhere to a distribution but whose process remains mysterious. The goal of sabermetrics going forward ought to be shrinking that zone of mystery. Between physicists, biomechanical experts, hitting & pitching coaches, and statisticians it should be possible to answer some important questions–Is there such a thing as an optimal swing plane? If not, what are the trade-offs? Can we backward-engineer from outcomes the amount of torque in a swing and identify what hitters are doing to generate it? Ash or Maple? Is topspin/backspin something a hitter can actually affect? On the pitching side, can we actually identify a performance difference from a “strong downward plane”? Is Drop & Drive a bad idea? All of these questions are susceptible to scientific analysis, because they are fundamentally physical questions. With high speed HD cameras, PITCHf/x, and Statcast the answers may be out there.

Answering questions such as these will not only make for interesting SABR conferences. It would go a long way to bridging the gap between saberfans and ordinary fans. It would improve everyone’s understanding of the game. Above all, it would improve the actual quality of baseball at all levels. Anyone who has been involved in competitive baseball has encountered dozens of hitting and pitching “philosophies” and has had no way other than personal trial and error to judge between them. At present there is just no way to tell if the medicine a coach is prescribing is penicillin or snake oil. That “philosophies” of pitching & hitting are promoted as such is an implicit attempt to wall them off from empirical rigor. This shouldn’t be tolerated any longer than it has to by the saber set. Sabermetrics began as an attempt to measure greatness. Its greatest legacy to baseball could be in helping create it.


Quantifying Outlier Seasons

I’ve always been fascinated by the outlier season where a guy puts up numbers well above or below his career pattern (Mark Reynolds’ 2009 steals total is one of my favorite examples). I wanted to take a look at the biggest outlier seasons in baseball history. To do this, I ran the data on every player-season since 1950 and calculated a z-score for each season based on the player’s career mean and standard deviation for that stat (only including qualified seasons). While the results were interesting, in my first pass through I did not control for age and the results were largely what you would expect – lots of guys at the beginning or ends of their careers.

On my second pass, I rather arbitrarily restricted the age to 25-32 to attempt to get guys in the middles of their careers. I think these results ended up being pretty interesting. The full list is here, but I’ll highlight a few below:

null

I had never heard of Bert Campaneris, but it turns out he was a pretty good player who put up 45 career WAR, mostly as a speedy, light-hitting, great-fielding shortstop. But in 1970, he briefly turned into a power hitter. He hit 22 home runs, his only season in double digits. He hit two in 1969 and five in 1971, playing full seasons both years. So this wasn’t even a mini-plateau. This was a ridiculous peak that he would never come close to again. We don’t have the batted ball data to dig further, but I would love to know just what was going on that year.

Dawson, on the other hand, was a pretty good home run hitter who usually hit 20-30 a season, except in 1987 when he blasted 49. Usually guys hitting crazy amounts of home runs in the late 80s through the 90s wouldn’t be that interesting, but these guys played for a long time after, never coming close to their 1987 totals again.

The guys on the downside are all fantastic home run hitters. With guys playing a full season and falling this short of their numbers, it’s always a possibility that they were playing hurt. Schmidt did indeed play hurt in ’78, but a quick Google for Thomas and Carter brought up nothing, making it all the more inexplicable.

null

As I mentioned above, in 2009 Mark Reynolds went 44 HR/24 steals. That was Reynolds’ only season stealing more than 11, but it “only” registered a z-score of 2.0. The three guys listed here blow that out of the water. Zeile had his season early in his career so it could have been a case of a guy losing speed or getting caught too many times and then being told to stay put. But Palmeiro and Yaz did it right in the middle of their careers. Palmeiro’s stolen base record consists of usually stealing 3-7, and getting caught 3-5 times. But in 1993, he decided to steal 22 while only getting caught 3 times. The next year he was back to his plodding ways.

On the negative side, Crawford’s struggles have been well documented. Driven by a .289 OBP and possibly declining health, Crawford’s 18 steals in his dismal 2011 season were the lowest amount of his career in a qualified season by far. We knew it was a shocking performance at the time, but I didn’t fully grasp its historical significance.

null

The last things I will look at are plate discipline numbers. They differ from home runs and steals because they represent hundreds of interactions, thousands if you consider individual pitches, rather than the dozens that the former two represent.

Mantle’s 1957 season deserves some attention (although he put up 11.4 WAR so it probably gets plenty of attention). That year, he put up the second best walk rate and the best strikeout rate of his career, at age 25. After that he went right back to being the great player he was before, albeit with slightly worse plate discipline stats.

Except for Money who was a guy early in his career working his way into better walk rates, this is something I don’t have a great explanation for so I’d love to hear theories. Why did Ripken in 1988, right in the middle of his career, take a bunch of walks and then never do it again to that degree? Likewise, how was Brett Butler able to cut his strikeout rate from 8.7% to 6.3% in 1985 then jump back up to 8-10% for the rest of his career?

Before I corrected for age, I got a bunch of results of guys at the tail end of their careers doing what you would expect. I do want to highlight one of them, however. In 1971 at age 40, Willie Mays had a 3.7z walk rate and a 3.1z strikeout rate. He walked a ton, but also struck out a ton. Added with his 18 home runs, that season he had a robust 47% three true outcome percentage. As the z scores show, it was a radical shift from anything he had done in his career and impressively, he used this new approach to put up a 157 wRC+ and 5.9 WAR. Apparently that guy was pretty good.

This piece identifies the biggest outlier seasons in history, but is crucially missing the why. And unfortunately, for most of these that’s not something I have a great answer for. If you have enough player-seasons, you’re going to expect some 3z outcomes. But historical oddities are one of the joys of baseball and each of the 3z outcomes is the product of a radical departure in underlying performance. I think it would be fascinating to talk to some of these guys and see what they have to say about why things went so differently for one season.


The Risk of Long Contracts for Middle-Market Teams

Middle-market teams have historically tried to play the game like they are mini-large-market teams. They develop talent and when they have enough to make a run at the playoffs they make moves. They buy free agents, extend players through their age 27-33 years, and trade for proven talent. Unfortunately this usually does not work and we often see one of the top six most expensive teams (or the Cardinals) in the playoffs year after year. Then, the middle-market team’s “window” has closed, and the wait starts over.

It is time to have a change in the tradition of middle-market teams, and this includes the Texas Rangers.
The focus should not be on operating on a “window” of time where a World Series run is possible, but to create a team where there are very few years where this window is not open. The Cardinals are a good example of executing this plan. They rotate talent in and out due to a solid player-development system, while making very few large free-agent signings. This leads to a team where there is never too much money tied up to one or two players, and they can afford to make short-term deals or trades for players who add value to the team immediately without tying up long-term cash.

Let’s talk about how this relates to the Rangers though, specifically Elvis Andrus and his extension as this issue extends to all of the contracts the Rangers have given out. Most people look back and ask the wrong question as it was never about whether the Rangers thought Elvis was really going to be good for his contract. The Rangers obviously thought that he would be. The question the Rangers should have asked themselves is, should a middle-market team take a large risk by signing a player whose peak will probably be around age 26 to an eight-year extension, well past his peak? For a middle-market team, the contract is near impossible to avoid down the stretch if for some reason the player does not achieve the level of success that is expected.

Other situations, like Adrian Beltre, have worked. However, can you imagine a world where the Rangers spent all that money on Beltre, only to have him be awful? Of course you can, and it would have been miserable. The Rangers were fortunate that Beltre had a second peak at 31 that has lasted five years. Beltre is the exception, not the rule, and the Rangers should not expect to get lucky on a contract like his very often. It was a very high-risk offer that ended up working out. Unfortunately, we have the opposite side of the spectrum as well. Shin-Soo Choo was given a similar contract to Beltre, at a similar age. Unfortunately, this contract appears to be flat and the Rangers are already looking for a way to move Choo on.

The Rangers made a series of high-risk contract moves when they had players in the minors who were only a year or two away from being able to contribute on a major-league team, which led to a large amount of money being tied up. This is not to say that all long-term contracts are bad. If the Rangers were able to find a franchise player who brings extreme value consistently with a skill set that ages well, the risk would be worth the shot as long as a reasonable deal could be achieved.

The ultimate conclusion is that as a middle-market team, the Rangers should have a change in focus from spending money on long-term contracts, which are huge risks, to using money and trades to put together a solid supporting cast of players on shorter-length contracts. These players will support a group of younger cost-controlled players where their risk of failure is not tied to large amounts of cash. It is a superior strategy to hoping that during a window of opportunity, where long-term contract players are not past their prime, the team will make the playoffs a few times. If played correctly, with the Rangers’ amazing farm system and development team, the Rangers could have a consistently good team for long periods of time.


Devon Travis, Sign Stealer?

Devon Travis has been a pleasant surprise for the Jays this season, as he’s hit better than anyone could have expected out of the gate.  Despite a horrible month of May when he tried to play through a shoulder injury, he’s hit to a 129 wRC+ so far with solid defense at 2nd.  Additionally, he may be helping the Jays in other ways, as it seems as though he may be involved in stealing signs.

I was watching the Jays game against Oakland July 22nd, and after Devon Travis hit a double in the top of the 9th inning off of A’s closer Tyler Clippard, I began to notice Travis making some obvious movements at 2nd base.  Sometimes, I would see him clap his hands together enthusiastically; other times, I would see him hop up and down a few times. I then paid attention to the pitches that were subsequently thrown, and noticed a pattern: Whenever Travis would clap his hands, Clippard would throw a fastball, and whenever Travis would hop, Clippard would throw an offspeed pitch.  I decided to go back to the MLB.tv game archive to confirm what I thought I had seen live, and here is what I found:

Batter – Jose Reyes

Travis did not make any motions during the first five pitches to Reyes (likely, he was learning the signs). On the sixth pitch, he clapped, but Clippard stepped off and they ran through the signs again.

Batter – Josh Donaldson

Like with Reyes, Travis did not make any motions right away, as he looked at four pitches to get the signs down. The fun starts with pitch five:

Travis Motion – Clap

Clippard then steps off, followed by:

Travis Motion – Clap

Pitch – Fastball (92 mph)

Pitch six:

Travis Motion – Hop

Pitch – Offspeed (83 mph)

Pitch seven:

Travis Motion – Hop

Pitch – Offspeed (76 mph)

Batter – Jose Bautista

Pitch one:

Travis Motion – Clap

Pitch – Fastball (91 mph)

Pitch two:

Travis Motion – Clap

Pitch – Fastball (90 mph)

Sadly, after the second pitch to Bautista, the catcher visited the mound, and for the remaining three pitches in the at bat (which Bautista walked, moving Travis to third base) Travis did not make any motions (again, he probably figured they changed the signs).

So what we’re left with is five pitches (three fastballs, two offspeed) where the pattern holds up, and logical times when Travis does not clap or hop (i.e. after first reaching second base and after the mound visit when the signs could change). To me, given all the evidence, I don’t think the actions by Travis are coincidental, and I’m pretty certain he was stealing signs.

I was curious if this was a one-time thing, or something that Travis has done in the past, so I had a look at some other games in July in which Travis reached second base and was there for a few batters (i.e. long enough for him to pick up the signs).  Unfortunately, I wasn’t able to spot any patterns that would indicate he was stealing signs in those games that I checked.

As a Jays fan, Devon Travis is already one of my favourite players, as he’s having a fantastic rookie season at a position that has long been a black hole for the Jays.  Now, he’s given me further reason to appreciate him, and a definite incentive to watch his at-bats and times on base a little more closely from now on.


Impacting “Pace of Action”

In 2015, MLB implemented changes to shorten the length of games. As has been widely reported, game times have been reduced. Less widely reported is that the majority of the reduction is due to shorter breaks between innings, and the time between pitches has not decreased.

There is concern that rule changes to directly address the time between pitches will impact the game negatively. There are concerns for the logistics of a pitch clock and tasking umpires to somehow legislate/manage situations requiring exceptions. I am a wholehearted proponent of reducing time between pitches, but I have a hard time envisioning how a pitch clock would work with a fast runner on 1B or when the pitcher has mud in his cleats or when the batter gets dust in his eyes.

I propose an effective and non-invasive method for reducing the time between pitches: focus on player averages. A pitcher (or hitter) can be judged over a rolling sample of pitches and with escalating fines/penalties administered to the player and/or team. This method would not dictate any specific in-game action/penalty. It would not require involvement by umpires. It would be transparent to fans, other than less yawning and urges to check email.

While a simple rolling average would be…simple, improvements to the methodology can easily be envisioned. A player’s time score could be adjusted based upon the batter/pitcher faced, foul balls, stolen base opportunity, etc.

I’m surprised this type of method has not gotten much discussion in the media. I think it would allow MLB to steer behavior change without the negative impact of trying to take action in-game on a per-pitch basis.


Who to Root for In the Nats’ Presidents Race

If you’ve ever attended a Nationals’ home game, you’ve probably seen the best promotional event held in the Washington D.C. area– the Presidents Race. Beginning as a cartoon race featured on the video board of old RFK Stadium in 2005, the first-ever live race was held on July 21, 2006. The 10-foot tall presidents run the length of the field — across the warning track, down the foul lines, around the diamond — while often avoiding obstacles such as traffic cones and competing teams’ mascots. The race reached a fever pitch in the community and media in 2012 when Teddy Roosevelt finally broke his humiliating 500+ race-losing streak. The original competitors — Teddy, George Washington, Abraham Lincoln, and Thomas Jefferson — were joined by William Howard Taft in 2013 and Calvin Coolidge earlier this month.

As we approach the 9th year anniversary of the Presidents Race, I thought it would be interesting to look for correlation between the Presidents Race winners and the Washington Nationals’ on-field performance. Let’s begin with a few caveats. I’ll be looking at data from the beginning of 2013 to July 2, 2015. I chose 2013 as a starting point because it marked the end of Teddy’s losing streak and the beginning of William’s running career. I did not include any data from Calvin’s career because of small-sample-size issues. Also, regarding the racing record in relation to the Nats’ performance, I include data from 4th-inning races, extra-inning races, both races in a double-header, and all playoff races. Finally, I want to give a big thanks to Let Teddy Win! which is a tremendous wealth of Presidents Race knowledge, data, and video.

Abraham is the easy race champion over this time period, finishing 2nd in the final standings in 2013 and 2014. Teddy was carried by his impressive 29-win campaign in 2014, while let’s just say that Thomas is better at writing declarations than at running races. It should be noted that Teddy has been disqualified many times in his racing career because of infractions like unnecessary roughness and cutting the outfield corner.

From 2013-2015, the Nationals were 123-80 (.606) at home, the 3rd best home record in MLB, trailing only St. Louis (.667) and Pittsburgh (.632) over the same time period. To fully appreciate the influence (for better or worse) that the Presidents Race winners had over the Nationals’ on-field performance, we need to look for the winning percentages farthest from .606.

Unsurprisingly, the father of our nation has the biggest positive influence over the Nationals ballclub, leading the squad to a crushing .697 winning percentage. The newcomer, William, also inspired the Nats to play well, despite their mediocre run differential after his race victories. And while Nats fans and opponents may love Teddy (first as a lovable loser and now as a legit competitor), Nats players have not been inspired on the nights he crosses the finish line first. (Teddy went undefeated in the 2014 playoffs, and the Nats went winless in those games.)

The front-runner for this year’s National League Most Valuable Player is clearly inspired by the nation’s front-runner for Most Valuable President. At the plate after a George victory, Harper mashes to the tune of .325, while on-pace for a 50+ HR season. Teddy and Abraham again bring up the rear, and William has another strong showing, reinforcing the idea that “as Harper goes, so go the Nationals.”

Not only does Zimmermann pitch more often on George-victory days than on other days, but he also puts up his best numbers after George pulls out a win. Teddy upsets the pattern by inspiring Zimmerman to a 2.35 ERA and a 9 K/9 mark, the best of the five.

Taft famously threw the first-ever presidential first-pitch, yet both Nats pitchers remain uninspired on William’s victory days. Thomas remains the least influential president (perhaps due to the rarity of his victories), inspiring the team, Harper, and both pitchers to average winning percentages and average career numbers. Lincoln inspires Storen’s lowest ERA and 2nd best K/9.

The results of this data crunch are clear: while Teddy may be a lovable loser, some of that losing might be rubbing off on the Nationals. And if you’re a Nats fan, you probably want to root for George. Bryce Harper and Jordan Zimmermann clearly do.