Are Analysts Affecting the Behavior They’re Observing?

by jgossels

January 14, 2019

Introduction and Hypothesis

One of the longest standing tenets of sabermetrics, stemming from Voros McCracken’s seminal 2001 work on DIPS (Defense Independent Pitching Stats) theory, is that pitchers ought to try for strikeouts rather than focusing on inducing weak contact. McCracken asserted that pitchers have little control over the quality of contact they allow. However, they do control if they strike the batter out (good) or walk him (bad) or allow a home run (even worse). Put another way, McCracken found a strong negative correlation between a pitcher’s strikeout rate (K%) and his runs allowed per nine innings (RA9). It is a simple logical step from here to conclude that pitchers ought to try to strike batters out.

Or is it?

Might McCracken’s DIPS observations only hold as long as pitchers are trying to generate weak contact? If they begin to focus solely on strikeouts, might this observed correlation weaken? Might we find more pitchers who are able to generate strikeouts but are not particularly successful at preventing runs?

As an analogy, consider a farmer whose goal is to get a big harvest of high-quality crops. To this end, he regularly waters and fertilizes his plants. He hires a consultant who does some studies and points out that fertilizing is closely correlated with the quality and quantity of the harvest. As a result, the farmer shifts all of his efforts to fertilizing and ignores watering altogether. Clearly this is not the best strategy. In the same way, might a pitcher be hurt by focusing on strikeouts and ignoring the quality of contact his pitches will generate if the batter does make contact?

With this in mind, might we, as analysts, in fact be affecting the very phenomena that we’re observing?

Investigation

How can we go about answering this question?

Though sabermetricians have emphasized strikeouts for years, “old school” baseball people have held on longer to the idea of a pitcher’s ability to control the quality of contact against them. It is only in the past several years that the vast majority of teams have trended toward a strikeout-focused pitching style. We acknowledge that in recent years, the analytical community has softened on its stance that pitchers have little-to-no control over batted balls once the batter makes contact. The veracity of McCracken’s original observation is orthogonal to our investigation here, however; what matters is that the baseball world has potentially acted as if it is true by increasing its emphasis on strikeout pitchers.

If our hypothesis is correct, the negative correlation between RA9 and strikeouts should be weaker for the most recent five or so years than for earlier in baseball history. Hence, we calculated the correlation between individual pitcher K% and RA9 for every year from 1969 through 2018, inclusive. We started at 1969 because it was the year the mound was lowered to its current height. Figure 1 shows our results. All pitchers with at least 50 innings pitched (IP) in a given year are included in that year’s calculations.¹

In fact, we find the opposite effect — a strong increase over the years in the strength of the relationship between a pitcher’s K% and RA9.

Figure 1: Yearly R² for regression of pitcher RA9 on K% from 1969 through 2018.

More specifically, Figure 1 shows the R² for pitcher strikeout rate as the predictor of runs allowed per nine innings each individual year from 1969 through 2018, inclusive. For this relationship between year and R² value, we find that year alone explains 71.2% of the R² value for the regression of pitcher RA9 on K% (R² value for this regression = 0.712). We also find a p-value of less than 0.0001; if there is indeed no relationship between year and R² for pitcher RA9 as predicted by K%, there is less than a 0.01% chance that we observe data this extreme or more.

A New Question

Why do we observe a strong trend of increasing R² over the years?

Statistical Artifact of Higher K% in Recent Years

One theory is that the relationship between K% and RA9 is stronger when K% itself is higher, because knowing K% entails knowing the results of more plate appearances (PA). One potential confounding variable in this study is that K% itself has been increasing over time. Figure 3 adds league-average² K% to the Figure 1 plot of yearly R² for K% and as a predictor of RA9. Might it be the case that, in general, higher K% correlates more strongly with RA9 because, when K% is higher, we know for sure the result (a strikeout) of a higher percentage of plate appearances?

Figure 3: Yearly R² for regression of pitcher RA9 on K% as in Figure 1, along with the mean K% for all pitchers in the sample in a given year.

If this explanation is correct, then we should observe the same effect for individual pitchers within a single year; in a given year, K% should be a better predictor of RA9, and by extension ERA³ for those pitchers with a higher K%. We run a linear regression of ERA on K% for each pitcher in 2018 with at least 50 IP (Figure 4). According to this explanation, the points farther right on the graph (i.e. those with higher K%) should be clustered closer to the regression line than are those toward the left. But, as Figure 5 shows, this is not the case. Figure 5 plots the residuals of the regression in Figure 4, and Figure 5 shows no pattern in the residuals. Therefore, we conclude that our hypothesized explanation for the observed trend in Figure 1 does not apply.⁴

Figure 4: Regression of ERA on K% for all pitchers with at least 50 IP in 2018.

Figure 5: Residuals for data and regression line in Figure 4.

Changing Hitting Strategies

Thus far, we have considered pitchers’ increased focus on strikeouts while assuming hitters’ approaches haven’t changed. In reality, as pitchers have tried harder to strike out their opponents, batters have placed less emphasis on avoiding striking out. Hitters have realized that the increase in power they derive from swinging hard in (almost) all counts and situations more than offsets the negative effects this strategy has on their strikeout rates.

Holding pitching talent and strategy constant, we would expect this more optimal offensive strategy to lead to offenses scoring more runs (i.e. higher RA9 for opposing pitchers). But we also know this offensive strategy leads to higher K%. Thus, we should see an increase in both RA9 and K%, and therefore a weakening of the negative correlation between the two. However, this is the opposite of what we observe.

Conclusion

Twenty years ago, we analysts were outside observers describing how pieces of a baseball game appeared to fit together, as if we were archaeologists piecing together an understanding of ancient societies from limited surviving evidence. Just as archaeologists today have no actual effect on the people they are trying to understand, we gave no thought to the possibility that we might be interfering. Indeed, in the early days, our shouts of “stop bunting” and our grumbles that there is no such thing as a “clutch hitter” fell on deaf ears. But now analytics have become widely accepted throughout the industry, by front office decision-makers, coaches, and players alike. Is it possible that we are now affecting the phenomena we’re observing?

The first place we looked for evidence of this feedback loop was in the correlation between high-K% pitchers and pitchers who allow few runs. Surprisingly, we not only found no evidence that the correlation between high-strikeout and low-RA9 pitchers has weakened as the game has become more strikeout-focused, but it has actually been strengthened. We explored several possible explanations for why this might be the case, but none seems to fit. We welcome any and all suggestions for new potential causes to investigate.

Even though pitcher K%-RA9 correlations over the years provided no evidence for our initial hypothesis, that we as analysts may be affecting the very phenomena we’re observing, we need not immediately conclude that our hypothesis was wrong. As the saying goes, absence of evidence is not evidence of absence. Where else might we look?

One idea is the negative correlation between fastball velocity and RA9. We observe that, in general, pitchers who throw harder are more successful. However, scouts’ selections and coaching staffs’ development of players are driven by this very tenet. The baseball world is mesmerized by Hunter Greene routinely breaking 100 mph in high school, and he gets $7.23 million to sign as the second overall pick in the June Draft. What would happen if the Draft and international amateur bonuses selected for more Wade LeBlanc-like crafty lefties?

Does the community have any other ideas about how we may be inadvertently affecting the phenomena that we’re observing?

Footnotes

¹ Figure 1 takes some time to digest, because it is a plot of R² values, and not of the relationship between a sample of pitchers’ K% and their RA9. Figure 2 helps illustrate how we get each point on Figure 1 by running a regression of RA9 on K% for a single year.

Figure 2: Process for creating Figure 1.

² Technically, the graph shows a simple average of K% of all pitchers included in our sample, rather than (total league strikeouts) / (total league PA).

³ We switch from RA9 as the independent variable to ERA for ease of data access. We do not believe this materially affects our findings.

⁴ To ensure that 2018 is not somehow an outlier, we check the residuals of several other years. In each case we find no pattern.

3 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Russell EassomMember since 2019

6 years ago

Great piece Jennifer, have you read this piece by James on the push/pull effect of strikeouts. I think it goes together very well with your piece.

https://www.billjamesonline.com/the_strikeout_pushpull_effect/?AuthorId=3

Cool Cool Cool

One possible route for further analysis:

Since your hypothesis is focused on the measuring the impact of a change in behavior (increasing strike outs), you could group pitchers into two categories, those that changed behavior vs those that stayed the same*. Then compare the correlations between K% and RA9 between these groups. If we see a lower correlation among the group that changed compared to the group that didn’t change, it may provide some evidence for your hypothesis.

*How to determine those groups is a potential challenge. The simplest way would be based on change in K% between the pre/post time periods. I think this would be a good starting point, but is ultimately flawed since K% isn’t actually a measure of behavior, it is measuring an end result of the change in behavior (or lack thereof). The better way to create the desired buckets would be to identify some underlying metric(s) that are closer to measuring behavior. There’s an endless number of ways one could go about this, but the first idea that came to my mind was to look at change in pitch selection towards pitches with higher swinging strike %.

kenai kings

Reading this article allowed me two moments of reflection. 1)simply put, hitters are swinging for the fences more often (that includes a more upper-cut swing). 2) Devers crushing a Chapman fast ball he had no right to take out of the park.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG