When 1 + 1 Doesn’t Equal 2

By Bryan Woolley, JP Wong, and Nick Skiera.

Baseball, like all sports, is exciting because of the concept of variance. No team scores the exact same number of runs every game. That is why the Dodgers (5.82 runs/game) were not 60-0 in 2020. Runs per game strongly correlates with winning percentage for obvious reasons, but a team’s variance (essentially their consistency) plays a crucial role in their ability to win baseball games

Relating to this, we came across an interesting game theory concept. Given certain properties of the run-scoring distributions, the competitor with the lower output can increase their win probability by increasing the variance in their output. Conversely, the competitor with the higher output can increase their win probability by decreasing the variance in their output. Were this to apply to baseball, lower-scoring teams could win more games by becoming more inconsistent. Of course this is all just in theory, so the requirements for it to be relevant in reality to baseball might not be met.

We will examine the importance of variance in baseball both to test the theory and to attempt to uncover interesting trends in the sport. In our analysis we find that variance plays a significant role in a team’s success, suggesting that roster and lineup construction can be optimized by going beyond mean production. So as our title proposes, 1 WAR + 1 WAR and 2 WAR might not always be worth the same amount to a team if they are produced with different consistencies. Read the rest of this entry »


Which Pitch Should Be Thrown Next?

There are few things I enjoy in baseball more than the pitcher vs. hitter dynamic. Everyone likes to see highlight plays like a great catch or a mammoth home run, but those plays are few and far between. I believe that the tension created in a drawn-out plate appearance is where baseball is most enjoyable. Every pitch is meaningful, and the strategy of the game is on full display. The pitcher is trying to decide the best way to get the hitter to produce an out and the hitter is doing everything he can to thwart the pitcher.

This dynamic of baseball has always fascinated me. I was curious how pitchers and catchers decided which pitch was correct to throw in a situation. There are plenty of tools available to them that were not readily available when I was a child, like heat maps made from pitch-tracking data, but they show results without the context of what previous pitches were thrown in the plate appearance. Heat maps provide useful data, but the real art of pitching is being able to set up a hitter to take advantage of their weaknesses. If a pitcher throws the same pitch in the same location every time, eventually the hitter is going to catch on and change his strategy accordingly. So which sequence of pitches is the most effective at retiring hitters? This is the question I attempted to answer with this article. Read the rest of this entry »


Looking at xK% and xBB% Using StatCast Zones

Mike Podhorzer has recently been publishing articles regarding his xK%, primarily using Baseball-Reference’s strike rates. I have been trying for a few years now to come up with my own using StatCast numbers, and you’d be surprised how little data is required to get a very close xK% and xBB% for both hitters and pitchers.

I love Mike’s work, but that xK% equation is a bit unwieldy, including seven inputs, two of which have a correlation of -.829. It also uses roughly 63% of the pitches seen by hitters by accounting for all strikes, in addition to three other variables on top of that. I wanted to come up with something simpler that didn’t require so much input and wasn’t so overly constrained. After all, if we throw so much data in there that we basically know the outcome before we start, what good is it going forward? To that end, I dove into StatCast to see what might work. Read the rest of this entry »


Looking at bWAR’s Defensive Adjustment for Pitcher WAR

In 2018, the Philadelphia Phillies defense combined for -93 DRS, costing the team over half a run per game compared to an average defense. Though the pitching staff combined for a 3.83 FIP (seventh in MLB), defensive mistakes led to a middling 4.49 runs against per game. Much of the rotation underperformed their peripheral statistics, with Nick Pivetta and Vince Velasquez each having ERAs that exceeded their FIPs by more than a run. Against this backdrop, Aaron Nola’s performance was miraculous: a 2.37 ERA (fourth in MLB) over 212.1 innings pitched. Nola’s elite run prevention despite Philadelphia’s historically bad defense resulted in 10.2 bWAR, the 2second-best single season among all currently active pitchers.

According to FanGraphs, however, Nola’s performance was more All-Star-worthy than historic, as his 3.01 FIP notched him just 5.5 fWAR. Perhaps this case merely illustrates why one should opt for fWAR over bWAR and FIP over ERA; however, I don’t believe that bWAR is beyond salvaging, and philosophically I believe that to assess a pitcher’s value to a team we must examine the entirety of his work rather than just balls not in play. What this case illustrates is the need to rethink bWAR’s defensive component, which assumes that all pitchers are equally affected by a team’s good or bad defensive performance. This is obviously not the case. Defense is capricious, and a few bad or great plays behind a pitcher can cost or save several runs.

Thanks to Statcast’s Outs Above Average, which allows us to isolate a defense’s performance behind a specific pitcher, we can develop a new defensive adjustment that avoids outlier performances like Nola’s while not simply ignoring quality of contact on balls in play. Read the rest of this entry »


Using Decision Trees To Classify Yu Darvish Pitch Types

Last year, I wrote a post which outlined the application of a K Nearest Neighbors algorithm to make pitch classifications. This post will be, in some ways, an extension of that as pitches will yet again be classified using a machine learning model. However, as one might have presumed given this post’s title, the learner of choice here will be a decision tree. Additionally, this time around, instead of classifying pitches thrown over the course of a single game I will aim to classify pitches thrown by a single pitcher over the course of an entire season.

What follows will be divided into three sections: a brief conceptual explanation of decision tree learners, a description of the data and steps taken to train the decision tree model of choice here, and finally a run-through of the model’s results. I am not an expert on machine learning, but I believe that this is an interesting exercise that (very, very basically) highlights a powerful model using interesting baseball data. The work to support this post was conducted in scripting language R and with the direction of the book Machine Learning with R by Brett Lantz. Read the rest of this entry »


A Proposal for the “Veteran Player-Coach” Position

Our beloved pastime has a long history of over-the-hill veteran players serving important mentor roles around the game, but the primacy of the Competitive Balance Tax and the perpetual crush of roster spot competition and “efficiency” has rendered these players largely moot. Players like the 40-something version of Jason Giambi, a bench bat for years on the strength of his contributions to his team as a leader beyond just his metric value, have grown frightfully rare. It is sad to see that sort of quasi-player/coach fade to memory.

As I look over the U.S. Olympic Roster, I see an awful lot of well-loved veterans who have lost a step over the years and, with that lost step, any serious hope of a consistent job under the new normal of roster construction. But I am convinced there remains value to the game of baseball to have players like Todd Frazier, Edwin Jackson, Scott Kazmir, and David Robertson around the sport beyond what they contribute to the back of the baseball card. A glance at the current free agent list reveals a small glut of other interesting, memorable players, such as Matt Kemp, Ryan Braun, Matt Wieters, and Neil Walker, to name a few. Read the rest of this entry »


Predicting wOBA Using Process-Based Statistics

When trying to determine a batter’s overall offensive value using a single statistic, one of the most popular metrics to use is weighted on-base average (wOBA). wOBA is calculated as a ratio of a linear combination of “outcome” statistics (unintentional walks, hit-by-pitches, singles, doubles, triples, and home runs) divided by, essentially, the number of plate appearances.

With that being said, could one predict whether a given player’s wOBA will be above a certain threshold using “process” statistics such as plate discipline and batted ball parameters? In particular, if we know a player’s zone contact rate, chase rate, and average exit velocity, could we predict with any confidence whether that particular player’s wOBA will be above, say, .320?

Using Statcast data and a bit of machine learning, I have decided to train a shallow neural network to try to do just that. I’ll be using snapshots of the Jupyter Notebook throughout the analysis to try and make it a little easier to follow. Read the rest of this entry »


Thinking About My Baby: Does Paternity Leave Affect Performance?

As an ardent follower of the Baltimore Orioles, I’ve experienced a lot of bad baseball over the past few years, and one specific bit of bad baseball caught my eye recently. On April 5th, Shawn Armstrong returned from the paternity list after his wife gave birth just a few days before. He was continually demolished over the next week, giving up six earned runs in two innings of relief. He wasn’t getting too unlucky either, even if his FIP (20.12) was below his ERA (27.00).

As the parent of a three-year-old, I thought back to my first week after work following a month-long paternity leave. I was distracted, tired, and couldn’t wait to get home at the end of the day. Of course Armstrong got lit up, he just became a dad a few days before! Maybe professional athletes aren’t staying up all night changing diapers, but it stills seems like they would perform worse after a trip to the paternity list as they reorient their life. Is that true though? Do athletes perform worse after returning from the paternity list?

Fortunately, Baseball Prospectus tracks all paternity leave going back to MLB’s implementation of the policy in 2011. Instead of parsing through every season of data, I just focused on the most recent ones: 2017-2020. This still provided 115 different trips to the paternity leave list, enough to give an idea of trends and differences. I separated these individuals into pitchers (62) and hitters (53) to make for easier comparison. For hitters, I used wRC+ as my key metric and tracked it across 7-day, 14-day, and full season time frames. For pitchers, I used ERA and FIP and tracked those across the same time frames.

There are a few quick caveats. Occasionally players will make an appearance before promptly being demoted or ending up on the injured list. I’ve kept the same number for the 7- and 14-day span even if the player didn’t make an appearance during that time frame. This only accounts for six players (five pitchers and a hitter), a fairly small amount of the sample.

Pitchers Returning From Paternity Leave
Time Frame ERA Mean Added ERA to Total FIP Mean Added FIP to Total
7 Days 5.7604021 .09293 4.706660804 .07591
14 Days 4.667755914 .07529 4.446168986 .071712
Full Season 3.922911761 .06327 3.99527033 .063349

The impact on hitters isn’t quite as noticeable, without any clear trend. Hitters’ wRC+ is actually higher within seven days of a paternity list visit compared to their full season performance. There is an 8-point gap between performance 14 days after a paternity list visit when compared to the full season numbers, but it’s hard to see how this squares with the 7-day performance. None of the differences are statistically significant at the 95% confidence interval.

Hitters Returning From Paternity Leave
Time Frame wRC+ Mean Added wRC+ to Total
7 Days 98.58 1.86
14 Days 90.86 1.71
Full Season 96.78 1.83

In summary, a trip to the paternity list doesn’t seem to have much of an impact for players. Maybe Shawn Armstrong was pitching badly just because he’s a bad pitcher; There’s a reason he’s since been DFA’d and passed through waivers. The performance for pitchers does still pique my interest, as there is a consistent trend when looking at 7-day, 14-day, and full-season performance across ERA and FIP. Despite this, the only statistically significant difference is between 7-day ERA and full-season ERA, far from anything conclusive.

The small sample (263.2 innings) and other confounding variables leave it far from conclusive in any direction for pitchers, especially given the other t-test results. It does appear, however, interesting enough to look at in a larger sample. A future quantitative analysis incorporating additional years of data may be able to provide more comprehensive answers. It may also be an area where qualitative research can provide answers on the impact of pitcher preparation, stamina, and overall performance.


Using the Toxicological Prioritization Index To Visualize Baseball

Major League Baseball is awash in advanced statistics that more reliably describe key aspects of players’ offensive and defensive performance. It has been reported that through the use of Statcast, the MLB Advanced Media group can supply teams with 70 fields x 1.5 billion rows of data per season [i]. Yes, billion with a b. This flood of information has supercharged MLB teams’ and the sabermetric community’s development of ever-more useful statistics for describing player performance.

However, this amount of data brings significant challenges. Perhaps chief among them is that while certain individuals may be comfortable with reams of tables and ever-increasing numbers of descriptive statistics, many others prefer or require analyses and visualization tools that convert disparate metrics into informative and readily interpretable graphics.

MLB’s situation has certain similarities to the discipline of safety toxicology, where the use of high-information content assays for characterizing chemicals’ toxicological profiles has exploded [ii]. Drawing conclusions from multiple biomarkers and test systems is challenging, as it requires synthesis of large amounts of dissimilar data sets. One tool that toxicologists have found useful is the Toxicological Prioritization Index, or ToxPi for short [iii]. ToxPi is an analytical software package that was developed to combine multiple sources of evidence by transforming data into integrated, visual profiles. Read the rest of this entry »


What if the Mound Was Moved Back?

Moving the mound back is a proposed solution to the ever-increasing rate of strikeouts in the modern game of baseball. The effect of moving the mound back one foot will be tested in the Atlantic League from August this year. Without the results of this test, we don’t know much about how this rule change could affect the delicate balance between pitchers and hitters. There are many unknowns such as:

  • How much will the perceived velocity decrease benefit hitters?
  • Will the added break on pitches benefit pitchers?
  • Will throwing a further distance add injury risk or cause a loss of pitcher control?
  • Will batters change their approach if it is easier to make contact?

In this article I aim to use my model of predicted pitch outcomes to investigate how moving the mound back may change the game. I’ve written previously about modeling the deadened baseball and I shall take a similar approach here. Read the rest of this entry »