A Regional View of the MiLB Housing Crisis

Like millions across the country, minor league players are facing a housing crisis. The practice of using host families to house prospects was put on hold due to the pandemic, leaving players responsible for obtaining their own housing. Things have not gone well. While stories have come to light bit-by-bit, team-by-team, a piece last month by Brittany Ghiroli of The Athletic is one of the more comprehensive looks at the minor league housing crisis to date.

Ghiroli’s story details a number of ways in which minor league players get squeezed by housing, all of which is best summed up by this quote from catcher Caleb Joseph: “Finding a place to put your head at night is the hardest, most stressful thing to do as a minor leaguer.” Joseph would know, as he slept in his team’s clubhouse one year to save on housing.

The comments by Joseph, who spent 2014-2020 in the majors, also underscore that while the situation with host families is specific to this season, housing has long been an issue for minor leaguers. But in light of Ghiroli’s piece and the amount of reporting on this issue recently, I was interested in putting some numbers to the stories players have shared, particularly since housing costs can vary greatly from market to market and minor league teams are scattered across the country. Read the rest of this entry »


James McCann Has Lost His Progress

For Mets catcher James McCann, 2019 represented a career-altering triumph over the struggles that had plagued him through his first five big-league seasons in Detroit. With the Tigers, McCann’s abject lack of success at the plate led him to yo-yo between batting stances and approaches. In June 2016, he replaced his leg kick with a quieter front-foot step, struck out in a career high 29.2% of his at-bats, and tweaked his stance again in the offseason. McCann closed the book on his rookie contract with a 2018 season from hell — an abysmal triple-slash of .220/.267/.314, and a wRC+ of 56, good for second-worst among all hitters with at least 450 plate appearances.

2018 Worst wRC+ (450+ PA)
Player PA wRC+
Chris Davis 522 46
James McCann 457 56
Alcides Escobar 531 59
Scott Kingery 484 61
Billy Hamilton 556 68
JaCoby Jones 467 68
Adam Engel 463 68
Wilmer Difo 456 71
Jonathan Lucroy 454 72
Victor Martinez 508 73

Things opened up (literally) for McCann in Chicago. After signing a one-year, $2.5 million “prove it” deal with the White Sox, McCann’s most radical tweak struck gold. Opening his stance and bringing his hands closer to load unlocked an entirely different hitter in the once-struggling backstop. McCann became a legit power threat, popping 18 homers in just 118 games, and his wRC+ jumped to 108, placing him eighth among all catchers with at least 300 PAs. He continued this trend in 2020’s short-season madness: a slash of .289/.360/.460, a wRC+ within the top-40 of all hitters with as many at-bats, and even a positive grade as a framer. Read the rest of this entry »


Using Clustering To Generate Bullpen Matchups

In today’s game, reliever usage may be more important than ever. As starters go less deep into games, more emphasis is placed on bullpen strategy to survive the mid-to-late innings. Teams can use data to streamline this process, strategizing relief pitcher usage based on their pitch repertoires and batter ability. My goal is to produce a matchup tool that can potentially give us some insight as to how the big league teams “play the matchups.”

The basis of a bullpen matchup recommender will be at the pitch level: what types of pitches does a particular hitter struggle against, and how do they align with what a particular pitcher throws? To do this, I will first use clustering methods in order to redefine pitcher arsenals based on pitch flight characteristics. Matchups will then be selected according to which pitcher is expected to perform the best against a given batter, optimizing pitcher strengths against batter weaknesses.

Data

To conduct this research I used available Statcast data from 2016-2021 (through this year’s trade deadline). My variables of interest are as follows: pitch location (plate_x & plate_z), perceived pitch speed derived from release extension (effective_speed), pitch movement (pfx_x & pfx_z), spin rate (release_spin_rate), and the newly introduced spin axis (spin_axis). I elected to include spin axis in order to account for how the batter may see the pitch as it’s released. All in all, the variables selected measure the stuff and location of each pitch so that we may classify them more accurately beyond the basic pitch type labels. After cleaning this dataset and removing outliers, I was ready to move on to the modeling process. Read the rest of this entry »


When 1 + 1 Doesn’t Equal 2

By Bryan Woolley, JP Wong, and Nick Skiera.

Baseball, like all sports, is exciting because of the concept of variance. No team scores the exact same number of runs every game. That is why the Dodgers (5.82 runs/game) were not 60-0 in 2020. Runs per game strongly correlates with winning percentage for obvious reasons, but a team’s variance (essentially their consistency) plays a crucial role in their ability to win baseball games

Relating to this, we came across an interesting game theory concept. Given certain properties of the run-scoring distributions, the competitor with the lower output can increase their win probability by increasing the variance in their output. Conversely, the competitor with the higher output can increase their win probability by decreasing the variance in their output. Were this to apply to baseball, lower-scoring teams could win more games by becoming more inconsistent. Of course this is all just in theory, so the requirements for it to be relevant in reality to baseball might not be met.

We will examine the importance of variance in baseball both to test the theory and to attempt to uncover interesting trends in the sport. In our analysis we find that variance plays a significant role in a team’s success, suggesting that roster and lineup construction can be optimized by going beyond mean production. So as our title proposes, 1 WAR + 1 WAR and 2 WAR might not always be worth the same amount to a team if they are produced with different consistencies. Read the rest of this entry »


Which Pitch Should Be Thrown Next?

There are few things I enjoy in baseball more than the pitcher vs. hitter dynamic. Everyone likes to see highlight plays like a great catch or a mammoth home run, but those plays are few and far between. I believe that the tension created in a drawn-out plate appearance is where baseball is most enjoyable. Every pitch is meaningful, and the strategy of the game is on full display. The pitcher is trying to decide the best way to get the hitter to produce an out and the hitter is doing everything he can to thwart the pitcher.

This dynamic of baseball has always fascinated me. I was curious how pitchers and catchers decided which pitch was correct to throw in a situation. There are plenty of tools available to them that were not readily available when I was a child, like heat maps made from pitch-tracking data, but they show results without the context of what previous pitches were thrown in the plate appearance. Heat maps provide useful data, but the real art of pitching is being able to set up a hitter to take advantage of their weaknesses. If a pitcher throws the same pitch in the same location every time, eventually the hitter is going to catch on and change his strategy accordingly. So which sequence of pitches is the most effective at retiring hitters? This is the question I attempted to answer with this article. Read the rest of this entry »


Looking at xK% and xBB% Using StatCast Zones

Mike Podhorzer has recently been publishing articles regarding his xK%, primarily using Baseball-Reference’s strike rates. I have been trying for a few years now to come up with my own using StatCast numbers, and you’d be surprised how little data is required to get a very close xK% and xBB% for both hitters and pitchers.

I love Mike’s work, but that xK% equation is a bit unwieldy, including seven inputs, two of which have a correlation of -.829. It also uses roughly 63% of the pitches seen by hitters by accounting for all strikes, in addition to three other variables on top of that. I wanted to come up with something simpler that didn’t require so much input and wasn’t so overly constrained. After all, if we throw so much data in there that we basically know the outcome before we start, what good is it going forward? To that end, I dove into StatCast to see what might work. Read the rest of this entry »


Looking at bWAR’s Defensive Adjustment for Pitcher WAR

In 2018, the Philadelphia Phillies defense combined for -93 DRS, costing the team over half a run per game compared to an average defense. Though the pitching staff combined for a 3.83 FIP (seventh in MLB), defensive mistakes led to a middling 4.49 runs against per game. Much of the rotation underperformed their peripheral statistics, with Nick Pivetta and Vince Velasquez each having ERAs that exceeded their FIPs by more than a run. Against this backdrop, Aaron Nola’s performance was miraculous: a 2.37 ERA (fourth in MLB) over 212.1 innings pitched. Nola’s elite run prevention despite Philadelphia’s historically bad defense resulted in 10.2 bWAR, the 2second-best single season among all currently active pitchers.

According to FanGraphs, however, Nola’s performance was more All-Star-worthy than historic, as his 3.01 FIP notched him just 5.5 fWAR. Perhaps this case merely illustrates why one should opt for fWAR over bWAR and FIP over ERA; however, I don’t believe that bWAR is beyond salvaging, and philosophically I believe that to assess a pitcher’s value to a team we must examine the entirety of his work rather than just balls not in play. What this case illustrates is the need to rethink bWAR’s defensive component, which assumes that all pitchers are equally affected by a team’s good or bad defensive performance. This is obviously not the case. Defense is capricious, and a few bad or great plays behind a pitcher can cost or save several runs.

Thanks to Statcast’s Outs Above Average, which allows us to isolate a defense’s performance behind a specific pitcher, we can develop a new defensive adjustment that avoids outlier performances like Nola’s while not simply ignoring quality of contact on balls in play. Read the rest of this entry »


Using Decision Trees To Classify Yu Darvish Pitch Types

Last year, I wrote a post which outlined the application of a K Nearest Neighbors algorithm to make pitch classifications. This post will be, in some ways, an extension of that as pitches will yet again be classified using a machine learning model. However, as one might have presumed given this post’s title, the learner of choice here will be a decision tree. Additionally, this time around, instead of classifying pitches thrown over the course of a single game I will aim to classify pitches thrown by a single pitcher over the course of an entire season.

What follows will be divided into three sections: a brief conceptual explanation of decision tree learners, a description of the data and steps taken to train the decision tree model of choice here, and finally a run-through of the model’s results. I am not an expert on machine learning, but I believe that this is an interesting exercise that (very, very basically) highlights a powerful model using interesting baseball data. The work to support this post was conducted in scripting language R and with the direction of the book Machine Learning with R by Brett Lantz. Read the rest of this entry »


A Proposal for the “Veteran Player-Coach” Position

Our beloved pastime has a long history of over-the-hill veteran players serving important mentor roles around the game, but the primacy of the Competitive Balance Tax and the perpetual crush of roster spot competition and “efficiency” has rendered these players largely moot. Players like the 40-something version of Jason Giambi, a bench bat for years on the strength of his contributions to his team as a leader beyond just his metric value, have grown frightfully rare. It is sad to see that sort of quasi-player/coach fade to memory.

As I look over the U.S. Olympic Roster, I see an awful lot of well-loved veterans who have lost a step over the years and, with that lost step, any serious hope of a consistent job under the new normal of roster construction. But I am convinced there remains value to the game of baseball to have players like Todd Frazier, Edwin Jackson, Scott Kazmir, and David Robertson around the sport beyond what they contribute to the back of the baseball card. A glance at the current free agent list reveals a small glut of other interesting, memorable players, such as Matt Kemp, Ryan Braun, Matt Wieters, and Neil Walker, to name a few. Read the rest of this entry »


Predicting wOBA Using Process-Based Statistics

When trying to determine a batter’s overall offensive value using a single statistic, one of the most popular metrics to use is weighted on-base average (wOBA). wOBA is calculated as a ratio of a linear combination of “outcome” statistics (unintentional walks, hit-by-pitches, singles, doubles, triples, and home runs) divided by, essentially, the number of plate appearances.

With that being said, could one predict whether a given player’s wOBA will be above a certain threshold using “process” statistics such as plate discipline and batted ball parameters? In particular, if we know a player’s zone contact rate, chase rate, and average exit velocity, could we predict with any confidence whether that particular player’s wOBA will be above, say, .320?

Using Statcast data and a bit of machine learning, I have decided to train a shallow neural network to try to do just that. I’ll be using snapshots of the Jupyter Notebook throughout the analysis to try and make it a little easier to follow. Read the rest of this entry »