Archive for Research

Maybe It’s Better To Never Swing at Shane Bieber’s Pitches

You don’t need me to tell you how effective Shane Bieber was in 2020. He led the majors in ERA, FIP, K/9, overall strikeouts, and of course was the unanimous winner of the AL Cy Young Award. The underlying pitch-tracking data all back up the quality of his skillset. He’s very good. So you’re probably wondering how this all jibes with a title suggesting it may be better for hitters to not swing at Bieber’s pitches, right?

I’ll start with this: Bieber’s 34% zone rate ranks 316th out of 323 pitchers who threw a minimum of 20 innings in 2020. That’s dead last among qualified starters. How is this possible? The simple answer is that, once again, he’s very good. The slightly less simple answer is that batters swing at unhittable pitches and don’t swing at hittable pitches. Bieber throws almost twice as many pitches out of the zone as he throws in the zone, so what if hitters just stopped swinging at his offerings? Surely he would just change his approach if a batter didn’t swing at his pitches, right? Read the rest of this entry »


Calculating the Odds of Mike Brosseau’s Magic Moment

After watching the great matchup between the Yankees and Rays in the 2020 ALDS, including Mike Brosseau’s epic at-bat against Aroldis Chapman in the deciding Game 5 of that series, I couldn’t help but take a look at the characteristics of the pitch he hit. Chapman is known as having one of the best fastballs in the game and a long track record of success as a closer. After battling back from 0-2, on the 10th pitch of the at-bat, Brosseau hit a 100.2-mph fastball thrown with 2386 rpms and 7.4 feet of extension over the left-field wall, allowing the Rays to advance to the ALCS.

This pitch was 6.9 mph, 80 rpms, and 1.1 feet above the average velocity, spin rate, and extension for four-seam fastballs in 2020. Given the same location, if the pitch was a little faster, had more RPMs, or was thrown even closer to home plate, would the result have changed? The aim of this article is to create a model to determine what the exact chances were of Mike Brosseau hitting that home run.

Using Baseball Savant and its wealth of Statcast data and more typical statistics, we can select all the four-seam fastballs thrown in 2020 and their related metrics. The data was cleaned for missing values, four-seam fastballs thrown by position players, eephus pitches, and four-seamers that may have been mislabeled as sliders or changeups. For the latter category, a minimum velocity of 87 mph was used to remove these potential label errors, and pitches with negative pfx_z values were removed as four-seam fastballs are expected to drop less relative to gravity. For pfx_x, the absolute value of the given value was used, as I want to look at the magnitude of the horizontal break as opposed to which side of the plate the movement is going towards. Read the rest of this entry »


Stars or Depth? What Is the Best Way To Build an MLB Roster?

Building an MLB roster is anything but simple, to say the least.

It would be very convenient if it was as easy as playing MLB: The Show, but as we are well aware of, there are many complexities to roster construction. Not only do organizations need to have high-end talent, but they also need to have 26 quality big-leaguers as well other players in the pipeline when adversity hits.

In a perfect world, teams would be able to have tons of star talent as well as intriguing depth. However, we do not live in a perfect world, and for that reason, teams need to adopt a specific strategy when it comes to building the best roster possible in the most efficient way imaginable.

Teams have generally two courses: will they prioritize star talent, or will they look to have as deep a team as possible? The first option is typically known as the “stars and scrubs” approach, and it is one that you see often see in basketball. Meanwhile, the latter approach is one that you’ll see with sports with deeper rosters, primarily football. Overall, both methods are used frequently by teams, but it is unclear which one is the more efficient when it comes to roster building.

What good is there to posing a problem if we aren’t going to find the answer for it? We need to dig deep into these two approaches! Should teams prioritize star talent even if it means their depth is lacking? Or is quantity more valuable than quality? Let us try to discover the answer to this critical question! Read the rest of this entry »


The Utility of “Going For It” in the Offseason

For fans of the 29 teams whose autumns aren’t highlighted by a World Series parade (in a normal year at least), the offseason is a time of equality, when every team is zero games back from a playoff spot and hope springs eternal. Front offices have four months to write checks and strike deals with the hope of blocking off the streets come November, or at least sell some tickets along the way. Baseball Twitter and internet forums everywhere are filled with catchphrases like “winning the offseason,” “making a splash,” and of course, “going for it.”

In a perfect world, every team would try its hardest and “go for it” every year, but in today’s MLB, no offseason is without a large swathe of teams sitting on their hands if not outright tanking. The merits of managing a team for the sake of the bottom line or stockpiling prospects for some future championship run can be debated ad nauseum, but the teams that deserve our attention are the ones who spend the winter months actively trying to improve their on-field products and win the whole damn thing.

But what exactly does it look like when a team decides to go for it? A simple look at which teams sign the most free agents could be a start, but a team who signs an army of relievers to minor league contracts shouldn’t be regarded as trying harder than a team that adds a pair of high-profile bats. New dollars committed might be a step closer, but one massive long term contract would skew the results and heavily outweigh a team signing multiple short-term deals.

The best way, then, to judge to what extent a team “went for it” in an offseason would be to look at the perceived short-term value of the players added via trade or free agency compared to those who departed by those same avenues. Read the rest of this entry »


Studying Release Point Standard Deviation From Center

A few summers ago, Walker Buehler and the Los Angeles Dodgers came to Baltimore at the very end of the season. That night my buddy and I couldn’t figure out why the Dodgers, and the overwhelming mass of their fans in attendance, were so pumped about winning a single game in Baltimore. Once we saw staffers in ties and headsets running out with the “Division Champions” t-shirts, we realized what was going on.

Needless to say, Buehler was excellent, going 7 innings with 11 Ks and — because it was the 2019 Orioles — gave up no runs on four hits. During the game, while surrounded by very excited Dodgers fans, I mentioned that Buehler’s delivery seemed so efficient that his motion looked exactly the same every time he threw the ball. If you’ve ever worked on physical mechanics of any kind, be it baseball swings, golf swings, freestyle swim stroke, running stride, or maybe just proper form sitting at a desk to avoid that “work from home/pandemic backache,” you know how hard it can be to exactly replicate a motion over and over again. Buehler amazed us in his ability to do just that. We know that repetition in delivery mechanics leads to success in various forms, so with that in mind, the point of this analysis is to look at release point consistency and how that correlates with resulting pitching metrics. Read the rest of this entry »


Adjusting Batter Performance by the Quality of the Opposing Pitcher

In the 2020 season, American League MVP José Abreu faced 107 different pitchers, including the top four in Cy Young voting point totals (Shane Bieber, Trevor Bauer, Yu Darvish, and Kenta Maeda). Bauer was the only of the four not to allow a home run to Abreu in 2020. In comparison, MVP Runner-up José Ramírez faced 69 of the pitchers that Abreu faced. The third-place DJ LeMahieu faced a completely different set of pitchers, not a single one overlapping with Abreu’s.

While these batters were compared by their offensive production, it appears Abreu faced more challenging pitching. Using FanGraphs’s xFIP- (for which a lower number is better) as a measure of a pitcher’s quality, Abreu was up against a 96.75 xFIP- on average while LeMahieu faced pitchers with at a 105.93 mark. Both LeMahieu’s weighted on-base average (wOBA) of .429 and Abreu’s .411 were exceptional, but is the 18-point difference truly reflective of the difference between the two players’ seasons?

Overview

To answer the question, I derived a value with a similar intuition to Baseball Prospectus’s Deserved Run Average (DRA). DRA is a measure that adjusts a pitcher’s performance by the quality of the batters they are facing. This statistic also accounts for numerous context factors to attempt to better isolate the pitcher’s contribution. DRA shows that the quality of the batter can be influential in a pitcher’s performance, so it makes sense that the quality of pitcher is influential in a batter’s performance.

As for the statistic I will be working with, I choose to refer to this as “pitcher-adjusted weighted on-base average,” or pwOBA. The intuition is simple: a batter should get credit for offensive production against challenging pitching. The formula for pwOBA is based on the formula for wOBA. With wOBA, every event has a run value (ex. 1.979 for home runs in 2020) and a batter gets credit for these values accumulated over the course of the season. The sum of these values is then divided by (AB + BB – IBB + SF + HBP). Read the rest of this entry »


PitchingBot: Using Machine Learning To Understand What Makes a Good Pitch

People have always been looking to understand what makes a good pitch. With advances in pitch tracking technology and computing power, we can begin to use large amounts of data to answer this question more definitively. I’ve created a model called PitchingBot which uses machine learning to try and find what makes a good pitch.

Machine learning describes a general class of algorithms that are very flexible and “learn” patterns from large amounts of data. This means I don’t have to tell PitchingBot what I think a good pitch is, but instead I can give it a load of pitches (and the results of those pitches) and it will train itself to recognize a pitch that gives good results.

I intend to investigate a couple of key questions:

Does PitchingBot reach the same conclusions as conventional wisdom about what makes a good pitch?

Naively, I would expect a good pitch to have the following qualities: high velocity, plenty of movement, and good location in the corner of the strike zone. I will look at whether these are true for PitchingBot and how the definition of a good pitch changes with the ball/strike count.

Can we meaningfully compare and evaluate pitchers using PitchingBot?

Are the pitchers who are best according to PitchingBot those who get the best results? PitchingBot isn’t very useful if it does not agree with real pitcher performance. Read the rest of this entry »


Building a Hitting Prospect Projection Model

How well do you think you can predict the future of a minor leaguer? My computer may be able to help. Towards the end of the regular season, I found the prospects page at FanGraphs and started experimenting with it. I have always had a lot of fun thinking about the future and predicting outcomes, so I decided to try to build a model to predict whether or not a prospect would make it to the majors. I had all the data I needed thanks to FanGraphs, and I had recently been looking into similar models built by others to figure out how I could accomplish this project. I realized that all these articles I was reading detailed the results of their models, but not the code and behind-the-scenes work that goes into creating them.

With that in mind, I decided to figure it out on my own. I had a good idea of what statistics I wanted to use, but there were a few issues I needed to consider before I started throwing data around:

  1. Prospects can play multiple years at a single level.
  2. Not all prospects play at all levels of the minor leagues.
  3. What do I do with players who skipped levels?
  4. How can I make this model useful and practical?

Prospects playing multiple years at a single level isn’t too difficult to deal with because I can just aggregate the stats from those seasons. The fact that not all prospects play in every level of the minor leagues before reaching the majors is tough, however, because that makes for a lot of missing data that needs to be handled before building the model. I decided to replace all the missing values with the means of the existing data, and I created variables to indicate whether or not a player’s season stats for that particular level of the minor leagues were real. To make this model useful, I would want to take out certain variables. For example, I figured I wouldn’t need or want Triple-A stats included in the model because typically once a player has reached that level of the minors, you are more interested in how well they will do in the majors. Read the rest of this entry »


Applying Peta’s Wagering Methodology in 2020

For those unfamiliar with Joe Peta’s groundbreaking 2013 book Trading Bases, the author is a successful financial analyst and former Wall Street trader. Seriously injured in a traffic accident, Peta’s long and painful recovery included employing his professional skills to develop a baseball wagering methodology. His book is about more than that though, including observations about the 2008 economic meltdown and sports wagering writ large. Peta’s anecdotes alone make it worth the read — imagine being hit by a NYC ambulance and then being billed by the city for the ride to the hospital.

At its highest level, the Peta methodology is based on the utilization of a team’s previous season performance adjusted for cluster luck (a regression of OBP/SLG/ISO to arrive at “hits per run”) and WAR, as well as upcoming-season projected WAR. Arriving at an estimate of a team’s season win total, it is then used to identify and capitalize on inefficiencies between the model’s estimates and wagering lines.

Peta’s work produces two products: a season-long projection of wins (the long game) and the ability to handicap individual games through adjustments to each team’s lineup, starting pitcher, and home field. While conceptually straightforward, it is time-consuming to operate, requiring familiarity with Excel (particularly the ability to link sheets). In lieu of Peta’s regression calculation of cluster luck, I utilized FanGraphs’ calculation of BaseRuns, convinced of its utility as a proxy after reading a 2019 article at samkonmodels.com arguing it was one of a number of comparable and readily available such calculations. Read the rest of this entry »


The 3-0 Count Dilemma

While it might not appear so, baseball games constantly portray economic thought, such as in the mathematical model of game theory. There are many ways game theory takes place, but a classic example is the prisoner’s dilemma. Imagine a police officer is interrogating two suspects of robbing a bank together. The police officer has some evidence to put them in jail, but a confession would go a long way. Each suspect is contemplating confessing to the crime. If both suspects keep quiet, they will each receive five years in jail. If one suspect confesses and the other keeps quiet, the one who kept quiet will receive 20 years in jail while the suspect who confessed will receive just one year. If both confess, they each receive 10 years in jail. The logical choice for each suspect is called the dominant strategy. The end result, or the combination of each suspects decision, is called the Nash Equilibrium. By using game theory, we come to the conclusion that each suspect should confess to the crime, meaning they will each get 10 years in prison. I won’t go much into why this is the case, but feel free to research more about game theory and the Nash Equilibrium on your own.

What does this have to with baseball? We can think of each pitch as game theory, with each suspect as the pitcher and batter. Instead of confessing to a crime, the pitcher is contemplating throwing a ball in the strike zone while the batter is contemplating swinging. While the prisoner’s dilemma has a Nash Equilibrium, not only does a pitch to a batter not have a Nash Equilibrium, but the combination of decisions is constantly changing. If the batter’s dominant strategy is to swing, then pitchers will throw more balls outside the batter’s reach. If the pitcher’s dominant strategy is to throw a ball, then the batter will take more pitches.

We could observe this thought process for every pitch thrown. However, let’s look at one type of pitch: 3-0 counts. If you are the batter, it might seem obvious to take the pitch. The worst-case scenario is you end up with a 3-1 count. If you are the pitcher, it might seem obvious to throw an easy strike. You do not want to walk the batter, and you know the batter doesn’t want to swing and risk giving you an easy popup to get out of good count. So I guess the batter should take every pitch and the pitcher should throw the ball right down the middle every time. Read the rest of this entry »