Archive for Research

No Pitch Is an Island: Pitch Prediction With Sequence-to-Sequence Deep Learning

One of the signature dishes of baseball-related machine learning is pitch prediction, whereby the analysis aims to predict what type of pitch will be thrown next in a game. The strategic advantages of knowing what a pitcher will throw beforehand are obvious due to the lengths teams go (both legal and illegal) to gain such information. Analysts that solve the issue through data have taken various approaches in the past, but here are some commonalities among them:

  • Supervised learning is incorporated with numerous variables (batter-handedness, count, inning, etc.) to fit models on training data, which are then used to make predictions on test data.
  • The models are fit on a pitcher-by-pitcher basis. That is, algorithms are applied to each pitcher individually to account for their unique tendencies and repertoire. Results are reported as an aggregate of all these individual models.
  • There is a minimum cut-off for the number of pitches thrown. In order for a pitcher’s work to be considered they must have crossed that threshold.

An example can be found here. The goal of this study is not to reproduce or match those strong results, but to introduce a new, natural-fitting ingredient that can improve on their limitations. The most constraining restriction in other works is the sample size requirement; by only including pitchers with substantial histories, the scope of the pitch prediction task is drastically reduced. We hope to produce a model capable of making predictions for all pitchers regardless of their individual sample size. Read the rest of this entry »


The Effect of Fastball Velocity on the Slider

I’ve heard it said in the past that a batter should take care of the pitcher’s fastball first and then deal with the breaking ball. If this is true, then the faster the pitcher’s fastball is, the more the batter needs to be aware of the fastball when at the plate. I want to look at how this affects the most popular pitch in baseball: the slider.

First I calculated the average velocity of each pitcher’s fastball for pitchers who threw at least 100 fastballs (FF, FT, SI) in each major league season from 2017-2021. Based on the calculated average fastball velocity, I divided the pitchers into three groups: 143-148 km/h, 148-153 km/h, and 153-158 km/h. I then further divided the groups according to the velocity and movement of the slider thrown in each.

Then I calculated the Run Value/100 for each group. Let’s start with the velocity group between 143 and 148 km/h (click to enlarge). Read the rest of this entry »


Pitch Mix Variation and Ways to Measure It

Earlier this year, I took a hack at defining what I referred to as pitch mix variation. Pitch mix variation, as I conceived of it at least, would be a single number to capture how much any given pitcher mixes his offerings. A higher pitch mix variation (PMV) would indicate first that a pitcher has a relatively diverse mix of pitches and, second, throws each pitch roughly as much as any other. A lower PMV would indicate a pitcher has fewer pitches and relies on just one or maybe two of those the vast majority of the time.

Among other things, baseball types are quick to measure the quality of stuff, command, control, and the number of offerings of pitchers. That said, to my knowledge there doesn’t appear to be a standardized catch-all metric for how often those pitches are utilized. There also seems to be value for such a metric. For instance, a college starter might have a 3,000-rpm curveball that plays up in models, but if he doesn’t trust it and therefore throws it just ~5% of the time, that elite spin might somewhat belie long term bullpen risk.

Put simply, a pitcher who throws a four-seamer, sinker, curveball, and changeup all 25% of the time is quite possibly tougher to square up than one who throws just a four-seam (80%) and curveball (20%), all else held equal.

However, this post isn’t about assigning value or finding an optimal PMV (surely that depends on the individual pitcher), but rather juxtaposing various potential measures. To that end, this post will include the following: (1) a recap of the original formula and logic I previously cobbled together, (2) an overview of two more formalized models for quantifying variation, and (3) a comparison of those three measures across several hundred pitchers in 2021. Read the rest of this entry »


Weighted Runs Batted In Efficiency

Imagine that throughout high school, teachers gave their favorite students easier tests than the rest of the class. Results would be clear: the majority of the favored students would come out with stronger scores. However, one would question if those strong scores would be a result of high intellect or because of an easy test. Contrarily, there would be other students who would still score well while given a difficult test. Now there’s an issue. If the teachers want to know which of the students know the material the best, how should they figure it out? They know that they can’t take the highest score, because they are aware that the scores are not an accurate representation due to the skewed tests. This is the situation in which the RBI has put the baseball world.

When the RBI was first documented as an official statistic in 1920, the wording of the definition in Rule 86, Section 8 of the Official Baseball Rules was “The number of runs batted in by each batsman.” Although this definition was slightly vague, its intention was to quantify which batter is the best at batting in runs. For years, this statistic has been praised. The RBI is always one of the first statistics to be mentioned while summarizing a player’s year and career. The RBI is even in the most prestigious hitting award, The Triple Crown. Despite its strong reputation, over the last few years it has become clear that the RBI doesn’t answer “Which batsman is the best at batting in runs?” The RBI only answers “Who has batted in the most runs?” Although that may seem like a small wording change, the two questions are tremendously different. Read the rest of this entry »


Looking for a Breakout Performance

Every franchise is looking for that player who seems to come out of nowhere to be a major contributor in their lineup. Players like José Bautista, who went from 1.8 WAR in 2009 to 6.5 WAR in 2010, or Justin Turner, who jumped from 0.5 WAR in 2013 to 3.4 WAR in 2014. The cost for acquiring these players was affordable because they were no longer prospects and most of the league had written them off as potential everyday players.

If a team had the ability to identify which players are most likely to exceed industry expectations, they would have a significant advantage over their competition. That is why I decided to create a model that tries to identify potential breakout performers.

Methodology

The first thing I needed to do was to define what constitutes a breakout performance. I thought of several different definitions, but I decided to define a breakout performance as any player that exceeded their career high WAR in a single season by at least 2.0 WAR. So if a player had recorded a season of 0.0 WAR, they would need to have at least a 2.0 WAR season. If a player had recorded a season of 1.0 WAR, they would need to have at least a 3.0 WAR season and so on and so forth. Read the rest of this entry »


David Fletcher’s 2021 Was Missing Something

What’s 0/573?

Baseball Savant knows, but they also know it’s useless information. This is precisely why they do not display it. And it’s a shame that they don’t display it.

If they did, it would show that David Fletcher is in the zeroth Percentile for Barrels.

For a quick refresher, Barrels are “a batted ball with the perfect combination of exit velocity and launch angle.” To qualify, a ball must be hit at least 98 mph. For that exit velocity, a launch angle of between 26-30 degrees is required. For every single mph increase, the range of acceptable launch angle degrees increases by two or three, up until 116 mph. At that level, any ball hit between 8-50 launch degrees is considered Barreled.

Fletcher didn’t do that once in 2021. Instead, he mustered eight “close calls” among his 573 batted ball events. Read the rest of this entry »


Giving Away At-Bats

One piece from FanGraphs this season has stayed with me more than any other article on the website. In early September, Kevin Goldstein wrote a piece called The Rays’ Unique Ability To Mitigate Risk.

For most of the piece, Goldstein examined why the Rays pitch effectively even though they use so many relief pitchers. Most of the time, a team that cycles though relief pitchers in bunches is a bad one, like the Baltimore Orioles this year. But the Rays, as they often do, defy common practice.

I actually did not remember that part of Goldstein’s article; I only remembered it when I re-read it before writing this. What stuck with me was a short section at the beginning in which he explained why the Rays score so many runs.

Goldstein’s question was how does a team that has no high-priced free agent slugger, like Bryce Harper or Manny Machado, or no home-grown young stud, like Juan Soto or Fernando Tatis Jr., score so many runs? (You will see in a moment why I am ignoring the Rays’ young phenom Wander Franco.) Read the rest of this entry »


Pitchers Had Another Bad Year Hitting. But No. 9 Hitters…

October 4, 1972: Yankees righty Larry Gowell hits a double off of Milwaukee Brewers pitcher Jim Lonborg. The American League Brewers played that game in an American League park in the Bronx, with no designated hitter on either side.

October 3, 2021: Dodgers righty Andre Jackson hits for himself, in relief, grounding out against Milwaukee Brewers pitcher Daniel Norris. The National League Brewers played that game in a National League park in Los Angeles, with no designated hitter on either side.

Gowell started his game and went five innings. In the third he led off, got his double, and advanced on a 6-3 groundout before being stranded at third base. In the bottom of the inning, he gave up a sac fly to John Briggs. That proved to be the only run, tagging Gowell with the loss.

Jackson was in relief of Phil Bickford, himself in relief of Walker Buehler. When a reliever hits for himself, rarely is the game competitive: here, Jackson had already pitched two innings with a nice lead. Immediately before Jackson’s spot in the batting order came up, outfielder Matt Beaty drove in catcher Will Smith, utility man Chris Taylor, and himself. Dodgers skipper Dave Roberts surely saw the score in LA, the score in San Francisco, and Jackson’s roster status for the playoffs, and let him hit and finish out the ninth. (Jackson collected a save for his three-inning effort, the first of his career.)

Gowell probably wasn’t the last AL pitcher to bat before the DH. The Angels and Royals had night games on the same day with pitcher at-bats in the Pacific and Central time zones. If baseball should adopt the designated hitter rule for the National League effective next year, Jackson will probably be the last NL pitcher to bat under these rules. The Reds’ Reiver Sanmartin collected three at-bats before being lifted for a pinch-hitter on the same day, but two of those came to start and end the fifth inning in his game against the Pirates. The Giants’ Logan Webb collected three at-bats too, but his day as a batter ended after a home run in the fifth. All those games began around the same time, so Jackson’s appearance in the eighth inning was a bit later on. Read the rest of this entry »


Dominican Major Leaguers and the Provinces They Hail From

It shouldn’t come as any great surprise to a typical baseball fan that Dominican players play an outsized role in Major League Baseball today. In fact, the Dominican Republic, which has a population roughly just 3.3% that of the United States, supplies MLB with upwards of 10% of its players. Major League Baseball and baseball fans are better off because of this. After all, who wants to live in a baseball world without Nelson Cruz or Fernando Tatis Jr., for instance?

With this point in mind, the following takes a look at players from the Dominican Republic. More specifically, where in the D.R. players were born and when they made their way to MLB. What follows will be split into three brief sections: a description of the data utilized, some insights into the growth of the D.R.’s influence in MLB, and finally some map-based depictions of the players’ provinces of birth within the Dominican Republic. Read the rest of this entry »


Are Third Base Coaches Too Hesitant in Sacrifice Fly Situations?

Imagine you are coaching third base. Your team is at bat with a runner on third and one out. There is a flyball caught in marginally shallow left field. You think your runner has about a 50/50 chance of scoring if you send him. Do you send him?

Many of you would probably say no. This is a risky call. There is a 50% chance the runner would be out, which would be a huge momentum killer. Furthermore, if he gets caught and your team loses by a run, you are going to be the person blamed by the media.

My hypothesis is that third base coaches are leaving runs on the table. Over the past four seasons, third base runners scored 98% of the time when sent in sac fly situations, suggesting that coaches are sending them only when they have a very high degree of confidence of success. I hypothesize they won’t send runners unless they feel they have at least an 80% chance of scoring, but my analysis says they should be sent even with much lower chances. Read the rest of this entry »