A Peta Perspective on the Hot Stove So Far

Cold snowy days here in our nation’s capital, combined with the owners’ and players’ seeming determination to kill the golden goose, provides an opportunity for me to look at the hot stove (pre-lockout) through the lens of the Peta methodology. For those unfamiliar with the Peta methodology, I refer you to this deeper dive here on the Community Blog published last January. Based on Joe Peta’s groundbreaking 2013 book Trading Bases, the methodology derives each team’s upcoming season win-loss record based on the utilization of its previous season performance (runs scored/runs allowed), adjusted for cluster luck (my proxy is FG BaseRuns), and the team’s upcoming-season projected WAR.

Just before Opening Day, the product of this calculation is compared to the money line. Peta suggests that in a 162-game season, win totals produced by the model that deviate from the money line by more than four games (1.5 games in a 60- game season) represented “unrepeatable results” and therefore were worth a possible wager. Read the rest of this entry »


Analyzing Joc Pederson’s Free Agency

Joc Pederson is selling pearls, but is anyone buying? After earning his second World Series ring, your favorite bedazzled outfielder is on the market and could be coming to a city near you.

This past year, Pederson slashed .238/.310/.732 OPS between his stints with the Cubs and Braves, finishing the season with 18 home runs, 61 RBIs, two stolen bases, and an OPS+ of 93. If we dive a bit deeper we can see that despite modest traditional numbers, Baseball Savant has him ranked in the 80th percentile for average exit velocity and the 90th percentile for max exit velocity. His ability to hit the ball hard is nearing an elite level, and though a subpar batting average is certainly not helping his case, his skill at driving the ball may entice a team in need of some lefty power.

Although many fans consider Pederson a clutch postseason hitter, primarily because of his self-proclaimed “Joctober,” his stats looked grim as this postseason winded down. Pederson went 5-for-22 with one bomb in the NLCS, which is not especially great for a postseason power hitter of his caliber. He was worse in the World Series, going 1-for-15 with no homers. Despite having won a pair of championships, “Joctober” seems to have been a classic case of small sample size. His recent performance, or lack thereof, will weigh heavy on the mind of executives and undoubtedly bring down his value. Read the rest of this entry »


Predicting Hall of Famers with Machine Learning

The questions of who should and who will make it into the Baseball Hall of Fame have inspired countless debates, books, articles, and statistics. From the early days of statistical milestones like 3,000 hits, 500 home runs, and 300 wins to more advanced measurements like WAR and JAWS, and throughout baseball’s many eras, many have attempted to tackle the task. The discussion is more or less ongoing but peaks whenever a prominent player retires and during every winter ballot season. Innovations like the Hall of Fame Tracker have only added fuel to the fire.

I wanted to see if machine learning was up to the task of predicting who’ll get enshrined. I trained and evaluated a prediction model and used it to predict induction chances for current and recently retired players. I specifically wanted to see if I could get a sense of how some of the game’s younger superstars are doing, because who doesn’t want to talk about how good Juan Soto is?

In this article I discuss building and evaluating the model and show the predictions it makes. If you’re interested in the former, continue reading; if you’re interested only in the predictions, feel free to skip to the end. Read the rest of this entry »


Modeling One-Run and Extra-Inning Games

When the 2021 regular season concluded, there was the following exchange in the “Hey Bill” section of the Bill James Online baseball community, with Bill’s response starting at “Answered:”

Hey Bill! 

 Is it possible to calculate an expected number of 1-run games for a team in a season? The reason I ask is that the Mets played in 66 1-run games this year, 40.7% of their games. That seems like a whopping big number . . . but is it? 

Thanks 

Kevin

Asked by: kgh

Answered: 10/4/2021

It’s a very large number, but I wouldn’t know how to calculate an expected number. I don’t even know what the variables would be. I suppose one-run games are slightly more common among teams which are near .500, and obviously they would be significantly more common in a low-run environment than in a high-run environment.

Inspired by this interaction, I built a dataset to answer those questions and a few more that popped up along the way. Let’s start with the easiest one:

The 2021 Mets played 66 one-run games, or 40.7% of their contests. Is that a whopping big number?

Yes, that is a big number, but not “whopping” big.

The Mets did play 66 one-run games, with 13 of those in extra-innings and 53 in “regulation.” They played 18 total extra-inning games. This gave them a total of 71 games that were decided by one run or in extra-innings. Several teams listed below played more one-run games than the Mets did in 2021. Read the rest of this entry »


No Pitch Is an Island: Pitch Prediction With Sequence-to-Sequence Deep Learning

One of the signature dishes of baseball-related machine learning is pitch prediction, whereby the analysis aims to predict what type of pitch will be thrown next in a game. The strategic advantages of knowing what a pitcher will throw beforehand are obvious due to the lengths teams go (both legal and illegal) to gain such information. Analysts that solve the issue through data have taken various approaches in the past, but here are some commonalities among them:

  • Supervised learning is incorporated with numerous variables (batter-handedness, count, inning, etc.) to fit models on training data, which are then used to make predictions on test data.
  • The models are fit on a pitcher-by-pitcher basis. That is, algorithms are applied to each pitcher individually to account for their unique tendencies and repertoire. Results are reported as an aggregate of all these individual models.
  • There is a minimum cut-off for the number of pitches thrown. In order for a pitcher’s work to be considered they must have crossed that threshold.

An example can be found here. The goal of this study is not to reproduce or match those strong results, but to introduce a new, natural-fitting ingredient that can improve on their limitations. The most constraining restriction in other works is the sample size requirement; by only including pitchers with substantial histories, the scope of the pitch prediction task is drastically reduced. We hope to produce a model capable of making predictions for all pitchers regardless of their individual sample size. Read the rest of this entry »


The Effect of Fastball Velocity on the Slider

I’ve heard it said in the past that a batter should take care of the pitcher’s fastball first and then deal with the breaking ball. If this is true, then the faster the pitcher’s fastball is, the more the batter needs to be aware of the fastball when at the plate. I want to look at how this affects the most popular pitch in baseball: the slider.

First I calculated the average velocity of each pitcher’s fastball for pitchers who threw at least 100 fastballs (FF, FT, SI) in each major league season from 2017-2021. Based on the calculated average fastball velocity, I divided the pitchers into three groups: 143-148 km/h, 148-153 km/h, and 153-158 km/h. I then further divided the groups according to the velocity and movement of the slider thrown in each.

Then I calculated the Run Value/100 for each group. Let’s start with the velocity group between 143 and 148 km/h (click to enlarge). Read the rest of this entry »


Pitch Mix Variation and Ways to Measure It

Earlier this year, I took a hack at defining what I referred to as pitch mix variation. Pitch mix variation, as I conceived of it at least, would be a single number to capture how much any given pitcher mixes his offerings. A higher pitch mix variation (PMV) would indicate first that a pitcher has a relatively diverse mix of pitches and, second, throws each pitch roughly as much as any other. A lower PMV would indicate a pitcher has fewer pitches and relies on just one or maybe two of those the vast majority of the time.

Among other things, baseball types are quick to measure the quality of stuff, command, control, and the number of offerings of pitchers. That said, to my knowledge there doesn’t appear to be a standardized catch-all metric for how often those pitches are utilized. There also seems to be value for such a metric. For instance, a college starter might have a 3,000-rpm curveball that plays up in models, but if he doesn’t trust it and therefore throws it just ~5% of the time, that elite spin might somewhat belie long term bullpen risk.

Put simply, a pitcher who throws a four-seamer, sinker, curveball, and changeup all 25% of the time is quite possibly tougher to square up than one who throws just a four-seam (80%) and curveball (20%), all else held equal.

However, this post isn’t about assigning value or finding an optimal PMV (surely that depends on the individual pitcher), but rather juxtaposing various potential measures. To that end, this post will include the following: (1) a recap of the original formula and logic I previously cobbled together, (2) an overview of two more formalized models for quantifying variation, and (3) a comparison of those three measures across several hundred pitchers in 2021. Read the rest of this entry »


Weighted Runs Batted In Efficiency

Imagine that throughout high school, teachers gave their favorite students easier tests than the rest of the class. Results would be clear: the majority of the favored students would come out with stronger scores. However, one would question if those strong scores would be a result of high intellect or because of an easy test. Contrarily, there would be other students who would still score well while given a difficult test. Now there’s an issue. If the teachers want to know which of the students know the material the best, how should they figure it out? They know that they can’t take the highest score, because they are aware that the scores are not an accurate representation due to the skewed tests. This is the situation in which the RBI has put the baseball world.

When the RBI was first documented as an official statistic in 1920, the wording of the definition in Rule 86, Section 8 of the Official Baseball Rules was “The number of runs batted in by each batsman.” Although this definition was slightly vague, its intention was to quantify which batter is the best at batting in runs. For years, this statistic has been praised. The RBI is always one of the first statistics to be mentioned while summarizing a player’s year and career. The RBI is even in the most prestigious hitting award, The Triple Crown. Despite its strong reputation, over the last few years it has become clear that the RBI doesn’t answer “Which batsman is the best at batting in runs?” The RBI only answers “Who has batted in the most runs?” Although that may seem like a small wording change, the two questions are tremendously different. Read the rest of this entry »


Looking for a Breakout Performance

Every franchise is looking for that player who seems to come out of nowhere to be a major contributor in their lineup. Players like José Bautista, who went from 1.8 WAR in 2009 to 6.5 WAR in 2010, or Justin Turner, who jumped from 0.5 WAR in 2013 to 3.4 WAR in 2014. The cost for acquiring these players was affordable because they were no longer prospects and most of the league had written them off as potential everyday players.

If a team had the ability to identify which players are most likely to exceed industry expectations, they would have a significant advantage over their competition. That is why I decided to create a model that tries to identify potential breakout performers.

Methodology

The first thing I needed to do was to define what constitutes a breakout performance. I thought of several different definitions, but I decided to define a breakout performance as any player that exceeded their career high WAR in a single season by at least 2.0 WAR. So if a player had recorded a season of 0.0 WAR, they would need to have at least a 2.0 WAR season. If a player had recorded a season of 1.0 WAR, they would need to have at least a 3.0 WAR season and so on and so forth. Read the rest of this entry »


Previewing the CBA Deadline

MLB’s Collective Bargaining Agreement (CBA) is set to expire on December 1st. Unfortunately with all of the disagreements over issues including rule changes, profit sharing, and minor league living conditions, it’s possible that we could see a work stoppage similar to the one we saw in 1994.

The MLBPA’s website says the purpose of the CBA “is to set forth their agreement on certain terms and conditions of employment of all Major League Baseball Players for the duration of this Agreement.” This is vital to the league and many other major U.S. sports because it sets fair and ethical rules for players and teams to abide by. However, owners have historically dominated negotiations and kept the lion’s share of profits. In recent years, players have been much more open to speaking out, and there has been significant pushback in the media. If there aren’t substantial changes made by December, it would not be surprising to see another lockout or strike.

If there is no new CBA by December 1st, MLB rules say major league play will stop until it is renewed and there will be no moves allowed by any club. This will play a significant role this offseason regardless of whether the CBA gets renewed or not, as the potential scare of a delayed CBA may force teams to rush moves or wait longer on them. Read the rest of this entry »