Author Archive

Predicting wOBA Using Process-Based Statistics

When trying to determine a batter’s overall offensive value using a single statistic, one of the most popular metrics to use is weighted on-base average (wOBA). wOBA is calculated as a ratio of a linear combination of “outcome” statistics (unintentional walks, hit-by-pitches, singles, doubles, triples, and home runs) divided by, essentially, the number of plate appearances.

With that being said, could one predict whether a given player’s wOBA will be above a certain threshold using “process” statistics such as plate discipline and batted ball parameters? In particular, if we know a player’s zone contact rate, chase rate, and average exit velocity, could we predict with any confidence whether that particular player’s wOBA will be above, say, .320?

Using Statcast data and a bit of machine learning, I have decided to train a shallow neural network to try to do just that. I’ll be using snapshots of the Jupyter Notebook throughout the analysis to try and make it a little easier to follow. Read the rest of this entry »


The MVP Batter Through the First Month

In one of the later chapters of The MVP Machine, the authors describe a working relationship between an unnamed position player and a writer at an “analytically inclined” baseball website. The player felt that his club’s advanced scouting data wasn’t granular enough and asked the writer to supplement the information he was given by the club with additional detail. The writer was eventually performing scouting reports on the player himself, opposing pitchers, as well as the home plate umpires’ strike zones. In terms of evaluating his own performance, the writer summarized that the player was basically looking at three things: “Am I squaring up the ball? Am I swinging and missing? Am I swinging at strikes?”

With the first month of the season in the books, who would be some of the best performing hitters in the league according to this particular player’s criteria? Thanks to Statcast, we have the tools at our disposal to try and figure out just that. Note that the dataset I used for this exercise was all qualified batters as of the morning of April 30th, 2021.

First, we need to decide which parameters to use to represent each of the three questions posed by the player. Two of the three are pretty easy. “Am I swinging and missing?” We can look up a player’s whiff percentage on Statcast. “Am I swinging at strikes?” That information is represented in a player’s chase percentage. “Am I squaring up the ball?” The natural candidates here would be, if we’re using just one number, the average exit velocity, hard hit percentage, and barrel percentage. I decided to go with the average exit velocity because it takes into account every batted ball put in play by the batter. Let me explain. Read the rest of this entry »


Pound the Knees, Steven

After the Toronto Blue Jays traded for left-handed pitcher Steven Matz, he is projected to slide into the bottom of the starting rotation and pitch about 115 innings this year. Matz’s 2020 was a year to forget — join the club, Steven — but let’s take a look at who Matz is as a pitcher and why a change in fastball location is something the Jays coaching staff might consider.

Matz pitched only about 30 innings last year, so in the interest of sample size, I will also be using statistics from 2019 and 2018. Here is what those last three seasons looked like, courtesy of Baseball Savant: Read the rest of this entry »


Aaron Nola Will Make You Question Yourself

In one of the later chapters of The MVP Machine, the authors describe a working relationship between a professional baseball player (an unnamed position player) and a writer at an “analytically inclined” baseball website. The player felt that his club’s advanced scouting data wasn’t granular enough and asked the writer to supplement the information with more detail. The writer summarized that the player was basically looking at three things: “Am I squaring up the ball? Am I swinging and missing? Am I swinging at strikes?”

That last question got me thinking. As a pitcher, it is rarely a bad idea to have batters look at called strikes and swing at balls. Which pitchers, in 2020, were particularly effective at doing just that? To make that determination, I looked at Statcast data for all pitchers who threw at least 60 innings in 2020. Specifically, I looked at their outside-zone swing rate and their zone take rate – calculated as just (1 – zone swing rate) – and took the average of the two. Note that this analysis completely omits what happens if contact is made with the ball; We’re merely interested in strikes that were taken and balls that were swung at. (If you’re interested in the Statcast query and the few lines of code for this, click here.) The top 10 was as follows: Read the rest of this entry »