Author Archive

Starting Pitchers Aren’t Leaning On Their Best Pitches

Nathan Ray Seebeck / USA TODAY Sports

The title of this post does not exactly mince words. Should that be all the context you need (TL;DR), it would be fair to move on. However, for those looking for a greater explanation, qualifications and nuance abound in what follows as justification for such a statement.

The impetus for doing some digging and eventually choosing this topic (and title) is pretty simple; I wondered whether starting pitchers, over the course of a long season, throw their best pitches more often than their less effective pitches.

Starters were the focus for a reason. Relievers, who most often face mere subsets of an opposing lineup (and face that subset crucially just once) in any given outing, are likely more inclined to defer to their strongest offerings at higher rates. Starting pitchers, meanwhile, often have to grapple with the phenomenon of diminishing returns on pitch usage. Should an opposing hitter see that “best” pitch over and over, what made it effective in the first place loses some of its value to a hitter’s heightened recognition. Starting pitchers, it turns out, probably should practice some moderation.

Read the rest of this entry »

Are Hitters Hitting It Where It’s Being Pitched?

If you watch basebll games, which you probably do if you find yourself reading this, then you’re likely familiar with announcers employing phrases like “he just went with it,” or “hit it where it was pitched.” These phrases suggest hitters have made contact with the baseball such that outside pitches are hit to the opposite field and pitches on the inner half are put in play to the hitter’s pull side.

These comments beg the question: are hitters “going with” the pitches they are thrown with any discernible frequency? In today’s game, wherein the value of tapping into pull power and raising average launch angles has been well established, are hitters still hitting it where it’s pitched? To what extent do team’s defensive alignments correspond to how their pitchers will approach any given hitter should that hitter go with pitches? Given that pitchers who throw higher in the zone more often allow fly ball contact and those who throw lower induce more groundballs, does something similar apply for hitters given how they are pitched on a horizontal plane, i.e. inside and outside? Read the rest of this entry »

Pitch Mix Variation and Ways to Measure It

Earlier this year, I took a hack at defining what I referred to as pitch mix variation. Pitch mix variation, as I conceived of it at least, would be a single number to capture how much any given pitcher mixes his offerings. A higher pitch mix variation (PMV) would indicate first that a pitcher has a relatively diverse mix of pitches and, second, throws each pitch roughly as much as any other. A lower PMV would indicate a pitcher has fewer pitches and relies on just one or maybe two of those the vast majority of the time.

Among other things, baseball types are quick to measure the quality of stuff, command, control, and the number of offerings of pitchers. That said, to my knowledge there doesn’t appear to be a standardized catch-all metric for how often those pitches are utilized. There also seems to be value for such a metric. For instance, a college starter might have a 3,000-rpm curveball that plays up in models, but if he doesn’t trust it and therefore throws it just ~5% of the time, that elite spin might somewhat belie long term bullpen risk.

Put simply, a pitcher who throws a four-seamer, sinker, curveball, and changeup all 25% of the time is quite possibly tougher to square up than one who throws just a four-seam (80%) and curveball (20%), all else held equal.

However, this post isn’t about assigning value or finding an optimal PMV (surely that depends on the individual pitcher), but rather juxtaposing various potential measures. To that end, this post will include the following: (1) a recap of the original formula and logic I previously cobbled together, (2) an overview of two more formalized models for quantifying variation, and (3) a comparison of those three measures across several hundred pitchers in 2021. Read the rest of this entry »

Dominican Major Leaguers and the Provinces They Hail From

It shouldn’t come as any great surprise to a typical baseball fan that Dominican players play an outsized role in Major League Baseball today. In fact, the Dominican Republic, which has a population roughly just 3.3% that of the United States, supplies MLB with upwards of 10% of its players. Major League Baseball and baseball fans are better off because of this. After all, who wants to live in a baseball world without Nelson Cruz or Fernando Tatis Jr., for instance?

With this point in mind, the following takes a look at players from the Dominican Republic. More specifically, where in the D.R. players were born and when they made their way to MLB. What follows will be split into three brief sections: a description of the data utilized, some insights into the growth of the D.R.’s influence in MLB, and finally some map-based depictions of the players’ provinces of birth within the Dominican Republic. Read the rest of this entry »

Using Decision Trees To Classify Yu Darvish Pitch Types

Last year, I wrote a post which outlined the application of a K Nearest Neighbors algorithm to make pitch classifications. This post will be, in some ways, an extension of that as pitches will yet again be classified using a machine learning model. However, as one might have presumed given this post’s title, the learner of choice here will be a decision tree. Additionally, this time around, instead of classifying pitches thrown over the course of a single game I will aim to classify pitches thrown by a single pitcher over the course of an entire season.

What follows will be divided into three sections: a brief conceptual explanation of decision tree learners, a description of the data and steps taken to train the decision tree model of choice here, and finally a run-through of the model’s results. I am not an expert on machine learning, but I believe that this is an interesting exercise that (very, very basically) highlights a powerful model using interesting baseball data. The work to support this post was conducted in scripting language R and with the direction of the book Machine Learning with R by Brett Lantz. Read the rest of this entry »