If you were feeling charitable, you could say this post owes a lot to Jeff Sullivan’s recent set of articles examining pitch comps. If you weren’t feeling charitable, you could say this post is a shameless appropriation of his ideas. Either way, you should read those articles! They were very good, and very entertaining, and directly inspired this post. There were seven, in total: here, here, here, here, here, here, and here. I’ll wait.
Back? Good! In the comments of the third article, someone asked Jeff about finding the “most signature” pitch, or the pitch with the worst/fewest comps. Jeff said: “Wouldn’t be surprised if it was Dickey or the Chapman fastball. That math… I’m afraid of that math, but I might make an attempt.” Jeff has looked at unique pitches twice (Carlos Carrasco’s changeup and Odrisamer Despaigne’s changeup, the last two articles linked above), but I wanted to attack the question in a less ad-hoc fashion, looking at all pitches rather than singling some out.
Jeff wasn’t wrong, though – the math is not simple. His methodology doesn’t really work here for a couple reasons. First of all, I’m looking for uniqueness rather than similarity. I could just flip Jeff’s method around and look for high comp scores, like what he did for the Carrasco/Despaigne changeups, but I also want to consider all pitch types. Again, Jeff sort of did this in the Despaigne article, by comparing his changeup to a few different pitch types, but that is not really feasible for every pitch thrown.
What this means is that a new method is needed to directly calculate dissimilarity. We could find the maximum distances from the mean (basically Jeff’s method), which would work for a single pitch type: if all the pitches are clustered together, with similar velocities and breaks, calculating the distance from the mean to find the weirdest pitch makes sense. But consider this hypothetical set of pitches, graphed on two axes for simplicity:
Obviously, the pitch that corresponds to the red point is the sort of thing we’d like to identify as unique. It’s also exactly at the center of that dataset, and would show up as the least unique pitch, if distance from the mean was used to determine uniqueness. Luckily, there’s an algorithm that is designed to find outliers in a more rigorous way.
This is where the math gets scary. The algorithm is called Local Outlier Factor analysis, which identifies outliers in a dataset based on the density of data around that point as compared to its neighbors. In this context, the density around a point is a function of how similar the best comps are for each pitch. Each point gets a score, where anything near 1 indicates normal, and higher values indicate greater isolation. I’m not going to go into detail, but if anyone wants to learn more, feel free to ask in the comments, or just Google it. It’s fairly simple to run it on all pitches, with the relevant variables of velocity, horizontal break, and vertical break.
Any pitch thrown more than 100 times in 2014 was included, and righties and lefties were considered separately (since pitches that move the same way obviously are very different based on what side of the rubber they come from). But enough about methodology! Here are the top five most signature pitches, for righties and lefties, along with their LOF scores, followed by some gratuitous gifs.
It’s nice when things work exactly like you expect them to. The top pitches on the two lists are incredible, and incredibly unique, and while it’s not a surprise to see them here, it does provide some reassurance that this measure is doing what it’s supposed to. Everyone knows about Dickey’s knuckleball, and if anything, it’s underrated by this measure. Since it moves so randomly, the knuckle’s season averages end up being slow and pretty much neutral horizontally and vertically. While that’s enough to make them show up as very odd under this measure, the individual pitches don’t often follow that straight trajectory, as seen in the above gif. The same can be said for Steven Wright’s knuckleball in third, but it’s nice that this measure still picks them out as unique pitches.
As for Chapman, there’s not that much to say about his fastball that hasn’t already been said. It feels wrong in some way to call his fastball strange, since it is disturbingly direct in practice, but there was truly no pitch like it in 2014. The velocity is the carrying factor behind the massive outlier score, almost a full 2 MPH greater than the next fastest pitch. Interestingly, Chapman’s pitch was the only one in either top five with notably high velocity.
Looking at the weirdest pitches in baseball, what can we conclude about them as a group? First, the pitchers throwing them are generally not bad. While you’d expect someone to be at least halfway decent to get in the position to throw 100 pitches of a single type, the owners of these pitches averaged about 1 WAR in 2014. With eight of these 10 throwing primarily in relief, and having only 710.2 innings collectively, that comes out to a very respectable 2.4 WAR/200.
The pitches themselves varied in usage, from Neshek’s change, thrown 13.4% of the time, to Britton’s sinker, thrown 89.3% of the time. They also varied in effectiveness, as measured by run values, from Neshek’s 3.6/100 to Marshall’s -1.63/100. Overall, the best pitch is probably Chapman’s fastball, followed by Britton’s sinker, given both the results on those pitches and how often they use them, but as a group, these pitches are pretty good. Maybe that isn’t totally surprising, but weird does not necessarily equal effective. Any pitcher could immediately have the weirdest pitch in baseball, if he threw 40 MPH meatballs, but less absurdly, mix and control matter just as much as the movement of the pitch.
Finally, all this stuff tracks fairly well with what Jeff identified previously. Obviously, he called Dickey and Chapman, but he also wrote this article about how Zach Britton’s sinker is pretty much comp-less, and we see that very pitch in fifth for lefthanders. Odrisamer Despaigne’s change was 12th for righthanders. Interestingly, Carrasco’s change is 98th on that same list, indicating this method doesn’t think he’s incredibly unique. Overall, this was mostly just a fun exercise, but maybe there’s more to this list, so if you want to poke around, it’s in a public Google Doc here. And like I said, if you have any questions about the methodology or anything like that, I’d be glad to answer them in the comments.
Henry is a very-part-time baseball writer whose past work has appeared at Beyond the Box Score and Baseball Prospectus. Find him on Twitter @henrydruschel, and find his other writing at medium.com/@henrydruschel.