Nick Punto was Right: Evaluating the Game’s Dramatic Bullpen Evolution via Machine Learning

When I played for Oakland, the guys who weren’t playing tended to congregate at the far end of the dugout, next to the bat rack. Mind you, that was usually me. It’s kind of a weird place to stand since we were pretty much always in the way, but there weren’t a ton of options.

One of those days, I was down there with Nick Punto. I didn’t spend much time with him, but he was one of the funniest guys I’ve played with. He had just dispatched Billy Burns up the approximately fifteen flights of stairs to the clubhouse to make him a Pb&J. While we were waiting for Billy, I was asking Nick about how the game had changed since he’d started playing. He debuted in 2001. It was 2014.

I wasn’t taking notes, but I’d paraphrase what he said as, “Bullpens are way nastier than they used to be.”

Side note: It was probably fate that the first thing he thought of was the bullpen. I still can’t think of bullpens without thinking of the 2014 Royals. For those of us in the dugout, the Wild Card Game that year was heartbreaking. We had a four run lead in the eighth. I knew I’d come off the playoff roster if we won and went to Anaheim, but I’d get to make the trip, not to mention collect a full playoff share. What I didn’t know was that it would be my last game in the big leagues. 

I was trying to fathom what playing fourteen years in the big leagues would be like when Billy got back down. He had just gotten called up for September and (like me) wanted to be on the veterans’ good side. He was walking towards us when Punto gave us a quick wink.

“I said CRUNCHY peanut butter!” he yelled. “Go get another one.” And he took the sandwich and stomped on it.

Bullpens Aren’t Created Equal

As the right-handed hitting half of a first base platoon, I needed to be ready for lefty relievers. I’d get to the field and watch video on all the lefties in the other team’s pen. And in the fifth inning, I’d go inside and start getting loose and hitting flips in case I got to pinch hit. I was always asking for flips. I probably annoyed the hell out of Chili Davis, our hitting coach, especially since there was usually little chance I’d actually pinch hit.

There was a lot of variation in what we could see out of a bullpen.

We’ve talked a lot about how league average has changed. Do I even need to link to a story about how strikeouts and velocity have been rising? You’re reading the community blog at FanGraphs. You already knew.

With all the talk about aggregate changes, I think something that gets lost in the discussion is how some teams just have nastier pens than others. It’s tempting to see league average fastball velocity and forget that it’s just an average.

I’ve been thinking about what Shredder said that day (Nick Punto’s nickname is Shredder). Yes, bullpens as a whole have changed, but can we look at individual ones? Can we assign a “beginning”, “middle” and “end” to this story? Can we categorize the bullpens by where in the story they fall?

Let’s Try

It just so happens that FanGraphs has velocity and plate discipline stats going back to 2002, which is basically when Nick Punto started playing. That’s the data I used for this post. I did the analysis, and made the graphs, in R.

Our first chart represents fastball velocity in four seasons: 2002, 2008, 2013 and 2018.

It’s clear that relievers are throwing harder. What’s interesting is the 2002 curve is much more spread out. There was more team-to-team variation in what you’d see out of the pen. See that little blip all the way at the left? That’s the 2002 Expos, averaging 86.2 mph. Yes, with the fastball.

Fast forward to 2018. The curves are filled in with 80% opacity so we can see what’s behind them. Sure enough, all the way at the right, we’re in pretty uncharted territory. That’s the Yankees and Pirates, both averaging 95mph.

In case you’re wondering, that purple outlier hovering by itself at 90mph is the Padres (and their 87 bullpen ERA-).

More Than Velocity

It’s fun to look at how velocity has evolved, but I’d like to try looking at more variables. In fact, I’d like to look at ten variables and try to see how they fit together. We’re only going to be looking at input variables such as velocity and swing percentages. I’m not going to use results variables like ERA or WAR. I gathered the data from FanGraphs and built a correlation chart:

It’s not surprising to see that fastball velocity has a 0.65 correlation with o-swing percentage. As a hitter, it’s pretty simple. The faster the ball comes in, the less time you have to make a good decision. It’s also pretty straightforward to see that o-swing% and zone % (percent of pitches that are in the zone) have a strong negative correlation. If you’re going to swing outside the zone, I’m going to throw it outside the zone.

It also looks like hard throwing pitchers sacrifice control for velo (Zone and FBv correlate at -0.62). That or they take advantage of the higher o-swing% afforded by said velo and throw more pitches for chase. That’s so 2018.

I was interested to see that fastball velocity has a -0.29 correlation with fastball percentage. Brandon Moss used to say that pitchers who throw the hardest seem to use their fastballs the least. He may have been on to something.

Fun with Dimensionality Reduction

Now let’s use these variables to make a ten dimensional graph! In order to do this, we’ll need to start with a principal components analysis. PCA creates new variables, called principal components, that are linear combinations of our original ten. What’s nice is we can now express our data in terms of these new variables. Because each principal component draws from all ten of the original variables, we can actually graph our ten dimensional data using just two axes: Principal Components 1 and 2.

Before we move on, let’s take a look at our new variables:

In the correlation circle above, the horizontal axis is Principal Component 1 and the vertical axis is Principal Component 2. Each arrow corresponds to one of our original variables from FanGraphs. In order to interpret the arrows, we’ll start by look at how far they go horizontally. Let’s look at O.Swing%. It points very far to the right, but only a little bit down. That means that Principal Component 1 (horizontal axis) has a strong positive correlation with O.Swing. In other words, if you have a high score for PC1, it’s associated with having a high O.Swing rate. The fact that it only points a little bit downward means PC2 has only a weak negative correlation.

We can see that PC1 is going to be associated with arrows that point far to the right (positive) or left (negative). So PC1 looks like it’s going to be associated with high O.Swing rates, high fastball velocity, and high swinging strike rates. It will also be negatively associated with high zone rate, high fastball percentage and high contact rate. In summary, if you score high on PC1, you throw hard, throw a lot of offspeed, get lots of swinging strikes and throw lots of pitches out of the zone. Sounds familiar.

Let’s look at PC2. This one looks like it’s most associated with low contact rates.

One more point to make. Those percentages on the axis labels represent the percentage of the total variance that each PC captures. So by using PC1 and PC2 together, we can see over half the variance of our ten dimensional data.

K-Means and PCA Chart

I said earlier that I hoped our story would have a beginning, a middle and an end. I wanted to see if there were three distinct phases to the evolution of bullpens since the beginning of Punto’s career. To help visualize this, I ran a machine learning algorithm called K-means. It “learns” the data and generates clusters centered at different points. In order to run the algorithm, you have to specify how many clusters you want. I marked three (k=3). Ideally, the three clusters would represent some kind of narrative. (I got the idea for this method here.)

Finally, here’s the graph:

There’s a lot going on here. We’re looking at a two-dimensional representation of ten-dimensional data. The dots represent each team bullpen since 2002. The circles contain the bullpens in four different seasons: 2002, 2008, 2014 and 2018. Finally, the colors are our clusters. Sure enough, the clusters give us a pretty decent story. The points are basically moving from left to right.

These axes are our principal components. Like we said earlier, having a high score in PC1 means you throw hard, throw a lot of offspeed, throw lots of pitches for chase, and get lots of swinging strikes. The data is clearly moving to the right as the years go by, which means all of these things are increasing.

What’s cool is that the k-means algorithm settled on three clusters that definitely demonstrate an evolution in bullpens. We can call these “Phase 1,” “Phase 2,” and “Phase 3.” These are arbitrary names and even picking three was an arbitrary number, but it can help tell a story. Intuitively, a team in Phase 1 pitches like a 2002 bullpen, whatever that means. A team in Phase 3 pitches like a 2018 bullpen.

To simplify, I made another graph with just the four years we’ve been talking about.

The three cluster centers are in red. The 2014 Royals are their own color, as are the 2018 Yankees.

Phase 1 is associated with the lowest values of PC1. In Phase 2, the values of PC1 are higher but the PC2 values are lower. In Phase 3, the PC1 values are the highest, while PC2 is approximately equal to Phase 1. Again, these are abstract, but just meant to tell a story.

Every team in 2002 was in Phase 1. By 2008, the game had clearly changed. The circles hardly overlap and while the 2002 circle contained all bullpens in Phase 1, the 2008 circle has bullpens in all three phases. 2002 to 2008 appears to have the most drastic changes.

I figured that the 2014 Royals would be some type of temporal outlier. They were one of the only teams that didn’t try to play matchups to get those last nine outs. They didn’t need to. Herrera, Davis, Holland. I’d be hitting flips in the cage next to the visitors dugout in Kauffman, but once those guys came in the game the righty pinch hitters could pretty much sit back down.

It turns out that they are a Phase 2 bullpen right in the middle of the other 2014 teams. They had some guys that threw gas, but in terms of the way they attacked hitters, it was still a 2014 approach.

The 2018 circle is much more spread out. Twenty-three bullpens look like they could be at home in 2014 or even 2008, but there are seven outliers:

Rather than point to outliers in one variable such as velocity, we can look at these seven bullpens and say that using all ten of the original FanGraphs variables, these are some of the most unique bullpens we’ve seen.

In 2018, twenty-five out of the thirty teams are pitching in Phase 3. Again, this has nothing to do with success variables like WAR or ERA. It’s more about their velocity, their mix of pitches, and how they attack the strike zone.

If you’re interested, the five Phase 2 bullpens of 2018:

Cardinals, DBacks, Marlins, Reds and Royals.

And the point is?

It would be interesting to explore PCA and k-means further, maybe even look at starting rotations. PCA is pretty abstract, especially compared to something like ERA- or FIP. I wanted to dive into this to see if we could visualize the way things have changed. The k-means gave us a cool breakdown of the story, which we arbitrarily called Phases 1, 2 and 3. It was a fun way to represent how the game has changed.

Thanks, Shredder.

2013-2014 Oakland A’s

newest oldest most voted

Wait, when did the conversation with Punto happen if he was reminiscing from a big league dugout about his last big league game the prior fall?

Travis L
Travis L

This is pretty interesting! But what separates the usefulness of this work (standard ML analysis) from the linear stuff that dominates sabermetrics is the interpretation. Unless we can find some interpretability that makes sense and supports better decision making, I’m afraid this type of research will always end with an unsatisfying conclusion. Please don’t take this as a criticism of your excellent work. We need more of this kind of thing! But I’d like to challenge all the baseball ML practictioners to go beyond a K-Means tutorial and actually provide decisioning support or unique insight into the game.


FG: hire this man….