Author Archive

Nick Punto was Right: Evaluating the Game’s Dramatic Bullpen Evolution via Machine Learning

When I played for Oakland, the guys who weren’t playing tended to congregate at the far end of the dugout, next to the bat rack. Mind you, that was usually me. It’s kind of a weird place to stand since we were pretty much always in the way, but there weren’t a ton of options.

One of those days, I was down there with Nick Punto. I didn’t spend much time with him, but he was one of the funniest guys I’ve played with. He had just dispatched Billy Burns up the approximately fifteen flights of stairs to the clubhouse to make him a Pb&J. While we were waiting for Billy, I was asking Nick about how the game had changed since he’d started playing. He debuted in 2001. It was 2014.

I wasn’t taking notes, but I’d paraphrase what he said as, “Bullpens are way nastier than they used to be.”

Side note: It was probably fate that the first thing he thought of was the bullpen. I still can’t think of bullpens without thinking of the 2014 Royals. For those of us in the dugout, the Wild Card Game that year was heartbreaking. We had a four run lead in the eighth. I knew I’d come off the playoff roster if we won and went to Anaheim, but I’d get to make the trip, not to mention collect a full playoff share. What I didn’t know was that it would be my last game in the big leagues. 

I was trying to fathom what playing fourteen years in the big leagues would be like when Billy got back down. He had just gotten called up for September and (like me) wanted to be on the veterans’ good side. He was walking towards us when Punto gave us a quick wink.

“I said CRUNCHY peanut butter!” he yelled. “Go get another one.” And he took the sandwich and stomped on it.

Bullpens Aren’t Created Equal

As the right-handed hitting half of a first base platoon, I needed to be ready for lefty relievers. I’d get to the field and watch video on all the lefties in the other team’s pen. And in the fifth inning, I’d go inside and start getting loose and hitting flips in case I got to pinch hit. I was always asking for flips. I probably annoyed the hell out of Chili Davis, our hitting coach, especially since there was usually little chance I’d actually pinch hit.

There was a lot of variation in what we could see out of a bullpen.

We’ve talked a lot about how league average has changed. Do I even need to link to a story about how strikeouts and velocity have been rising? You’re reading the community blog at FanGraphs. You already knew.

With all the talk about aggregate changes, I think something that gets lost in the discussion is how some teams just have nastier pens than others. It’s tempting to see league average fastball velocity and forget that it’s just an average.

I’ve been thinking about what Shredder said that day (Nick Punto’s nickname is Shredder). Yes, bullpens as a whole have changed, but can we look at individual ones? Can we assign a “beginning”, “middle” and “end” to this story? Can we categorize the bullpens by where in the story they fall?

Let’s Try

It just so happens that FanGraphs has velocity and plate discipline stats going back to 2002, which is basically when Nick Punto started playing. That’s the data I used for this post. I did the analysis, and made the graphs, in R.

Our first chart represents fastball velocity in four seasons: 2002, 2008, 2013 and 2018.

It’s clear that relievers are throwing harder. What’s interesting is the 2002 curve is much more spread out. There was more team-to-team variation in what you’d see out of the pen. See that little blip all the way at the left? That’s the 2002 Expos, averaging 86.2 mph. Yes, with the fastball.

Fast forward to 2018. The curves are filled in with 80% opacity so we can see what’s behind them. Sure enough, all the way at the right, we’re in pretty uncharted territory. That’s the Yankees and Pirates, both averaging 95mph.

In case you’re wondering, that purple outlier hovering by itself at 90mph is the Padres (and their 87 bullpen ERA-).

More Than Velocity

It’s fun to look at how velocity has evolved, but I’d like to try looking at more variables. In fact, I’d like to look at ten variables and try to see how they fit together. We’re only going to be looking at input variables such as velocity and swing percentages. I’m not going to use results variables like ERA or WAR. I gathered the data from FanGraphs and built a correlation chart:

It’s not surprising to see that fastball velocity has a 0.65 correlation with o-swing percentage. As a hitter, it’s pretty simple. The faster the ball comes in, the less time you have to make a good decision. It’s also pretty straightforward to see that o-swing% and zone % (percent of pitches that are in the zone) have a strong negative correlation. If you’re going to swing outside the zone, I’m going to throw it outside the zone.

It also looks like hard throwing pitchers sacrifice control for velo (Zone and FBv correlate at -0.62). That or they take advantage of the higher o-swing% afforded by said velo and throw more pitches for chase. That’s so 2018.

I was interested to see that fastball velocity has a -0.29 correlation with fastball percentage. Brandon Moss used to say that pitchers who throw the hardest seem to use their fastballs the least. He may have been on to something.

Fun with Dimensionality Reduction

Now let’s use these variables to make a ten dimensional graph! In order to do this, we’ll need to start with a principal components analysis. PCA creates new variables, called principal components, that are linear combinations of our original ten. What’s nice is we can now express our data in terms of these new variables. Because each principal component draws from all ten of the original variables, we can actually graph our ten dimensional data using just two axes: Principal Components 1 and 2.

Before we move on, let’s take a look at our new variables:

In the correlation circle above, the horizontal axis is Principal Component 1 and the vertical axis is Principal Component 2. Each arrow corresponds to one of our original variables from FanGraphs. In order to interpret the arrows, we’ll start by look at how far they go horizontally. Let’s look at O.Swing%. It points very far to the right, but only a little bit down. That means that Principal Component 1 (horizontal axis) has a strong positive correlation with O.Swing. In other words, if you have a high score for PC1, it’s associated with having a high O.Swing rate. The fact that it only points a little bit downward means PC2 has only a weak negative correlation.

We can see that PC1 is going to be associated with arrows that point far to the right (positive) or left (negative). So PC1 looks like it’s going to be associated with high O.Swing rates, high fastball velocity, and high swinging strike rates. It will also be negatively associated with high zone rate, high fastball percentage and high contact rate. In summary, if you score high on PC1, you throw hard, throw a lot of offspeed, get lots of swinging strikes and throw lots of pitches out of the zone. Sounds familiar.

Let’s look at PC2. This one looks like it’s most associated with low contact rates.

One more point to make. Those percentages on the axis labels represent the percentage of the total variance that each PC captures. So by using PC1 and PC2 together, we can see over half the variance of our ten dimensional data.

K-Means and PCA Chart

I said earlier that I hoped our story would have a beginning, a middle and an end. I wanted to see if there were three distinct phases to the evolution of bullpens since the beginning of Punto’s career. To help visualize this, I ran a machine learning algorithm called K-means. It “learns” the data and generates clusters centered at different points. In order to run the algorithm, you have to specify how many clusters you want. I marked three (k=3). Ideally, the three clusters would represent some kind of narrative. (I got the idea for this method here.)

Finally, here’s the graph:

There’s a lot going on here. We’re looking at a two-dimensional representation of ten-dimensional data. The dots represent each team bullpen since 2002. The circles contain the bullpens in four different seasons: 2002, 2008, 2014 and 2018. Finally, the colors are our clusters. Sure enough, the clusters give us a pretty decent story. The points are basically moving from left to right.

These axes are our principal components. Like we said earlier, having a high score in PC1 means you throw hard, throw a lot of offspeed, throw lots of pitches for chase, and get lots of swinging strikes. The data is clearly moving to the right as the years go by, which means all of these things are increasing.

What’s cool is that the k-means algorithm settled on three clusters that definitely demonstrate an evolution in bullpens. We can call these “Phase 1,” “Phase 2,” and “Phase 3.” These are arbitrary names and even picking three was an arbitrary number, but it can help tell a story. Intuitively, a team in Phase 1 pitches like a 2002 bullpen, whatever that means. A team in Phase 3 pitches like a 2018 bullpen.

To simplify, I made another graph with just the four years we’ve been talking about.

The three cluster centers are in red. The 2014 Royals are their own color, as are the 2018 Yankees.

Phase 1 is associated with the lowest values of PC1. In Phase 2, the values of PC1 are higher but the PC2 values are lower. In Phase 3, the PC1 values are the highest, while PC2 is approximately equal to Phase 1. Again, these are abstract, but just meant to tell a story.

Every team in 2002 was in Phase 1. By 2008, the game had clearly changed. The circles hardly overlap and while the 2002 circle contained all bullpens in Phase 1, the 2008 circle has bullpens in all three phases. 2002 to 2008 appears to have the most drastic changes.

I figured that the 2014 Royals would be some type of temporal outlier. They were one of the only teams that didn’t try to play matchups to get those last nine outs. They didn’t need to. Herrera, Davis, Holland. I’d be hitting flips in the cage next to the visitors dugout in Kauffman, but once those guys came in the game the righty pinch hitters could pretty much sit back down.

It turns out that they are a Phase 2 bullpen right in the middle of the other 2014 teams. They had some guys that threw gas, but in terms of the way they attacked hitters, it was still a 2014 approach.

The 2018 circle is much more spread out. Twenty-three bullpens look like they could be at home in 2014 or even 2008, but there are seven outliers:

Rather than point to outliers in one variable such as velocity, we can look at these seven bullpens and say that using all ten of the original FanGraphs variables, these are some of the most unique bullpens we’ve seen.

In 2018, twenty-five out of the thirty teams are pitching in Phase 3. Again, this has nothing to do with success variables like WAR or ERA. It’s more about their velocity, their mix of pitches, and how they attack the strike zone.

If you’re interested, the five Phase 2 bullpens of 2018:

Cardinals, DBacks, Marlins, Reds and Royals.

And the point is?

It would be interesting to explore PCA and k-means further, maybe even look at starting rotations. PCA is pretty abstract, especially compared to something like ERA- or FIP. I wanted to dive into this to see if we could visualize the way things have changed. The k-means gave us a cool breakdown of the story, which we arbitrarily called Phases 1, 2 and 3. It was a fun way to represent how the game has changed.

Thanks, Shredder.


A player’s take on xwOBA

When I was playing in the Arizona Fall League in 2012, I led the league in line-outs. At least it seemed like it. It was the fall before I was Rule-5 eligible and I was hoping to show the Padres I could hit high level pitching. Unfortunately, a .726 OPS in the desert wasn’t going to have them breaking down my door with a team-friendly extension in hand.

If only there were x-stats! XwOBA is the shiny new eight-figure toy that we hitters can play with after an 0-15 slump. “But I was hitting the ball hard. See, look!” Back in the pre-Statcast dark-ages, a lineout might have had some anecdotal benefit buried in the bottom of a report. Now we have the data.

There’s been a lot written about xwOBA this week. Craig EdwardsTom Tango and Jonathan Judge have all weighed in. I was especially interested in the ways they addressed it’s predictive capabilities.

Judge’s study compared season xwOBA for pitchers with the following season. Tango explored the correlations of small sample sizes of xwOBA to a larger sample.

I looked at this through the lens of a player. When a guy is getting lots of hits but they are bloopers and seeing-eye grounders (remember when ground balls went through the infield?) it’s soft hot streak. Likewise, a guy might be hitting the loudest .220 in the history of the PCL.

If you’re hitting the ball hard, they’ll start falling. Right? I wanted to test this theory by measuring xwOBA’s predictive capability month-to-month.

Methodology

(All data from BaseballSavant)

I started by getting data for each month of the regular season in the Statcast Era (2015-) for players with 50 PA in that month. I then did a series of inner joins in R to get what I’ll call “double-months.” A double month is when a player has 50 PA in two consecutive months. So Aaron Judge in April-May 2017 is one player-double-month. 

The column labels in the Double Month data frame were: “wOBA,” “xwOBA,” and “Next month wOBA.” I ended up with 3,173 data points. Running these correlations gives us an idea of how your month might predict your next month.

I also wanted to see whether you’d be better off using your entire previous season to predict the next month. For this I got full-season data (min 200 PA) for 2015 and 2016 and did another series of inner joins to get a data frame representing the previous full-season metrics and the current month metric. These columns would look like this:

“Previous season wOBA,” “Previous Season xwOBA,” “Current season month wOBA.”

I got 2311 of these data points.

For good measure, I also created a data frame for double-seasons. If you had 200 PA in two consecutive seasons, congratulations: you just got a double-season. There ended up being 532 of them.

Finally, I ran all the correlations.

Results

Double-Months

wOBA to Next Month wOBA: r=0.203

xwOBA to Next Month wOBA: r=0.274

 

Previous season to current month:

wOBA to wOBA: r=0.238

xwOBA to wOBA: r=0.25

 

Double-Seasons

wOBA to wOBA: 0.403

xwOBA to wOBA: 0.451

 

The differences are small, but they are consistent. xwOBA appears to be a better short term predictor than wOBA. What interested me the most was that while wOBA predicts your next month better if used in large sample size, the opposite is true for xwOBA. If you want to use xwOBA, you’re (slightly) better off using the most recent data.

Let’s talk about this in baseball terms. Baseball is so complex that a couple broken bat bloopers here and there can give you a really good month. Maybe you’re getting shifted but the pitcher doesn’t execute his spot and misses away and you shoot the wide open side of the infield a couple times. Maybe you made the mistake of hitting the ball hard in the middle of the field against the Cubs. Stats like wOBA practically scream regression to the mean.

But there’s no hiding from Statcast. If you’re hitting the ball hard it probably means you’re seeing the ball well and are consistently on time. Plate appearances aren’t independent events; we feel things in the cage one day that might get us locked in for a week. Or the other way around.