One of the founding notions of sabermetrics has been the emphasis of the walk. Before sabermetrics, in the dark ages, people hardly paid attention to the walk. Teams would pay players based on there batting average, HR, and RBIs and no one really put a lot of stock on the “scrappy” player who would draw walks and get on base. Sabermetrics essentially started around the mid 1900s and one of their founding principals was that the walk was way undervalued. Now the walk is deemed as an extremely valuable tool, and organizations will often pay a heavy hand for someone with a good walk rate. But what if the value of the walk was dropping, what if a walk in today’s game was not nearly as valuable as it use to be? Baseball you see is a living organism and is prone to change, just because something was valuable in the past, doesn’t mean it’s valuable in the present. We constantly need to be adjusting to the value of certain strategies and skills in order to stay ahead of the game.
This essentially all started when I looked at the correlation between pitches per plate appearance (Pit/PA) and runs scored per game (R/G), for 2014, and found that there was no real correlation (You can find the article here). I therefore decided to expand the data pool, look through a twenty year span to examine if 2014, was an anomaly, part of a consistent trend, or if Pit/PA never really had any correlation with (R/G).
So what I did was, I calculated the correlation coefficient of Pit/PA and R/G dating all the way back to 1994, for each individual year. If you don’t know what correlation coefficient is, or what is a strong or week correlation coefficient, I explain it, in my previous article. Anyways, the data that I found had a high level of variance. I did, however display two labels, the largest correlation coefficient in the last twenty years and the smallest. Why? Because although there is a large variation in the data from year to year, and it wouldn’t be unreasonable to believe that Pit/PA has a much higher correlation to R/G in 2015, it still is displaying a downward trend.
1994 had the highest correlation, while 2014 had the lowest correlation. So at this point you’ve probably noticed the variation and downward trend. Essentially what this tells us is that Pit/PA’s correlation with R/G is basically unpredictable. If your team, for example, sees a lot of pitches, it doesn’t mean that they will have a good offense. In fact if someone says that this team sees a lot of pitches and it’s a good thing, well he’s probably just blurting crap out. This is not to suggest that that individual is wrong, it is rather to suggest that seeing pitches doesn’t have a consistent correlation with runs scored. It is rather difficult then or impractical to come to any conclusion from this data set.
Now, what follows is an examination of similar trends and stronger trends of data. Oh, and I almost forgot, you’re also probably wondering well what about the base on balls, what was the point of that introduction? Well after I looked at the correlation between Pit/PA and R/G, I took a look at the correlation between BB% and R/G for 2014.
This basically shows no distinct correlation between BB% and R/G in 2014. Then I calculated the correlation coefficient to get an exact number, and got R=0.0908. Essentially this displays that there was no correlation between BB% and R/G in 2014.
I therefore ran the numbers again, for 20 years, to see if this was just an abnormality in the data. I also wanted to get a sense of whether there was a specific trend.
For this chart I decided to display all the data sets, to give you an idea of what the correlations looked like. The two, however, that I really want you to focus on are the 2012 correlation (R=0.083) and 2014 (R=0.0908) correlation. Both of these years show a significant drop-off in the correlation between BB% and R/G. Before there was always a positive correlation between the two data points, even at times strong correlations. In 2014 and 2012, however, there was essentially no correlation between BB% and R/G.
So what does this mean? Why the sudden drop in data correlation and will it continue? I also found it odd that in 2013, the correlation went all the way back up to R=0.4749, which is not the strongest correlation, but still a good one.
First, however, before we try to answer the two questions I’ve asked, let’s look at another set of correlation data, and that’s the correlation between BB% and OBP. Why? Well my hypothesis was if the correlation between BB% and OBP is getting smaller than naturally the correlation between BB% and R/G would get smaller as well.
As you might be able to tell although less drastic the correlation between BB% and OBP has similar results to the correlation between BB% and R/G. Again the part of the graph, which you should focus on is the two outlier data points. Again they are 2012 (R=0.2317) and 2014 (R=0.3570). This at this point gives us some explanation for the two outlier data points in the previous graph.
Essentially what one needs to understand from this is, since BB% is becoming less correlated with OBP, it’s evidently going to have a lesser correlation with R/G. Since the primary value of a BB is the effect it has on the OBP (obviously though not the only). Also generally and through the 20 years of data there has been a strong correlation between BB% and OBP. Apart from 2012 and 2014 where their correlation is weaker, although still a positive correlation.
So now we need to understand this, if the walk has a small correlation with OBP, then its value will be significantly affected. The problem here is trying to figure out why in 2012 and 2014 there was a sudden drop in its correlation with OBP. My first hypothesis was that it had something to do with the overall BB% of the league.
In hindsight this was probably a simplistic hypothesis. At this point you’ve probably figured out that this was not the answer. Yes, the overall BB% is trending down, just like the previous charts, but the difference is that it doesn’t have the outliers of 2012 and 2014. (I included this to dispel a possible easy assumption to the answer.)
There are in fact several possibilities for the drop in correlation between BB% and OBP. Perhaps it’s the shift, perhaps it’s the low run environment, perhaps it’s high rise in strikeouts. I think another interesting element to look at it is how are hitters doing later in the count. Considering the rise in strikeouts, it’s probably not unreasonable to assume that hitters are performing worse than ever when hitting with two strikes, although this of course is just a hypothesis. The answer to that question is for another study, for another day. What is certain, however, is that this upcoming season will be a fascinating data point. Will the correlations keep getting smaller or are these two data points just truly abnormalities? In any case I think it’s important to consider this, baseball is an ever changing game, and just because something has value one year, doesn’t mean it has value another. Teams need to keep changing and mixing their strategies in order to stay ahead in this wacky game.
Finally, something to note: these data sets are not meant to arrive to any conclusion. I have not arrived at any conclusions about baseball through this data. What it does is, it raises more questions for further and more detailed and elaborate studies. For, example it would be interesting, for Pit/PA to look at it from a pitchers point of view, although I’m not sure that would give us different results. These data sets are also general; they give us a general idea of the situation. Perhaps there are specific teams or players that thrive on seeing a lot of pitches or that do translate a high number of BBs into runs. Also and this might be the most important element to note, correlations aren’t always linked with causation. For example, pop fly’s may have a positive correlation with Pit/PA, that doesn’t mean that pop fly’s caused Pit/PA. What correlations, however, can do is direct us into the right direction to finding the causation. It is a measure or a way of advancing and creating more elaborate and specific research.
So I conclude, now that one has digested all this data, is it time to re-evaluate the value of a walk?
All data courtesy of baseball reference.