I was watching the Twins game a few weeks ago when veteran Jamey Carroll effortlessly took an outside pitch to right field, as one might hope he would. The announcers were quick to praise his ability to “go with the pitch”. I’ve seen this play out time after time, often followed by praise for “going with the pitch” and “not trying to do too much”. That got me thinking, do some hitters go with the pitch better than others? Is this a desirable skill or does it leave the hitter vulnerable? Can a defense exploit this trait with a defensive shift much like we see shifts on straight pull hitters?
To dive into this I captured the angle of each hit ball since 2010 and displaced that against the angle that I expected the pitch to be hit. For example, an inside pitch on a right-handed batter could be expected to be hit near the left field line, while an outside pitch could be expected to be hit near the right field line. Everything in between would be evenly spread across the field, relative to the pitch’s location across the plate.
To make it a little more accurate for right-handed hitters vs left-handed hitters, I analyzed the actual pitch placement for pitches that become hit balls. As you can see below, all hitters prefer the ball just a touch on the outside part of the plate. I took two standard deviations of the hit pitches and considered that the spectrum that we’ll map to the field, with unique values for right or left handed hitters. We’ll call this our hit zone.
The players that made it to the top of the data below are the ones that tend to go with the pitch. That is, they take the outside pitch to the opposite field, they pull an inside pitch, and they take a pitch down the middle of the plate straight through center field. They are less random and more predictable.
With that, here are the most predictable hitters of 2013 through August 10th.
Average Absolute Angle Difference
Mean Angle Difference (Pull Tendency)
For comparison sake, here are the 10 least predictable hitters.
Let’s explain this data before we go any further.
First off, the field is 90 degrees and thus, the values are all in degrees.
Looking at the data back to 2010 I found these players continually near the top. It seems for them, they have always hit this way, and can be expected to continue to hit this way.
Now, what can we do with this knowledge? Can a defense use the left-handed shift on a right-handed hitter? To look at this we’ll look at spray charts, but with a very important distinction from a standard spray chart – we’ll limit the hit balls to those hit on pitches on the outside of the hit zone.
I’ll start you off with a spray chart for someone not on our list – Jose Bautista. This chart shows where he hits outside pitches. He looks like a good spray hitter when you look at only the outside pitches. As a defense, you wouldn’t shift on Bautista AND pitch him outside.
Let’s move on to someone who was continually at the top of our list, Marco Scutaro. You’ll see Scutaro reliably hits balls on the outer third of the hit zone to the right side. He still hits a fair number of ground balls across the infield, so an infield shift wouldn’t be advised. But liners and fly balls in the outfield are heavily weighted to the right. Using a control pitcher, pitching on the outside ⅓ of the hit zone, you could reliably shade the outfield to right field.
The same applies for Jamey Carroll, another player who, like Scutaro, shows up on our list year after year.
I’ve found that the tendency of pushing the ball on outside pitches to be much more predictable with our leaders than pulling the ball on an inside pitch. There’s surely more to be gleaned from this data, but the outfield shift on these predictable push hitters is definitely the most interesting.
Data Collection & Mining Techniques
The metrics for all hitters, year-by-year back to 2010 can be found here: https://docs.google.com/spreadsheet/ccc?key=0AtERgAQ83pATdDItUzAxXzhMZm41cGFPRjgxOEdZa0E&usp=sharing
All of the data used in this post was loaded from MLB’s gameday servers into a MongoDB database using my atbat-mongodb project. This project is open source code that anybody can use, modify, contribute to, etc. Fork me please!
The following programs were used to mine and plot the data from the mlbatbat MongoDB database.
We all know that Miguel Cabrera had a phenomenal year in 2012, winning the Triple Crown and later being named the American League MVP. His 44 home runs and .330 batting average are all his own but the 139 RBI he amassed are a shared number, as he couldn’t accumulate RBI without the R (runners). What if everybody had Cabrera’s opportunities? Would others have eclipsed his RBI total?
To analyze this I calculated a percentage measure called the Runner Movement Indicator, or RMI for short. It’s a simple calculation once you have the data. Each time a batter comes to the plate with a runner on base, the potential bases that the runners can move are added together. A runner on 1st can move three total bases, 2nd base can move two and 3rd base can move one. Then, at the end of the at-bat, the final positions of the runners are compared with their starting position to determine the total bases moved out of the potential bases. For example if Cabrera gets a single with a runner on 1st, moving the runner to 3rd base, he is awarded two of the possible three bases, for a 0.667 clip. By calculating RMI as a percentage of the opportunities, we’re factoring out the increased benefit Cabrera gets from his stellar teammates.
One of the beautiful things about RMI is not just that it is a simple calculation, but that it reads nearly like a batting average. This makes it is immediately easy to tell the good from the bad. Below is a histogram of the RMI for all qualifying players in 2012.
Now let’s overlay that with the batting averages from the same year in red. You’ll see the distribution is quite similar.
One might think that players with high batting averages also have high RMI, but that’s not quite the case. If we try to correlate RMI with Batting Average, OBP or SLG, we stay below a 0.5 R2 in each case although all with the expected positive slopes.
RMI vs BA
RMI vs OBP
RMI vs SLG
Now that we know a little about RMI, let’s look at the leaders from 2012.
Actual Bases Moved
Potential Bases Moved
We see that Cabrera is 7th on the list for 2012. Still great, but not the best. We also see that Joey Votto moved runners around the bases at the highest rate, 26 points higher than Cabrera. So let’s use the RMI data above to see if anybody would have taken over the RBI lead given the same opportunities as Cabrera.
To do this we first subtract home runs from RBI, as the batter’s own bases aren’t used in RMI. Of Cabrera’s 139 RBI in 2012, 44 came from himself scoring on his own home run. This means he had 95 RMI influenced RBI based on a 0.316 RMI. If we apply this same ratio to Votto’s RMI of 0.342 we get 103 RBI. Votto’s 14 home runs bring him up to 117 RBI, still well shy of Cabrera.
Of course we know that Josh Hamilton was the one chasing Cabrera’s home run total in 2012, so let’s do the same calculation with him. Hamilton’s 0.323 RMI would give him 98 equivalent RBI. Adding in his 43 home runs brings him to 141 RBI, 2 higher than Cabrera. Too close to call? Nah… Hamilton wins.
The ability to get on base is one of the best predictive factors of runs and therefore wins. It gets better if you add RMI but they should be considered a distinct contribution. RMI leaders may not have great batting averages and vice versa. Undervalued players can be found with high RMI that have average OBP and BA stats.
Complete player and team RMI stats can be found on with the links below
Player RMIs from 2010 to 2013
Team RMIs from 2010 to 2013
Data Collection & Mining Techniques
All data aggregation code and charts are written in Python using MongoClient, matplotlib, scipy and numpy modules. You can find that code on github as well. https://github.com/kruser/mlb-research
Other Notes on RMI
After collecting my data I ran across Gary Hardegree’s Base-Advance Average paper from 2005, which does a nearly similar calculation, with the exception that it gives the batter credit for moving themselves. I prefer to keep this a clutch stat and remove the batter’s bases.
The RMI data does not correlate to team run production as high as Batting Average, Slugging Percentage or On-Base Percentage. Adding OBP to RMI correlates much higher, but then again, that’s what a run is–getting on base and moving around to home. So there isn’t anything noteworthy enough there to post numbers.
In order to qualify for my list a batter must have a minimum of two potential base movement opportunities per game. Opportunities fluctuate largely among regular players so it is important not to keep this requirement too low.