When Slugging Percentage Beats On-Base Percentage
What’s the single most important offensive statistic? I imagine most of us who have bookmarked FanGraphs would not say batting average or RBIs. A lot of us would name wOBA or wRC+. But neither of those are the types of things you can calculate in your head. If I go to a game, and a batter goes 1-for-4 with a double and a walk, I know that he batted .250 with a .400 on-base percentage and a .500 slugging percentage. I can do that in my head.
So of the easily calculated numbers — the ones you might see on a TV broadcast, or on your local Jumbotron — what’s the best? I’d guess that if you polled a bunch of knowledgeable fans, on-base percentage would get a plurality of the votes. There’d be some support for OPS too, I imagine, though OPS is on the brink of can’t-do-it-in-your-head. Slugging percentage would be in the mix, too. Batting average would be pretty far down the list.
I think there are two reasons for on-base percentage’s popularity. First, of course, is Moneyball. Michael Lewis demonstrated how there was a market inefficiency in valuing players with good on-base skills in 2002. The second reason is that it makes intuitive sense. You got on base, you mess with the pitcher’s windup and the fielders’ alignment, and good things can happen, scoring-wise.
To check, I looked at every team from 1914 through 2015 — the entire Retrosheet era, encompassing 2,198 team-seasons. I calculated the correlation coefficient between a team’s on-base percentage and its runs per game. And, it turns out, it’s pretty high — 0.890. That means, roughly, that you can explain nearly 80% of a team’s scoring by looking at its on-base percentage. Slugging percentage is close behind, at 0.867. Batting average, unsurprisingly, is worse (0.812), while OPS, also unsurprisingly, is better (0.944).
But that difference doesn’t mean that OBP>SLG is an iron rule. Take 2015, for example. The correlation coefficient between on-base percentage and runs per game for the 30 teams last year was just 0.644, compared to 0.875 for slugging percentage. Slugging won in 2014 too, 0.857-0.797. And 2013, 0.896-0.894. And 2012, and 2011, and 2010, and 2009, and every single year starting in the Moneyball season of 2002. Slugging percentage, not on-base percentage, is on a 14-year run as the best predictor of offense.
And it turns out that the choice of endpoints matter. On-base percentage has a higher correlation coefficient to scoring than slugging percentage for the period 1914-2015. But slugging percentage explains scoring better in the period 1939-2015 and every subsequent span ending in the present. Slugging percentage, not on-base percentage, is most closely linked to run scoring in modern baseball.
Let me show that graphically. I calculated the correlation coefficient between slugging percentage and scoring, minus the correlation coefficient between on-base percentage and scoring. A positive number means that slugging percentage did a better job of explaining scoring, and a negative number means that on-base percentage did better. I looked at three-year periods (to smooth out the data) from 1914 to 2015, so on the graph below, the label 1916 represents the years 1914-1916.
A few obvious observations:
- The Deadball years were extreme outliers. There were dilution-of-talent issues through 1915, when the Federal League operated. World War I shortened the season in 1918 and 1919. And nobody hit home runs back then. The Giants led the majors with 39 home runs in 1917. Three Blue Jays matched or beat that number last year.
- Since World War II, slugging percentage has been, pretty clearly, the more important driver of offense. Beginning with 1946-1948, there have been 68 three-year spans, and in only 19 of them (28%) did on-base percentage do a better job of explaining run scoring than slugging percentage.
- The one notable exception: the years 1995-1997 through 2000-2002, during which on-base percentage ruled. Ol’ Billy Beane, he knew what he was doing. (You probably already knew that.)
This raises two obvious questions. The first one is: Why? The graph isn’t random; there are somewhat distinct periods during which either on-base percentage or slugging percentage is better correlated to scoring. What’s going on in those periods?
To try to answer that question, I ran another set of correlations, comparing the slugging percentage minus on-base percentage correlations to various per-game measures: runs, hits, home runs, doubles, triples, etc. Nothing really correlates all that well. I tossed out the four clear outliers on the left side of the graph (1914-16, 1915-17, 1916-18, 1917-19), and the best correlations I got were still less than 0.40. Here’s runs per game, with a correlation coefficient of -0.35. The negative number means that the more runs scored per game, the more on-base percentage, rather than slugging percentage, correlates to scoring.
That makes intuitive sense, in a way. When there are a lot runs being scored — the 1930s, the Steroid Era — all you need to do is get guys on base, because the batters behind them stand a good chance of driving them in. When runs are harder to come by — Deadball II, or the current game — it’s harder to bring around a runner to score without the longball. Again, this isn’t a really strong relationship, but you can kind of see it.
The second question is, what does this mean? Well, I suppose we shouldn’t look at on-base percentage in a vacuum, because OBP alone isn’t the best descriptor of scoring. A player with good on-base skills but limited power works at the top or bottom of a lineup, but if you want to score runs in today’s game, you need guys who can slug.
Taking that a step further, if Beane exploited a market inefficiency in on-base percentage at the beginning of the century, might there be a market inefficiency in slugging percentage today? It doesn’t seem that way. First, there’s obviously an overlap between slugging percentage and on-base percentage (i.e., hits), and just hitting the ball hard on contact doesn’t fill the bill if you don’t make enough contact. Recall the correlation coefficient between run-scoring and on-base percentage is 0.89 and between runs and slugging is 0.87. The correlation between run-scoring and pure power, as measured by isolated slugging, is just 0.66. That’s considerably lower than batting average (0.81). ISO alone doesn’t drive scoring.
The second reason there probably isn’t a market inefficiency in slugging percentage is that inefficiencies, by definition, assume that the market as a whole is missing something. In the Moneyball example, other clubs didn’t see the value in Scott Hatteberg and his ilk. It’s harder to believe, fifteen years later, with teams employing directors of baseball systems development and posting for quantitative analysts, that all 30 teams are missing the boat on players who slug but don’t contribute a lot otherwise. Or, put another way, there’s a reason Pedro Alvarez and Chris Carter were non-tendered, and it’s not market inefficiency.
Writer for Baseball Prospectus
Nice work. I think there is a market ineffeciency but it would be something like slg on Babip. Home runs get paid for, singles and doubles not so much.
Thanks, Jon. While some teams are obviously more analytically advanced than others, I don’t know whether a team can maintain a long-term advantage in terms of information processing anymore. The people the teams are hiring are GOOD.
Great work and thought provoking. One thing about doing correlations over a long period of time is that the error rate changes alot. Back in the 20s and 30s, fielding pct was lower, putting more guys on base and driving guys in and that is not captured in either SLG or OBP. So a give OBP or SLG should lead to more runs back then. Of course, you did years individually and in 3 year groups, so that probably avoids the problem.
That’s a great observation. I’d guess that ROEs were not symmetrical – some teams and batters more prone than others – so it would, in aggregate, dilute the impact of OBP and SLG. My use of three-year spans wouldn’t correct for, e.g., the higher frequency of ROEs in 1918-20 compared to 2013-15.
It might be interesting if you could compare what you did to what Bill James did recently in his article “The Contact Theory and the Power Theory” which is at
http://www.billjamesonline.com/the_contacct_theory_and_the_power_theory/
Thanks, Cyril. I saw that headline flash by on my email from BJOL. I’m working today and tomorrow but look forward to reading it.
Finally got around to reading this–busy weekend. I’d say that Bill James reaches an opposite conclusion, of sorts. My article suggests that the best predictor of scoring is slugging percentage. I think it’d be pretty hard to refute that, given the correlation coefficients, though admittedly the difference between SLG and OBP correlations are pretty small. James is saying that we’re at a point where contact hitting, on the margin, would drive more offense than slugging, on the margin. That’s somewhat opposed to what I wrote, but I don’t think the two positions are mutually exclusive. The three players from the 1970s whom he points out as contact hitters–Rod Carew, Pete Rose, and Lou Brock–as well as more contemporary guys like Jeter, Ichiro, and Gwynn–all generated a lot of extra-base hits. They were doubles instead of homers, but they all had career SLG well in excess of .400. Dee Gordon, Jose Iglesias, and Juan Pierre, also mentioned in the article, do not. So I think that the idea of a player who has contact skills with a sprinking of power–Michael Brantley, Daniel Murphy, Jose Altuve, Edgar Inciarte, Ian Kinsler were all top-10 for contact last year with SLG over .400–may add more, at the margin, than low-contact sluggers–Joc Pederson and Ryan Howard were bottom-10 for contact (though so were Kris Bryant, Chris Davis, Justin Upton, and Alex Rodriguez). James’ argument that the marginal contribution of contact hitters increases as more are added, while the marginal contribution of sluggers decreases, is pretty provocative but makes sense.
Interesting. Thanks
There is the bias of recent seasons playing out here, and I mean seasons 1998+
Contact hitting typically means a singles hitter, but don’t forget that singles add to slugging percentage too, not simply on base percentage, and a contact out or ROE does not get added to ave.obp.slg.ops. Most of the time when you think of slugging percentage, you think of selling out for homers. There is no worse example of this than Matt Carpenter the last 3 years. He has decided to go for homers and he has been a worse player for it.
Whether you want to go by WAR or some other metric Matt Carpenter NEEDS to go back to being a guy that could get you 180+ hits a year and forego the HRs. He was at his best then.
Matt Carpenter 2013-2015
WAR 6.6, 3.0, 4.0
Total Hits 199, 162, 156
Contact outs 330, 325, 261
HR 11, 8, 28
2B 55, 33, 44
XBH 73, 43, 75
SO 98, 111, 151
BB 82, 105, 92
GB/FB total contacts, 537, 495, 427
LOB 144, 178, 169
ROE 5,5,5
R+RBI 204, 158 185
(R+RBI)-HR 193, 150, 157
BABIP 359, 318, 321
AVE 318, 272, 272
OBP 392, 375, 365
SLG 481, 375, 505
OPS 873, 750, 871
This is exactly why I created HEWCO, CCR, and BSM because it captures offensive value that none of these above metrics do. OPS makes it seem like Matt Carpenter’s production didn’t drop off, but it did drastically, both in terms of runs and rbi’s and in terms of contact. The idea that slugging wins is a fallacy because the trade off is clogging the bases consistently with contact. And look, slugging went up for Matt between 2013 and 2015 but because his hit total overall cratered by more than 40 and contact outs fell by about 70 his runs and rbi fell.
Matt Carpenter, 2013 – 2015
HEWCO 717, 656, 643
HEWCO/G 4.567, 4.152, 4.175
HEWCO/PA 1.000, .925, .967
CCR .745, .964, .635
The added benefit of all the homers, WAS NOT an added benefit.
The game itself is taking contact hitters out of the game, and by extension lowering overall hits, run scoring, and total plate appearances in the game.
All this “power” comes at a cost. the rise of the solo home run, which is practically useless, and I can prove that too.
If you want the info, email me at thecrazybaseballcoach@gmail.com
Eric, as you know, I don’t agree with your assessment of Carpenter. He clearly had his best year in 2013, but that was an age-27 career year for him, and was way better in 2015 as a slugger with 80% contact (139 wRC+, just 7 off his 2013 peak) than in 2014 with 90% contact (117 wRC+).
During the period 1998 to the present that you cite–that covers 540 team-seasons, a decent sample–the correlation coefficient of runs per game to OBP is 0.905, R/G to SLG is 0.914, and R/G to OPS is 0.957. How do HEWCO, CCR, and BSM do?
I guarantee you calc contact % way differently than I do. I probably wouldn’t use 1998 as my starting point given the steroid era, either, but no biggie either way.
Better, because it captures everything, including the value of a contact out and error which NONE of the traditional stat lines do.
When you capture 100% of offense or 100% of anything it would stand to reason the model would be better.
But while we are on the subject of errors, did anyone like the ERROR HR, Alcides Escobar hit in the bottom of the first inning of the first game of the world series for 2015? You cannot tell me that was a legitimate hit.
I don’t care how you carve that one up, its nothing but a 4 base error. You can give Matt Harvey at least 1 base of it for throwing a fastball down the heart of the plate when all the broadcasters were saying not to do that cause he swings at everything. Then you can give 1 base mental error to Cespedes and 2 base physical error to him as well.
But I digress, the Royals won that game 5-4, by a single Run, and there is your run.