When Slugging Percentage Beats On-Base Percentage

What’s the single most important offensive statistic? I imagine most of us who have bookmarked FanGraphs would not say batting average or RBIs. A lot of us would name wOBA or wRC+. But neither of those are the types of things you can calculate in your head. If I go to a game, and a batter goes 1-for-4 with a double and a walk, I know that he batted .250 with a .400 on-base percentage and a .500 slugging percentage. I can do that in my head.

So of the easily calculated numbers — the ones you might see on a TV broadcast, or on your local Jumbotron — what’s the best? I’d guess that if you polled a bunch of knowledgeable fans, on-base percentage would get a plurality of the votes. There’d be some support for OPS too, I imagine, though OPS is on the brink of can’t-do-it-in-your-head. Slugging percentage would be in the mix, too. Batting average would be pretty far down the list.

I think there are two reasons for on-base percentage’s popularity. First, of course, is Moneyball. Michael Lewis demonstrated how there was a market inefficiency in valuing players with good on-base skills in 2002. The second reason is that it makes intuitive sense. You got on base, you mess with the pitcher’s windup and the fielders’ alignment, and good things can happen, scoring-wise.

To check, I looked at every team from 1914 through 2015 — the entire Retrosheet era, encompassing 2,198 team-seasons. I calculated the correlation coefficient between a team’s on-base percentage and its runs per game. And, it turns out, it’s pretty high — 0.890. That means, roughly, that you can explain nearly 80% of a team’s scoring by looking at its on-base percentage. Slugging percentage is close behind, at 0.867. Batting average, unsurprisingly, is worse (0.812), while OPS, also unsurprisingly, is better (0.944).

But that difference doesn’t mean that OBP>SLG is an iron rule. Take 2015, for example. The correlation coefficient between on-base percentage and runs per game for the 30 teams last year was just 0.644, compared to 0.875 for slugging percentage. Slugging won in 2014 too, 0.857-0.797. And 2013, 0.896-0.894. And 2012, and 2011, and 2010, and 2009, and every single year starting in the Moneyball season of 2002. Slugging percentage, not on-base percentage, is on a 14-year run as the best predictor of offense.

And it turns out that the choice of endpoints matter. On-base percentage has a higher correlation coefficient to scoring than slugging percentage for the period 1914-2015. But slugging percentage explains scoring better in the period 1939-2015 and every subsequent span ending in the present. Slugging percentage, not on-base percentage, is most closely linked to run scoring in modern baseball.

Let me show that graphically. I calculated the correlation coefficient between slugging percentage and scoring, minus the correlation coefficient between on-base percentage and scoring. A positive number means that slugging percentage did a better job of explaining scoring, and a negative number means that on-base percentage did better. I looked at three-year periods (to smooth out the data) from 1914 to 2015, so on the graph below, the label 1916 represents the years 1914-1916.

A few obvious observations:

  • The Deadball years were extreme outliers. There were dilution-of-talent issues through 1915, when the Federal League operated. World War I shortened the season in 1918 and 1919. And nobody hit home runs back then. The Giants led the majors with 39 home runs in 1917. Three Blue Jays matched or beat that number last year.
  • Since World War II, slugging percentage has been, pretty clearly, the more important driver of offense. Beginning with 1946-1948, there have been 68 three-year spans, and in only 19 of them (28%) did on-base percentage do a better job of explaining run scoring than slugging percentage.
  • The one notable exception: the years 1995-1997 through 2000-2002, during which on-base percentage ruled. Ol’ Billy Beane, he knew what he was doing. (You probably already knew that.)

This raises two obvious questions. The first one is: Why? The graph isn’t random; there are somewhat distinct periods during which either on-base percentage or slugging percentage is better correlated to scoring. What’s going on in those periods?

To try to answer that question, I ran another set of correlations, comparing the slugging percentage minus on-base percentage correlations to various per-game measures: runs, hits, home runs, doubles, triples, etc. Nothing really correlates all that well. I tossed out the four clear outliers on the left side of the graph (1914-16, 1915-17, 1916-18, 1917-19), and the best correlations I got were still less than 0.40. Here’s runs per game, with a correlation coefficient of -0.35. The negative number means that the more runs scored per game, the more on-base percentage, rather than slugging percentage, correlates to scoring.

That makes intuitive sense, in a way. When there are a lot runs being scored — the 1930s, the Steroid Era — all you need to do is get guys on base, because the batters behind them stand a good chance of driving them in. When runs are harder to come by — Deadball II, or the current game — it’s harder to bring around a runner to score without the longball. Again, this isn’t a really strong relationship, but you can kind of see it.

The second question is, what does this mean? Well, I suppose we shouldn’t look at on-base percentage in a vacuum, because OBP alone isn’t the best descriptor of scoring. A player with good on-base skills but limited power works at the top or bottom of a lineup, but if you want to score runs in today’s game, you need guys who can slug.

Taking that a step further, if Beane exploited a market inefficiency in on-base percentage at the beginning of the century, might there be a market inefficiency in slugging percentage today? It doesn’t seem that way. First, there’s obviously an overlap between slugging percentage and on-base percentage (i.e., hits), and just hitting the ball hard on contact doesn’t fill the bill if you don’t make enough contact. Recall the correlation coefficient between run-scoring and on-base percentage is 0.89 and between runs and slugging is 0.87. The correlation between run-scoring and pure power, as measured by isolated slugging, is just 0.66. That’s considerably lower than batting average (0.81). ISO alone doesn’t drive scoring.

The second reason there probably isn’t a market inefficiency in slugging percentage is that inefficiencies, by definition, assume that the market as a whole is missing something. In the Moneyball example, other clubs didn’t see the value in Scott Hatteberg and his ilk. It’s harder to believe, fifteen years later, with teams employing directors of baseball systems development and posting for quantitative analysts, that all 30 teams are missing the boat on players who slug but don’t contribute a lot otherwise. Or, put another way, there’s a reason Pedro Alvarez and Chris Carter were non-tendered, and it’s not market inefficiency.





Writer for Baseball Prospectus

12 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Jon C
8 years ago

Nice work. I think there is a market ineffeciency but it would be something like slg on Babip. Home runs get paid for, singles and doubles not so much.

Cyril Morong
8 years ago

Great work and thought provoking. One thing about doing correlations over a long period of time is that the error rate changes alot. Back in the 20s and 30s, fielding pct was lower, putting more guys on base and driving guys in and that is not captured in either SLG or OBP. So a give OBP or SLG should lead to more runs back then. Of course, you did years individually and in 3 year groups, so that probably avoids the problem.

Cyril Morong
8 years ago

It might be interesting if you could compare what you did to what Bill James did recently in his article “The Contact Theory and the Power Theory” which is at

http://www.billjamesonline.com/the_contacct_theory_and_the_power_theory/

Cyril Morong
8 years ago
Reply to  Rob Mains

Interesting. Thanks

Eric
8 years ago

There is the bias of recent seasons playing out here, and I mean seasons 1998+

Contact hitting typically means a singles hitter, but don’t forget that singles add to slugging percentage too, not simply on base percentage, and a contact out or ROE does not get added to ave.obp.slg.ops. Most of the time when you think of slugging percentage, you think of selling out for homers. There is no worse example of this than Matt Carpenter the last 3 years. He has decided to go for homers and he has been a worse player for it.

Whether you want to go by WAR or some other metric Matt Carpenter NEEDS to go back to being a guy that could get you 180+ hits a year and forego the HRs. He was at his best then.

Matt Carpenter 2013-2015
WAR 6.6, 3.0, 4.0
Total Hits 199, 162, 156
Contact outs 330, 325, 261
HR 11, 8, 28
2B 55, 33, 44
XBH 73, 43, 75
SO 98, 111, 151
BB 82, 105, 92
GB/FB total contacts, 537, 495, 427
LOB 144, 178, 169
ROE 5,5,5
R+RBI 204, 158 185
(R+RBI)-HR 193, 150, 157
BABIP 359, 318, 321
AVE 318, 272, 272
OBP 392, 375, 365
SLG 481, 375, 505
OPS 873, 750, 871

This is exactly why I created HEWCO, CCR, and BSM because it captures offensive value that none of these above metrics do. OPS makes it seem like Matt Carpenter’s production didn’t drop off, but it did drastically, both in terms of runs and rbi’s and in terms of contact. The idea that slugging wins is a fallacy because the trade off is clogging the bases consistently with contact. And look, slugging went up for Matt between 2013 and 2015 but because his hit total overall cratered by more than 40 and contact outs fell by about 70 his runs and rbi fell.

Matt Carpenter, 2013 – 2015

HEWCO 717, 656, 643
HEWCO/G 4.567, 4.152, 4.175
HEWCO/PA 1.000, .925, .967
CCR .745, .964, .635

The added benefit of all the homers, WAS NOT an added benefit.

The game itself is taking contact hitters out of the game, and by extension lowering overall hits, run scoring, and total plate appearances in the game.

All this “power” comes at a cost. the rise of the solo home run, which is practically useless, and I can prove that too.

If you want the info, email me at thecrazybaseballcoach@gmail.com

Eric
8 years ago
Reply to  Rob Mains

Better, because it captures everything, including the value of a contact out and error which NONE of the traditional stat lines do.

When you capture 100% of offense or 100% of anything it would stand to reason the model would be better.

But while we are on the subject of errors, did anyone like the ERROR HR, Alcides Escobar hit in the bottom of the first inning of the first game of the world series for 2015? You cannot tell me that was a legitimate hit.

I don’t care how you carve that one up, its nothing but a 4 base error. You can give Matt Harvey at least 1 base of it for throwing a fastball down the heart of the plate when all the broadcasters were saying not to do that cause he swings at everything. Then you can give 1 base mental error to Cespedes and 2 base physical error to him as well.

But I digress, the Royals won that game 5-4, by a single Run, and there is your run.

Eric
8 years ago
Reply to  Rob Mains

I guarantee you calc contact % way differently than I do. I probably wouldn’t use 1998 as my starting point given the steroid era, either, but no biggie either way.