Home Run Skewness, Babe Ruth, and Maybe PEDs

The breaking of baseball known as the dead-ball era is generally considered a phenomena of the 1919 Babe Ruth season where he hit a record 29 homers for the Red Sox.  That was a good year, but not something jaw dropping as three players had managed 25+ homers at that point and Ned Williamson’s record from 1884 was only two behind Babe.  The next season was the unprecedented explosion when Ruth redefined power posting 54 home runs doubling up anyone else who had ever played in the big leagues.

It only took a few years for the trajectory of offense, and especially home run production, to change drastically.  In 1922 Rogers Hornsby hit 42, Ken Williams 39, and Tilly Walker 37 all besting The Bambino’s paltry 35 that season.  Over the next several decades home run production shifted drastically as power re-shaped the game.

 photo HRSkew_zpsb90e19d4.jpg

 

Skewness is based on the Excel formula where anything between -1 and 1 is not skewed, and since we have no negatives here we will focus on above 1 to start, or positive skewness (long right tail).  As you can see, the peak of skewness in HR production was that 1920 season where Ruth was an extreme outlier, see below:

 photo 1920HRs_zps20fcd686.jpg

 

You can see the skewness, a long right tail, and most of it is being driven by one observation.  Positive skewness was always present in early baseball due to the large cluster of players at or slightly above 0, but this took it to a new level.  If you go back to the previous chart though, you will see that as the league started hitting more long balls the skewness quickly dissipated, and by the late 40s went away.  Only twice since 1949 did we see a skewness above 1, in 1981 and 1981 where the skewness shows up as 1.05 and 1.04 respectively, so right on the dividing line between truly skewed or not.  Interestingly, the skewness leaves and stays away shortly after the talent pool widened with an influx from the Negro Leagues which may have cut out some of the lower end that was causing it.

One of the things to keep in mind for all of this is that a lot of people look at the steroid era as another period where baseball was broken with scientifically enhanced freaks blasting way more home runs than should be seen.  Yet, in the data we don’t see a large spike in skewness through that period, which of course leads to a lot of ambiguity and no answers as you could read it in multiple ways including the two extreme views:

1) See, EVERYONE was cheating in the steroid era, so the entire distribution shifted enough to prevent even 1998’s home run chase ending with two players breaking the all-time record from becoming a skewed distribution.

2) Despite the cheating nothing was all that greatly affected.  There happen to be  a couple of cheaters who succeeded, but mostly the cheaters stayed with the pack and thus we see no skewness.

So what did the distribution look like in 1998?

 photo 1998HRs_zpsc52198d3.jpg

Rather than the highest frequencies being 0 to 4 home runs and then tapering off quickly like 1920, we now see that every qualified batter came up with at least 1 HR and that the largest mass is from 9 to 23 home runs.  This means that Mark McGwire’s 70 HRs was about 3.5 times the average and median which were 20.7 and 20 for the year.  In comparison, Babe Ruth hit 10 times the average of 5.3 HRs in 1920 and 18 times the median of 3, so you can see how much farther from the pack he was.

Whether or not PEDs broke baseball again is not something I am prepared to answer here, but we can at least say it didn’t break it to the degree that Babe Ruth did when he signaled the end of the dead-ball era.  What we can tell from home run production is that it seems to be distributed fairly evenly and has been for more than half a century of baseball in which time we have seen many changes to the game.  All that leaves me with is more questions in reality, and that is just fine by me.





10 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Cyril Morong
9 years ago

Those HR totals of 1884 were helped by playing in a park with something like a 220 foot distance to the wall. I think in most years a ball hit over that wall was a double but that year it was a HR

Cyril Morong
9 years ago
Reply to  Cyril Morong

I think the article I link below says the wall was 230 feet from home plate

Cyril Morong
9 years ago

All 4 of the players who hit 20+ HRs in 1884 played for Chicago.

Also, see this link for an explanation of what happened in 1884

http://sabr.org/research/clarifying-early-home-run-record

Cyril Morong
9 years ago

Over the two seasons, 198-19, Ruth hit 31 HRs in 114 road games. Compared to what guys had done before, that is jaw dropping. He hit all 11 HRs on the road in 1918 and 20 of his 29 on the road in 1919. It used to take a long poke to hit a HR at Fenway before they put in those bullpens in RF

Cyril Morong
9 years ago
Reply to  Cyril Morong

That should say “Over the two seasons, 1918-19”

Cyril Morong
9 years ago

Here are the top 5 HR%’s for guys with 200+ PAs through 1919

1 Babe Ruth 1919 6.71
2 Ned Williamson 1884 6.47
3 Gavvy Cravath 1919 5.61
4 Fred Pfeffer 1884 5.35
5 Bill Joyce 1894 4.79

We know how dubious the 1884 numbers are. And Cravath in 1919 hit 10 of his 12 HRs at home and in his career he hit 72 of his 87 HRs at home. Philadelphia was a great place to hit HRs.

So then we have Joyce. Ruth beats him by almost 2 percentage points and again, he was severely hurt by his park.

Cyril Morong
9 years ago

You say “That was a good year, but not something jaw dropping as three players had managed 25+ homers at that point and Ned Williamson’s record from 1884 was only two behind Babe.”

On something like that, I think it is important to know how unusual that was and why. It was not just the park, it was the rule change for just that year

DavidKB
9 years ago

What happens if you calculate the skewness for Ruth’s era without Ruth? I imagine you’ll see an “echo” of skewness as other players picked up his approach. It would also be interesting to look at the skewness and the median vs. the year in a single graph. I wonder if you would see the rate of change of the median settle down right as the skewness does as well. Nice article!