Archive for April, 2013

The Ten Lowest BABIPs Since 1945

For hitters, BABIP is often an explanation for unusually good or bad seasons. But what causes a great or poor BABIP? And are we right to simply blame BABIP whenever a bizarre season happens? It might help to look at some extreme cases. Even if we don’t learn something about how to interpret hitters’ BABIP, we can at least have fun. Nerdy, nerdy fun.

What is BABIP?

Batting average on balls in play is exactly that: when you hit the ball and it’s not a home run, what’s your batting average? Imagine you’d only ever batted twice; first you hit a single and then you struck out. Your BABIP would be 1.000. If a single and a groundout, .500. After seven games of the 2013 season, Rick Ankiel had two home runs but no singles, doubles, or triples, so his BABIP was .000.

Across any given season, the average BABIP tends to be about .300. All this means is that, when you hit the ball at professional defenders, there’s a 70% chance they’ll get you out.

What influences BABIP?

The enemy. Defense and to some extent pitching are factors, but over the course of a full year, as you face the entire league, this averages out.

Power. If you hit twenty balls to the warning track, and a lot of them fall for hits, your BABIP will increase. But if they all carry right over the fence for home runs, they will stop counting for this purpose, meaning your BABIP will probably decrease since more of your hits will be excluded from the stat.

Hitting style. There are six infielders, so more ground balls tend to be fielded; this is why pitchers, who are wimpy at hitting, tend to have low BABIPs. Fly balls are often caught, so the best scores go to line-drive hitters.

Speed. If you’re fast enough to beat throws and bunt for singles, your BABIP will be higher. If you run like I do, probably not so much.

Luck. Maybe the biggest single factor is: are you lucky? We all see hard-hit balls straight at defenders, or guys who go on “hot streaks” where the ball “finds all the holes.” That’s called “luck,” and BABIP can quantify it. Believe it or not, you really can have good or bad luck that lasts an entire year.

Let’s illustrate these principles by looking at some hitters with very low BABIPs.

The Ten Lowest BABIPs Since 1945

10. Roger Maris, 1961 (.209). 38.4% of Roger Maris’ hits that year were home runs. (Stop now to think about that.) If the ball stayed in the park, somebody probably caught it. On the other hand, if the ball had a chance of leaving the park, it did. 61 of them did.

9. Jim King, 1963 (.208). Although somewhat powerful (24 homers), Jim King was also something else: bad. His BABIP never came close to league average, and in partial seasons after ’63 it would be .207 and .209. He was known as a power-hitting bench bat, and only found regular playing time on the miserable Washington Senators (106 losses that year).

8. Dave Kingman, 1982 (.207). Dave Kingman hit homers (37) and struck out a whole lot, and based on his terrible, terrible fielding metrics, he was a mighty slow fellow. There’s also another factor here: he was old. “But he was only 33,” you say. “If there was something to this age thing, he’d get worse as he got even older.” “Aha,” I reply, “that’s why you’re supposed to keep reading!”

7. Dick McAuliffe, 1971 (.206). Here’s our first plausible “bad luck” guy. A career .264 BABIP, and indeed the following year he had a .264 BABIP. A career .247 hitter, and the following year he hit .240. A career .343 OBP, and the following year his OBP was .339. So Dick McAuliffe bounced back just fine, but it’s worth noting two things: first, a career .247 hitter is not that good, and second, for whatever reason his walk rate did decline sharply during his “unlucky” year. Was he swinging more aggressively? If so, he was still striking out less than usual.

6. Roy Cullenbine, 1947 (.206). I mentioned Roy Cullenbine in my first post on these venerable pages: a man who combined all-time bad luck with a truly incredible batting eye, walking 22.6% of the time despite being a distinctly non-intimidating hitter. The only guy in 1947 who walked more was Triple Crown winner Ted Williams, and Williams was frequently being walked on purpose. Cullenbine’s possibly all-time-great ability to take a walk was rewarded with–well, never playing in another major league game.

He did hit 24 homers, but this is another bad luck year. Heck, Cullenbine’s BABIP in 1946 was .347.

5. Dave Kingman, 1986 (.204). Toldja so! Here’s Kingman, age 37, hitting home runs (35) but nothing else. A full-time DH by now, he (like Cullenbine) never played in the big leagues again.

4. Brooks Robinson, 1975 (.204). Only six home runs to his name, still manning third base, Brooks Robinson is another example of what’s becoming a clear trend: he was 38 years old. He played partial seasons after this, but not full ones. This was a truly godawful year: .201/.267/.274, good for a wRC+ of 54.

3. Ted Simmons, 1981 (.200). A catcher and a fairly slow runner turning 32, Simmons saw a small drop in power, which he partially recovered the next year, and a 97-point drop in his BABIP, hard to explain just from the power outage. The traditional explanation for his poor 1981 is that he had just moved to Milwaukee and the American League. Luck might have hurt him, too.

2. Curt Blefary, 1968 (.198). Carson Cistulli previously highlighted Blefary on this site. After winning Rookie of the Year in 1965, the young outfielder posted two more above-average seasons before falling off a metaphorical cliff in 1968. He was being bounced around between positions, and he was never a speedster: his defense inspired the nicknames Clank and Buffalo.

Part of it must be bad luck. The BABIP .045 below his career average bounced back in 1969, when he moved to catcher and had a fairly good season for the Astros; a power decline turned out to be real, but his other numbers recovered. And yet Blefary would play his last major league game at age 29, moving on to a career as a “sheriff, bartender, truck driver, and night club owner.”

1. Aaron Hill, 2010 (.196). Aaron Hill’s notoriously lost season is the only one here from the last twenty-five years–and the most dramatic of all. Interestingly, a RotoGraphs article on Hill attributes his 2010 to pure awfulness but his recovery in 2011 to an “inflated” BABIP. But a .196 BABIP, a full hundred points below average, counts as deflated, right? Hill sucked in 2010 despite 26 homers and a slightly increased walk rate.

The advantage of recency is that we have more data. Here the culprit is obvious: he had previously been, and would soon be again, very good at hitting line drives, but in 2010 his line-drive percentage dropped by half (just 10.6%) and more than half of the balls he hit all year became fly balls. Some of those drifted out of the park, but most drifted over a waiting defender. And even though Hill was walking more, he was also swinging more frequently at pitches outside the strike zone. Hill’s new approach in 2010 didn’t hurt his ability to take a walk, but it hurt his ability to drive the ball. Still, to earn the lowest BABIP in modern history, he also suffered from an entire season of some of the worst luck any batter’s ever had.


The BABIP losers here didn’t do badly over their careers: combined, these “bottom 10” earned 42 All-Star appearances (18 by Brooks Robinson), 3 MVP awards, and a Rookie of the Year prize.

This unscientific survey confirms a lot of preconceived ideas:
– slower players don’t create their own luck on balls hit in fair territory
– aging players often lose their speed or power or both
– swinging at balls outside the strike zone means you make inferior contact
– sometimes, good luck isn’t enough to save a terrible hitter
– sometimes, terrible luck is enough to end a good hitter’s career

But there’s an interesting question to be raised here. Some of these guys–Maris, Kingman–hit homers like crazy, thus suppressing their BABIPs. On the other hand, Blefary and Simmons lost home run power in their hard-luck years. Simmons was playing in a new ballpark and Blefary at a new position. Maybe they were the Aaron Hills of their times, adjusting their approaches in deleterious ways (probably swinging at more pitches). Maybe they hit the ball poorly for unknown, reversible reasons. Maybe they had bad luck.

If I were counseling hitters on how to maximize their batting average on balls in play, I would say this: cultivate speed and athleticism, swing at better pitches, and try to hit line drives. I don’t know if BABIP can or should be learned, however. Ultimately, BABIP is the baseball version of a zen koan or hippie bumper sticker. BABIP: Stuff Happens. Or, more accurately, sometimes in baseball you make your own fate, but sometimes your fate makes you.

Does it matter which side of the pitching rubber a pitcher starts from throwing a sinker?

As we start a new baseball season, I start a new season of my own. This is my first – of many I hope – analysis and write-up on baseball that I am submitting. I am an avid fan, a numbers geek, an aspiring writer and lastly a bored software engineer. I am also very fortunate. I have a close connection with a former major league player and the ability to leverage his vast experience and knowledge of the game. Hopefully, I can parlay the knowledge I have learned from many years of observation along with the knowledge I have gleaned from my connection to realize my goal as a contributor to the sabermetric community and to the enjoyment of baseball fans everywhere. Here we go!


Is the effectiveness of a sinker dependent on from which side of the rubber the pitcher throws?

I was in Florida in mid March for spring training, talking with a minor league coach when he mentioned that he and a former all star pitcher were in a disagreement about how to throw a sinker. Their debate centers on where a pitcher should stand on the rubber to throw a sinker most effectively. We all understand that a pitcher should not move all over the rubber to become more effective on a single pitch. This would obviously tip off the hitters as to what type of pitch might be coming. But for argument’s sake, a team might have some newly transformed position players learning to throw different pitches. Wouldn’t a team want to know if, for some pitches, it was more beneficial to stand on one side of the rubber than another?

I consider myself a pretty observant guy, but I will have to admit that I never really paid much attention to where a pitcher stood on the rubber. To me the juicy part is watching the ball just after it is released. The dance, dip, duck and dive a pitcher is able to command of the ball is where the action is as far as I am concerned. So watching what a pitcher does before he even starts his motion was asking a little much. Nonetheless, I was certain that with so many pitchers in the majors, that a breakdown of data would show that there was not a singular starting point on the rubber. Every pitcher is different, right?


I started my analysis by downloading the last 4 years (2009-2012) of PitchFx data. Most of us know this already but by using PitchFx data there are some limitations to analysis. Unlike Trackman, PitchFx initially records each pitch at 50’ from home plate, not the actual release point of the pitch. For PitchFx this data point is called “x0”, and for all intents and purposes this is pretty good data, as for most pitchers their strides are approximately 5 to 6’ from the rubber, and with arms length added in we are talking about a difference of a couple of percentage points from being the same as the release point metric from Trackman. But full disclosure, it is not exactly the release point. Another factor that I didn’t measure is a pitcher’s motion to the plate. Some pitchers throw “across” their bodies and not down a straight line, and even fewer open up their body to the batter (stepping to stride leg’s baseline). Also, there is probably a bit to glean from going between the stretch and wind-up, but again without doing a very in-depth study I assume no factor in the analysis. Lastly, arm length is an unmeasured factor. For example, I didn’t check to see if there were any right-handed pitchers with extra long arms standing on the first-base side of the rubber distorting the data.

I started by combining the PitchFx Sinker (SI) and Two-seam fastball (FT) data into a single database. The reason to combine the data is due to the fact that the grips for each pitch are the same, combine this with a two-seam fastball can and a sinker break the same way (down and in to a RH batter from a RH pitcher), and lastly they are also somewhat synonymous in major league vernacular. Maybe somewhere along the line the pitch was invented twice (north or south), the name given is based on region like when asking for a Coke… it’s a “soda”, a “pop”, or a “tonic” depending on where you are in the states. Maybe in the South it was labeled a sinker and the North it was taught as a “two-seamer”? Either way it’s the same pitch as far as I am concerned, and the etymology of pitch naming is a different topic for a different time.

Back to the question above about every pitcher being different, I was wrong. Using the 2012 data I created a frequency distribution for right-handed pitchers (figure 1), and as you can see there is definite focal area at around -2’ point from the centerline of the pitching rubber (and home plate).


Figure 1 – Right-handed pitchers in 2012

This shows that most pitchers start from about the same side; which I determined to be the right side of the rubber (3rd base side). I determined this by adding 9” to one-half the length of the pitching rubber (24”) which comes to 21” (9”+12”). Add in arm length and you can see that using an x0 that is less than or equal to 2’ (remember we are using negatives here) should prove that the pitcher is throwing from the right side.  I would like to add that the 9” used above is based on the shoulder width of an average man, which is around 18”. This metric is based on studies on the “biacromial diameter” of male shoulders in 1970 (pg. 28 Vital and Health Statistics – Data from the National Health Survey). I think we can all agree that the 18” is probably conservative by today’s growth standards. I mentioned in the limitations of the analysis written above, I don’t account for arm length or pitcher motion. Therefore I needed to make sure that there are right-handed pitchers who are throwing from the left hand side of the rubber; just not a bunch of super long-armed, cross bodied throwers.  With the data in hand I was able to identify which pitchers had thrown the ball closer to centerline of the rubber and therefore would be good candidates for standing on the left side of the rubber. The first pitcher who had a higher (>-2) x0 value was Yovani Gallardo of the Milwaukee Brewers. Without knowing Gallardo’s motion I needed to go to the video. From the video, you can clearly see that Gallardo starts on the left side of the rubber and throws fairly conventionally, straight down the line to the batter.

I wanted to keep this as simple as possible, breaking up the pitchers in two categories – Left side or Right side. Without looking at video for each pitcher I had to come up with a tipping point for classifying the side based on the x0 data I had available. If we simply take what we determined above and correlate it to the left hand side we will come up with 1 (starting on left side of rubber) and an x0 of 0. But it isn’t quite that simple. The frequency chart shows that there are less than 1000 balls thrown in 2012 with an x0 greater than or equal to 0. Gallardo threw 504 pitches himself in 2012. So we have to increase the scope a bit. By arranging the x0 data into quartiles we see that upper or lower quartile – depending on handedness – is around -1 or 1 (remember we are using negatives) so for a right handed pitcher the x0 splits are:














For left handers:














As I am trying to stay conservative, and the fact that these are not release point numbers I use 1 and -1 as the cut off for classification based on the handedness of the pitcher. Using these numbers provided a pretty clean break in the distributions (90-10%).


So who was right, the all star pitcher or the minor league pitching coach? Is there an advantage depending on where the pitcher stands on the rubber? Neither – both of them. It’s a tie.

What can I say; my initial analysis is a bit anticlimactic, but not because of lack of effort.  To denote the labels below:

  • LH or RH (Handedness)
  • RR or LR (Right or Left Rubber)
  • B – Balls
  • K – Strikes
  • P – In play (No Outs)
  • O – In play (Outs)
  • BackK – Called Strikes
  • FT – Two seam fastballs
  • SI – Sinkers
  • Efficiency – O/(P+O)
  • XSide – Cross Side (i.e. RH-LR or LH-RR)
  • Same side – LH-LR or RH-RR






































































































Xside  667519


Same Side





















































The efficiency is so very close. Twelve-hundredths (.12) of a percent is not a lot – 169 outs out of 140678 – but give any Chicago Cub fan five of those outs in 2003 and Mr. Bartman would be an afterthought. Which, I am sure is the way he and all Cub fans around the world would like it. The efficiency is the same, no other way to put it which is the beauty of statistics and sabermetrics. Numbers can say so much, even when they are the equal.

But the analysis wasn’t all for naught, there are some nuggets to glean from the numbers above. As a segue, I am currently watching Derek Lowe of the Texas Rangers pitch on opening night and from the left side of the rubber he throws a sinker and it dips back over the rear part of the plate for a called strike. With all of the similarities within my analysis the most striking observation is the difference in called strikes depending on the side of the rubber. If a pitcher, coach or manager could get a strike or a strike out without the fear of having a batter get a hit or moving a runner forward they would do it every time. With a five percent difference in getting a strike and not having the worry of the ball being put into play would be an interesting thing to know in some tight situations with runners on base. My thought on the difference revolves around the back door being open a little wider when it comes to getting called strikes. With a pitcher throwing X-side you can definitely see a pattern of called strikes on the same side of the plate from which the pitcher throws from. Positive numbers in figures below indicate right side of plate (1st base side)


With today’s specialization where pitchers are matched up to batters based on handedness, the ability for a pitcher to throw a strike as it tails back over the plate or close to the plate (or maybe not even close for some of the pitches above ) is essential. It appears that umpires are a little more flexible with their perception of the strike zone for these pitchers as well.


I didn’t get the results that I anticipated when I started this analysis, and that is great! As a society we are determined to have a winner! Just as there is “no crying in baseball”, there are no ties in baseball. Even when there is a tie; like on a close play at first – it proverbially goes to the runner. We can’t settle for a tie…. hockey reduced ties by adding a shootout after overtime.  College football removed the tie by introducing sudden death (hopefully the bowl playoff with help eliminate the subjective BCS tie). With no clear cut advantage (read – TIE) identified in my analysis means that a more in depth analysis could/should be performed to validate. Maybe expanding the percentage of X-side pitchers to 15-20, or identifying when pitchers are throwing from the stretch and removing those instances would alter the results and provide a much needed winner? If after all analytical statistical avenues have been exhausted there’s still not a proven advantage, we can always resort to having the coach and player settle it with a coin flip?

A Case Study in Lineup Construction

Controversy and speculation have surrounded the Texas Rangers’ lineup for the better part of a year.  First, Michael Young was a consistent presence in the middle of the Rangers’ order despite lackluster performance.  More recently, the departure of Josh Hamilton and Mike Napoli have led many to speculate the Rangers’ offense would take a step back in 2013.  But how did Ron Washington’s lineups compare to an optimized lineup? How will the loss of Hamilton and Napoli affect the Rangers’ run production?

To find out, I wrote a Monte Carlo program which simulated 50 seasons of games for all 362,880 (9!) lineup combinations. It takes as input the percentage of singles, doubles, triples, home runs, walks, and strikeouts with respect to their number of plate appearances for each batter in the lineup. The outcomes of each at bat is determined by a random number generator as if each batter faces a league average pitcher, and base runners advance according to the league averages for taking extra bases. While not including all the variations of pitcher quality, player speed and defensive quality, it allows for an adequate picture of the effectiveness of various lineups.

Let’s first look at the effect of moving Young from the 5th spot to the 9th spot. We’ll start with the most frequently occurring lineup from 2012:

Ian Kinsler
Elvis Andrus
Josh Hamilton
Adrian Beltre
Micheal Young
Nelson Cruz
David Murphy
Mike Napoli
Mitch Moreland

We’ll plot a histogram of the runs per game (labeled rpg in the plots, always full 9 innings games) scored by all 362,880 possible lineup combinations, all 40,320 lineup combinations with Young batting 5th, and all 40,320 lineup combinations with Young batting 9th (y-axis is frequency of occurrence, note the logarithmic scale).

2012 Lineup distribution, Young in 5 slot vs 9 slot

Most possible lineup combinations produce the same number of runs to within a 0.1 runs per game. No matter the lineup combination, the variation of runs scored is around 16 runs a year. For the Rangers’ lineup, lineup optimization is a relatively small effect. Lineups with different hitters may show a greater or lesser dependence of lineup construction on run scoring.

The difference between moving Michael Young from 5th in the order to 9th in the order is smaller; 0.02 runs per game, or 3 runs over the course of a year. Given the hitters in the Rangers lineup, batting Young 5th in the order did not make a significant difference. But there was another option, Ron Washington could have substituted Craig Gentry for Michael Young. We again plot a histogram of the runs per game scored for all possible lineup combinations with Gentry batting (red) or Michael Young batting (blue).

Rangers Lineup Distribution, Young vs. Gentry

Again, we find the difference to be minimal; this time roughly 0.01 runs per game, or a mere 1.6 runs per season. While it was painful to watch Young batting 5th in 2012, the increased production at the bottom of the lineup largely offset the loss of production in the middle of the lineup. So what happens now that the Rangers’ lineup has lost Hamilton, Napoli and Young in exchange for AJ Pierzynski, Lance Berkman, and Leonys Martin/Craig Gentry? Based on Ron Washington’s lineups in spring training, a likely common lineup for the Rangers in 2013 is as follows:

Ian Kinsler
Elvis Andrus
Lance Berkman
Adrian Beltre
Nelson Cruz
AJ Pierzynski
David Murphy
Mitch Moreland
Leonys Martin

I ran all possible lineup combinations in which Adrian Beltre batted 2nd, 3rd or 4th for both the 2012 and likely 2013 Rangers’ lineup. For the 2013 Rangers’ lineup, I used projections (ZiPS, Steamer, Oliver, Bill James) for the upcoming season to seed the simulation with the hitters’ likely production. Again, a histogram of runs scored per game for all these lineup combinations, with 2012 in blue and 2013 in red.

2013 Rangers Lineup Distribution vs 2012 Lineup Distribution

The peaks as fit predict a 0.22 runs per game increase for the Rangers in 2013, or roughly 36 runs over the course of the year. The non-Gaussian (or normal distribution) tail of the 2013 distribution indicates it might be possible to improve even more.

We will finish with comparisons of the optimized lineups for 2012 and 2013 to the most usual/expected lineups for those years.

2012 Lineup 2012 Optimized 2013 Lineup 2013 Optimized
5.03 rpg 5.11 rpg 5.29 rpg 5.34 rpg
Ian Kinsler David Murphy Ian Kinsler Ian Kinsler
Elvis Andrus Adrian Beltre Elvis Andrus Lance Berkman
Josh Hamilton Josh Hamilton Lance Berkman Leonys Martin
Adrian Beltre Mitch Moreland Adrian Beltre Adrian Beltre
Micheal Young Nelson Cruz Nelson Cruz Nelson Cruz
Nelson Cruz Mike Napoli AJ Pierzynski Mitch Moreland
David Murphy Ian Kinsler David Murphy AJ Pierzynski
Mike Napoli Micheal Young Mitch Moreland David Murphy
Mitch Moreland Elvis Andrus Leonys Martin Elvis Andrus

We’ll start with the big picture. While moving/substituting for Michael Young in 2012 would have made little difference in run production, an optimized lineup would have increased the Rangers’ run total by 13 runs over the course of the year. Not much, but it would likely have been enough to have won the division instead of losing to the A’s. Of course, it is much easier to optimize a lineup when you already know how everyone is going to perform; using an optimized lineup based on 2012 projections wouldn’t have netted the 13 run increase. Most notably, leading off with Murphy (in his breakout year) instead of Kinsler (in his down year) to increase production is not a move one could expect an organization to predict before any games had been played in 2012.

Second, the probable lineup for the Rangers in 2013 is projected to score 8 runs a year less than an optimized lineup. Given the large variance in the production of a hitter as compared to his projections, these lineups seem virtually equivalent.

The optimized lineups show different characteristics than the lineups generated by Ron Washington. The optimized lineups forego Elvis Andrus batting second in preference for a power hitter with good average. Elvis Andrus is instead relegated to the 9th spot. The 2013 optimized lineup puts a lot of faith in rookie Leonys Martin, due entirely to some very respectable projections for the coming year (and not knowing he’s a rookie). Given the uncertainty of how much offense Martin will produce in 2013, have Martin bat in the bottom of the order, as in Ron Washington’s lineup, seems prudent. Finally, Mitch Moreland is preferred in the middle of the lineup in the optimized lineups instead of the bottom of the order as in Washington’s lineups.

If the Rangers are looking to optimize their lineup for 2013, this simulation indicates the two main points to consider: moving Moreland to the middle of the order, and considering batting Andrus 9th.