A Case Study in Lineup Construction
Controversy and speculation have surrounded the Texas Rangers’ lineup for the better part of a year. First, Michael Young was a consistent presence in the middle of the Rangers’ order despite lackluster performance. More recently, the departure of Josh Hamilton and Mike Napoli have led many to speculate the Rangers’ offense would take a step back in 2013. But how did Ron Washington’s lineups compare to an optimized lineup? How will the loss of Hamilton and Napoli affect the Rangers’ run production?
To find out, I wrote a Monte Carlo program which simulated 50 seasons of games for all 362,880 (9!) lineup combinations. It takes as input the percentage of singles, doubles, triples, home runs, walks, and strikeouts with respect to their number of plate appearances for each batter in the lineup. The outcomes of each at bat is determined by a random number generator as if each batter faces a league average pitcher, and base runners advance according to the league averages for taking extra bases. While not including all the variations of pitcher quality, player speed and defensive quality, it allows for an adequate picture of the effectiveness of various lineups.
Let’s first look at the effect of moving Young from the 5th spot to the 9th spot. We’ll start with the most frequently occurring lineup from 2012:
Ian Kinsler |
Elvis Andrus |
Josh Hamilton |
Adrian Beltre |
Micheal Young |
Nelson Cruz |
David Murphy |
Mike Napoli |
Mitch Moreland |
We’ll plot a histogram of the runs per game (labeled rpg in the plots, always full 9 innings games) scored by all 362,880 possible lineup combinations, all 40,320 lineup combinations with Young batting 5th, and all 40,320 lineup combinations with Young batting 9th (y-axis is frequency of occurrence, note the logarithmic scale).
Most possible lineup combinations produce the same number of runs to within a 0.1 runs per game. No matter the lineup combination, the variation of runs scored is around 16 runs a year. For the Rangers’ lineup, lineup optimization is a relatively small effect. Lineups with different hitters may show a greater or lesser dependence of lineup construction on run scoring.
The difference between moving Michael Young from 5th in the order to 9th in the order is smaller; 0.02 runs per game, or 3 runs over the course of a year. Given the hitters in the Rangers lineup, batting Young 5th in the order did not make a significant difference. But there was another option, Ron Washington could have substituted Craig Gentry for Michael Young. We again plot a histogram of the runs per game scored for all possible lineup combinations with Gentry batting (red) or Michael Young batting (blue).
Again, we find the difference to be minimal; this time roughly 0.01 runs per game, or a mere 1.6 runs per season. While it was painful to watch Young batting 5th in 2012, the increased production at the bottom of the lineup largely offset the loss of production in the middle of the lineup. So what happens now that the Rangers’ lineup has lost Hamilton, Napoli and Young in exchange for AJ Pierzynski, Lance Berkman, and Leonys Martin/Craig Gentry? Based on Ron Washington’s lineups in spring training, a likely common lineup for the Rangers in 2013 is as follows:
Ian Kinsler |
Elvis Andrus |
Lance Berkman |
Adrian Beltre |
Nelson Cruz |
AJ Pierzynski |
David Murphy |
Mitch Moreland |
Leonys Martin |
I ran all possible lineup combinations in which Adrian Beltre batted 2nd, 3rd or 4th for both the 2012 and likely 2013 Rangers’ lineup. For the 2013 Rangers’ lineup, I used projections (ZiPS, Steamer, Oliver, Bill James) for the upcoming season to seed the simulation with the hitters’ likely production. Again, a histogram of runs scored per game for all these lineup combinations, with 2012 in blue and 2013 in red.
The peaks as fit predict a 0.22 runs per game increase for the Rangers in 2013, or roughly 36 runs over the course of the year. The non-Gaussian (or normal distribution) tail of the 2013 distribution indicates it might be possible to improve even more.
We will finish with comparisons of the optimized lineups for 2012 and 2013 to the most usual/expected lineups for those years.
2012 Lineup | 2012 Optimized | 2013 Lineup | 2013 Optimized |
5.03 rpg | 5.11 rpg | 5.29 rpg | 5.34 rpg |
Ian Kinsler | David Murphy | Ian Kinsler | Ian Kinsler |
Elvis Andrus | Adrian Beltre | Elvis Andrus | Lance Berkman |
Josh Hamilton | Josh Hamilton | Lance Berkman | Leonys Martin |
Adrian Beltre | Mitch Moreland | Adrian Beltre | Adrian Beltre |
Micheal Young | Nelson Cruz | Nelson Cruz | Nelson Cruz |
Nelson Cruz | Mike Napoli | AJ Pierzynski | Mitch Moreland |
David Murphy | Ian Kinsler | David Murphy | AJ Pierzynski |
Mike Napoli | Micheal Young | Mitch Moreland | David Murphy |
Mitch Moreland | Elvis Andrus | Leonys Martin | Elvis Andrus |
We’ll start with the big picture. While moving/substituting for Michael Young in 2012 would have made little difference in run production, an optimized lineup would have increased the Rangers’ run total by 13 runs over the course of the year. Not much, but it would likely have been enough to have won the division instead of losing to the A’s. Of course, it is much easier to optimize a lineup when you already know how everyone is going to perform; using an optimized lineup based on 2012 projections wouldn’t have netted the 13 run increase. Most notably, leading off with Murphy (in his breakout year) instead of Kinsler (in his down year) to increase production is not a move one could expect an organization to predict before any games had been played in 2012.
Second, the probable lineup for the Rangers in 2013 is projected to score 8 runs a year less than an optimized lineup. Given the large variance in the production of a hitter as compared to his projections, these lineups seem virtually equivalent.
The optimized lineups show different characteristics than the lineups generated by Ron Washington. The optimized lineups forego Elvis Andrus batting second in preference for a power hitter with good average. Elvis Andrus is instead relegated to the 9th spot. The 2013 optimized lineup puts a lot of faith in rookie Leonys Martin, due entirely to some very respectable projections for the coming year (and not knowing he’s a rookie). Given the uncertainty of how much offense Martin will produce in 2013, have Martin bat in the bottom of the order, as in Ron Washington’s lineup, seems prudent. Finally, Mitch Moreland is preferred in the middle of the lineup in the optimized lineups instead of the bottom of the order as in Washington’s lineups.
If the Rangers are looking to optimize their lineup for 2013, this simulation indicates the two main points to consider: moving Moreland to the middle of the order, and considering batting Andrus 9th.
This is good. But the Rangers missed the real playoffs by one game and lineup optimization would have been a nice way to increase their chances at making it.
yes, but as phys noted it is much easier to optimize a lineup when you already know the results. what would be interesting is seeing how far the rangers lineup was from optimal using their 2012 projections, and also comparing how the actual optimal was originally projected. im sure in the 2012 projection having Young bat 5th would have been much closer to optimal.
Would you be willing to share your model?
In your simulation, how do you account for different handed pitching? Shouldn’t there be two optimized lineups, for play against each hand? Also, in game simulations, do the lineups face relievers of different hands? I.E., would a lineup that stacked righties or lefties see a penalty for doing that, something managers like to avoid in real life?
Great stuff overall!
leeroy: Optimizing the 2012 projections is a good idea; will have to see if I can still find those somewhere.
Marc: Have thought about sharing it, but would need to clean it up and make it user friendly first. It’s in C, so you’d need an appropriate compiler to make it work.
Will: There’s no optimization for different handed pitching or relievers, each batter faces a ‘league average’ pitcher. This is one of the thing things to implement pushing forward, as well as getting a better handle on how speed affects base running and thus scoring. But for getting a feel for the run per game distribution/shape, as well as relative scoring efficiency of various lineups, it should do fine.
I loved this article. Have always been interested in lineup construction. Anyway a series could be made with lienup analysis for each team? That would be awesome.
Considering the size of the effect making any substantive comments is woefully premature. Do you know that protection and/or spot hitting in the order counts for basically nothing? Considering how miniscule the effects are, claiming they are greater than the unknowns is a bold and ignorant step.
I agree with Will. The Rangers, in particular, would probably end up with two very different optimized lineups because of Mitch Moreland who has a career .341 wOBA vs RHP and a career .269 wOBA vs LHP. I understand this would be more difficult to do, but I’d expect Moreland to be optimized, as you say, at #5 vs RHP and possibly not even in the lineup against LHP.
Great article, though! Lineup construction talk is always fun.
50 seasons of games? Is that 162×50=8100 games for each lineup? That may seem like a lot of iterations but I have found that it is not. You are going to get a ton of noise from only simulating 8100 games. Try changing your random number seed and look at how different your results come up.
Roy: Thanks, I’m looking at least doing one NL team, preferable a team with a larger disparity in productivity between players.
df: Not sure what you’re trying to get at; none of the conclusions in the article rest on small differences between the optimizations. The limiting factor for these optimizations will always be variability in player performance vs. player projections, and any of the second order effects will be small in comparison. As noted early in the article, it is a rough first order model, without some of the nuances of second order effects.
DR: You’re quite right with Moreland; he’s not a middle of the order guy with lefties. Without R/L split projections, I demurred on the platoon issue for the time being.
Xeifrank: The statistical error for each simulation was (+/-)0.01 rpg (1-sigma).
Lineup analysis for each team you say? With a simulator? Written in C/C++. Man, you guys missed my whole off-season series on this? 🙂
http://thesoberangels.blogspot.com/2013_01_01_archive.html
I was wondering if you could give me the code for this because I would like to experiment with it as well…
df: Not sure what you’re trying to get at; none of the conclusions in the article rest on small differences between the optimizations. The limiting factor for these optimizations will always be variability in player performance vs. player projections, and any of the second order effects will be small in comparison. As noted early in the article, it is a rough first order model, without some of the nuances of second order effects.
Show your work.