## Constructing a Lineup for the Blue Jays

Jose Bautista recently came off the DL for a second time this year. John Gibbons has stated that he will predominantly DH and not occupy his usual spot in RF (this is good news if you’re a fan of the Blue Jays or a fan of outfield defense in general). But perhaps the bigger question is, where in the lineup is he going to hit? Last year he was their No. 3. He then moved to the leadoff spot, and he’s even hit second for 10 games this year.

The reason this is even a perceived issue is that Devon Travis has looked quite decent in the leadoff spot in Bautista’s absence. But let’s get something straight: Travis is no Bautista. Coming into the 2016 season, Bautista’s numbers compared to every other human in MLB since 2010 have him 1^{st} in HR, 4^{th} in wRC+, 3^{rd} in wOBA, 3^{rd} in runs scored, and 2^{nd} in BB%. That’s spectacular. The issue has become do you keep *both* Travis and Bautista at the top of the lineup and simply shift Josh Donaldson and Edwin Encarnacion down one slot?

That’s the reason I started thinking about this; I was perplexed that *both* Bautista and Travis were going to be put ahead of Donaldson and Encarnacion. The idea that the Blue Jays’ two best hitters would be moved down the lineup for a player with just about a full season of MLB under his belt, and another who’s had a little more than 80 plate appearances since late June didn’t add up. This is a pennant race, supposedly the most critical time of the year.

Don’t get me wrong, this isn’t about dumping on Bautista, or even Travis. They’re great and good hitters, respectively. It’s about maximizing the production of your lineup. So before I moan and let everyone know my opinion is best, I thought I’d look at the data and let the numbers speak for themselves.

Hitting statistics from 2002 to 2015 were gathered and filtered by batting order. This produced 420 cases (each team the last 14 years) with six variables per place in the batting order (variables were: wOBA, BB%, ISO, wRC+, OBP, & HR). Data was then analyzed utilizing multiple regression analyses to identify what metrics at different spots in the order best predicted team runs. Results can be seen below.

Figure 1. R^{2 }values for total team runs with wOBA values for each spot in the batting order.

The most obvious component of Figure 1 is the drop in R^{2} for the 3^{rd} place hitter. At first this may seem counter-intuitive, as it’s typically assumed that your 3^{rd} place hitter is the team’s best. And as that that player goes, so should the team. But the most likely rationale is that most teams, regardless of how awful or great they are, can typically muster at least one decent hitter. They place that hitter 3^{rd} and away they go. Think about the Blue Jays and the Tigers in 2015 — they had Miguel Cabrera and Jose Bautista. So comparing Cabrera and Bautista’s 2015 stats shows an edge for Cabrera with a .413 wOBA compared to Bautista’s .389. Yet the Blue Jays vastly outscored the Tigers. This is because good to great teams have more than one “3^{rd} place hitter.” And they apparently stack them 2^{nd} and 4^{th} in the order. In fact, these two spots in the batting order combined to account for slightly more than 50% of the variance explained when analyzing team runs (using the 2^{nd} and 4^{th} place hitter’s wOBA).

So if a team like the Blue Jays can afford to have Jose Bautista taking up the “menial” 3^{rd} spot in the order, shouldn’t they do it? The mean wRC+ and wOBA for No. 3 hitters in the AL last year was 116 and .351. Bautista, who’s having a down year by his standards, has a wRC+ of 115 and wOBA of .346 as of August 29^{th}. ZiPS has him closing out the year with a wRC+ of 132 and wOBA of .369, well above league average. So it could work.

But does Bautista hitting 3^{rd} help them?

Well, since Donaldson has the best offensive numbers (and is the reigning MVP), the offense should be built around him. And the 2^{nd} and 4^{th} spots in the order are the most crucial in the presented team-runs-scored analysis, so we’re going to move forward with the idea that Donaldson hits 2^{nd}. A runs-scored model that predicts the amount of runs a player will score when batting in the #2 spot can be seen in Figure 2.

Figure 2. Predicted runs for the #2 hitter by actual runs scored. Using wOBA values for all 9 hitters [((315.5*wOBA2)+(132.15*wOBA3)+(65.6*wOBA4)+(71.63*wOBA5))-100].

This analysis produces an R^{2} value of .7 and incorporates, in order of model entry, the wOBA of: the 2^{nd} hitter, 3^{rd} hitter, 4^{th} hitter, and 5^{th} hitter respectively. The variables that enter are quite sensible; the greatest amount of variability of the 2^{nd} hitters runs scored is the wOBA of *that* hitter. Followed by the next three spots in the lineup, all in sequential order. If this is done for each spot in the lineup the same pattern emerges. Where the hitter’s own wOBA is the greatest source of runs-scored variability and the 2-3 hitters following him account for an additional ~20%. So essentially, if you want to score runs, bunch your hitters together. Don’t spread them out, and don’t try to place a poor hitter in the middle to get him more fastballs. Stick all of your threats in a row.

But is there a specific combination of clustering? Predicting how many runs Donaldson will score in a season, depending on the order of Bautista and Encarnacion around him, reveals the following results: *132 runs/season* where the order is JB – JD – EE, *130 runs/season* where the order is JD – EE – JB, and *129 runs/season* where the combination is JD –JB – EE. Again, this is the number of runs scored specifically by Donaldson when batting in the 2^{nd} spot of the order. These results reveal error-term differences between combinations, something that wouldn’t be significant over the remaining ~30 games of the season. So as long as they’re clustered together it’s fine. This indicates that sensible lineup options have JB – JD – EE batting in succession, with Donaldson occupying the 2^{nd} spot, and with some combination of Travis, Troy Tulowitzki, and Russell Martin surrounding them (pending on matchups/splits/who’s hot/etc.).

If clustering these three hitters together is the option, the question that follows will invariably be: who leads off? Answer — It really doesn’t seem to matter (as long as that person isn’t Kevin Pillar). Using a similar run-prediction model has Travis, Martin, Tulo, Saunders, Bautista, & Upton all averaging over 110 runs/season when batting leadoff having JD and EE behind them.