The Value and Consistency of Pitcher Inconsistency

March 13, 2015

There was an article published in 2013 on FanGraphs that focused on the value of starter inconsistency. The basic idea is relatively simple – a starter who does terribly in one start and very well in the next (e.g., 8 runs in 2 innings followed by 2 runs in 8 innings) gives his team better chances to win than one who is mediocre in two starts (5 runs in 5 innings both outings). Mr. Hunter did some math to illustrate the fact, and quantify it somewhat, but it was a relatively rough measure, and I think the concept is intuitive enough not to gain a ton from a rough demonstration. Definitely read that article, though!

I think the first question that comes to mind upon reading that is: is this sustainable? Is consistent inconsistency possible? To find out, I came up with a relatively simple measure of inconsistency within a season. For every pitcher, I calculated the standard deviation of the Game Scores for each of their starts. If you’re not familiar with Game Score, it’s a Bill James-developed metric that gives pitchers points for outs and strikeouts and docks them points for hits, walks, and runs. It’s mostly a narrative stat, but I think it does a good job of illustrating the quality of a given start. The best start of 2014 by Game Score: Clayton Kershaw’s no-hitter against the Rockies, on June 18^th, in which he didn’t allow a hit or walk (damn you Hanley Ramirez) and struck out 15, good for a Game Score of 102. The worst: Colby Lewis’s July 10^th start, in which he went 2.1 innings, gave up 13 hits and gave up 13 runs. Didn’t walk anybody! Still had the abysmal Game Score of -12. The 2014 Rangers, ladies and gentlemen.

By looking at the standard deviation of a season’s worth of Game Scores, we get a measure of the inconsistency of their quality. I set a minimum of 10 starts to qualify, which ensures no one is being labeled consistent off a single week of pitching. The usual caveats apply – pitchers needed to be good enough to pick up 10 starts, so this is a snapshot of usage, not just skill. Before looking at the year-to-year correlation, I want to look at the most consistent and inconsistent starters of 2014.

Rank	Name	Games Started	FIP	Game Score StDev
1	Miles Mikolas	10	4.77	24.73
2	Jerome Williams	11	4.09	23.25
3	Brandon Cumpton	10	3.22	21.97
4	Robbie Ross	12	4.74	20.88
5	Juan Nicasio	14	4.18	20.84
…
178	Jordan Lyles	22	4.22	11.40
179	Kyle Hendricks	13	3.32	10.98
180	Marco Estrada	18	4.88	10.46
181	Mike Fiers	10	2.99	10.00
182	David Buchanan	20	4.27	9.85

Not surprisingly, we see a lot of starters with fairly low numbers of starts, since extreme values (either high or low) are likely to regress toward the variance for the whole sample (15.53 in 2014) as the number of starts increases. On the consistent end, David Buchanan started his first game for the Phillies on May 20^th, and between then and the end of the season, his worst start by Game Score came on June 3^rd, when he gave up 7 runs in 6 innings, striking out 2 and walking 6, good for a Game Score of 28. But for a worst start, that’s not that awful, and his best wasn’t that great either – about two weeks later, on June 19^th, he threw 7.2 innings of 1-run ball, with 1 walk and 4 strikeouts, and a Game Score of 70. The rest of his season was extremely consistent in its mediocrity, with 16 of his 20 starts having Game Scores between 40 and 60, so it’s no surprise that he takes the bottom spot on this list.

Miles Mikolas was worse, but also much more erratic, with outings like his on August 25^th (8 innings, 1 walk, 5 strike outs, and no runs, Game Score of 80) and on July 7^th (3.1 innings, 0 walks, 5 strike outs (looks fine so far!) and 9 runs (oh), Game Score of 5). Between those two starts, he had an RA9 of 7.15, but my guess is he gave the Rangers a much higher expected win percentage than if he had evenly distributed those runs across two 6-inning outings.

But does this mean anything when it comes to evaluation? Should a GM view one of the inconsistent starters with a little more optimism for 2015 than one of the consistent starters? In a word, no.

year to year

That is a pile of random points, and a resulting R² value that is basically zero. The inconsistency of a pitcher in 2013 had almost nothing to do with their inconsistency in 2014, so while inconsistency is a hidden way for a pitcher’s results to be better than they look, it doesn’t appear to be a skill.

Even if this was predictable, though, this doesn’t seem to be the sort of thing that would swing the needle too far in either direction. The theoretical argument makes sense, but in practice, there are lots of mitigating factors that might make consistency more valuable. Maybe the starter the day before got bombed, and the bullpen really just needs a day off, and a 100% chance of 6 innings/4 runs is more valuable to the team that day than a 50% chance of 8 innings/1 run and 4 innings/7 runs. There’s also just a lot of randomness, probably enough to drown out the small effect. Inconsistency isn’t consistent year-to-year, and it also isn’t predictable. If a pitcher could control what games he was bad, and bank some great innings to use when he needed them, that would be a big deal. They can’t.

Managers, however, can. They can use their bad innings in games where the outcome is already practically decided, and save their best innings for the tightest of moments, with optimal bullpen use. Day-to-day inconsistency of a pitcher isn’t predictable, but pitcher-to-pitcher inconsistency of a bullpen is, and a similar argument for its value applies. A team with a lights-out closer (FIP of 2.00) and a pretty terrible long man (FIP of 5.00) is going to win more games than a team with two okay relievers (FIP of 3.50 for both), if the manager of the first team deploys his closer in close games and lets the other pitcher eat innings in blowouts. The ability to choose those spots makes the effect potentially much larger than among starters.

Balancing that, however, is the fact that relievers just have a much smaller effect on the game, so this still might not be big enough to matter. However, if it did have a noticeable effect, it would give a team an edge that wouldn’t be reflected in measures of collective performance, and so this could be one reason a team beat its BaseRuns estimated record. To see if that was perhaps the case in 2014, I developed a simple measure of bullpen-wide inconsistency. After discarding some more complicated ideas, I settled on calculating the standard deviation for each team’s eight relief pitchers that threw the most innings. This picks up most of each bullpen’s regulars and semi-regulars, and should be an okay measure of the distribution of skill in a bullpen.

Again, I wanted to first look at the most and least consistent bullpens of 2014 by this measure.

Rank	Team	Innings	FIP	WAR	StDev
1	KCR	464.0	3.29	5.9	1.65
2	HOU	468.2	4.11	0.4	1.54
3	OAK	467.1	3.47	4.0	1.35
…
28	MIN	521.2	3.88	2.0	0.51
29	MIA	510.1	3.20	4.6	0.50
30	SEA	498.1	3.24	4.5	0.50

Seeing the Royals as the most inconsistent bullpen of 2014 is not a surprise. On the one hand, Wade Davis (1.19 FIP), Kelvin Herrera (2.69) and Greg Holland (1.83) combined to throw over 200 innings of absurdly good relief. The next five most-used relievers, however, were Aaron Crow (5.40 FIP), Louis Coleman (5.69), Francisley Bueno (3.84), Michael Mariot (3.93), and Tim Collins (4.80). Those are not good pitchers, and that’s a huge gap between the two groups, but by using the top three in close games and letting the other five eat as many non-crucial innings as possible, Kansas City might have been able to win a lot more games than a bullpen with eight relievers with FIPs around 3.30 (the figure for the bullpen as a whole). The Royals are also a good example of why the advantages of inconsistency might just not show up – Ned Yost was (in-)famous for not using his bullpen optimally, and sticking to strictly defined roles with his relievers, which is the sort of thing that could nullify this effect.

The consistent bullpens are pretty boring, so I won’t spend much time on them. Seattle’s worst reliever by FIP in the eight most-used was Joe Beimel (4.18), and the best was Charlie Furbush (2.80), with the other six spread fairly evenly between them. Consistency has advantages, but not being able to turn to a true shutdown reliever when needed, or having to use a fairly valuable arm even in a blowout, might have its own costs, even compared to a bullpen with similar overall skill, such as Kansas City.

Unfortunately, either because of manager incompetence, the smallness of the effect, or something else entirely, bullpen inconsistency does very little to explain BaseRuns over- or under-performance in 2014. In the below graph, teams that beat their BaseRuns record are on the right, while those that fell below are on the left, and more inconsistent bullpens are higher versus consistent bullpens lower.

base runs and bullpen variance

That, again, is basically a random collection of points. In the top right, the Royals, both the most inconsistent bullpen and the team with the biggest positive gap between their actual winning percentage and the BaseRuns estimate (5.0%). But in the top left, Houston, the second-most inconsistent bullpen and the second-largest negative gap between their actual and BaseRuns winning percentages (-4.6%).

At best, this is inconclusive, but I find the idea really interesting. This does at least show that, on an individual pitcher basis, inconsistency is not predictable, even when looking at previous years, which I think bucks conventional wisdom in a real way. Seeing what bullpens and pitchers were particularly erratic in 2014 is fun, and it’s something I’ll be keeping an eye on in 2015.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG