BB and K%

by Slava Heretz

April 19, 2014

The great debate has been raging for years: which strikeout-related metric is a better predictor of actual pitching success? Some would say there is no right or wrong answer — that each metric has it’s own unique merit and value. That one must look at certain strikeout-related metrics in combination with others. Unfortunately, as tragic as it may seem, statistical evidence begs to differ. Statistics tell us there is in fact a right answer, and it’s a whopper.

Let’s start with K/9. Looking at all 2013 pitchers with 80+ innings, the correlation (R²) between strikeouts per 9 and ERA is a solid .1081. This correlation has been consistent, plus or minus a few hundredths, for the past five years. So nothing exciting or anomalous can be found in looking at other seasons. Yu Darvish leads the category with Tony Cingrani, Max Scherzer, Anibal Sanchez, and A.J. Burnett rounding out the top five. Additionally, eight of the top ten K/9 leaders ended up with sub 3.10 ERAs. So a decent indicator all-around.

K/BB get’s a bit more interesting. We see a jump in linear correlation to .1671 — more than a 50% increase over K/9. Clayton Kershaw, Cliff Lee, and Adam Wainwright all leap into the top ten of this metric, with Hisashi Iwakuma climbing into the top fifteen — four elite hurlers in 2013 left out of the K/9 leaderboard.

But the real gem is K%. It shows double the correlation versus K/9. Plus, the top fifteen in this category ended the year with sub 3.30 ERA — whereas Scott Kazmir (4.04) and Josh Johnson (6.20) smeared the good name of the K/9 leaderboard; with Kevin Slowey (4.11) and Dan Haren (4.67) unpleasantly loitering on the K/BB board.

The reason K% is so powerful is that it simplifies how effective a pitcher is at simply striking out each batter he faces. When BABIP gets involved — as it does for K/9 (high BABIP pitchers are rewarded on K/9 since the number of outs remains the same even if they’re giving up, say, 10+ hits per game) — the value of each strikeout is severely reduced.

To recap:

2013	R² (correlation to ERA)
K/9	.1081
K/BB	.1671
K%	.2089

So should we end the debate completely? No. But if you asked me to put money on Tim Lincecum, a career 25.8 K% pitcher with no decline in the stat over the past 2 years, over Tyler Chatwood, a career 13.0 K% who had a breakout year in 2013 with his freakish 76.3% LOB, I would bet on Lincecum every doggone time.

9 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Noah Baron

11 years ago

Great analysis. My only quibble would be a preference to see the graphs in a scatter plot form, but I think the bar graph was pretty good as well.

Also, maybe we would get different results if we set a minimum innings threshold. There’s a reason Josh Johnson is the huge outlier; He only pitched 81 innings. I feel like the correlations end up being more reflective on where the outliers fall, rather than the overall trend.

Either way, interesting results. I’m somewhat surprised K/BB doesn’t have the greatest correlation with ERA.

Slava Heretz

Reply to Noah Baron

Interesting thought. The reasoning behind 80+ innings was to create a larger sample size. My gut says results would stay the same, but I’d be curious to run data on totals over the course of, say, the past 5 seasons with, say again, guys with 300+ innings. This would add relievers to the data set as well.

KK-Swizzle

Awesome! I’ve been thinking K% was the way to go for quite some time now for the exact reasons you ended up specifying…It’s nice to see my hunch verified by some simple, effective analysis!

AE1324

Hate to nitpick, but R squared is not correlation. R squared is a measure of how well the data fits around your model or line of best fit (correlation). The higher the number, the better predictor and more accurate your model or line is.

Correlation is a number between -1 and 1.

Reply to AE1324

Forgot to add that R squared explains variability. The more variability that’s explained, the more accurate your line or model!

Interesting analysis though. I’ve always used K/BB…

Peter Jensen

Why the neglect of K-B in your analysis?

Nathaniel Dawson

Reply to Peter Jensen

Yeah, if you want to use two components, K rate minus BB rate should have a better correlation than K to BB ratio.

Spitball McPhee

What about k%/bb%…?

-1

a eskpert

Reply to Spitball McPhee

It’s the same as k/bb

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG