Author: Wisconsins West Coast

Author Archive

Evaluating the Eno Sarris Pitcher Analysis Method

September 27, 2014

For regular listeners of the Sleeper and the Bust podcast , I do not need to tell you what the Eno Sarris Pitcher Analysis Method is (let’s drop the Eno and leave the Sarris so we can call it SPAM). For those who aren’t familiar, you can see it at work in this article and this one over here. Basically, it is based on the idea that a pitcher can be evaluated by comparing their performance in several key metrics against league averages. We are primarily looking at swinging strike rates and groundball rates by pitch type.

I wanted to see how well this method works, so I grabbed my handy Excel toolkit and pulled down lots of pitching data. Unfortunately, pitch-type PITCHf/x data is not on the FanGraphs leaderboard (come on, Appelman!), so I headed on over to Baseball Prospectus to use their PITCHf/x leaderboards. I pulled the GB/BIP, swing%, whiff/swing, and velocity data for all starters that threw at least 50 of each pitch type in a given season. Is 50 pitches an arbitrary cut-off? Yes, yes it is.

I included four seam fastballs, two seam fastballs, cut fastballs, curves, sliders, changeups, and splitfingers. I used all the data that was available, which goes back to 2007. And, because I am impatient and couldn’t wait until the 2014 season was in the books, I didn’t include data from the last two weeks of this season. I calculated the swinging strike % by multiplying the swing % and the whiff/swing values together. After this, I pulled the K%, ERA, and WHIP data from the FanGraphs leaderboards. In all, I analyzed 1,851 pitcher-seasons.

Note: the swinging strike rates I calculated do differ from those on the player pages at FanGraphs. I’m not sure why there is a discrepancy since they are both based on PITCHf/x data, but there is one. Therefore, I did not use the FanGraphs pitch-type benchmarks in this analysis.

I pulled K%, ERA, and WHIP because I wanted to use these as proxies for pitching outcomes (i.e. my dependent variables). I amended SPAM to include four-seam velocity, because we all know how much of an effect velocity has on run prevention.

Here’s how I did this. I first calculated the league averages for each metric for each season to account for the pitching environment of that season. The table below shows the league average values for each of the metrics for each season.

	FF	FT	FC	CU	SL	CH	FS
Year	SwStr%	SwStr%	SwStr%	SwStr%	SwStr%	SwStr%	SwStr%
2007	6.1%	4.6%	10.0%	10.2%	13.4%	13.1%	13.9%
2008	5.9%	4.5%	9.7%	9.7%	14.2%	13.0%	14.1%
2009	6.1%	4.7%	9.7%	10.1%	14.1%	12.5%	15.2%
2010	6.0%	4.8%	9.8%	9.5%	14.1%	13.5%	14.5%
2011	6.3%	4.5%	9.1%	9.9%	14.9%	12.8%	14.7%
2012	6.6%	5.0%	10.3%	10.9%	15.6%	13.1%	15.5%
2013	6.7%	5.1%	9.3%	10.5%	15.0%	13.8%	17.2%
2014	6.56%	5.1%	9.8%	10.7%	15.5%	14.3%	17.4%

	FF	FT	FC	CU	SL	CH	FS	FF
Year	GB%	GB%	GB%	GB%	GB%	GB%	GB%	Velocity	BB%
2007	33.8%	49.8%	44.8%	47.2%	42.9%	48.1%	52.9%	91.06	8.92%
2008	33.2%	49.9%	44.1%	48.7%	44.1%	46.8%	52.0%	90.87	9.17%
2009	33.1%	48.9%	42.9%	50.6%	43.7%	47.2%	53.5%	91.17	9.13%
2010	35.6%	48.9%	43.9%	50.1%	44.0%	47.6%	52.9%	91.22	8.61%
2011	33.8%	49.9%	45.2%	48.9%	45.8%	47.3%	54.7%	91.57	8.23%
2012	34.0%	50.9%	43.8%	52.2%	43.9%	48.6%	53.2%	91.76	8.36%
2013	34.6%	51.4%	45.0%	50.2%	45.8%	47.4%	54.6%	92.02	8.33%
2014	35.8%	50.6%	46.1%	49.9%	45.3%	50.3%	52.7%	92.24	7.84%

I then gave each pitcher one point for each metric that was above league average. For example, King Felix this year gets above average whiffs on five pitches, gets above average grounders on four pitches and has above average four-seam velocity, so he gets ten points. I then computed the SPAM score for each pitcher in each season by summing the scores for the individual metrics.

Here is a table of some randomly-selected pitcher-seasons to give you an idea of the types of SPAM scores I found. This table shows you that there are certainly outliers, guys with good results and bad scores or vice-versa.

Player	Year	Score	ERA	WHIP
Felix Hernandez	2014	10	2.07	0.91
Zach McAllister	2014	6	5.51	1.49
Yu Darvish	2012	11	3.90	1.28
Bronson Arroyo	2011	2	5.07	1.37
Drew Pomeranz	2011	1	5.40	1.31
Johan Santana	2008	7	2.53	1.15
Zack Greinke	2008	8	3.47	1.28
Edinson Volquez	2010	9	4.31	1.50

Before we dive into the results, I am not a statistician, but I am an engineer, so maybe I’m not completely off the hook. I am looking at these results from a high level and a simple perspective. Maybe I can build off these results and look for deeper connections in the future. First, let’s just look at some averages.

	SPAM without BB%
	Averages in Each SPAM Bin
SPAM Score	ERA	WHIP	K%	# of Pitcher Seasons
0	7.27	1.77	13.3%	44
1	5.92	1.62	14.1%	120
2	5.64	1.56	15.4%	218
3	5.05	1.49	15.9%	298
4	4.72	1.42	16.9%	297
5	4.52	1.40	17.2%	293
6	4.14	1.34	18.7%	226
7	4.02	1.31	19.7%	182
8	3.79	1.30	19.9%	110
9	3.60	1.27	20.9%	38
10	3.39	1.20	22.1%	17
11	3.42	1.21	23.0%	7
12	3.45	1.12	26.8%	1

The above table shows the average K%, ERA, WHIP for each SPAM score, along with the number of pitcher-seasons that earned that score.

Finally, onto the scatter plots! First up, we have the K% vs. SPAM score graph. We expect this one to have a strong positive correlation, since whiff rates and velocity normally correspond to strikeouts (ground balls, not so much). I used a simple linear regression, since it seemed to be the best fit and the easiest to understand.

Here is the WHIP vs. SPAM score graph.

Here is the ERA vs. SPAM score graph.

Obviously, none of these show strong R² values, but the table of averages above and these graphs do show there is a clear trend here, with higher scores mostly leading to lower ERAs and WHIPs, and higher K%.

None of the above accounts for control directly, so I thought I would try adding BB% as another metric to the SPAM score. I computed the league average walk rate for each season and handed out the points. The addition of BB% changed the values, but didn’t really impact the trends. Below is the averages table for the SPAM scores with BB%. Below that, you will find the three graphs again. The linear trend lines are a little better fit now, but nothing earth-shattering.

	SPAM with BB%
	Averages in Each SPAM Bin
SPAM Score	ERA	WHIP	K%	# of Pitcher Seasons
0	7.70	1.88	13.1%	27
1	6.38	1.73	13.7%	73
2	6.02	1.65	15.0%	160
3	5.24	1.52	15.7%	254
4	4.89	1.45	16.2%	287
5	4.69	1.42	17.2%	289
6	4.34	1.37	17.4%	270
7	4.03	1.31	19.1%	197
8	3.90	1.29	20.0%	160
9	3.71	1.27	20.1%	80
10	3.44	1.22	21.0%	34
11	3.41	1.20	22.5%	14
12	3.36	1.20	21.5%	5
13	3.45	1.12	26.8%	1

So, what does all this tell us? Well, it seems that Eno’s SPAM method does a pretty good job of identifying pitchers that will be successful and is useful for identifying breakout pitchers. The beauty of this method is that it does not require a lot of data. Per-pitch metrics stabilize faster than per plate-appearance ones, so we can start to evaluate pitchers after only a start or two instead of waiting for the 170 PA required for BB% or the 70 PA for K%. I plan on digging deeper into this data over the offseason to see if I can pull any more insights from it. Please let me know in the comments if you think of something worth investigating further. Eno, if you are reading this, I hope I gave your method the treatment it deserves. And, as I do in all of my online ramblings, I will end with Tschüs!

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG