Identifying Impact Hitters: Proof of Concept

by Snerd

December 8, 2017

Earlier this season I set out to build a tool similar in nature to my dSCORE tool, except this one was meant to identify swing-change hitters. Along the course of its construction and early-alpha testing, it morphed into something different, and maybe something more useful. What I ended up with was a tool called cHit (“change Hit”, named for swing changers but really I was just too lazy to bother coming up with a more apt acronym for what the tool actually does). cHit, in its current beta form, aims to identify hitters that tend to profile for “impact production” — simply defined as hit balls hard, and hit them in the air. Other research has identified those as ideal for XBH, so I really didn’t need to reinvent the wheel. Although I’d really like to pull in Statcast data offerings in a more refined form of this tool, simple batted ball data offered here on FanGraphs does the trick nicely.

The inner workings of this tool takes six different data points (BB%, GB%, FB%, Hard%, Soft%, Spd), compares each individual player’s stat against a league midpoint for that stat, then buffs it using a multiplier that serves to normalize each stat based on its importance to ISO. I chose ISO as it’s a pretty clean catch-all for power output.

Now here’s the trick of this tool: it’s not going to identify “good” hitters from “bad” hitters. Quality sticks like Jean Segura, Dee Gordon, Cesar Hernandez, and others show up at the bottom of the results because their game doesn’t base itself on the long ball. They do just fine for themselves hitting softer liners or ground balls and using their legs for production. Frankly, chances are if a player at the bottom of the list has a high Speed component, they’ve got a decent chance of success despite a low cHit. Nuance needs to be accounted for by the user.

Here’s how I use it to identify swing-changers (and/or regression candidates): I pulled in data for previous years, back to 2014. I compared 2017 data to 2016 data (I’ll add in comparisons for previous years in later iterations) and simply checked to see who were cHit risers or fallers. The results were telling — players we have on record as swing changers show up with significant positive gains, and players that endured some significant regression fell.

There’s an unintended, possible third use for this tool: identifying injured hitters. Gregory Polanco, Freddie Freeman, and Matt Holliday all suffered/played through injury this year, and they all fell precipitously in the rankings. I’ll need a larger sample size to see whether injuries and a fall in cHit are related or if that’s just noise.

Data!

cHit 2017

Name	Team	Age	AB	cHit Score	BB%	GB%	FB%	Hard%	Soft%	Spd	ISO
Joey Gallo	Rangers	23	449	27.56	14.10%	27.90%	54.20%	46.40%	14.70%	5.5	0.327
J.D. Martinez	– – –	29	432	23.52	10.80%	38.30%	43.20%	49.00%	14.00%	4.7	0.387
Matt Carpenter	Cardinals	31	497	22.46	17.50%	26.90%	50.80%	42.20%	12.10%	3.1	0.209
Aaron Judge	Yankees	25	542	21.56	18.70%	34.90%	43.20%	45.30%	11.20%	4.8	0.343
Lucas Duda	– – –	31	423	19.69	12.20%	30.30%	48.60%	42.10%	14.50%	0.5	0.279
Cody Bellinger	Dodgers	21	480	19.26	11.70%	35.30%	47.10%	43.00%	14.00%	5.5	0.315
Miguel Sano	Twins	24	424	17.73	11.20%	38.90%	40.50%	44.80%	13.50%	2.9	0.243
Jay Bruce	– – –	30	555	16.50	9.20%	32.50%	46.70%	40.30%	11.70%	2.6	0.254
Trevor Story	Rockies	24	503	16.39	8.80%	33.70%	47.90%	40.30%	14.40%	4.7	0.219
Justin Turner	Dodgers	32	457	16.16	10.90%	31.40%	47.80%	38.90%	9.80%	3.3	0.208
Khris Davis	Athletics	29	566	15.64	11.20%	38.40%	42.30%	42.10%	13.50%	3.4	0.281
Brandon Belt	Giants	29	382	15.38	14.60%	29.70%	46.90%	38.40%	14.00%	4.2	0.228
Nick Castellanos	Tigers	25	614	14.94	6.20%	37.30%	38.20%	43.40%	11.50%	4.6	0.218
Eric Thames	Brewers	30	469	14.52	13.60%	38.40%	41.30%	41.50%	16.00%	4.6	0.271
Justin Upton	– – –	29	557	14.43	11.70%	36.80%	43.70%	41.00%	19.80%	4	0.268
Justin Smoak	Blue Jays	30	560	14.38	11.50%	34.30%	44.50%	39.40%	13.10%	1.7	0.259
Wil Myers	Padres	26	567	14.32	10.80%	37.50%	42.90%	41.40%	19.50%	5.3	0.220
Paul Goldschmidt	Diamondbacks	29	558	14.31	14.10%	46.30%	34.90%	44.30%	11.30%	5.6	0.265
Chris Davis	Orioles	31	456	14.28	11.60%	36.70%	39.80%	41.50%	12.80%	2.7	0.208
Kyle Seager	Mariners	29	578	13.57	8.90%	31.30%	51.60%	35.70%	13.10%	2.2	0.201
Nelson Cruz	Mariners	36	556	13.35	10.90%	40.40%	41.80%	40.70%	14.70%	1.7	0.261
Mike Zunino	Mariners	26	387	13.31	9.00%	32.00%	45.60%	38.60%	17.50%	1.9	0.258
Mike Trout	Angels	25	402	13.16	18.50%	36.70%	44.90%	38.30%	19.00%	6.2	0.323
Corey Seager	Dodgers	23	539	13.08	10.90%	42.10%	33.10%	44.00%	12.90%	2.7	0.184
Logan Morrison	Rays	29	512	12.74	13.50%	33.30%	46.20%	37.40%	17.50%	2.4	0.270
Randal Grichuk	Cardinals	25	412	12.61	5.90%	35.90%	42.70%	40.20%	18.20%	5.2	0.235
Salvador Perez	Royals	27	471	12.50	3.40%	33.30%	47.00%	38.10%	16.50%	2.4	0.227
Michael Conforto	Mets	24	373	12.42	13.00%	37.80%	37.80%	41.60%	20.20%	3.6	0.276
Matt Davidson	White Sox	26	414	12.19	4.30%	36.20%	46.50%	38.20%	15.80%	1.8	0.232
Mike Napoli	Rangers	35	425	12.15	10.10%	33.20%	52.10%	35.50%	21.90%	2.7	0.235
Miguel Cabrera	Tigers	34	469	12.03	10.20%	39.80%	32.90%	42.50%	9.90%	1.1	0.149
Brandon Moss	Royals	33	362	11.83	9.20%	33.10%	44.50%	37.30%	13.60%	2.3	0.221
Curtis Granderson	– – –	36	449	11.69	13.50%	32.60%	48.80%	35.30%	17.60%	4.8	0.241
Ian Kinsler	Tigers	35	551	11.64	9.00%	32.90%	46.50%	37.00%	18.70%	5.6	0.176
Edwin Encarnacion	Indians	34	554	11.01	15.50%	37.10%	41.80%	37.60%	15.50%	2.7	0.245
Manny Machado	Orioles	24	630	10.79	7.20%	42.10%	42.10%	39.50%	18.50%	3.3	0.213
Freddie Freeman	Braves	27	440	10.72	12.60%	34.90%	40.60%	37.50%	12.40%	4.3	0.280
Nolan Arenado	Rockies	26	606	10.60	9.10%	34.00%	44.90%	36.70%	17.60%	4.1	0.277
Anthony Rendon	Nationals	27	508	10.41	13.90%	34.00%	47.20%	34.30%	13.00%	3.5	0.232
Yonder Alonso	– – –	30	451	10.34	13.10%	33.90%	43.20%	36.00%	13.20%	2.4	0.235
Kyle Schwarber	Cubs	24	422	10.24	12.10%	38.30%	46.50%	36.40%	21.30%	2.8	0.256
Carlos Gomez	Rangers	31	368	10.19	7.30%	39.10%	40.30%	39.00%	16.50%	5	0.207
Luis Valbuena	Angels	31	347	9.81	12.00%	38.40%	47.30%	35.80%	22.00%	1.3	0.233
Dexter Fowler	Cardinals	31	420	9.61	12.80%	39.40%	38.20%	38.10%	12.70%	5.9	0.224
Jed Lowrie	Athletics	33	567	9.40	11.30%	29.40%	43.50%	34.50%	12.10%	2.7	0.171
Giancarlo Stanton	Marlins	27	597	8.96	12.30%	44.60%	39.40%	38.90%	20.80%	2.3	0.350
Jose Abreu	White Sox	30	621	8.95	5.20%	45.30%	36.40%	40.50%	15.80%	4.4	0.248
Josh Donaldson	Blue Jays	31	415	8.92	15.30%	41.00%	42.30%	36.30%	17.30%	1.6	0.289
Joey Votto	Reds	33	559	8.87	19.00%	39.00%	38.00%	36.30%	10.40%	2.8	0.258
Victor Martinez	Tigers	38	392	8.75	8.30%	42.10%	34.20%	39.90%	12.40%	0.9	0.117
Charlie Blackmon	Rockies	31	644	8.63	9.00%	40.70%	37.00%	39.00%	17.10%	6.4	0.270
Mitch Moreland	Red Sox	31	508	8.43	9.90%	43.40%	36.20%	38.90%	13.50%	1.7	0.197
Scott Schebler	Reds	26	473	8.29	7.30%	45.60%	38.20%	39.40%	19.30%	3.9	0.252
Paul DeJong	Cardinals	23	417	8.19	4.70%	33.70%	42.90%	36.40%	21.40%	2.5	0.247
Ryan Zimmerman	Nationals	32	524	8.18	7.60%	46.40%	33.70%	40.50%	14.10%	2.2	0.269
Mookie Betts	Red Sox	24	628	7.76	10.80%	40.40%	42.80%	35.70%	18.20%	5.5	0.194
Rougned Odor	Rangers	23	607	7.61	4.90%	41.50%	42.20%	36.80%	18.50%	5.6	0.193
Francisco Lindor	Indians	23	651	7.42	8.30%	39.20%	42.40%	35.20%	14.30%	5.1	0.232
Brad Miller	Rays	27	338	7.39	15.50%	47.40%	36.10%	38.40%	18.10%	4.6	0.136
Daniel Murphy	Nationals	32	534	6.97	8.80%	33.50%	38.90%	35.70%	16.70%	3.8	0.221
Travis Shaw	Brewers	27	538	6.87	9.90%	42.50%	37.60%	37.10%	15.80%	4.5	0.240
Jake Lamb	Diamondbacks	26	536	6.86	13.70%	41.10%	38.30%	35.70%	12.90%	4.4	0.239
Todd Frazier	– – –	31	474	6.75	14.40%	34.20%	47.50%	32.20%	23.20%	3.1	0.215
Yasmani Grandal	Dodgers	28	438	6.63	8.30%	43.50%	40.00%	36.50%	17.60%	1.1	0.212
Brian Dozier	Twins	30	617	6.60	11.10%	38.40%	42.60%	34.10%	15.90%	5.2	0.227
Adam Duvall	Reds	28	587	6.55	6.00%	33.20%	48.60%	31.80%	17.50%	3.9	0.232
Hunter Renfroe	Padres	25	445	6.52	5.60%	37.90%	45.40%	34.60%	23.50%	3.2	0.236
Justin Bour	Marlins	29	377	6.40	11.00%	43.40%	33.60%	38.80%	19.60%	1.6	0.247
Carlos Correa	Astros	22	422	6.33	11.00%	47.90%	31.70%	39.50%	15.00%	3.2	0.235
Marcell Ozuna	Marlins	26	613	6.09	9.40%	47.10%	33.50%	39.10%	18.30%	2.3	0.237
Domingo Santana	Brewers	24	525	5.85	12.00%	44.90%	27.70%	39.70%	11.70%	4	0.227
Kris Bryant	Cubs	25	549	5.83	14.30%	37.70%	42.40%	32.80%	14.80%	4.4	0.242
Gary Sanchez	Yankees	24	471	5.47	7.60%	42.30%	36.60%	36.90%	18.60%	2.6	0.253
Asdrubal Cabrera	Mets	31	479	5.46	9.30%	43.50%	36.20%	36.80%	17.20%	2.5	0.154
Austin Hedges	Padres	24	387	5.37	5.50%	36.60%	45.70%	33.10%	22.30%	2.7	0.183
Logan Forsythe	Dodgers	30	361	5.33	15.70%	44.00%	33.10%	36.60%	13.20%	2.8	0.102
Yadier Molina	Cardinals	34	501	5.25	5.20%	42.20%	37.40%	36.40%	16.50%	3.9	0.166
Bryce Harper	Nationals	24	420	5.07	13.80%	40.40%	37.60%	34.30%	13.30%	3.7	0.276
Neil Walker	– – –	31	385	5.01	12.30%	36.20%	41.70%	32.80%	17.70%	2.8	0.174
Aaron Altherr	Phillies	26	372	5.01	7.80%	43.10%	37.50%	36.40%	20.10%	5.5	0.245
Andrew McCutchen	Pirates	30	570	4.90	11.20%	40.70%	37.40%	35.20%	17.50%	4.3	0.207
Eduardo Escobar	Twins	28	457	4.86	6.60%	33.70%	45.30%	31.40%	16.00%	5.1	0.195
Anthony Rizzo	Cubs	27	572	4.79	13.20%	40.70%	39.20%	34.40%	19.80%	4.4	0.234
Ryan Braun	Brewers	33	380	4.73	8.90%	49.20%	31.90%	39.00%	19.20%	5.3	0.218
Kendrys Morales	Blue Jays	34	557	4.56	7.10%	48.40%	33.20%	37.90%	15.20%	1.1	0.196
Jose Ramirez	Indians	24	585	4.54	8.10%	38.90%	39.70%	34.00%	16.70%	6	0.265
Mike Moustakas	Royals	28	555	4.51	5.70%	34.80%	45.70%	31.90%	21.20%	1.1	0.249
Andrew Benintendi	Red Sox	22	573	4.50	10.60%	40.10%	38.40%	34.30%	16.60%	4.5	0.154
Jose Bautista	Blue Jays	36	587	4.47	12.20%	37.70%	45.80%	31.40%	21.70%	3.4	0.164
Jason Castro	Twins	30	356	4.36	11.10%	41.90%	33.50%	36.00%	14.00%	1.5	0.146
Albert Pujols	Angels	37	593	4.12	5.80%	43.50%	38.10%	35.10%	15.90%	2.1	0.145
Hanley Ramirez	Red Sox	33	496	4.04	9.20%	41.80%	37.10%	35.30%	20.00%	1.5	0.188
Tommy Joseph	Phillies	25	495	3.99	6.20%	41.70%	39.00%	35.00%	20.90%	2.2	0.192
Tim Beckham	– – –	27	533	3.99	6.30%	48.80%	29.50%	39.10%	15.50%	4.4	0.176
Jonathan Schoop	Orioles	25	622	3.90	5.20%	41.90%	37.20%	36.10%	23.00%	2.2	0.211
George Springer	Astros	27	548	3.58	10.20%	48.30%	33.80%	36.70%	17.90%	3.1	0.239
Carlos Beltran	Astros	40	467	3.54	6.50%	43.10%	40.40%	33.70%	17.50%	1.8	0.152
Alex Bregman	Astros	23	556	3.52	8.80%	38.40%	39.90%	33.00%	18.00%	5.9	0.191
Carlos Santana	Indians	31	571	3.49	13.20%	40.80%	39.30%	33.00%	18.40%	4	0.196
Eugenio Suarez	Reds	25	534	3.33	13.30%	38.90%	37.10%	33.80%	20.70%	3.1	0.200
Scooter Gennett	Reds	27	461	3.29	6.00%	41.30%	37.60%	34.40%	17.20%	4.3	0.236
Mark Reynolds	Rockies	33	520	3.26	11.60%	42.10%	36.30%	34.50%	19.00%	2.7	0.219
Josh Reddick	Astros	30	477	3.23	8.00%	33.60%	42.30%	31.10%	17.20%	4.8	0.170
Mitch Haniger	Mariners	26	369	2.97	7.60%	44.00%	36.70%	34.70%	17.70%	4.3	0.209
Ian Happ	Cubs	22	364	2.92	9.40%	40.20%	39.70%	32.80%	18.70%	5.7	0.261
Josh Harrison	Pirates	29	486	2.90	5.20%	36.50%	40.80%	32.40%	18.70%	4.9	0.160
Keon Broxton	Brewers	27	414	2.78	8.60%	45.10%	34.60%	35.30%	17.00%	7.4	0.200
Matt Joyce	Athletics	32	469	2.69	12.10%	37.80%	42.80%	30.30%	16.30%	3.2	0.230
Derek Dietrich	Marlins	27	406	2.65	7.80%	36.50%	40.70%	32.10%	20.50%	3.9	0.175
Ryon Healy	Athletics	25	576	2.56	3.80%	42.80%	38.20%	33.90%	16.50%	1.4	0.181
Evan Longoria	Rays	31	613	2.50	6.80%	43.40%	36.80%	34.30%	18.00%	3.8	0.163
Zack Cozart	Reds	31	438	2.49	12.20%	38.20%	42.30%	30.80%	19.50%	5.3	0.251
Robinson Cano	Mariners	34	592	2.48	7.60%	50.00%	30.60%	36.90%	12.80%	2	0.172
Max Kepler	Twins	24	511	2.39	8.30%	42.80%	39.50%	32.90%	18.70%	4.2	0.182
Steven Souza Jr.	Rays	28	523	2.22	13.60%	44.60%	34.30%	34.10%	16.50%	4.8	0.220
Michael Taylor	Nationals	26	399	2.17	6.70%	42.90%	36.70%	34.00%	18.10%	5.9	0.216
Yulieski Gurriel	Astros	33	529	2.12	3.90%	46.20%	35.20%	35.10%	15.90%	2.8	0.187
Corey Dickerson	Rays	28	588	1.24	5.60%	41.80%	35.80%	33.60%	18.70%	4	0.207
Whit Merrifield	Royals	28	587	1.01	4.60%	37.70%	40.50%	30.60%	15.40%	6.7	0.172
Chris Taylor	Dodgers	26	514	0.88	8.80%	41.50%	35.80%	32.40%	15.80%	6.4	0.208
A.J. Pollock	Diamondbacks	29	425	0.81	7.50%	44.60%	32.10%	35.00%	19.80%	7.5	0.205
Marwin Gonzalez	Astros	28	455	0.71	9.50%	43.90%	36.20%	32.70%	18.60%	3.2	0.226
Yangervis Solarte	Padres	29	466	0.62	7.20%	41.60%	42.10%	31.10%	25.20%	2.4	0.161
Shin-Soo Choo	Rangers	34	544	0.57	12.10%	48.80%	26.20%	36.10%	12.20%	4.7	0.162
Buster Posey	Giants	30	494	0.50	10.70%	43.60%	33.00%	33.00%	14.10%	2.8	0.142
Jedd Gyorko	Cardinals	28	426	0.48	9.80%	40.50%	39.30%	30.80%	19.20%	3.8	0.200
Yasiel Puig	Dodgers	26	499	0.30	11.20%	48.30%	35.60%	32.90%	18.30%	4.4	0.224
Eddie Rosario	Twins	25	542	0.12	5.90%	42.40%	37.40%	31.70%	16.70%	3.9	0.218
J.T. Realmuto	Marlins	26	532	-0.01	6.20%	47.80%	34.30%	33.30%	14.90%	5	0.173
Jorge Bonifacio	Royals	24	384	-0.20	8.30%	39.30%	34.80%	32.20%	20.20%	2.9	0.177
Gerardo Parra	Rockies	30	392	-0.27	4.70%	46.80%	30.30%	34.70%	14.40%	3	0.143
Willson Contreras	Cubs	25	377	-0.34	10.50%	53.30%	29.30%	35.50%	17.00%	2.4	0.223
Kole Calhoun	Angels	29	569	-0.37	10.90%	43.90%	35.00%	31.80%	17.00%	3.7	0.148
Robbie Grossman	Twins	27	382	-0.43	14.70%	40.70%	34.40%	30.90%	16.00%	3.5	0.134
Matt Holliday	Yankees	37	373	-0.46	10.80%	47.70%	37.50%	31.80%	21.20%	2.1	0.201
Mark Trumbo	Orioles	31	559	-0.47	7.00%	43.30%	40.60%	30.40%	20.90%	2.5	0.163
Stephen Piscotty	Cardinals	26	341	-0.80	13.00%	49.20%	33.20%	32.70%	17.90%	2.7	0.132
Tommy Pham	Cardinals	29	444	-0.86	13.40%	51.70%	26.10%	35.50%	15.40%	6	0.214
Joe Mauer	Twins	34	525	-0.92	11.10%	51.50%	23.60%	36.40%	12.80%	2.4	0.112
Jackie Bradley Jr.	Red Sox	27	482	-0.94	8.90%	49.00%	32.60%	33.30%	17.50%	4.5	0.158
Brandon Crawford	Giants	30	518	-0.98	7.40%	46.20%	34.40%	32.60%	19.30%	2.5	0.151
Nomar Mazara	Rangers	22	554	-1.13	8.90%	46.50%	34.20%	32.60%	20.90%	2.6	0.170
Ben Zobrist	Cubs	36	435	-1.35	10.90%	51.10%	33.30%	32.30%	14.90%	3.6	0.143
Javier Baez	Cubs	24	469	-1.36	5.90%	48.60%	36.00%	32.40%	21.30%	5.3	0.207
Jorge Polanco	Twins	23	488	-1.42	7.50%	37.90%	42.80%	27.70%	19.90%	4.9	0.154
Avisail Garcia	White Sox	26	518	-1.70	5.90%	52.20%	27.50%	35.30%	15.70%	4.3	0.176
Matt Kemp	Braves	32	438	-1.76	5.80%	48.50%	28.20%	34.70%	17.40%	1.7	0.187
Maikel Franco	Phillies	24	575	-2.04	6.60%	45.40%	36.70%	30.90%	20.80%	1.5	0.179
Nick Markakis	Braves	33	593	-2.17	10.10%	48.60%	29.20%	33.10%	15.60%	1.9	0.110
Tucker Barnhart	Reds	26	370	-2.46	9.90%	46.00%	27.80%	33.20%	16.50%	3.4	0.132
Trey Mancini	Orioles	25	543	-2.48	5.60%	51.00%	29.70%	34.10%	19.60%	3.2	0.195
Christian Yelich	Marlins	25	602	-2.51	11.50%	55.40%	25.20%	35.20%	15.90%	5.2	0.156
Lorenzo Cain	Royals	31	584	-2.79	8.40%	44.40%	32.90%	31.10%	18.70%	6.5	0.140
Josh Bell	Pirates	24	549	-2.87	10.60%	51.10%	31.20%	32.60%	20.60%	3.5	0.211
Jose Reyes	Mets	34	501	-3.00	8.90%	37.20%	43.10%	26.70%	26.10%	7.2	0.168
Carlos Gonzalez	Rockies	31	470	-3.04	10.50%	48.60%	31.70%	31.90%	20.50%	3.2	0.162
Adam Jones	Orioles	31	597	-3.27	4.30%	44.80%	34.30%	30.90%	20.10%	2.7	0.181
Byron Buxton	Twins	23	462	-3.57	7.40%	38.70%	38.00%	27.60%	18.20%	8.2	0.160
Kevin Kiermaier	Rays	27	380	-3.81	7.40%	49.60%	32.10%	31.80%	22.00%	5.9	0.174
Chase Headley	Yankees	33	512	-3.90	10.20%	43.50%	31.70%	30.00%	17.10%	4.3	0.133
Xander Bogaerts	Red Sox	24	571	-4.31	8.80%	48.90%	30.50%	31.40%	19.70%	6.7	0.130
Jordy Mercer	Pirates	30	502	-4.33	9.10%	48.30%	30.90%	31.00%	19.00%	2.9	0.151
Brandon Drury	Diamondbacks	24	445	-4.44	5.80%	48.80%	29.40%	31.70%	16.60%	2.4	0.180
Alex Gordon	Royals	33	476	-4.69	8.30%	42.60%	33.00%	29.20%	19.40%	4.3	0.107
Ben Gamel	Mariners	25	509	-4.84	6.50%	44.90%	33.30%	29.40%	18.70%	4.9	0.138
Hernan Perez	Brewers	26	432	-4.85	4.40%	48.30%	33.50%	30.40%	21.20%	5.3	0.155
Matt Wieters	Nationals	31	422	-4.94	8.20%	42.50%	36.40%	27.40%	18.10%	2	0.118
Brett Gardner	Yankees	33	594	-5.07	10.60%	44.50%	33.20%	28.80%	20.00%	6	0.163
Odubel Herrera	Phillies	25	526	-5.10	5.50%	44.10%	34.70%	29.40%	24.40%	4.3	0.171
Freddy Galvis	Phillies	27	608	-5.11	6.80%	36.70%	39.20%	25.50%	18.10%	5.3	0.127
Elvis Andrus	Rangers	28	643	-5.13	5.50%	48.50%	31.50%	30.50%	18.70%	5.7	0.174
Danny Valencia	Mariners	32	450	-5.93	8.00%	47.90%	31.00%	29.80%	20.50%	3.3	0.156
Kevin Pillar	Blue Jays	28	587	-6.25	5.20%	43.10%	36.40%	27.30%	22.50%	4.4	0.148
Dansby Swanson	Braves	23	488	-6.35	10.70%	47.40%	29.40%	29.30%	18.00%	3.2	0.092
Jose Altuve	Astros	27	590	-6.45	8.80%	47.00%	32.70%	28.20%	19.00%	6.4	0.202
Alcides Escobar	Royals	30	599	-6.47	2.40%	40.80%	37.40%	26.80%	22.80%	4.3	0.107
Andrelton Simmons	Angels	27	589	-6.62	7.30%	49.50%	31.50%	29.30%	20.60%	5	0.143
Didi Gregorius	Yankees	27	534	-6.91	4.40%	36.20%	43.80%	23.10%	24.40%	2.7	0.191
Ryan Goins	Blue Jays	29	418	-6.94	6.80%	50.30%	34.80%	27.70%	19.60%	2.7	0.120
Gregory Polanco	Pirates	25	379	-7.00	6.60%	42.20%	37.50%	25.90%	22.80%	3.7	0.140
David Peralta	Diamondbacks	29	525	-7.02	7.50%	55.10%	26.50%	31.80%	21.20%	4.6	0.150
Kolten Wong	Cardinals	26	354	-7.11	10.00%	48.10%	31.80%	28.20%	20.80%	5.4	0.127
Orlando Arcia	Brewers	22	506	-7.74	6.60%	51.60%	28.50%	30.20%	22.90%	4.1	0.130
Martin Maldonado	Angels	30	429	-7.80	3.20%	48.50%	36.60%	26.70%	21.60%	2.3	0.147
Cory Spangenberg	Padres	26	444	-7.85	7.00%	49.30%	27.80%	29.20%	16.90%	5	0.137
Joe Panik	Giants	26	511	-7.96	8.00%	44.00%	34.10%	26.10%	20.10%	4.2	0.133
David Freese	Pirates	34	426	-8.08	11.50%	57.00%	22.60%	31.90%	19.40%	1	0.108
Melky Cabrera	– – –	32	620	-8.14	5.40%	48.90%	29.00%	28.90%	19.00%	2.3	0.137
Hunter Pence	Giants	34	493	-8.28	7.40%	57.20%	29.40%	29.40%	18.50%	3.6	0.126
Manuel Margot	Padres	22	487	-8.30	6.60%	40.50%	36.30%	25.40%	25.90%	6.1	0.146
Trea Turner	Nationals	24	412	-8.61	6.70%	51.70%	33.50%	26.70%	18.00%	8.9	0.167
Jonathan Villar	Brewers	26	403	-8.85	6.90%	57.40%	21.90%	33.20%	27.00%	5.4	0.132
Starlin Castro	Yankees	27	443	-9.19	4.90%	51.80%	28.00%	29.20%	21.80%	3.5	0.153
Denard Span	Giants	33	497	-9.30	7.40%	45.00%	33.60%	25.10%	18.60%	5.5	0.155
Jacoby Ellsbury	Yankees	33	356	-9.73	10.00%	45.90%	31.00%	26.10%	22.70%	7.7	0.138
Delino DeShields	Rangers	24	376	-9.93	10.00%	45.10%	34.80%	23.90%	20.10%	7.1	0.098
Adam Frazier	Pirates	25	406	-9.98	7.90%	47.90%	26.80%	27.50%	17.90%	5.7	0.123
DJ LeMahieu	Rockies	28	609	-10.42	8.70%	55.60%	19.70%	30.60%	15.40%	3.9	0.099
Yolmer Sanchez	White Sox	25	484	-10.53	6.60%	44.50%	33.90%	24.00%	19.30%	5.3	0.147
Jason Heyward	Cubs	27	432	-10.54	8.50%	47.40%	32.70%	25.50%	25.80%	4.3	0.130
Tim Anderson	White Sox	24	587	-10.66	2.10%	52.70%	28.00%	28.30%	21.30%	6.2	0.145
Jean Segura	Mariners	27	524	-10.79	6.00%	54.30%	26.40%	28.30%	19.70%	5.5	0.128
Cameron Maybin	– – –	30	395	-10.88	11.30%	57.70%	27.90%	27.40%	20.10%	6.9	0.137
Dustin Pedroia	Red Sox	33	406	-10.90	10.60%	48.80%	28.80%	25.90%	20.10%	2.2	0.099
Jose Iglesias	Tigers	27	463	-10.91	4.30%	50.40%	26.40%	28.40%	23.40%	4.2	0.114
Eric Hosmer	Royals	27	603	-11.30	9.80%	55.60%	22.20%	29.50%	21.80%	3.4	0.179
Eduardo Nunez	– – –	30	467	-12.27	3.70%	53.40%	29.10%	26.70%	24.50%	4.8	0.148
Jon Jay	Cubs	32	379	-12.53	8.50%	47.10%	23.90%	25.30%	11.50%	5.3	0.079
Brandon Phillips	– – –	36	572	-12.97	3.50%	49.50%	28.30%	25.50%	21.70%	4.1	0.131
Guillermo Heredia	Mariners	26	386	-15.19	6.30%	47.40%	34.90%	20.40%	23.80%	2.2	0.088
Ender Inciarte	Braves	26	662	-15.36	6.80%	47.00%	29.10%	22.10%	20.90%	5.4	0.106
Jonathan Lucroy	– – –	31	423	-16.18	9.60%	53.50%	27.90%	22.30%	20.50%	3.1	0.106
Jose Peraza	Reds	23	487	-16.45	3.90%	47.10%	31.30%	21.40%	26.60%	5.8	0.066
Cesar Hernandez	Phillies	27	511	-18.08	10.60%	52.80%	24.60%	22.10%	23.50%	6	0.127
Billy Hamilton	Reds	26	582	-21.80	7.00%	45.80%	30.60%	16.00%	25.00%	9	0.088
Dee Gordon	Marlins	29	653	-28.88	3.60%	57.60%	19.60%	16.10%	24.70%	8.5	0.067

Okay, so here’s the breakdown. I pulled all 2017 hitters with 400 at-bats or more so I could capture some significant hitters that didn’t have qualifying numbers of ABs due to injury. Ball-bludgeon extraordinaire Joey Gallo is a pretty solid name to have heading up this list, as he’s pretty much the human definition of what this tool is trying to identify. JD Martinez, Aaron Judge, Cody Bellinger, Miguel Sano, Trevor Story, and Justin Turner all in the top 10 is pretty much all the proof-of-concept I needed.

Interesting notes:

Brandon Belt at 12 — Someone needs to tell the Giants to trade him to literally any other team, stat.

Giancarlo Stanton at 46 — Surprisingly, the MVP fell off from his stats in 2016. His grounders and soft contact rose by 3 or more percentage points, and shaved off the equivalent from hard and fly balls. His output was fueled by adding almost 200 ABs to his season — he could actually get better if he can stay healthy and add those hard flies back in!

Francisco Lindor at 58 — The interesting part of this is even though Lindor is still a decent way down the list, he actually was the biggest gainer from last season to this, adding 9.52 points to his cHit. We knew he was gunning for flies from the outset of the season, and it looks like his mission was accomplished.

Mike Moustakas at 87 — Frankly, being bookended by Jose Ramirez and Andrew Benintendi should, in a vacuum, should be great company. But this is a prime example of how cHit requires users to not take the numbers at face value. Ramirez and Benintendi aren’t slug-first hitters like Moose. They’ve got significantly better Speed scores, plus aren’t as prone to soft contact. I’d be very wary of Moose regressing, as he seems to rely on sneaking some less-than ideal homers over fences. If he goes to San Francisco I could see his value crater (see Belt, Brandon).

Eric Hosmer at 206 — Nope, negative, pass, I’m trying to sign quality hitters here <— Suggested responses for GMs when approached this offseason by Scott Boras on behalf of Hosmer.

Final Notes:

Batted-ball distribution data is noticeably absent. In one of my iterations I added in those stats, and found that they actually regressed the accuracy of the formula. It doesn’t matter where you hit the ball, as long as you hit it hard.
Medium% and LD% are noisy stats. They also regressed the formula.
I may look to replace BB% in future iterations. For now though, it does a decent job of capturing plate discipline and selectivity.
K% doesn’t seem to have much of an impact on cHit (see Gallo, Joey).
R-squared numbers over the last four years of data hold pretty steady between .65 and .75, which is really encouraging. Also, the bigger the pool of data per year (number of batters analyzed), the higher R-squared goes; which is ultimately the most encouraging result of this whole endeavor.

Input is greatly appreciated! I’m not a mathematician in any stretch of the imagination, so if there’s a better way of going about this I’d love to hear it. I’ll do a writeup about my swing-change findings at a later date.

Looking for Evidence of a Change to the Ball

by Dominikk85

December 8, 2017

We saw an unprecedented jump in home runs in the last few years. What made it so strange was that most of it happened after the 2015 All-Star break. There is an increased awareness of launch angle and bat path, and 2015 was the first year there was a public in-game feedback, but still you would expect such an adjustment to take longer, especially since in-season swing changes are really hard to do — maybe with a whole offseason to work on it, it might have been slightly more believable.

There have been multi-factor explanations like a great rookie class of power hitters in the second half of 2015, changed approach, and other stuff like a slightly smaller zone, but really you would not expect such a multi-factor cause to happen that quickly and distinctly between two season halves. That made most sabermetric writers, including most of the FanGraphs staff, believe in a single-factor cause, most likely the ball.

There is some evidence for a changed ball, and there is also anecdotal evidence of minor-league players called up claiming the MLB ball flies farther. However, MLB so far has rejected that, and supported that with the credible name of professor Alan Nathan, albeit without really publishing the data, which further increased the suspicion.

We also did see an increase in launch angle: In 2015 in the first half, the LA of the league was 9.6, and in the second half it was 10.3, which further slightly increased in the first half of 2016 (10.4) and 2017 (10.8). The biggest jump, however, occurred between the season halves of 2015. So were the players really able to increase their LA with a single focus cue without really having much time to work on swing mechanics by just aiming higher after getting the first-half feedback? Those are the most talented athletes in the world, but still that sounds incredible.

But of course just increased elevation doesn’t explain the surge. The number of balls hit between 20 and 35 degrees (usual HR range) increased from roughly 8200 in the first half of 2015 to roughly 8600 in the first half of 2016, but the number of HRs increased from 2521 to 3082. Since less than half of the FBs between 20 and 35 go out of the park (I don’t have the exact number but I estimate 30% from the numbers I have), the 600 more batted balls in that range don’t explain 500 more HRs. That means, apart from more FBs, those also got out more, and the league saw a jump in HR/FB rate (9.5% in 2014 and 12.8 in 2016).

To research that, I looked into some Statcast stats. All stats here are just first halves of the respective seasons, because the first half of 2015 was the last “normal” HR half. Also I want to lessen weather effects.

This table shows that balls between 20 and 35 degrees do indeed fly farther and also go faster off the bat.
Average distance (20-35 LA)

2015 326 89.9
2016 331 91.6
2017 332 91.3

So does this jump in HR/FB prove a juiced ball? Not necessarily. To explain this, we have to get into swing mechanics. The attack angle is the vector of the bat’s sweetspot just before contact. Generally you can hit higher LAs (launch angles) by just hitting the bottom of the ball, but while some backspin is good, too much of it will slow down the ball. Generally the more LA and attack angle match, the higher the exit velo. That means players that try to swing up more might shift their highest velos to higher LAs. So while players couldn’t really change their swings that fast, just the intent of higher LA might have unconsciously caused a higher attack angle and thus more “flush hit” fly balls.

Evidence for the ball not being a factor is that average league EV is actually down a tiny bit. However, if the attack-angle theory is true, you would also expect that the EV of balls between 0 and 10 degrees would lower a little bit, and that hasn’t really happened.

Avg EV EV (0-10 LA)
2015 87.1 93.3
2016 87.8 93.3
2017 86.9 93.1

Another theory came from Tom Tango. He assumed that harder swinging and increased attack angles lead to higher peak EVs but also more weak mis-hits.

We do indeed see a big increase of balls hit above 105 MPH, but on the other side (and there have to be weaker hits to explain that overall EV is not up) there is an effect of more weak-hit balls in 2017, but not so in 2016.

EV >85 Balls 105
2015 96.2 19210 2960
2016 96.9 19075 3917
2017 96.7 20436 3635

To see if there is an aerodynamic effect — one theory of the juiced ball is reduced air drag due to lower seams — I looked at the average distance of balls hit at 20-25 degree LA in different velocity buckets.

EV Range 95-100 100-105 105-110
2015 366 391 415
2016 362 387 408
2017 363 391 411

You can’t really see an effect here. Balls hit at the same EV (which is measured right after exit so that air drag hasn’t done its work yet) don’t fly farther in 2016 or 2017 than they did in the first half of 2015. That means there likely isn’t really an effect of aerodynamics, at least not a big one.

So the reason for increased HRs seems to be mostly that fly balls fly faster and farther for whatever reason. We don’t see an across-the-board increase of EV, however, but simple explanations like a shift of max EVs to other launch angles don’t seem to really work either, as LAs from 0-10 (and also lower than minus 5 for that matter) haven’t really changed in their EV.

It remains mysterious what did actually happen. We do know LAs have increased some, but that doesn’t explain the whole story. But I couldn’t find real evidence for a changed ball in Statcast either. Could a super fast on-the-fly adjustment of the league between season halves based on the Statcast date really be the driving factor here?

Intellectually I really want to believe the juiced-ball theory, as it is the most elegant explanation for such a quick turnaround, but maybe it isn’t that easy.

Yasiel Puig Was a Terrible, Terrible Baserunner

by Henry Still

December 7, 2017

Yasiel Puig had an impressive rebound season in 2017. He responded to disappointing, injury-marred seasons in 2015-16 with a solid 2.9 WAR this year. Puig greatly improved his plate discipline, increasing his selectivity and his contact rates en route to an 11.2 BB% and 17.5 K%. He has been known for his free-swinging ways since entering the league, but he may have changed that reputation this past season. Puig was not the reckless hitter he had been in the past. However, he may have decided to channel that recklessness to the base paths.

Puig is a good athlete, but has never been much of a base-stealer. In his first two years, he converted a poor 22/37 of his steal attempts, and mostly quit trying to steal in 2015-16. Despite the failures of his base-stealing, he had actually been a slight positive on the bases in his career, accumulating 0.5 runs above average in 2013-16, per FanGraphs’ base-running metric. Puig reverted to his aggressive base-stealing in 2017, and his 15 stolen bases indicated success with the approach. His 71.4% conversion rate was not exceptional, but not horrible. But his base-running had no semblance of success.

Puig was the sixth-worst player on the bases in 2017, accumulating -7.6 runs. He was surrounded by names like Albert Pujols, Miguel Cabrera, and Edwin Encarnacion. Not exactly names you want to be grouped with when talking about base-running.

FanGraphs’ base-running metric encompasses three things: wSB, wGDP, and UBR. wSB measures the run value a player produced based off attempting steals. Puig produced a mediocre mark of 0.1, which lines up with his stolen-base numbers. wGDP measures the ability of a player to avoid double plays. Puig ranked 13th-worst in 2017 with -2.4 runs produced, but wGDP is more related to avoiding ground balls with men on base and beating out throws to first. UBR (Ultimate Base Running), measures the value of a player with respect to non-stealing base-running, like taking an extra base. Puig produced -5.3 runs per UBR, sixth-worst in the league.

Let’s focus on that UBR. Providing context, that figure is a whole lot worse than sixth-lowest in the league. Puig had a speed score of 4.4 in 2017, placing him 80th in the league, among players with at least 400 PA. The five players directly ahead of Puig in UBR had an average speed score of 2.3. That would rank 185th. Considering speed, Puig was likely the worst base-runner in the league. He did things like this, which you probably remember from the World Series:

Puig was probably the worst base-runner in 2017. But how bad was he on a historical level?

Of all individual seasons (min 400 PAs) since 2002, when UBR was introduced, Puig’s UBR ranks lower than the 3rd percentile out of 3393 seasons. Of those individual seasons with a speed score within one standard deviation of Puig’s, his UBR ranks lower than the 1st percentile.

Here is a plot of every one of those seasons, with each player’s stolen-base total versus their UBR. Puig in 2017 is highlighted in yellow.

Obviously, players with higher stolen-base totals are generally faster, and thus produce more value on the bases. As with anything, though, there are outliers. Puig is definitely an outlier. Only one player with as many stolen bases has produced an UBR lower than Puig: Juan Encarnacion in 2003. Here is another chart, with speed score plotted against UBR. Puig again is in yellow.

Puig is again an extreme outlier, even historically. Considering his athleticism, Puig had one of the worst base-running seasons of the last 15 years. This does not mean a ton. Puig has not always been a terrible base-runner, and he was still a quite effective player in 2017, woes on the basepaths aside. He can easily turn it around and produce a solid base-running season with the physical gifts he has. However, in 2017, Puig’s base-running was really, really terrible.

On Starling Marte and Steroids

by Lance Brozdowski

December 6, 2017

Each baseball fan has a set of specific events throughout time they remember fondly. Some exist in said group because of their emotional impact on your fandom. Others remain on the peripheral of importance because of a random characteristic that still stands out.

Those peripheral events, for me, are often those I’ve seen on live television. I don’t think of these events often, nor do I keep a record of them, or have some strict guideline for what sticks in my head, but when a story in the present day sparks my memory, a picture often emerges. My teenage years watching baseball were done one of two ways: sitting on the ground in front of my laptop with MLB.tv fading in-and-out, or scouring local stations for a good matchup. These two primary settings allowed for many one-off memories to accumulate.

When I began to think about Pittsburgh Pirates’ outfielder Starling Marte — due to this offseason’s stagnation — I thought back to the first pitch he saw in his major-league career. Just over five years ago, 23-year-old Starling Marte took the first pitch Dallas Keuchel threw on July 26 out of Minute Maid Park. The rarity of that event — a prospect’s debut, leading off a game, first-pitch home run — forces me to remember that bomb whenever Marte steps into a batter’s box. Because I happened to see it live, that memory has stuck.

For the wider population of fans, what now supersedes that milestone is Marte’s run-in with performance-enhancing drugs. Suspended for 80 games during the 2017 season, this mistake by Marte will couple itself with any other success he has.

Predicting how Marte would fare upon his return during this layoff in 2017 raised some interesting, PED-related questions. Would his power drop? Would his speed deteriorate? What about his overall durability?

Nestled within all those asks is what exactly the effect of PEDs on an athlete’s body is after stopping use. Much more intriguing is this question: does any use at all matter as much as stopping that use? In other words, do the effects of PED use in the first place help prolong success?

I mention this because Marte joins Dee Gordon as the more prominent speed-first users of prohibited substances in the recent years. The drugs Gordon and Marte took were different from my understanding — nandrolone versus a stacked dose with clostebol — but maybe some intrigue exists in the stats before and after use?

The overall comparison doesn’t show us much. Even in what I highlighted with darkened gridlines — slugging percentage and wRC+ — has more noise within it than signal. Two main questions exist, among many others, that don’t have answers.

What portion of the “before” PED use window contains tainted statistics?
What portion of the drop is due specifically to the lack of steroids in the body?

But perhaps our intentions with those questions are incorrect. Think back to the question I asked before showing this dataset: do the effects of PED use in the first place help prolong success?

What if the muscle memory and learning that takes place while a player is under the influence of the drug extends beyond the window where a player can run a positive test?

With some high-level Googling, I found one instance where this idea might be a reasonable rabbit hole to dig into (BBC News). Certainty around this topic, however, is impossible, given all the variables. Some selection bias brings us the average fan to Nelson Cruz and Bartolo Colon as examples of this idea. But assuming two players with demonstrable skills outside of steroid use represent a wider population is not an appropriate assumption. We’re left in limbo regarding how much one positive test early on can affect one’s long-term production.

***

Let’s leave the uncertainty around long-term effects of Marte’s steroid use alone for now and focus on what has happened in Marte’s career.

The attribute his value has been tied to for most of his career, like Dee Gordon, is speed. But for Marte, age-induced deterioration of that attribute may be underway as he heads into his 29-year-old season with the Pirates.

It wasn’t too long ago we were concerned about the viability of McCutchen’s long-term impact, yet speed has a much greater weight on the impact of Marte as a player than McCutchen. I remain perplexed as to how Marte intends to turn around this decline in sprint speed as he starts to fall away from elite towards the 27.0 feet-per-second average the standard MLB player possesses.

Marte can still produce with his bat, but after seeing this decrease in peak sprint speed, I wonder if he becomes less reliant on his wheels to buoy his BABIP and the resulting average he’ll post. The Pirates’ outfielder might need to adjust.

To counteract this potential speed regression, Marte might want to adjust back to his approach from 2015, where he popped 19 home runs.

What we do know from that year presides in his tendency to pull the ball above his career average, which resulted in the majority of his home runs landing somewhere near the corner in a park’s left-field seats. He was also more aggressive than he had ever been in his career in 2015, but since, Marte has reverted to a contact-based approach, raising his zone-contact rate by two percent and overall contact rate by three percent.

With all this said, the form and substance of Marte’s swing has been largely the same since the early days of his career. Each of the four videos embedded within the GIF below are base hits to left field for Marte. Instead of focusing on the moments just before contact — where most hitters look identical — focus on his pre-pitch rhythm and timing.

Marte has a unique pulse when it comes to the timing mechanism in his hands, as his bat moves towards the first-base line twice prior to his load. The speed at which he executes this varies slightly based on the pitch, but his front foot’s inward turn and hip rotation remain unaltered from this selection of swing in our four-year sample.

My worry is that pushing Marte towards the 2015 version of himself, with pull-happy tendencies and a little bit more aggression, may not lead to the power result we want. With his speed possibly deteriorating, the balls he rolls over on with his sights set on the bleachers will turn into hits less often. We might want Marte to trade some of his contact for power, but my inclination is that such a trade, at present, is not one-for-one and would result in a net-negative effect.

This contact approach of Marte’s may be the new normal, and I remain worried about what the ceiling of productivity can be if he doesn’t find a second wind in the speed department. Marte can still be an asset to the Pirates, and isn’t a financial burden, but it might be too late to expect 2015’s power-speed combo that had the chance to nudge Marte towards the elite bracket of outfielders in baseball.

Bill Brink of Pittsburgh Post Gazette reports that Marte is making up his lost at-bats in the Dominican Winter League for Leones del Escogido. The results, so far, in a small sample have not been great:

.197/.244/.316 in 76 at-bats, with a 21:3 strikeout-to-walk ratio.

Marte’s evolution as a hitter will become clearer as our post-PED sample size increases. The Pirates’ outfield, once considered the best in baseball with McCutchen, Marte, and Polanco, now finds itself in a pickle, especially if Cutch is traded, Marte’s speed continues to trend south, and Polanco can’t stay healthy.

An Exercise in Generating Similarity Scores

by John Edwards

December 5, 2017

In the process of writing an article, one of the more frustrating things to do is generate comparisons to a given player. Whether I’m trying to figure out who most closely aligns with Rougned Odor or Miguel Sano, it’s a time-consuming and inexact process to find good comparisons. So I tried to simplify the process and make it more exact — using similarity scores.

An Introduction to Similarity Scores

The concept of a similarity score was first introduced by Bill James in his book The Politics of Glory (later republished as Whatever Happened to the Hall of Fame?) as a way of comparing players who were not in the Hall of Fame to those who were, to determine which non-HOFers deserved a spot in Cooperstown. For example, since Phil Rizzuto’s most similar players per James’ metric are not in the HOF, Rizzuto’s case for enshrinement is questionable.

James’ similarity scores work as such: given one player, to compare them to another player, start at 1000 and subtract one point for every difference of 20 games played between the two players. Then, subtract one point for every difference of 75 at-bats. Subtract a point for every difference of 10 runs scored…and so on.

James’ methodology is flawed and inexact, and he’s aware of it: “Similarity scores are a method of asking, imperfectly but at least objectively, whether two players are truly similar, or whether the distance between them is considerable” (WHHF, Chapter 7). But it doesn’t have to be perfect and exact. James is simply looking to find which players are most alike and compare their other numbers, not their similarity scores.

Yes, there are other similarity-score metrics that have built upon James’ methodology, ones that turn those similarities into projections: PECOTA, ZiPS, and KUBIAK come to mind. I’m not interested in making a clone of those because these metrics are obsessed with the accuracy of their score and spitting out a useful number. I’m more interested in the spirit of James’ metric: it doesn’t care for accuracy, only for finding similarities.

Approaching the Similarity Problem

There is a very distinct difference between what James wants to do and I what I want to do, however. James is interested in result-based metrics like hits, doubles, singles, etc. I’m more interested in finding player similarities based on peripherals, specifically a batted-ball profile. Thus, I need to develop some methodology for finding players with similar batted-ball profiles.

In determining a player’s batted-ball profile, I’m going to use three measures of batted-ball frequencies — launch angle, spay angle, and quality of contact. For launch angle, I will use GB%/LD%/FB%; for spray angle, I will use Pull%/Cent%/Oppo%; and for quality of contact, I will use Soft%, Med%, Hard%, and HR/FB (more on why I’m using HR/FB later).

In addition to the batted-ball profiles, I can get a complete picture of a player’s offensive profile by looking at their BB% and K%. To do this, I will create two separate similarity scores — one that measures similarity based solely upon batted balls, and another based upon batted balls and K% and BB%. All of our measures for these tendencies will come from FanGraphs.

Essentially, I want to find which player is closest to which overall in terms of ALL of the metrics that I’m using. The term “closest” is usually used to convey position, and it serves us well in describing what I want to do.

Gettin’ Geometrical

In order to find the most similar player, I’m going to treat every metric (GB%, LD%, FB%, Pull%, and so on) as an axis in a positioning system. Each player has a unique “position” along that axis based on their number in that corresponding metric. Then, I want to find the player nearest to a given player’s position within our coordinates system — that player will be the most similar to our given player.

I can visualize this up to the third dimension. Imagine that I want to find how similar Dee Gordon and Daniel Murphy are in terms of batted balls. I could first plot their LD% values and find the differences.

1-D visualization of Daniel Murphy's and Dee Gordon's batted ball profiles

So the distance between Murphy and Gordon, based on this, is 4.8%. Next, I could introduce the second axis into our geometry, GB%.

2-D visualization of Daniel Murphy's and Dee Gordon's batted ball profiles

The distance between the two players is given by the Pythagorean formula for distance — sqrt(ΔX^2 + ΔY^2), where X is LD% and Y is GB%. To take this visualization to a third dimension and incorporate FB%…

3-d visualization of Daniel Murphy's and Dee Gordon's batted ball profiles

… I would add another term to the distance calculation — sqrt(ΔX^2 + ΔY^2 + ΔZ^2). And so on, for each subsequent term. You’ll just have to use your imagination to plot the next 14 data points because Euclidian geometry can’t handle dimensions greater than three without some really weird projections, but essentially, once I find the distance between those two points in our 10 or 12-dimensional coordinate system, I have an idea how similar they are. Then, if I want to find the most similar batter to Daniel Murphy, I would find the distance between him and every other player in a given sample, and find the smallest distance between him and another player.

If you’ve taken a computer science course before, this problem might sound awfully familiar to you — it’s a nearest-neighbor search problem. The NNS problem is about finding the best way to determine the closest neighbor point to a given point in some space, given a set of points and their position in that space. The “naive” solution, or the brute-force solution, would be to find the distance between our player and every other player in our dataset, then sort the distances. However, there exists a more optimized solution to the NNS problem, called a k-d tree, which progressively splits our n-dimensional space into smaller and smaller subspaces and then finds the nearest neighbor. I’ll use the k-d tree approach to tackling this.

Why It’s Important to Normalize

I used raw data values above in an example calculation of the distance between two players. However, I would like to issue caution against using those raw values because of the scale that some of these numbers fall upon.

Consider that in 2017, the difference between the largest LD% and smallest LD% among qualified hitters was only 14.2%. For GB%, however, that figure was 30.7%! Clearly, there is a greater spread with GB% than there is with LD% — and a difference in GB% of 1% is much less significant than a difference in LD% of 1%. But in using the raw values, I weight that 1% difference the same, so LD% is not treated as being of equal importance to GB%.

To resolve this issue, I need to “normalize” the values. To normalize a series of values is to place differing sets of data all on the same scale. LD% and GB% will now have roughly the same range, but each will retain their distribution and the individual LD% and GB% scores, relative to each other, will remain unchanged.

Now, here’s the really big assumption that I’m going to make. After normalizing the values, I won’t scale any particular metric further. Why? Because personally, I don’t believe that in determining similarity, a player’s LD% is any more important than the other metrics I’m measuring. This is my personal assumption, and it may not be true — there’s not really a way to tell otherwise. If I believed LD% was really important, I might apply some scaling factor and weigh it differently than the rest of the values, but I won’t, simply out of personal preference.

Putting it All Together

I’ve identified what needs to happen, now it’s just a matter of making it happen.

So, go ahead, get to work. I expect this on my desk by Monday. Snap to it!

…

Oh, you’re still here.

If you want to compare answers, I went ahead and wrote up an R package containing the function that performs this search (as well as a few other dog tricks). I can do this in two ways, either using solely batted-ball data or using batted-ball data with K% and BB%. For the rest of this section, I’ll use the second method.

Taking FanGraphs batted-ball data and the name of the target player, the function returns a number of players with similar batted-ball profiles, as well as a score for how similar they are to that player.

For similarity scores, use the following rule of thumb:

0-1 -> The same player having similar seasons.

1-2 -> Players that are very much alike.

2-3 -> Players who are similar in profile.

3-4 -> Players sharing some qualities, but are distinct.

4+ -> Distinct players with distinct offensive profiles.

Note that because of normalization, similarity scores can vary based on the dataset used. Similarity scores shouldn’t be used as strict numbers — their only use should be to rank players based on how similar they are to each other.

To show the tool in action, let’s get someone at random, generate similarity scores for them, and provide their comparisons.

Here’s the offensive data for Elvis Andrus in 2017, his five neighbors in 12-dimensional space (all from 2017), and their similarity scores.

Elvis Andrus Most Similar Batters (2017)

The lower the similarity score, the better, and the guy with the lowest similarity score, J.T. Realmuto, is almost a dead ringer for Andrus in terms of batted-ball data. Mercer, Gurriel, Pujols, and Cabrera aren’t too far off as well.

After extensively testing it, the tool seems to work really well in finding batters with similar profiles — Yonder Alonso is very similar to Justin Smoak, Alex Bregman is similar to Andrew McCutchen, Evan Longoria is similar to Xander Bogaerts, etc.

Keep in mind, however, that not every batter has a good comparison waiting in the wings. Consider poor, lonely Aaron Judge, whose nearest neighbor is the second furthest away of any other player in baseball in 2017 — Chris Davis is closest to him with a similarity score of 3.773. Only DJ LeMahieu had a further nearest-neighbor (similarity score of 3.921!).

The HR/FB Dilemma

While I’m on the subject of Aaron Judge, let’s talk really quickly about HR/FB and why it’s included in the function.

When I first implemented my search function, I designed it to only include batted-ball data and not BB%, K%, and HR/FB. I ran it on a couple players to eye-test it and make sure that it made sense. But when I ran it on Aaron Judge, something stuck out like a sore thumb.

Aaron Judge Similarity Scores

Players 2-5 I could easily see as reasonable comparisons to Judge’s batted balls. But Nick Castellanos? Nick Castellanos? The perpetual sleeper pick?

But there he was, and his batted balls were eerily similar to Judge’s.

Aaron Judge Most Similar Batters (2017)

Judge hits a few more fly balls, Castellanos hits a few more liners, but aside from that, they’re practically twins!

Except that there’s not. Here’s that same chart with HR/FB thrown in.

Aaron Judge Most Similar Batters (2017)

There’s one big difference between Judge and Castellanos, aside from their plate discipline — exit velocity. Judge averages 100+ MPH EV on fly balls and line drives, the highest in the majors. Castellanos posted a meek 93.2 MPH AEV on fly balls and line drives, and that’s with a juiced radar gun in Comerica Park. Indeed, after incorporating HR/FB into the equation, Castellanos drops to the 14th-most similar player to Judge.

HR/FB is partially considered a stat that measures luck, and sure, Judge was getting lucky with some of his home runs, especially with Yankee Stadium’s homer-friendly dimensions. But luck can only carry you so far along the road to 50+ HR, and Judge was making great contact the whole season through, and his HR/FB is representative of that.

In that vein, I feel that it is necessary to include a stat that has a significant randomness component, which is very much in contrast with the rest of the metrics used in making this tool, but it is still a necessary inclusion nevertheless for the skill-based component of that stat.

Using this Tool

If you want to use this tool, you are more than welcome to do so! The code for this tool can be found on GitHub here, along with instructions on how to download it and use it in R. I’m going to mess around with it and keep developing it and hopefully do some cool things with it, so watch this space…

Although I’ve done some bug testing (thanks, Matt!), this code is still far from perfect. I’ve done, like, zero error-catching with it. If in using it, you encounter any issues, please @ me on twitter (@John_Edwards_) and let me know so I can fix them ASAP. Feel free to @ me with any suggestions, improvements, or features as well. Otherwise, use it responsibly!

Hack Wilson: The Most Interesting Player You’ve Sorta-Kinda Heard of Before

by mohallor

December 5, 2017

Lewis Robert “Hack” Wilson was an outfielder for the New York Giants, Chicago Cubs, Brooklyn Dodgers, and Philadelphia Phillies in the early 20th century. Wilson was a very good ballplayer, and was enshrined in Cooperstown in 1979.

As my title suggests, you have probably heard the name Hack Wilson before, but I’m guessing you probably don’t know much about him, because his most popular claim to fame is considered by many to be irrelevant today. This claim to fame is his record-setting 191 RBI in 1930. This remains the single-season record for the stat to this day, and it’s hard to believe that anyone will come along who can break it. In that 1930 campaign, Hack also slugged 56 home runs, walked 105 times, struck out 84 times, and slashed .356/.454/.723 with a 1.177 OPS and a 177 OPS+. These were all league highs, excluding average and OBP.

That’s a great season, but it gets a whole lot more interesting when you look a little closer. 56 home runs is a lot. That mark is tied with Ken Griffey Jr.’s pair of 56-home-run campaigns for 17th-most all-time in a single season, and was the best non-Ruth mark at the time (although this would last just two years, when Jimmie Foxx hit 58 home runs in 1932).

Just hitting home runs isn’t what makes Hack Wilson so interesting to me, though. It’s who he was. Hack Wilson stood at just 5’6. The same height as our favorite short player today, Jose Altuve. In fact, at 5’6, Altuve and Hack are both the shortest players to ever hit 20 or more home runs in a single season. Hack alone is the shortest player to ever slug 30, 40, or 50 in a single season. Hack also holds the single-season home-run record for anyone under 6’0. Hack, Mantle (5’11), Mays (5’10), and Prince Fielder (5’11) are the only men to hit 50 or more home runs while being less than 6’0.

However, with that enormous home-run total comes strikeouts. You may have noticed that he struck out just 84 times in that 56 home-run season, and he even walked more than he struck out. But 84 was a lot in 1930. In fact, Hack Wilson led the league in strikeouts.

In 2017, just 25 qualified hitters struck out 84 times or fewer. Of these 25, just one (Mookie Betts) matched or exceed Hack’s 709 plate appearances. This tidbit really speaks more to the two eras in discussion, but it’s interesting nonetheless.

Some other Hack Wilson fun facts:

Hack received MVP votes in five years. Amazingly, his monstrous 1930 season (undoubtedly his best) was not one of the five. However, this was due to the fact that the MVP was not awarded in 1930. Had it been, Wilson likely would have won in a landslide.

Despite having the single-season record for most RBI, he is tied for just the sixth-most seasons of 150 or more RBI with two, behind Lou Gehrig (7), Babe Ruth (6), Jimmie Foxx (4), Hank Greenberg (3), and Al Simmons (3), and tied with Sosa, DiMaggio, and Sam Thompson.

Despite the legendary 1930 season, Hack’s career was significantly below that of a typical Hall of Famer. His Gray Ink score is 110 (average HOF’s is 144), and his “Hall of Fame Standards” is 39 (average HOF’s is 50). His 38.8 career bWAR is nearly half of the average bWAR for center fielders, at 71.2.

That’s all I have on Lewis Wilson. He may still seem like a relatively mundane player, but imagine if Altuve came out in 2018 and kept up with Stanton and Judge in the home-run race. That is what Hack Wilson did in 1930, belting 56 homers as a man who stood 5’6″ tall (how can you not be romantic about baseball?).

Overcoming Imperfect Information

by NHL14

December 2, 2017

When a team trades a veteran for a package of prospects, only minor-league data and the keen eye of scouts can be used to assess the likely future major-league contributions from those particular players. Teams have accurately relied on the trained eyes of scouts for generations, but of course the analytics community wants its foot in the game too. Developments such as Chris Mitchell’s KATOH systems make some strides, as it is helpful to compare historical information. Does prospects rank on MLB.com’s or Baseball America’s top-prospect list really indicate how productive a player will be in the major leagues? Of course, baseball players are human, and production will always vary due to the result of numerous factors that could potentially change the course of someone’s career. Perhaps a player meets a coach that dramatically changes his game around, or a pitcher discovers a new-found talent for an impressive curveball that jumps him from low fringe prospect to MLB ready. The dilemma of imperfect information will always be present, so team must use the best resources available to them to tackle the problem.

To start my analysis of imperfect information, I look at the top 100 position prospects from 2009 using data from BaseballReference.com. I break up the prospects into three groups based on their prospect ranking, which are position players ranked 1-10, 11-20 and 21-100. I then look at the value that those prospects contributed in their first six seasons in the major leagues, as well as their to-date total contributions using fWAR. I choose to look at the first six seasons of a player’s career because that is how long a player is under team control before reaching free agency. This study does not take into account any contract extensions that may have been given before a player reached free agent-eligibility. For players who have not been in the MLB for six full seasons, I look at their total contributions so far. The general idea for this study was inspired by a 2008 article by Victor Wang that looked at imperfect prospect information.

I convert the prospects’ production into monetary value based on the relative WAR values that were commanded in the free-agent market that year. I use fWAR to encompass the best measure of total value. When teams trade for prospects, they understand that they are trading wins today for wins in the future. Since baseball is a business and teams care about their performance on the field each year, I need to account for that fact in my analysis. In order to do that, I assume all else equal, a win today is more valuable than a win in the future. I apply an 8% discount rate to each prospect’s WAR value and create a discounted WAR value (dWAR). The value of the discount rate can be debated, but the 8% rate seems appropriate for the time framed looked at.

From here, I break up the prospects into a few different subgroups based on their average WAR contributed over their first six seasons in the major leagues. I follow some of the guidelines laid out in other studies with some slight modifications. Players with 0 or negative WAR per year are labeled as busts. Players with slightly above 0-2 WAR are contributors. Players with 2-4 WAR are starters and players with 4+ WAR are stars. Like described previously, I estimate the players’ monetary savings to their team by taking their monetary value based on WAR performance and comparing it to what similar production would command in the free-agent market for that year. There seems to be some debate on the value of one WAR in the free-agent market, however my calculations show that about $7 million bought one WAR leading up to the 2009 season. Victor Wang suggests that the price for one WAR had about a 10% inflation rate from year to year. I find the present value of each player’s WAR, then divide it by the $7 million dollars per WAR that would have been commanded in the free-agent market in order to find a player’s effective savings to their team based on production.

Position Prospects Ranked 1-10

Bust	Contributor	Starters	Star	AVG WAR/Y
1	2	5	2	2.83
10.00%	20.00%	50.00%	20.00%

	Bust	Contributor	Starters	Star
WAR/Y	0.43	1.53	2.73	5.17
Probability	10.00%	20.00%	50.00%	20.00%
PV Savings/y (in millions)	1.88	8.46	10.91	27.98

Interestingly enough, this prospect class panned out quite well compared to some other recent draft classes. The only bust in terms of discounted WAR turned out to be Travis Snider of Toronto, who was ranked the sixth-best prospect in 2009 but only managed to accumulate a cumulative WAR slightly above 0 in his first six seasons. Though the top 10 position-player prospects from this class feature names such as Jason Heyward and Mike Moustakas, the player that contributed the greatest WAR over his first six seasons from the top 10 ranking was Buster Posey of San Francisco, who posted nearly 6 WAR a year. It is important to understand that the savings a player gives to his team based on his production does not indicate any “deserved” salary for that player. Instead, it merely indicates the amount of money the team would have had to spend in the free-agent market to acquire that exact same production. The top 10 position-player prospects from this prospect class turned very productive to their respective teams, having a 70% chance of being either a contributor or star.

Position Prospects Ranked 11-20

Bust	Contributor	Starters	Star	AVG WAR/Y
5	2	1	2	2.158950617
50.00%	20.00%	10.00%	20.00%

	Bust	Contributor	Starters	Star
WAR/Y	0.67	1.6	3.56	5.71
Probability	50.00%	20.00%	10.00%	20.00%
PV Savings/y (in millions)	3.21	8.36	19.10	30.90

The next group is the 11-20 ranked position players. As perhaps expected, there are more busts in this group of ranked prospects. The variation of the small is sample is spread through the rest of the categories. Giancarlo Stanton, the 16^th ranked prospect, and Andrew McCutchen, the 33^rd ranked prospect, turned out to the be the two stars from the list. As the chart shows, the probability of getting a bust at this ranking of prospects is much higher than the 1-10 rankings. The variance does show, however, that player outcomes expectancy can also be promising at this ranking level. There was an identical chance of player becoming a star in this group compared to the first group, and a 50% chance of them being at least a contributor. In total, four of the top 20 prospects from 2009 turned out to be stars to this point in their careers, though not all have reached six full service years in the majors.

Position Prospects Ranked 21-100

Bust	Contributor	Starters	Star
12	7	3	1
38.71%	22.58%	9.68%	3.23%

	Bust	Contributor	Starters	Star
WAR/Y	0.35	1.46	3.22	3.87
Probability	38.71%	22.58%	9.68%	3.23%
PV Savings/y	1.48	7.52	17.18	20.80

The next group of charts shows the rest of the top 100 ranked position players. The chart shows there is much more potential for busts to be found in this ranking; however, we must keep in mind that the variance will be different in this group automatically because of the larger sample size than the first two groups. Nearly 40% of position players ranked 21-100 turned out to be busts. In addition, only Freddie Freeman of Atlanta managed to get above the 4+ dWAR/year threshold to qualify as a star. In fact, the most common category of these ranked position players is a bust. When drafting a player, a team never knows for certain the production that the pick will produce in the major leagues, no matter the pick number of the draft pick. In addition, prospect rankings based on minor-league performance is still not a completely accurate indicator of future MLB productivity. Higher-ranked prospects in 2009 did have higher probability of contributing more to their major-league club, though rankings are understandably volatile. A variety of factors play into the volatile nature of prospect outcomes and the prospect risk premium. Part of the reason I chose to only look at position players is because they are traditionally safer from injury than pitchers, and therefore carry slightly less of a risk premium.

Looking at the variance of dWAR for the prospect group, the distribution is skewed left, which is to be expected because not all prospects will turn out to be as equally strong, and most will not become stars. It also makes sense because in any given year, only a few top prospects will become very strong players, while most will hover around average. We also see that the inner quartile range is about from 0.5 dWAR per year to slightly above 2.5 dWAR per year. Therefore, it could be expected that a team get production in that range from a given prospect ranked 1-100, varying sightly in what rank group they are in. A useful analysis would be to make a distribution chart of each rank group, but in the interest of brevity, I do not do that here.

New ways of evaluating both minor league and amateur players to relieve some of the prospect-risk premium is useful, although risk will always be present. In the next part of this study, I will try to discover statistically significant correlations between college and major-league performance in order to try to reduce the noise of prospect-risk premium. One of the great things about the baseball player development structure is that it allows players with the right work ethic and dedication, as well as others who were overlooked in high rounds of the draft, to prove themselves in the minor leagues. That can seldom be said it other professional sports. The famous example of this was Mike Piazza, who was one of the last overall picks in his draft class and worked his way to a Hall of Fame career. With perfect information, the graph would be perfectly skewed left, with each ranked prospect achieving a higher dWAR than the next ranked prospect. Some may attribute the imperfect information dilemma to drafting or the evaluation of minor-league performance, and some may attribute it to differences in player-development systems. Some may also rationally say that both the players and the scouts are humans and will not be perfect. Prospects rankings for a given year are based on several factors, including a player’s proximity to contributing on the major-league level. The most talented minor-league players could be at a lower ranking in a given year because of their age or development level, which could cause some unwanted variance in the data. Looking at the just the 100 top prospects helps somewhat eliminate this problem, but will not make the problem completely disappear. It is difficult to know when teams plan on calling up prospects anyway, and it really depends on the needs of the team. Some make the jump at 20, while others make the jump at 25, or even later.

This type of analysis could be useful for things like estimating opportunity cost of a trade involving prospects for both financial trade-offs and present versus future on-field production. A lot of factors play into the success of a prospect. When evaluating any player, things such as makeup and work ethic are just as big of factors as measurable statistics. Evaluating college and high-school players for the annual Rule 4 draft can be especially difficult because of the limited statistical information that are accessible. Team scouts work very hard to accurately evaluate the top amateur players in the United States and around the world in order to put their team in a good position for the draft. Despite the immense baseball knowledge that scouts bring to player evaluation, statistical analysis on college players is still explored and used to complement traditional scouting reports. Prospect-risk premium will always be something teams must deal with, but efficiently allocating players into a major-league pipeline is essential for every front office.

There have been a few other articles on sites such as FanGraphs and The Hardball Times on statistical analysis of college players. Cubs president Theo Epstein told writer Tom Verducci that the Cubs analytics team has developed a specific algorithm for evaluating college players. The process involved sending interns to photocopy old stat sheets on college players from before the data was recorded electronically.

Though I do not doubt the Cubs have a very accurate and useful algorithm for such a goal, the algorithm is not publicly available for review, and understandably so. However, for the several articles which tackle this question on other baseball statistical websites, I think there is some room for improvement. First, the multiple of different complex statistical analysis techniques to compare college versus MLB statistics yield about the same disappointing results as the other, meaning that some of the models are probably unnecessarily complicated. Second, though the authors may imply it by default, statistical models in no way account for the character and makeup of a college player and prospect. Even in the age of advanced analytics, the human and leadership elements of the game still hold great value. Therefore, statistical rankings should not be taken as precise recommended draft order. In addition, they do not take into account injury history and risk of a player. Teams can increase their odds of adding a future starter or star over a player’s first six seasons by drafting position players, who have been historically shown to be safer bets than pitchers due to a lesser injury risk.

The model in this post attempts to find statistically significant correlations between players’ college stats and a player’s stats for his first six seasons in the MLB. Six seasons is the amount of time a team has a drafted player under control until they reach free agency and the player is granted negotiating powers with any team, like we’ve gone over. However, the relationship between college batting statistics and MLB fWAR can only go so far because of the lack of fielding and other data for college players.

The first thing I did was merge databases of Division I college players for years 2002-2007 with their statistics for their first six years in the MLB. There is some noise in the model since some payers in the MLB who were drafted in later years in my sample have not spent six years in the MLB, which is accounted for. I only look at the first 100 players drafted each year. I then calculate each player’s college career wOBA per the methods recommended by Victor Wang in his 2009 article on a similar topic. However, since wOBA weights are not recorded for college players, the statistic is more of an arbitrary wOBA that uses the weights from the 2013 MLB season. Since wOBA weights do not vary heavily from year to year, it will do the trick for the purpose of this analysis. For MLB players, wOBA compared to wRC and wRC+ have a 97% correlation (varying slightly on the size of the sample) so I did not feel it was necessary to calculate wRC in addition to wOBA. In fact, when using ordinary least squares and multiple least squares regression techniques, I would have experienced problems with pairwise collinearity, so calculating both statistics would have proved pointless. Along with an ordinary least squares regression technique, I also use multiple least squares and change the functional form to double logarithmic. (A future study I hope to tackle soon is to use logistic regression techniques to calculate the odds of a college player ending up each of the four WAR groups for their first six season in the majors.)

Due to the limitations in the data as well as the restrictions on the amount of top 100 picks that actually make it to the MLB, the analysis is somewhat limited, yet still produces some valuable results. Interestingly, though perhaps unsurprisingly, my calculated wOBA for each player’s college career showed a strong and statistically significantly relationship with wOBA produced in the MLB. To a lesser extent, college wOBA also indicates a statistically significant relationship with MLB-produced WAR, even though this study does not take into account defense, baserunning, etc. Looking at a collinearity matrix, I find that college wOBA and MLB wOBA have about a 25% pairwise collinearity. In addition, the matrix shows a similar pairwise collinearity of about 25% between college wOBA and MLB WAR, though at a lower level of confidence. Using an ordinary least squares regression, I use different functional forms to further evaluate the strength of the relationship between college and MLB statistics.

The first model confirms a fairly strong and statistically significant relationship at the 1% level between college and MLB wOBA with a correlation coefficient of about .25. College strikeout to walk ratio is also statistically significant at the 1% level albeit without a strong correlation coefficient. Even so, looking back at the matrix indicated that players who are less prone to the strikeout in college, on average, see better success in the MLB. Interestingly enough, college wOBA and strikeout to walk ratio are about the only two statistically significant statistics that I can find by running several models with different functional forms. Per the model, we can also say that it is likely that college hitters with extra-base-hit ability have better prospects in the majors. The R-square for model one is about .20, which is not terrible, but certainty not enough information to provide a set-in stone model. The constant in the regressions seem to capture noise that is difficult to replicate, lending insight to the extreme variance and unpredictability of the draft.

For model 2, I use a double logarithmic functional form with a multiple least squares linear regression in order to see the variance in MLB wOBA with college wOBA and strikeout to walk ratio. The results of this regression are slightly stronger and look a bit more promising to the conclusion that the calculated college wOBA is a strong predictor of MLB wOBA.

According to the results of the double log model, a one percent increase in MLB wOBA corresponds to about 36% increase in college wOBA, all else equal. (Since the model is in double log form, the interpretation is done by percent and percentage points.) We can more simply interpret this that a player, on average and all else equal, will have a one percent higher wOBA in MLB for every 36% increase to their college wOBA compared to other players. The coefficient is significant at the one percent level. In addition, a one percent increase in MLB wOBA corresponds to about a six percent decrease in college strikeout to walk ratio. Again, I get about a R-squared of about 0.20.

Perhaps the most interesting thing that these regressions have shown is that college batting average has almost no correlation with MLB success. This may be a little misleading because hitters who get drafted in high rounds and who do well in the MLB will likely have high college batting averages, but the regressions show that there are other things teams should look for in their draft picks besides a good batting average. Traits such as low amounts of strikeouts, especially relative to the number of walks, helping indicate a player’s pure ability to get on base. When evaluating college players, factors such as character build, work ethic and leadership abilities will be just as good as indicators for success for strong college ball players. Perhaps the linear weights measurements used in wOBA calculations are on to something. Accurate weights can obviously not be applied to college statistics without the proper data, but the comparisons using MLB weights for college players can still be useful. In addition, it is also well known that position players are traditionally safer higher-round picks than pitchers due to injury risk. I would argue that strong college hitters are often times the most productive top prospects, while younger pitchers who can develop in a team’s player-development system can be beneficial for a strong farm system and pipeline to the major leagues. Many high-upside arms can be found coming out of high school, rather than taking power college pitchers. In addition, arms from smaller schools often times are overlooked due to the competitive environment they player in. Nevertheless, hidden and undervalued talent exists that could result in high-upside rewards, both financially and productively for teams.

Let’s Find the Giants 88 Wins

by Lance Brozdowski

December 1, 2017

We find ourselves in the midst of an exceptionally intriguing offseason. Rarely is there an opportunity to acquire a prior year’s MVP and remain in position to nab the number-two asset on the market: Shohei Ohtani. Given Ohtani’s decision to forego a contract that syncs up with his open-market value when he turns 25, he’ll hold a Black Friday-esque price-tag when posted. Virtually any team in baseball can make a play to acquire the former star from the Hokkaido Nippon-Ham Fighters, regardless of wallet size. That makes this particular campaign for a generational talent so intriguing.

Whether your team meets Ohtani’s duo of wants — independent of a passing grade on his questionnaire — is another story.

The San Francisco Giants are in a precarious position heading into 2018. Coming off a 64-win season, the lowest win total for their franchise since 1994, and the lowest of Bruce Bochy’s tenure by seven games, a rebound seems imminent. The current state of their roster, however, casts doubt on how relevant a rebound can make their team.

So, I sent out a tweet entertaining the possibility that one team lands the two biggest names of the offseason.

In the theoretical universe where the Giants land #Stanton and #Ohtani – what’s their win total in 2018? #MLB (64 Ws in ’17 for reference)

— Lance Brozdowski (@LanceBrozdow) November 19, 2017

A little bit of mental math brought my over/under to 87.5 wins. Imprecise? Sure, but only three times since 2014 has one team improved on their prior year win total by more than 24 games: the Minnesota Twins (2016 to 2017, +26 wins), Arizona Diamondbacks (2016 to 2017, +24 wins), and Chicago Cubs (2014 to 2015, +25 wins). Whether a signal or mere noise, each of those improvements came without lavish acquisitions during winter (I used my subjective definition of “lavish”). Each was propelled to relevance by internal talent (Buxton/Sano, Ray/Godley, Arrieta/Bryant, etc.), superb management, and other favorable nods from the Baseball Gods. Each of the 29 responses to my poll came with three elements of consideration: Ohtani, Stanton, and everything else.

Ohtani

The pitching side of Ohtani’s value is interesting. ZiPS and Dan Szymborski were the first to throw their hat in the ring, giving Ohtani a 3.55 ERA over 139 innings of work, with 161 strikeouts, and a walk rate of 3.9 BB/9. It’s lukewarm, considering the hype around Ohtani and knowledge of his sub-1.1 WHIP over in the NPB. Do I agree with it? Not from a control standpoint, but we can work with it and my disagreement isn’t dismissal of a labor-intensive statistical model’s projection.

Taking the three essential components of FIP (walks, strikeouts, and homers), and our knowledge that pitcher fWAR is derived from FIP, we can backtrack from Ohtani’s ZiPS projection and in an anti-statistician kind of way. By comparing Ohtani’s per-nine peripherals to last year’s performers, we can infer his fWAR might be around 3.0 as a pitcher in 2018 (139 IP, 10.4 K/9, 3.9 BB/9, 1.0 HR/9). This ZiPS and fWAR magic says he’ll be slightly worse than 2017 Brad Peacock (that was a weird sentence to write).

Ohtani’s potential 3.0 fWAR is backed up when you look at his 2016 in the NPB. The righty posted 137 1/3 innings of work, with a 9.2 K/9, 2.9 BB/9, and a HR/9 just north of 1.0. This gives Ohtani something slightly better than Jose Berrios’ 2.8 fWAR 2017 campaign (an equally weird sentence to write).

Value for Ohtani with his bat on the Giants, a team obviously absent of a DH, is where confusion starts.

I want to keep this as simple as possible. It’s unlikely that he goes to the NL if contributing significantly on the mound and in the box are his main goals. The inherent risk for the lottery-winning club would be too high and uncertainty around whether Ohtani would prefer such a role plays an equally large factor. Travis Sawchik breaks Ohtani’s NL hitting value down better than I ever could, so I’ll only give you the product of his analysis.

Ohtani could have about 1.6 fWAR as a hitter. This is composed of 1.1 fWAR in his standard pitcher plate appearances, plus another .5 fWAR from regular pinch-hitting chances (emphasis on the word “regular”).

In total, we have a 4.6 fWAR player in Shohei Ohtani in the National League. Our 3.0 fWAR on the mound and an aggressive — but feasible — 1.6 fWAR in the box.

To find 88 wins for the Giants that my poll responders believe in, we need to start somewhere. It’s too easy to begin at a projection already circulating for the Giants’ 2018 win total, so I’ll make this hard for myself to execute, and likely, for you to rationalize. Let’s start with those 64 hard-fought wins Bochy’s squad scratched and clawed their way to. We’ll work backwards from there.

64 wins, plus roughly five we’re going attribute to Ohtani brings us to 69.

Stanton

Now onto Stanton.

Eno Sarris, a familiar name to many, looked through the surplus value on a trade that would send Stanton to the Bay Area. The names included in that analysis revolve around the following:

To SF: Stanton, Dee Gordon

To MIA: Joe Panik, Tyler Beede, Chris Shaw

We don’t have confirmation this would be the package, but I remain adamant Miami wants contract relief more than anything. Centering an offer around the eight FanGraphs wins above replacement (fWAR) Panik has accumulated in his career feels like a proper balancing of sides, given how much money the Giants would take on in a scenario like this. Whether Stanton opts out or stays through the length of his contract muddies just how much money the Giants, or any team, will tie up through 2027. Although it seems like a risk teams are willing to take, how that opt-out risk factors into offerings is another confounding input.

However, Stanton’s value to teams from a performance standpoint is less cloudy than his monetary value. He’s good. Very good. Completing two 6-fWAR seasons before turning 28 is desirable trait for any player. One of the first projections kicking around — FanGraphs’ Steamer — holds Stanton somewhat steady with his torrid 2017. 5.3 fWAR, buoyed by another 45+ homer season, and a wRC+ that holds up to his career standard. I have little objection to this, even if worry consumes you that a healthy season for Stanton was an anomaly.

Ohtani brought us to 69 wins and now Stanton will take us north of the only number above 15 anybody is ever excited to see. We’re at 74 for the Giants by taking WAR and interpreting them as literal wins, something I probably shouldn’t do given the debate the industry just had, but I’ll test my luck.

Everything else

This subheading encompasses a lot of assumptions. In my tweet asking my loyal followers to quickly gauge whether the Giants could get above the 87.5 wins, this considered everything from a (hopefully) full season of good Madison Bumgarner and paying a priest to rid the bad juju from the Giants’ clubhouse, to a minor investment in separate baseballs juiced specifically for AT&T Park.

We could venture another 1,000 words on the improvements of San Fran, but there are far more qualified Giants fans on this website and others (shoutout to Grant Brisbee at McCovey Chronicles) that have surely detailed this difference with more care and a deeper knowledge of the Giants’ issues and internal fixes.

Cutting to the chase, let’s make a simple push to the 88-win mark. FanGraphs’ depth-chart projections currently has the Giants as a 78-win team. That’s 14 wins better than 2017. It is also exactly what we need to go from 74 wins to 88.

Sometimes, things work out better than anybody could have ever planned.

We found our 88 wins.

The only thing I’m left wondering is whether my tweet and over/under projection at 87.5 inspired hopes of 90-plus-win seasons in voters’ minds. If Bochy & Co. can accomplish that feat without even one of Ohtani or Stanton, I commit to paying the shipping fee for Bochy’s Manager of the Year Award.

A version of this post can be found on my site, BigThreeSports.com, by following this link.

Who Are the Top “Pound-for-Pound” Power Hitters?

by djer2xa

December 1, 2017

We all know that Aaron Judge hit for more power this year than Jose Altuve. But, whose power was more impressive? Aaron Judge, who is 6’7 and 282 pounds, has a considerable size advantage over Jose Altuve, at 5’6 and 164 pounds. Perhaps Altuve is actually a better power hitter for his size than is Judge. Let’s expand this idea to the entire league: who is the pound-for-pound top power hitter?

Role of Height and Weight in Batter Power

Using simultaneous linear regression, I estimated the effects of two physical characteristics — height and weight — on batter power. Measures of batter height and weight were taken from MLB.com. For batter power, I used Isolated Power.

As shown in the figures below, weight and height have positive relationships with power.

Height and Weight

Weight has a stronger relationship with power than height, though it is difficult to see in the figures alone. (It’s also not intuitively clear exactly how height affects power.) In subsequent analyses, I consider both weight and height.

Who are the top pound-for-pound power hitters?

Using the model, one can predict a batter’s expected power (based on height and weight) and compare it to their actual power.

Who are the top pound-for-pound power hitters? See below for the results.

Top 10 hitters

Khris Davis, formerly the #9 top power hitter, emerges as the #1 pound-for-pound power hitter in baseball. In 2017, Davis, who is three inches and over 30 pounds below average for a Major League hitter, hit a remarkable 43 home runs in 2017, with an ISO of .281. Nolan Arenado and Josh Donaldson made similar jumps in the rankings, from #7 to #2, and #10 to #3, respectively.

Notable power hitters have fallen slightly on this list, though remain in the top 10. For example, Aaron Judge fell from the top spot to #8, while Giancarlo Stanton dropped three spots (#2 to #5). It is important to note here that these power hitters are still impressive – continuing to hold spots in the top 10, regardless of their size.

Biggest improvements in rankings

Which players showed the most improvement in the list? Below are results from the top 50 players on the list.

Top 3 improved rank players

Andrew Benintendi showed the largest increase in rankings (from 184 to 43). Jose Altuve nearly broke into the top 10, jumping from 132 to 12. Lastly, Eddie Rosario improved 68 spots (100 to 32). Altuve, in particular, has recently shown increases in power (from .146 to .194 to .202 in 2015-2017); as a result, his pound-for-pound status may continually increase in upcoming years.

Who was more impressive?

To reference the initial question in this article: was Jose Altuve’s or Aaron Judge’s power more impressive? Results from the above analyses were compiled from 2015 to 2017 seasons. To compare Altuve and Judge’s recent season, take a look below.

Altuve vs Judge

Aaron Judge tops Jose Altuve in the pound-for-pound hitter rankings – by a very thin margin – in 2017. Judge’s power performance exceeded expectations (as predicted by his height and weight) to a slightly higher degree than Altuve.

Full Rankings

If you want to see the full list of hitters for this dataset, including the worst pound-for-pound power hitters (poor Jason Heyward!), click here.

Analysis

Read the rest of this entry »

Alex Cobb Will Be One of the Gems of This Free Agent Class

by Matthew Mocarsky

November 30, 2017

Of the pitchers hitting the free-agent market this winter, Alex Cobb is not likely to receive the most fanfare.

Aces Yu Darvish and Jake Arrieta will command contracts north of $100 million. Closers Wade Davis and Greg Holland will do their best to secure four-year deals with big price tags. The whole world is watching every development in the Shohei Ohtani saga. Hell, among midmarket starting pitchers, MLB Trade Rumors predicts Lance Lynn to receive a more lucrative contract than Alex Cobb.

Cobb, who broke in as a full-time starter with Tampa Bay in 2012, has historically shown great promise and good-but-not-great results. He averaged 2.5 fWAR from 2012-2014, lost the next two seasons to Tommy John surgery, then came back with a 2.4 fWAR season in 2017. Cobb has never started 30 games in a season, nor has he ever thrown 200 innings. These facts are concerning to some, but I would argue that he is one of the wisest investments one can make this offseason.

Alex Cobb has evolved as a pitcher through pitch selection. Cobb has a great curveball. You either already know that, or you’re about to find out. He also mixes in a four-seam fastball, a splitter, and a sinker. Right now, curveballs are all the rage in baseball, resulting in tremendous success for pitchers like Rich Hill, Trevor Bauer, and Lance McCullers. They throw their curveballs so often that we can consider the breaking ball, not the fastball, to be their primary pitch. Like Hill, Bauer, and McCullers, Cobb has a quality breaking ball, so it stands to reason he should throw it more often and perhaps eschew his mediocre offerings. With Brooks Baseball, we can track the usage rate on each of his pitches throughout the season.

Look at the first couple data points for the usage rates on his pitches, and then compare them to his points at the end of the season. It’s clear that Cobb began to realize he works best by using the fastball and the curveball exclusively, so he increased his usage rate on those pitches and gradually phased out the splitter and sinker.

The question for Cobb is whether this was a good idea. In Cobb’s career, he’s only posted a strikeout-to-walk percentage (K-BB%) above 15% twice, and only ever so slightly so. He’s not bad in that regard, but it’s not where he makes his bread and butter. Fortunately for Cobb, he is one of the better pitchers in the league at inducing ground balls, which we know is favorable contact. The more grounders Cobb induces, the better he gets, and his curveball is a ground-ball machine. Consider the correlation between the rate at which Cobb increased his curveball usage and his ground-ball rate (GB%) throughout the season:

source: imgur.com

That’s a pretty strong correlation. It seems that Cobb is ready to join the Hills, Bauers, and McCullerses of the world and ride a high breaking-ball-usage rate to breakout success. Of course, it’s never going to be that easy for Cobb or anybody, but let’s go through one of his starts and parse what we can from the good and bad.

On September 4, Cobb pitched against a red-hot Minnesota Twins lineup and had one of his better starts of the season. His first batter of the game was second-half monster and fly-ball connoisseur Brian Dozier, and he managed to get him out on the first pitch.

It’s been proven that batters from the “fly-ball revolution” can be neutralized if you throw them high fastballs. These hitters are swinging up to lift the ball, but it’s difficult to put much lift on a high pitch coming in fast.

We’re going to focus on the curveball throughout this piece, but here is a fun fact about his fastball. Cobb’s heater sits at 92 MPH and had a spin rate of 2101 RPM this season, which seems pretty pedestrian. However, among starting pitchers with at least 100 batted-ball events involving fastballs, Alex Cobb’s has the 31st lowest exit velocity (87.1 MPH). To put this in perspective, that’s a better mark than James Paxton, Chris Sale, Max Scherzer, Jon Gray, Justin Verlander, and Luis Severino.

Cobb was smart to bait Dozier here, and he reaped the benefits with a first-pitch out to begin the ballgame.

In the second inning, we see Cobb pitching out of the stretch and unleashing a curveball that Ehire Adrianza buries into the ground. This will be the common theme today.

I mentioned earlier that Cobb doesn’t have the K-BB% of Chris Sale or Corey Kluber, so every once in awhile he walks batters. The common thought is that Cobb, who throws so many breaking balls, might end up behind in the count thanks to misplaced curves. Then, to get back in the count, he throws his 93 MPH fastball in the zone, which gets crushed by every hitter expecting it.

This would be a bad habit for Cobb to fall into, but he certainly didn’t in 2017. Consider the list of pitchers who threw the most curveballs while behind in the count this season (via Baseball Savant): source: imgur.com There’s Cobb, in fifth place, not far behind Rich Hill himself. All five of these guys have great curveballs, so it makes sense for them to Trust the Process™ and continue dropping the hammer rather than submitting to doom and throwing a predictable fastball in the zone.

After walking the leadoff batter to start the third inning, Cobb knew Joe Mauer could make him pay. So rather than giving Mauer the fastball he wanted, Cobb began the at-bat by dropping a curveball for a strike that even froze the great Mauer.

This changed the whole at-bat, because now Mauer didn’t know whether Cobb would be coming at him with the curve or the fastball. Cobb took advantage of his opportunity, used the fastball to get him in an ideal 1-2 count, and then he went back the curveball and got Mauer to ground into a double play.

Cobb is comfortable throwing the curveball both behind in the count and with runners on base, so he can reap the rewards and induce quite a few double plays. That is an asset. Additionally, Cobb is comfortable throwing his curve from both the stretch (as we saw against Adrianza and Mauer) and from his big windup, as you can see here.

Eddie Rosario is a good hitter who made great strides late in the season, but even he found himself to be another ground-ball victim of Cobb’s curveball.

By the fifth inning, Cobb was almost through his second time against the Twins’ batting order. At this point, they weren’t sure whether to expect the curveball or the fastball, so Cobb was often ahead in the count. Here, he has Eduardo Escobar in a 1-2 count and throws a high fastball that Escobar swings right through.

Everyone in the park was expecting Cobb to throw the curveball to finish Escobar off. From a look at Escobar’s swing, it’s safe to say he was expecting a curveball himself. Cobb’s fastball isn’t necessarily anything special, but the way he uses it to pitch off the curveball can be.

With two outs in the inning, Cobb faced his 18th batter (which would complete his second time through against the opposing batting order). He quickly got Ehire Adrianza into an 0-2 count and then unleashed his best curveball of the night, which Adrianza pounded into the ground for another easy out.

At this point, Cobb had gone through the opposing order twice, pitched five innings, and only given up one run. Teams around the league are beginning to realize that most of their starters simply shouldn’t go out for the third time through the order, even if they are rolling. The Houston Astros just rode using Lance McCullers, Brad Peacock, and Charlie Morton in tandems all the way to the World Series. Those three guys are valuable pieces, and if Cobb is utilized liked this, so is he.

Unfortunately for Cobb, his pitch count was at 85, so his manager decided to bring him out for another inning. The Twins got their third look at Cobb, and I don’t need to cite the statistics to you about what happens at this point. Hitters are smart, so they can pick up on the tendencies of a pitcher if they see him so many times. Alex Cobb, as great at he was through five innings and two times through the order, is no exception to this rule.

Here is Joe Mauer taking an 0-2 curveball from Cobb and driving it into the gap in center for a double.

The important question here is, “was that Cobb’s fault or just a good piece of hitting from Joe Mauer?” Of course, the answer in baseball is always going to be both, but you can see in the embedded GIF that Cobb doesn’t necessarily leave the pitch up. In fact, if you compare it to the curveball that Cobb threw earlier in the game to get Mauer to ground into a double play, it doesn’t look much different — maybe an inch or two higher, at worst. The bigger change is Mauer, who swings like a guy fighting to stay alive in the first GIF, then like he knew exactly what was coming and how to handle it in the second.

This is the “third time through the order” effect in a microcosm. Pitches that fool batters earlier in the game become cookies, so the key is to relieve your pitcher while his pitches still fool the batters. Cobb should not be penalized by us for giving up a double to Mauer there; in 2018, analytical teams will be bringing in a new pitcher in these situations.

In this sense, Cobb is the first free-agent test case for the newest pitching trend in the industry — the tandem starter — one who pitches twice through the order, hopefully gets 15-18 outs, and then gives way to someone else. The Mets, who hired progressive Indians pitching coach Mickey Callaway to be their new manager, have made it clear that all starters not named deGrom or Syndergaard will be shielded from facing lineups more than twice in a game. Baseball has never experienced a shortage of five-inning pitchers in its history, but these changes in pitcher usage are leading to new premiums for these specialists.

It’s as simple as this: every team wants to stock their pitching staff with Alex Cobbs. To be clear, every team wants a Justin Verlander, but there is only one Justin Verlander; even horses Chris Sale and Corey Kluber showed significant wear and tear in October. To combat this dilemma, the Houston Astros deployed Lance McCullers, Brad Peacock, and Charlie Morton in five-inning tandems and rode them all the way to the last out of Game 7.

I expect Alex Cobb will fit into this role quite nicely for whichever team he signs with.

« Previous Page — « Previous entries

Next entries » — Next Page »

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG